Partial credit scoring for speech recognition tasks can improve measurement precision. However, assessing the magnitude of this improvement with partial credit scoring is challenging because meaningful speech contains contextual cues, which create correlations between the probabilities of correctly identifying each token in a stimulus. Here, beta-binomial distributions were used to estimate recognition accuracy and intraclass correlation for phonemes in words and words in sentences in listeners with cochlear implants (N = 20). Estimates demonstrated substantial intraclass correlation in recognition accuracy within stimuli. These correlations were invariant across individuals. Intraclass correlations should be addressed in power analysis of partial credit scoring.

Speech recognition tasks are used in clinical and experimental settings to assess an individual's speech recognition accuracy. For these tasks to be useful, they must collect sufficient data from the individual to provide a precise estimate of their ability. However, this need must be weighed against the cost of collecting the data. This cost is a particular concern in clinical settings, where clinicians have limited time to administer and score such assessments. One approach to improving measurement precision or decreasing the amount of data that need to be collected to achieve a target precision is partial credit scoring, i.e., by scoring a word recognition task by the number of phonemes the participant correctly identified rather than by whether the entire word was correct (Billings , 2016). While the precision of measurements obtained with whole item scoring is well-established (Thornton and Raffin, 1978), establishing the precision of partial credit scoring is less straightforward when recognizing each token within a stimulus depends on whether other tokens were recognized. When speech is meaningful, contextual cues from neighboring phonemes in words or words in sentences can support identification (Boothroyd and Nittrouer, 1988), which renders correct identification of any token in a stimulus dependent on whether other contextually related tokens were correctly identified. As a result, the amount of information obtained per stimulus for meaningful stimuli is somewhere between one binary success or failure (i.e., the amount of information obtained via whole item scoring) and a number of successes or failures equal to the number of tokens in the stimulus (i.e., recognition accuracy is independent for each token in the stimulus). In terms of measurement precision, the most information is obtained per trial when recognition accuracy is independent across tokens. The least information is obtained per trial when there could be so much contextual information that identifying any one token in the stimulus is sufficient to identify the entire stimulus, which would result in no advantage to partial credit scoring over whole item scoring. In most cases, meaningful speech stimuli fall somewhere in between, with some degree of contextual dependence.

Mathematically, partial credit scoring can be described as sampling from a binomial random process. For a given trial t and participant p, the participant will correctly identify Xtp tokens out of a total of nt tokens, with the probability of correctly identifying each token determined by the participant's speech recognition accuracy, πp,
(1)
When the probability of identifying a token is invariant across tokens, recognition accuracy can be modeled as a constant value, μp,
(2)
However, as described above, the assumption of independence is often violated in meaningful speech. This case can be modeled by randomly sampling πp from a beta process,
(3)
Combined, sampling from a beta process to obtain πp and then sampling from a binomial process to obtain Xtp yields samples from a beta-binomial process. The beta-binomial distribution is a generalization of the binomial distribution. This generalization allows the beta-binomial distribution to take on a variety of shapes, ranging from being equivalent to the binomial distribution to a U-shaped function in which recognition accuracy is predominantly all-or-nothing. This latter shape is of particular interest here because it describes the case where recognition accuracy for any given token is dependent on recognition accuracy for other tokens within the trial, which would occur when the listener uses contextual information to accurately infer the identity of tokens they did not actually recognize. Formally, when the beta-binomial distribution deviates from the binomial distribution, it is because the variance of the beta-binomial distribution is greater than the variance of the binomial distribution, which is called overdispersion. While multiple parameterizations of the beta-binomial distribution exist, one that is theoretically intuitive for analyzing speech recognition data uses two parameters: μ is the mean probability of accurately identifying a token and ρ (alternatively denoted φ in some sources) is the intraclass correlation, which determines the overdispersion of the distribution,
(4)
(5)

Example distributions across varying levels of μ and ρ are shown in Fig. 1. At the limits, when ρ approaches zero, the beta-binomial distribution reduces to the binomial distribution, and when ρ approaches one, all tokens within a stimulus are either correctly identified or incorrectly identified. μ and ρ can vary orthogonally, such that increasing ρ makes the distribution more U-shaped (i.e., increased variance) without affecting the mean.

Fig. 1.

Example beta-binomial distributions of phoneme recognition accuracy within monosyllabic words across varying levels of correct response probability (μ) and intraclass correlation (ρ). Nonzero values of ρ indicate overdispersion of recognition accuracy across phonemes within words, which reflects the use of context to facilitate recognition.

Fig. 1.

Example beta-binomial distributions of phoneme recognition accuracy within monosyllabic words across varying levels of correct response probability (μ) and intraclass correlation (ρ). Nonzero values of ρ indicate overdispersion of recognition accuracy across phonemes within words, which reflects the use of context to facilitate recognition.

Close modal

In the current work, Bayesian analyses were used to compare binomial and beta-binomial models of speech recognition accuracy. Beta-binomial distributions have been shown to account for overdispersed psychophysical task performance in psychometric function estimation (Fründ , 2011; Schütt , 2016) and have been used to model speech reception psychometric functions for speech recognition in noise (Hu , 2015). The current work extends these previous studies beyond estimating psychometric functions to an examination of individual differences in a constant stimulus speech recognition task. Specifically, models were used to estimate group and individual values of μ and ρ for word and sentence recognition data from individuals with cochlear implants. Individuals with cochlear implants dramatically vary from one another in speech recognition accuracy (Gifford , 2018), which has been attributed to a wide variety of factors (Holden , 2013). Individual differences in the resolution of the auditory signal conveyed by the auditory pathway in response to electrical stimulation and the cognitive ability of the listener to interpret such signals both affect their speech recognition accuracy (O'Neill , 2019; Tamati , 2020). Cognitive ability plays a role in “top-down” restoration of the degraded signals provided by a cochlear implant (Moberly and Reed, 2019), so I hypothesize that individuals in this population will differ from one another in their use of context, which would manifest as individual differences in intraclass correlation ρ, in addition to known individual differences in the probability of accurate recognition μ. Participants in the current study were tested with minimum speech test battery consonant nucleus consonant (CNC) words (MSTB, 2011) because these are linguistically simple stimuli that are commonly used to assess clinical speech recognition outcomes. Participants were also tested with perceptually robust English sentence test open-set (PRESTO) sentences (Gilbert , 2013) because these sentences are long and semantically complex. I hypothesize that individual differences in ρ are most likely to be evident in PRESTO sentence recognition because the complexity of these sentences provides the opportunity to use semantic context to fill in missing information. Additionally, if ρ is substantially greater than zero, then the precision of recognition accuracy estimates will decrease more slowly as a function of the number of trials than if ρ is close to zero (Schütt , 2016), which would reduce the effective amount of information obtained per trial (Fründ , 2011). I simulated the measurement precision of partial credit scoring for CNC words as a function of the number of tested words, which was compared to measurement precision with whole word scoring (Thornton and Raffin, 1978).

Twenty individuals with at least 1 year of cochlear implant use participated in this study (eight males, 12 females, mean age was 60 years with a range of 22–76 years). Participant information is provided in the supplemental Open Science Foundation repository (see Data Availability). All participants provided informed consent and were compensated for their participation. The study was approved by the Boys Town National Research Hospital Institutional Review Board.

Participants listened to and repeated MSTB CNC words and PRESTO sentences that were presented from a loudspeaker in a sound-treated booth at a mean level of 65 dB SPL. All participants used their everyday cochlear implant processor set to their default clinical program. Participants who had residual acoustic hearing removed their hearing aids and wore an ear plug to render stimuli inaudible to the ear with residual hearing. Verbal responses were recorded for offline transcription and scoring.

A total of 250 CNC words (MSTB lists 1, 2, 4, 5, and 10) and 36 PRESTO sentences (lists 13 and 17; 152 keywords total) were presented to each participant in a fixed order. Prior to testing with each stimulus set, three practice MSTB words (“duck,” “bomb,” and “June”) and three sentences from PRESTO List 1 were presented to familiarize the participant with the stimuli. Participants were instructed to press a button on a computer mouse to initiate playback of each stimulus, then listen to the stimulus and repeat aloud what they thought was said. The task was self-paced, and breaks were offered between lists.

CNC word responses were transcribed in Klattese. Edit distance was used to calculate the number of edits required to transform the response to the target word and the number of phonemes correct was calculated as three minus the number of edits, to a minimum of zero phonemes correct (Gonthier, 2022). PRESTO sentences were scored by the number of keywords correct using scoring criteria provided with those sentences. PRESTO sentence recognition data were previously reported by Bosen (2021).

Four models were used to estimate individual differences in model parameters μ and ρ for both stimulus sets. The first model assumed that the number of correct responses Xtp in each trial arose from a binomial distribution, with mean accuracy μp varying across individuals but constant across trials,
(6)
The second model assumed that the number of correct responses in each trial arose from a beta-binomial distribution. Recognition accuracy for each trial, πtp, was sampled from a beta distribution with mean accuracy and intraclass correlation μp and ρp varying across individuals,
(7)
The third model also assumed a beta-binomial distribution, but fixed ρgroup to a constant value across all participants. Comparing the second and third model determined whether individual variability in intraclass correlation ρp provides a better explanation of the data than fixed intraclass correlation ρgroup,
(8)
Finally, the fourth model fixed μgroup and ρgroup to constant values across all participants. This model was used to examine the group-level distribution of the data (see Fig. 2 below) but was not compared to the other models,
(9)
Fig. 2.

Group-level distribution of correct phonemes within MSTB CNC words and words within PRESTO sentences are shown as bars. Beta-binomial distributions are shown as solid black lines and binomial distributions are shown as dashed gray lines. PRESTO sentences have a variable number of keywords per trial, so number of keywords correct is shown as a proportion out of total keywords.

Fig. 2.

Group-level distribution of correct phonemes within MSTB CNC words and words within PRESTO sentences are shown as bars. Beta-binomial distributions are shown as solid black lines and binomial distributions are shown as dashed gray lines. PRESTO sentences have a variable number of keywords per trial, so number of keywords correct is shown as a proportion out of total keywords.

Close modal

Beta-binomial model parameters were estimated in two ways. First, the VGAM package (v1.1.8; Yee, 2010) in R (v4.2.0; R Core Team, 2022) was used to estimate group- and individual-level parameter values for each stimulus set as an easy-to-use method of obtaining descriptive statistics to show group trends and as a reference for developing more complex Bayesian models. Second, the Stan programming language (v2.26.1; Carpenter , 2017), using the RStan interface (v2.26.11; Stan Development Team, 2020) was used to compare the three models of task performance described above. Priors for μ and ρ were set to Beta(2,2) to provide a weakly informative but nonuniform prior, although a prior sensitivity analysis indicated that the choice of prior did not substantially affect model parameter estimates (see Data Availability for information about the supplemental OSF repository). Models were compared by using Pareto smoothed importance-sampling leave-one-out cross-validation (Vehtari , 2017) to estimate expected log pointwise predictive density (ELPD), which quantifies goodness of fit. If two models have a difference in ELPD of greater than four and the standard error of the difference is smaller than the difference, then the model with the higher ELPD is a better explanation for the data (Sivula , 2020). For an overview of the workflow when using Bayesian modeling and an example of its use in auditory research (see Gelman , 2020; McMillan and Cannon, 2019).

To estimate measurement precision for mean accuracy using phoneme-level scoring in MSTB word lists when intraclass correlations are present, 10 000 draws from two distributions were made across a range of 10–100 words tested. Conditions simulated were whole word scoring,
(10)
and phoneme scoring using the most likey value for ρgroup (see results),
(11)

The standard deviation of mean accuracy across draws for each number of words was used as a metric of measurement precision. For this simulation, a fixed recognition accuracy of μ = 0.5 was used because the standard deviation of the mean is greatest at this value for a binomial distribution (Thornton and Raffin, 1978). I then calculated the minimum number of words required to equate the standard deviation of the mean for phoneme scoring with the standard deviation for whole word scoring of responses to 50 words, because 50 word lists are often used in clinical assessment (MSTB, 2011).

Figure 2 shows the group-level distribution of phonemes correct in CNC words and words correct in PRESTO sentences. For CNC words, participants correctly identified an average of 60% of CNC words and μgroup = 76% of phonemes within those words. For PRESTO sentences, participants correctly identified an average of μgroup = 55% of keywords. Binomial distributions using these values of μgroup, shown as dashed gray lines, generally fail to capture trends in phonemes or keywords correct. For the CNC words, the binomial distribution overestimates the probability of correctly identifying two out of three phonemes and underestimates the probability of correctly identifying all three keywords. In the majority of PRESTO sentences, participants either correctly identified every keyword or none of them (proportions of one or zero, respectively), although the binomial distribution predicts that either outcome should be unlikely. For beta-binomial distributions, most likely values of intraclass correlations were ρgroup = 0.46 for CNC words and ρgroup = 0.42 for PRESTO sentences. Beta-binomial distributions using these values, shown as solid black lines, are capable of representing these trends. This observation indicates that substantial intraclass correlations exist in responses to these stimuli, as expected.

Figure 3 shows individual differences in the most likely values of μp and ρp for CNC words and PRESTO sentences, along with estimated posterior probability densities for each model parameter. Individual data and model fits are provided in the supplemental OSF repository. As expected, individuals substantially vary in recognition accuracy μp, with a range of 29%–92% phonemes correct for CNC words and 3%–78% keywords correct for PRESTO sentences. Contrary to my hypothesis, there appears to be little variation in intraclass correlation ρp across individuals. Comparisons of model fits, which either varied or fixed ρ across participants, shown in Table 1, confirm this observation. For both datasets, fixing the value of ρgroup across participants did not yield a substantially worse model fit for CNC words and improved the model fit for PRESTO sentences. Group intraclass correlations were ρgroup = 0.35 for CNC words and ρgroup = 0.25 for PRESTO sentences. These values of ρgroup are lower than when μgroup is fixed across participants, indicating that individual differences in accuracy μp affect the shape of the group-level distributions shown in Fig. 2. Binomial distributions had a substantially worse fit to the data than either beta-binomial model, confirming that including intraclass correlations is necessary to explain the distribution of recognition accuracy for phonemes in words or keywords in sentences.

Fig. 3.

Individual differences in recognition accuracy μp and intraclass correlation ρp estimated via Bayesian modeling. Most likely values for model parameters for each participant are shown as circles, with colors representing different participants. Shaded gradients around each most likely value show the corresponding posterior probability density for both model parameters for that participant, with darker shades corresponding to higher probability density. The dashed black line shows the most likely value for ρgroup.

Fig. 3.

Individual differences in recognition accuracy μp and intraclass correlation ρp estimated via Bayesian modeling. Most likely values for model parameters for each participant are shown as circles, with colors representing different participants. Shaded gradients around each most likely value show the corresponding posterior probability density for both model parameters for that participant, with darker shades corresponding to higher probability density. The dashed black line shows the most likely value for ρgroup.

Close modal
TABLE 1.

Comparison of binomial and beta-binomial models of CNC word and PRESTO sentence data. Expected log pointwise predictive density (ELPDloo), number of effective model parameters (ploo), and the standard errors for both estimates (SE) were estimated for both model fits using the LOO package in R (Vehtari , 2017). Model fits were compared to obtain an estimated difference in expected log posterior density relative to the best-fitting model (ΔELPD) and standard error for the difference.

CNC words
Model ELPDloo ELPDloo SE ploo Ploo SE ΔELPD ΔELPD SE
X t p BetaBinomial ( n t , μ p , ϱ p )  −4811.8  57.3  38.1  1.0  —  — 
X t p BetaBinomial ( n t , μ p , ϱ group )  −4817.1  56.9  20.6  0.4  −5.3  6.2 
X t p Binomial ( n t , μ p )  −5410.9  74.6  32.9  0.9  −599.1  39.3 
CNC words
Model ELPDloo ELPDloo SE ploo Ploo SE ΔELPD ΔELPD SE
X t p BetaBinomial ( n t , μ p , ϱ p )  −4811.8  57.3  38.1  1.0  —  — 
X t p BetaBinomial ( n t , μ p , ϱ group )  −4817.1  56.9  20.6  0.4  −5.3  6.2 
X t p Binomial ( n t , μ p )  −5410.9  74.6  32.9  0.9  −599.1  39.3 
PRESTO sentences
Model ELPDloo ELPDloo SE ploo Ploo SE ΔELPD ΔELPD SE
X t p BetaBinomial ( n t , μ p , ϱ group )  −990.5  17.9  19.7  0.8  —  — 
X t p BetaBinomial ( n t , μ p , ϱ p )  −1002.0  18.3  33.2  1.6  −11.5  2.9 
X t p Binomial ( n t , μ p )  −1090.5  27.1  33.4  1.7  −99.9  16.2 
PRESTO sentences
Model ELPDloo ELPDloo SE ploo Ploo SE ΔELPD ΔELPD SE
X t p BetaBinomial ( n t , μ p , ϱ group )  −990.5  17.9  19.7  0.8  —  — 
X t p BetaBinomial ( n t , μ p , ϱ p )  −1002.0  18.3  33.2  1.6  −11.5  2.9 
X t p Binomial ( n t , μ p )  −1090.5  27.1  33.4  1.7  −99.9  16.2 

Figure 4 shows the standard deviation of the mean when sampling from a beta-binomial distribution with ρgroup = 0.35, which was the value obtained for CNC words. As shown, a minimum of 30 words are required to equate the standard deviation of the mean to a 50-word sample with whole word scoring.

Fig. 4.

The standard deviation of the mean for repeated draws from whole word scoring (n =1, μ = 0.5) and a three-phoneme beta-binomial distribution (n = 3, μ = 0.5, ρ = 0.35) as a function of the number of words tested. Dashed lines show the comparison of the number of trials required to equate standard deviation of the mean for the beta-binomial distribution with whole word scoring for 50 words.

Fig. 4.

The standard deviation of the mean for repeated draws from whole word scoring (n =1, μ = 0.5) and a three-phoneme beta-binomial distribution (n = 3, μ = 0.5, ρ = 0.35) as a function of the number of words tested. Dashed lines show the comparison of the number of trials required to equate standard deviation of the mean for the beta-binomial distribution with whole word scoring for 50 words.

Close modal

Beta-binomial distributions quantify intraclass correlations when multiple binary outcomes are measured within a sample, such as when scoring the proportion of correct phonemes in words or correct words in sentences. Figure 2 shows that context effects create substantial intraclass correlations that alter the distribution of the number of tokens in each stimulus that are correctly identified. While methods of quantifying the amount of context in a stimulus set have already been established (Boothroyd and Nittrouer, 1988), the use of beta-binomial model fits proposed here provides additional flexibility to incorporate individual differences in recognition accuracy and/or intraclass correlation and compare models that fix or vary these parameters across individuals. My present findings, alongside previous work examining the use of beta-binomial distributions as a model of speech recognition in varying levels of noise (Hu , 2015), indicate that recognition of meaningful speech is likely to generally follow this distribution. More broadly, the use of Bayesian statistics enables the estimation of the range of likely values for model parameters (Morey , 2016) to characterize individual differences in those parameters, provides additional diagnostic information for identifying models that are good or poor explanations for the data (Vehtari , 2017), and provides the flexibility to attempt using any model that can be expressed as a set of statistical formulae.

The practical advantage of using beta-binomial distributions to analyze speech recognition data is that can be used to quantify the benefit of partial credit scoring, as shown in Fig. 4. Here, recognition accuracy for MSTB CNC words in listeners with cochlear implants was examined because this is a common clinical assessment of speech recognition outcomes in a heterogeneous clinical population. Scoring responses by the number of phonemes correct within a word improved measurement precision, but because the intraclass correlation was nonzero, this improvement fell short of what would be expected if recognition accuracy was independent for each phoneme within a word. If phoneme recognition accuracy was independent, only 17 words (50 words/three phonemes) would be required to achieve an equivalent precision to whole word scoring, but with the average intraclass correlation estimated for the CNC words, a total of 30 words is needed. Thus, power analyses, which estimate the amount of data required per participant, need to be designed to account for intraclass correlations to ensure that measures are not underpowered. The tools used to fit beta-binomial distributions in the present work are free and publicly available and example source code is provided in the supplemental OSF repository, which should facilitate adoption of this type of analysis.

Against the expected hypothesis, allowing intraclass correlations to vary across participants in this study did not provide a better explanation for the data than fixing the value of intraclass correlation across participants, as shown in Fig. 3 and Table 1. There were a few individuals for whom the estimated probability density of intraclass correlation for CNC words was unlikely to include the fixed group value, but the addition of individual intraclass correlation model parameters was not necessary to explain most participants' recognition accuracy. This finding could reflect the fact that we recruited only post-lingually deafened adults with no known cognitive impairments. Children who use cochlear implants differ from post-lingually deafened adults in the auditory cues they use to recognize speech (Gifford , 2018) and often have heterogeneous delays in language development (Niparko , 2010); thus, it is possible that individual differences in intraclass correlation would be more evident in children (see also Nittrouer and Boothroyd, 1990 for evidence in favor of this possiblity in children with normal hearing). Additionally, any cognitive impairment that affects lexical access would likely also alter contextual use. Attentional control and hearing status also interact to determine the effect of masking speech or noise on recognition of target speech (Shinn-Cunningham and Best, 2008), so individual differences in intraclass correlations could be revealed by adding maskers that would produce more all-or-nothing responses (Boothroyd and Nittrouer, 1988). Examining the cases in which intraclass correlations do or do not vary across individuals could inform theories of stream segregation and speech recognition.

A larger sample of clinical data could potentially be used to establish normative values for intraclass correlations in clinical assessments and thereby develop criteria to flag outliers as potential indicators of cognitive or linguistic issues. No additional clinical testing time would be needed to use these criteria, although scoring response by phonemes correct within words does require both more work than whole word scoring and consistency across testers to be meaningful (Billings , 2016). To ensure clinical test–retest differences are reliable, it may also be necessary to test for variability in intraclass correlations across test lists, in addition to the previously reported variability in recognition accuracy across MSTB CNC lists (Bierer , 2016). The phonemes within words may also affect intraclass correlations because phonemes that tend to be hard to identify with a cochlear implant (Munson , 2003) may be inferred from context more often than phonemes that are easier to identify. While the data reported in the present study is insufficiently powered to address these questions, it establishes proof of concept for using beta-binomial distributions as a tool that could be used with extant datasets to provide answers.

This work was supported by a Centers of Biomedical Research Excellence (COBRE) Grant No. NIH-NIGMS/5P20GM109023-05 and a student training Grant No. NIH-NIDCD/5T35DC008757-14. David Pisoni provided recordings of the Perceptually Robust English Sentence Test Open-set (PRESTO) sentences. Victoria Sevich and Shauntelle Cannon lead data collection. Aditya Kulkarni assisted with participant recruitment.

The author has no conflicts to disclose.

All participants provided informed consent at the start of the experimental session and had the option to leave the study at any time. The study was approved by the Boys Town National Research Hospital Institutional Review Board.

The data and analysis that support the findings of this study are openly available in the Open Science Foundation at https://osf.io/y8t5k/.

1.
Bierer
,
J. A.
,
Spindler
,
E.
,
Bierer
,
S. M.
, and
Wright
,
R.
(
2016
). “
An examination of sources of variability across the consonant-nucleus-consonant test in cochlear implant listeners
,”
Trends Hear.
20
,
1
8
.
2.
Billings
,
C. J.
,
Penman
,
T. M.
,
Ellis
,
E. M.
,
Baltzell
,
L. S.
, and
McMillan
,
G. P.
(
2016
). “
Phoneme and word scoring in speech-in-noise audiometry
,”
Am. J. Audiol.
25
(
March
),
75
83
.
3.
Boothroyd
,
A.
, and
Nittrouer
,
S.
(
1988
). “
Mathematical treatment of context effects in phoneme and word recognition
,”
J. Acoust. Soc. Am.
84
(
1
),
101
114
.
4.
Bosen
,
A. K.
,
Sevich
,
V. A.
, and
Cannon
,
S. A.
(
2021
). “
Forward digit span and word familiarity do not correlate with differences in speech recognition in individuals with cochlear implants after accounting for auditory resolution
,”
J. Speech. Lang. Hear. Res.
64
(
8
),
3330
3342
.
5.
Carpenter
,
B.
,
Gelman
,
A.
,
Hoffman
,
M. D.
,
Lee
,
D.
,
Goodrich
,
B.
,
Betancourt
,
M.
,
Brubaker
,
M. A.
,
Guo
,
J.
,
Li
,
P.
, and
Riddell
,
A.
(
2017
). “
Stan: A probabilistic programming language
,”
J. Stat. Softw.
76
(
1
),
1
32
.
6.
Fründ
,
I.
,
Haenel
,
N. V.
, and
Wichmann
,
F. A.
(
2011
). “
Inference for psychometric functions in the presence of nonstationary behavior
,”
J. Vision
11
(
6
),
16
19
.
7.
Gelman
,
A.
,
Vehtari
,
A.
,
Simpson
,
D.
,
Margossian
,
C. C.
,
Carpenter
,
B.
,
Yao
,
Y.
,
Kennedy
,
L.
,
Gabry
,
J.
,
Bürkner
,
P.-C.
, and
Modrák
,
M.
(
2020
). “
Bayesian workflow
,” arXiv:2011.01808.
8.
Gifford
,
R. H.
,
Noble
,
J. H.
,
Camarata
,
S. M.
,
Sunderhaus
,
L. W.
,
Dwyer
,
R. T
,
Dawant
,
B. M.
,
Dietrich
,
M. S.
, and
Labadie
,
R. F.
(
2018
). “
The relationship between spectral modulation detection and speech recognition: Adult versus pediatric cochlear implant recipients
,”
Trends Hear.
22
,
1
14
.
9.
Gilbert
,
J. L.
,
Tamati
,
T. N.
, and
Pisoni
,
D. B.
(
2013
). “
Development, reliability, and validity of PRESTO: A new high-variability sentence recognition test
,”
J. Am. Acad. Audiol.
24
(
1
),
026
036
.
10.
Gonthier
,
C.
(
2022
). “
An easy way to improve scoring of memory span tasks: The edit distance, beyond ‘correct recall in the correct serial position
,’ ”
Behav. Res.
55
(
4
),
2021
2036
.
11.
Holden
,
L. K.
,
Finley
,
C. C.
,
Firszt
,
J. B.
,
Holden
,
T. A.
,
Brenner
,
C.
,
Potts
,
L. G.
,
Gotter
,
B. D.
,
Vanderhoof
,
S. S.
,
Mispagel
,
K.
,
Heydebrand
,
G.
, and
Skinner
, M. W.
(
2013
). “
Factors affecting open-set word recognition in adults with cochlear implants
,”
Ear Hear.
34
(
3
),
342
360
.
12.
Hu
,
W.
,
Swanson
,
B. A.
, and
Heller
,
G. Z.
(
2015
). “
A statistical method for the analysis of speech intelligibility tests
,”
PLoS One
10
(
7
),
e0132409
.
13.
McMillan
,
G. P.
, and
Cannon
,
J. B.
(
2019
). “
Bayesian applications in auditory research
,”
J. Speech. Lang. Hear. Res.
62
(
3
),
577
586
.
14.
Moberly
,
A. C.
, and
Reed
,
J.
(
2019
). “
Making sense of sentences: Top-down processing of speech by adult cochlear implant users
,”
J. Speech. Lang. Hear. Res.
62
(
8
),
2895
2905
.
15.
Morey
,
R. D.
,
Hoekstra
,
R.
,
Rouder
,
J. N.
,
Lee
,
M. D.
, and.
Wagenmakers
,
E. J.
(
2016
). “
The fallacy of placing confidence in confidence intervals
,”
Psychon. Bull. Rev.
23
(
1
),
103
123
.
16.
MSTB
. (
2011
). “
Minimum Speech Test Battery (MSTB) for adult cochlear implant users 2011 new MSTB user manual
,” https://www.auditorypotential.com/MSTBfiles/MSTBManual2011-06-20%20.pdf (Last viewed January 25, 2024).
17.
Munson
,
B.
,
Donaldson
,
G. S.
,
Allen
,
S. L.
,
Collison
,
E. A.
, and
Nelson
,
D. A.
(
2003
). “
Patterns of phoneme perception errors by listeners with cochlear implants as a function of overall speech perception ability
,”
J. Acoust. Soc. Am.
113
(
2
),
925
935
.
18.
Niparko
,
J. K.
,
Tobey
,
E. A.
,
Thal
,
D. J.
,
Eisenberg
,
L. S.
,
Wang
,
N.-Y.
,
Quittner
,
A. L.
, and
Fink
,
N. E.
(
2010
). “
Spoken language development in children following cochlear implantation
,”
JAMA
303
(
15
),
1498
1506
.
19.
Nittrouer
,
S.
, and
Boothroyd
,
A.
(
1990
). “
Context effects in phoneme and word recognition by young children and older adults
,”
J. Acoust. Soc. Am.
87
(
6
),
2705
2715
.
20.
O'Neill
,
E. R.
,
Kreft
,
H. A.
, and
Oxenham
,
A. J.
(
2019
). “
Cognitive factors contribute to speech perception in cochlear-implant users and age-matched normal-hearing listeners under vocoded conditions
,”
J. Acoust. Soc. Am.
146
(
1
),
195
210
.
21.
R Core Team
(
2022
). “
R: A language and environment for statistical computing
” (R Foundation for Statistical Computing, Vienna, Austria), https://www.R-project.org/.
22.
Schütt
,
H. H.
,
Harmeling
,
S.
,
Macke
,
J. H.
, and
Wichmann
,
F. A.
(
2016
). “
Painfree and accurate Bayesian estimation of psychometric functions for (potentially) overdispersed data
,”
Vision Res.
122
,
105
123
.
23.
Shinn-Cunningham
,
B. G.
, and
Best
,
V.
(
2008
). “
Selective attention in normal and impaired hearing
,”
Trends Amplif.
12
(
4
),
283
299
.
24.
Sivula
,
T.
,
Magnusson
,
M.
,
Matamoros
,
A. A.
, and
Vehtari
,
A.
(
2020
). “
Uncertainty in Bayesian leave-one-out cross-validation based model comparison
,” arXiv:2008.10296.
25.
Stan Development Team
(
2020
). “
RStan: The R Interface to Stan
” (R package version 2.26.11), https://mc-stan.org/.
26.
Tamati
,
T. N.
,
Ray
,
C.
,
Vasil
,
K. J.
,
Pisoni
,
D. B.
, and
Moberly
,
A. C.
(
2020
). “
High- and low-performing adult cochlear implant users on high-variability sentence recognition: Differences in auditory spectral resolution and neurocognitive functioning
,”
J. Am. Acad. Audiol.
31
(
05
),
324
335
.
27.
Thornton
,
A. R.
, and
Raffin
,
M. J. M.
(
1978
). “
Speech-discrimination scores modeled as a binomial variable
,”
J. Speech Hear. Res.
21
(
3
),
507
518
.
28.
Vehtari
,
A.
,
Gelman
,
A.
, and
Gabry
,
J.
(
2017
). “
Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC
,”
Stat. Comput.
27
(
5
),
1413
1432
.
29.
Yee
,
T. W.
(
2010
). “
The VGAM package for categorical data analysis
,”
J. Stat. Softw.
32
(
10
),
1
34
.