Several filterbank-based metrics have been proposed to predict speech intelligibility (SI). However, these metrics incorporate little knowledge of the auditory periphery. Neurogram-based metrics provide an alternative, incorporating knowledge of the physiology of hearing by using a mathematical model of the auditory nerve response. In this work, SI was assessed utilizing different filterbank-based metrics (the speech intelligibility index and the speech-based envelope power spectrum model) and neurogram-based metrics, using the biologically inspired model of the auditory nerve proposed by Zilany, Bruce, Nelson, and Carney [(2009), J. Acoust. Soc. Am. 126(5), 2390–2412] as a front-end and the neurogram similarity metric and spectro temporal modulation index as a back-end. Then, the correlations with behavioural scores were computed. Results showed that neurogram-based metrics representing the speech envelope showed higher correlations with the behavioural scores at a word level. At a per-phoneme level, it was found that phoneme transitions contribute to higher correlations between objective measures that use speech envelope information at the auditory periphery level and behavioural data. The presented framework could function as a useful tool for the validation and tuning of speech materials, as well as a benchmark for the development of speech processing algorithms.

1.
Allen
,
J. B.
(
2005
). “
Articulation and intelligibility
,”
Synth. Lect. Speech Audio Process.
1
(
1
),
1
124
.
2.
ANSI
(
1997
).
American National Standard: Methods for Calculation of the Speech Intelligibility Index
(
Acoustical Society of America
,
New York
), Vol.
19
.
3.
Benzeghiba
,
M.
,
De Mori
,
R.
,
Deroo
,
O.
,
Dupont
,
S.
,
Erbes
,
T.
,
Jouvet
,
D.
,
Fissore
,
L.
,
Laface
,
P.
,
Mertins
,
A.
,
Ris
,
C.
,
Rose
,
R.
,
Tyagi
,
V.
, and
Wellekens
,
C.
(
2007
). “
Automatic speech recognition and speech variability: A review
,”
Speech Commun.
49
(
10
),
763
786
.
4.
Bidelman
,
G. M.
, and
Heinz
,
M. G.
(
2011
). “
Auditory-nerve responses predict pitch attributes related to musical consonance-dissonance for normal and impaired hearing
,”
J. Acoust. Soc. Am.
130
(
3
),
1488
1502
.
5.
Boersma
,
P.
, and
Weenink
,
D.
(
2014
). “
Praat: Doing phonetics by computer
,” Version 5.3.16. http://www.praat.org/ (Last viewed October 23, 2014).
6.
Boothroyd
,
A.
, and
Nittrouer
,
S.
(
1988
). “
Mathematical treatment of context effects in phoneme and word recognition
,”
J. Acoust. Soc. Am.
84
(
1
),
101
114
.
7.
Bradley
,
J. S.
(
1986
). “
Predictors of speech intelligibility in rooms
,”
J. Acoust. Soc. Am.
80
(
3
),
837
845
.
8.
Bruce
,
I. C.
,
Léger
,
A. C.
,
Moore
,
B. C.
, and
Lorenzi
,
C.
(
2013
). “
Physiological prediction of masking release for normal-hearing and hearing-impaired listeners
,” in
Proceedings of Meetings on Acoustics
(
Acoustical Society of America, Montreal
,
Canada
), Vol.
19
, pp.
1
8
.
9.
Bruce
,
I. C.
,
Sachs
,
M. B.
, and
Young
,
E. D.
(
2003
). “
An auditory-periphery model of the effects of acoustic trauma on auditory nerve responses
,”
J. Acoust. Soc. Am.
113
(
1
),
369
388
.
10.
Chi
,
T.
,
Gao
,
Y.
,
Guyton
,
M. C.
,
Ru
,
P.
, and
Shamma
,
S.
(
1999
). “
Spectro-temporal modulation transfer functions and speech intelligibility
,”
J. Acoust. Soc. Am.
106
(
5
),
2719
2732
.
11.
Cole
,
R.
,
Yan
,
Y.
,
Mak
,
B.
,
Fanty
,
M.
, and
Bailey
,
T.
(
1996
). “
The contribution of consonants versus vowels to word recognition in fluent speech
,” in
IEEE International Conference on Acoustics, Speech, and Signal Processing, 1996
(
IEEE
,
Atlanta, GA
), Vol.
2
, pp.
853
856
.
12.
Dau
,
T.
,
Verhey
,
J.
, and
Kohlrausch
,
A.
(
1999
). “
Intrinsic envelope fluctuations and modulation-detection thresholds for narrow-band noise carriers
,”
J. Acoust. Soc. Am.
106
(
5
),
2752
2760
.
13.
Drullman
,
R.
(
1995
). “
Temporal envelope and fine structure cues for speech intelligibility
,”
J. Acoust. Soc. Am.
97
(
1
),
585
592
.
14.
Elhilali
,
M.
,
Chi
,
T.
, and
Shamma
,
S. A.
(
2003
). “
A spectro-temporal modulation index (STMI) for assessment of speech intelligibility
,”
Speech Commun.
41
(
2
),
331
348
.
15.
Ewert
,
S. D.
, and
Dau
,
T.
(
2000
). “
Characterizing frequency selectivity for envelope fluctuations
,”
J. Acoust. Soc. Am.
108
(
3
),
1181
1196
.
16.
Field
,
A.
,
Miles
,
J.
, and
Field
,
Z.
(
2012
).
Discovering Statistics Using R
(
SAGE
,
California
).
17.
Fogerty
,
D.
, and
Kewley-Port
,
D.
(
2007
). “
Investigating the consonant-vowel boundary: Perceptual contributions to sentence intelligibility
,”
Proc. Mtgs. Acoust.
2
,
060001
.
18.
Fogerty
,
D.
, and
Kewley-Port
,
D.
(
2009
). “
Perceptual contributions of the consonant-vowel boundary to sentence intelligibility
,”
J. Acoust. Soc. Am.
126
(
2
),
847
857
.
19.
Francart
,
T.
,
Moonen
,
M.
, and
Wouters
,
J.
(
2009
). “
Automatic testing of speech recognition
,”
Int. J. Audiol.
48
(
2
),
80
90
.
20.
Francart
,
T.
,
Van Wieringen
,
A.
, and
Wouters
,
J.
(
2008
). “
Apex 3: A multi-purpose test platform for auditory psychophysical experiments
,”
J. Neurosci. Methods
172
(
2
),
283
293
.
21.
French
,
N.
, and
Steinberg
,
J.
(
1947
). “
Factors governing the intelligibility of speech sounds
,”
J. Acoust. Soc. Am.
19
(
1
),
90
119
.
22.
Ganong
,
W. F.
(
1980
). “
Phonetic categorization in auditory word perception
,”
J. Exp. Psychol.: Hum. Percept. Perform.
6
(
1
),
110
125
.
23.
Gelfand
,
S. A.
(
1998
). “
Optimizing the reliability of speech recognition scores
,”
J. Speech Lang. Hear. Res.
41
(
5
),
1088
1102
.
24.
Green
,
D. M.
, and
Birdsall
,
T. G.
(
1964
).
Signal Detection and Recognition by Human Observers
, edited by
J. A.
Swets
(
Wiley
,
New York
).
25.
Heinz
,
M. G.
, and
Swaminathan
,
J.
(
2009
). “
Quantifying envelope and fine-structure coding in auditory nerve responses to chimaeric speech
,”
J. Assoc. Res. Otolaryngol.
10
(
3
),
407
423
.
26.
Hines
,
A.
, and
Harte
,
N.
(
2010
). “
Speech intelligibility from image processing
,”
Speech Commun.
52
(
9
),
736
752
.
27.
Hines
,
A.
, and
Harte
,
N.
(
2012
). “
Speech intelligibility prediction using a neurogram similarity index measure
,”
Speech Commun.
54
(
2
),
306
320
.
28.
Hornsby
,
B. W.
(
2004
). “
The speech intelligibility index: What is it and what's it good for?
,”
Hear. J.
57
(
10
),
10
17
.
29.
Hossain
,
M. E.
,
Jassim
,
W. A.
, and
Zilany
,
M. S.
(
2016
). “
Reference-free assessment of speech intelligibility using bispectrum of an auditory neurogram
,”
PLoS One
11
(
3
),
e0150415
.
30.
Jenkins
,
J. J.
,
Strange
,
W.
, and
Miranda
,
S.
(
1994
). “
Vowel identification in mixed-speaker silent-center syllables
,”
J. Acoust. Soc. Am.
95
(
2
),
1030
1043
.
31.
Jennings
,
S. G.
,
Heinz
,
M. G.
, and
Strickland
,
E. A.
(
2011
). “
Evaluating adaptation and olivocochlear efferent feedback as potential explanations of psychophysical overshoot
,”
J. Assoc. Res. Otolaryngol.
12
(
3
),
345
360
.
32.
Jennings
,
S. G.
, and
Strickland
,
E. A.
(
2012
). “
Evaluating the effects of olivocochlear feedback on psychophysical measures of frequency selectivity
,”
J. Acoust. Soc. Am.
132
(
4
),
2483
2496
.
33.
Jørgensen
,
S.
, and
Dau
,
T.
(
2011
). “
Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing
,”
J. Acoust. Soc. Am.
130
(
3
),
1475
1487
.
34.
Jørgensen
,
S.
,
Ewert
,
S. D.
, and
Dau
,
T.
(
2013
). “
A multi-resolution envelope-power based model for speech intelligibility
,”
J. Acoust. Soc. Am.
134
(
1
),
436
446
.
35.
Kryter
,
K. D.
(
1962
). “
Methods for the calculation and use of the articulation index
,”
J. Acoust. Soc. Am.
34
(
11
),
1689
1697
.
36.
Lee
,
J. H.
, and
Kewley-Port
,
D.
(
2009
). “
Intelligibility of interrupted sentences at subsegmental levels in young normal-hearing and elderly hearing-impaired listeners
,”
J. Acoust. Soc. Am.
125
(
2
),
1153
1163
.
37.
Lyon
,
R.
, and
Shamma
,
S.
(
1996
).
Auditory Computation
(
Springer
New York
), pp.
221
270
.
38.
Mamun
,
N.
,
Jassim
,
W. A.
, and
Zilany
,
M. S.
(
2015
). “
Prediction of speech intelligibility using a neurogram orthogonal polynomial measure (NOPM)
,”
IEEE/ACM Trans. Audio, Speech, Lang. Process.
23
(
4
),
760
773
.
39.
Miller
,
N.
(
2013
). “
Measuring up to speech intelligibility
,”
Int. J. Lang. Commun. Disord.
48
(
6
),
601
612
.
40.
Müsch
,
H.
, and
Buus
,
S.
(
2001
). “
Using statistical decision theory to predict speech intelligibility. I. Model structure
,”
J. Acoust. Soc. Am.
109
(
6
),
2896
2909
.
41.
Pavlovic
,
C. V.
(
1987
). “
Derivation of primary parameters and procedures for use in speech intelligibility predictions
,”
J. Acoust. Soc. Am.
82
(
2
),
413
422
.
42.
Rosen
,
S.
(
1992
). “
Temporal information in speech: Acoustic, auditory and linguistic aspects
,”
Philos. Trans. R. Soc. London, B
336
(
1278
),
367
373
.
43.
Shannon
,
R. V.
,
Zeng
,
F.-G.
,
Kamath
,
V.
,
Wygonski
,
J.
, and
Ekelid
,
M.
(
1995
). “
Speech recognition with primarily temporal cues
,”
Science
270
(
5234
),
303
304
.
44.
Sidwell
,
A.
, and
Summerfield
,
Q.
(
1986
). “
The auditory representation of symmetrical cvc syllables
,”
Speech Commun.
5
(
3
),
283
297
.
45.
Smith
,
Z. M.
,
Delgutte
,
B.
, and
Oxenham
,
A. J.
(
2002
). “
Chimaeric sounds reveal dichotomies in auditory perception
,”
Nature
416
(
6876
),
87
90
.
46.
Steeneken
,
H. J.
, and
Houtgast
,
T.
(
1980
). “
A physical method for measuring speech-transmission quality
,”
J. Acoust. Soc. Am.
67
(
1
),
318
326
.
47.
Stilp
,
C. E.
, and
Kluender
,
K. R.
(
2010
). “
Cochlea-scaled entropy, not consonants, vowels, or time, best predicts speech intelligibility
,”
Proc. Natl. Acad. Sci.
107
(
27
),
12387
12392
.
48.
Strange
,
W.
, and
Bohn
,
O.-S.
(
1998
). “
Dynamic specification of coarticulated German vowels: Perceptual and acoustical studies
,”
J. Acoust. Soc. Am.
104
(
1
),
488
504
.
49.
Swaminathan
,
J.
, and
Heinz
,
M. G.
(
2012
). “
Psychophysiological analyses demonstrate the importance of neural envelope coding for speech perception in noise
,”
J. Neurosci.
32
(
5
),
1747
1756
.
50.
Wang
,
K.
, and
Shamma
,
S.
(
1995
). “
Spectral shape analysis in the central auditory system
,”
IEEE Trans. Speech Audio Process.
3
(
5
),
382
395
.
51.
Wang
,
Z.
,
Bovik
,
A. C.
,
Sheikh
,
H. R.
, and
Simoncelli
,
E. P.
(
2004
). “
Image quality assessment: From error visibility to structural similarity
,”
IEEE Trans. Image Process.
13
(
4
),
600
612
.
52.
Williams
,
E. J.
, and
Williams
,
E.
(
1959
).
Regression Analysis
(
Wiley
,
New York
), Vol.
14
.
53.
Young
,
E. D.
(
2008
). “
Neural representation of spectral and temporal information in speech
,”
Philos. Trans. R. Soc., B
363
(
1493
),
923
945
.
54.
Zhang
,
X.
,
Heinz
,
M. G.
,
Bruce
,
I. C.
, and
Carney
,
L. H.
(
2001
). “
A phenomenological model for the responses of auditory-nerve fibers: I. Nonlinear tuning with compression and suppression
,”
J. Acoust. Soc. Am.
109
(
2
),
648
670
.
55.
Zilany
,
M. S.
, and
Bruce
,
I. C.
(
2006
). “
Modeling auditory-nerve responses for high sound pressure levels in the normal and impaired auditory periphery
,”
J. Acoust. Soc. Am.
120
(
3
),
1446
1466
.
56.
Zilany
,
M. S.
, and
Bruce
,
I. C.
(
2007
). “
Predictions of speech intelligibility with a model of the normal and impaired auditory-periphery
,” in
Neural Engineering, 2007, CNE'07, 3rd International IEEE/EMBS Conference
(
IEEE
,
Kohala Coast, HI
), pp.
481
485
.
57.
Zilany
,
M. S.
,
Bruce
,
I. C.
, and
Carney
,
L. H.
(
2014
). “
Updated parameters and expanded simulation options for a model of the auditory periphery
,”
J. Acoust. Soc. Am.
135
(
1
),
283
286
.
58.
Zilany
,
M. S.
,
Bruce
,
I. C.
,
Nelson
,
P. C.
, and
Carney
,
L. H.
(
2009
). “
A phenomenological model of the synapse between the inner hair cell and auditory nerve: Long-term adaptation with power-law dynamics
,”
J. Acoust. Soc. Am.
126
(
5
),
2390
2412
.
59.
Zilany
,
M. S.
, and
Carney
,
L. H.
(
2010
). “
Power-law dynamics in an auditory-nerve model can account for neural adaptation to sound-level statistics
,”
J. Neurosci.
30
(
31
),
10380
10390
.
You do not currently have access to this content.