The articulation index (AI), speech-transmission index (STI), and coherence-based intelligibility metrics have been evaluated primarily in steady-state noisy conditions and have not been tested extensively in fluctuating noise conditions. The aim of the present work is to evaluate the performance of new speech-based STI measures, modified coherence-based measures, and AI-based measures operating on short-term (30ms) intervals in realistic noisy conditions. Much emphasis is placed on the design of new band-importance weighting functions which can be used in situations wherein speech is corrupted by fluctuating maskers. The proposed measures were evaluated with intelligibility scores obtained by normal-hearing listeners in 72 noisy conditions involving noise-suppressed speech (consonants and sentences) corrupted by four different maskers (car, babble, train, and street interferences). Of all the measures considered, the modified coherence-based measures and speech-based STI measures incorporating signal-specific band-importance functions yielded the highest correlations (r=0.890.94). The modified coherence measure, in particular, that only included vowel/consonant transitions and weak consonant information yielded the highest correlation (r=0.94) with sentence recognition scores. The results from this study clearly suggest that the traditional AI and STI indices could benefit from the use of the proposed signal- and segment-dependent band-importance functions.

1.
Allen
,
J. B.
(
1994
). “
How do humans process and recognize speech
,”
IEEE Trans. Speech Audio Process.
2
,
567
577
.
2.
Anderson
,
W. B.
, and
Kalb
,
J. T.
(
1987
). “
English verification of STI method for estimating speech intelligibility of a communications channel
,”
J. Acoust. Soc. Am.
81
,
1982
1985
.
3.
ANSI
(
1997
). “
Methods for calculation of the speech intelligibility index
,” S3.5–1997 (
American National Standards Institute
, New York).
4.
Arai
,
T.
,
Pavel
,
M.
,
Hermansky
,
H.
, and
Avendano
,
C.
(
1996
). “
Intelligibility of speech with filtered time trajectories of spectral envelopes
,” in
Proceedings of the ICSLP
, pp.
2490
2493
.
5.
Arehart
,
K.
,
Kates
,
J.
,
Anderson
,
M.
, and
Harvey
,
L.
(
2007
). “
Effects of noise and distortion on speech quality judgments in normal-hearing and hearing-impaired listeners
,”
J. Acoust. Soc. Am.
122
,
1150
1164
.
6.
Beerends
,
J.
,
Larsen
,
E.
,
Lyer
,
N.
, and
van Vugt
,
J.
(
2004
). “
Measurement of speech intelligibility based on the PESQ approach
,” in
Proceedings of the Workshop Measurement of Speech and Audio Quality in Networks (MESAQIN)
,
Prague
,
Czech Republic
.
7.
Beerends
,
J.
,
van Wijngaarden
,
S.
, and
van Buuren
,
R.
(
2005
). “
Extension of ITU-T recommendation P.862 PESQ towards measuring speech intelligibility with vocoders
,” in
New Directions for Improving Audio Effectiveness
,
Proceedings of the RT0-MP-HFM-123
,
Neuilly-sur-Seine
,
France
, pp.
10
1
10
6
.
8.
Bladon
,
R.
, and
Lindblom
,
B.
(
1981
). “
Modeling the judgment of vowel quality differences
,”
J. Acoust. Soc. Am.
69
,
1414
1422
.
9.
Boothroyd
,
A.
,
Erickson
,
F. N.
, and
Medwetsky
,
L.
(
1994
). “
The hearing aid input: A phonemic approach to assessing the spectral distribution of speech
,”
Ear Hear.
6
,
432
442
.
10.
Brachmanski
,
S.
(
2004
). “
Estimation of logatom intelligibility with STI method for polish speech transmitted via communication channels
,”
Arch. Acoust.
29
,
555
562
.
11.
Carter
,
C.
,
Knapp
,
C.
, and
Nuttall
,
A.
(
1973
). “
Estimation of the magnitude-squared coherence function via overlapped fast Fourier transform processing
,”
IEEE Trans. Audio Electroacoust.
AU-21
,
337
344
.
12.
Cohen
,
I.
, and
Berdugo
,
B.
(
2002
). “
Noise estimation by minima controlled recursive averaging for robust speech enhancement
,”
IEEE Signal Process. Lett.
9
,
12
15
.
13.
Drullman
,
R.
,
Festen
,
J.
, and
Plomp
,
R.
(
1994a
). “
Effect of temporal envelope smearing on speech reception
,”
J. Acoust. Soc. Am.
95
,
1053
1064
.
14.
Drullman
,
R.
,
Festen
,
J.
, and
Plomp
,
R.
, (
1994b
). “
Effect of reducing slow temporal modulations on speech reception
J. Acoust. Soc. Am.
95
,
2670
2680
.
15.
Dunn
,
H.
, and
White
,
S.
(
1940
). “
Statistical measurements on conversational speech
,”
J. Acoust. Soc. Am.
11
,
278
288
.
16.
Ephraim
,
Y.
, and
Malah
,
D.
(
1985
). “
Speech enhancement using a minimum mean-square error log-spectral amplitude estimator
,”
IEEE Trans. Acoust., Speech, Signal Process.
ASSP-33
,
443
445
.
17.
Fletcher
,
H.
, and
Galt
,
R. H.
(
1950
). “
The perception of speech and its relation to telephony
,”
J. Acoust. Soc. Am.
22
,
89
151
.
18.
French
,
N. R.
, and
Steinberg
,
J. C.
(
1947
). “
Factors governing the intelligibility of speech sounds
,”
J. Acoust. Soc. Am.
19
,
90
119
.
19.
Goldsworthy
,
R.
, and
Greenberg
,
J.
(
2004
). “
Analysis of speech-based speech transmission index methods with implications for nonlinear operations
,”
J. Acoust. Soc. Am.
116
,
3679
3689
.
20.
Gustafsson
,
H.
,
Nordholm
,
S.
, and
Claesson
,
I.
(
2001
). “
Spectral subtraction using reduced delay convolution and adaptive averaging
,”
IEEE Trans. Speech Audio Process.
9
,
799
807
.
21.
Hansen
,
J.
, and
Pellom
,
B.
(
1998
). “
An effective quality evaluation protocol for speech enhancement algorithms
,” in
Proceedings of the International Conference on Spoken Language Processing
, Vol.
7
, pp.
2819
2822
.
22.
Hirsch
,
H.
, and
Pearce
,
D.
(
2000
). “
The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions
,” in
ISCA Tutorial and Research Workshop ASR2000
,
Paris, France
.
23.
Hohmann
,
V.
, and
Kollmeier
,
B.
(
1995
). “
The effect of multichannel dynamic compression on speech intelligibility
,”
J. Acoust. Soc. Am.
97
,
1191
1195
.
24.
Hollube
,
I.
, and
Kollmeier
,
K.
(
1996
). “
Speech intelligibility prediction in hearing-impaired listeners based on a psychoacoustically motivated perception model
,”
J. Acoust. Soc. Am.
100
,
1703
1715
.
25.
Houtgast
,
T.
, and
Steeneken
,
H. J. M.
(
1971
). “
Evaluation of speech transmission channels by using artificial signals
,”
Acustica
25
,
355
367
.
26.
Houtgast
,
T.
, and
Steeneken
,
H.
, (
1985
). “
A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria
,”
J. Acoust. Soc. Am.
77
,
1069
1077
.
27.
Hu
,
Y.
, and
Loizou
,
P. C.
(
2003
). “
A generalized subspace approach for enhancing speech corrupted by colored noise
,”
IEEE Trans. Speech Audio Process.
11
,
334
341
.
28.
Hu
,
Y.
, and
Loizou
,
P. C.
(
2004
). “
Speech enhancement based on wavelet thresholding the multitaper spectrum
,”
IEEE Trans. Speech Audio Process.
12
,
59
67
.
29.
Hu
,
Y.
, and
Loizou
,
P. C.
, (
2007
). “
A comparative intelligibility study of single-microphone noise reduction algorithms
,”
J. Acoust. Soc. Am.
122
,
1777
1786
.
30.
Hu
,
Y.
, and
Loizou
,
P. C.
(
2008
). “
Evaluation of objective quality measures for speech enhancement
,”
IEEE Trans. Audio, Speech, Lang. Process.
16
,
229
238
.
31.
IEC 60268-16
(
2003
). “
Sound system equipment—Part 16: Objective rating of speech intelligibility by speech transmission index
,” Ed. 3 (
International Electrotechnical Commission
, Geneva, Switzerland).
32.
IEEE
(
1969
). “
IEEE recommended practice for speech quality measurements
,”
IEEE Trans. Audio Electroacoust.
17
,
225
246
.
33.
ITU-T
(
2000
). “
Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs
,” ITU-T Recommendation P.
862
.
34.
Jabloun
,
F.
, and
Champagne
,
B.
(
2003
). “
Incorporating the human hearing properties in the signal subspace approach for speech enhancement
,”
IEEE Trans. Speech Audio Process.
11
,
700
708
.
35.
Kamath
,
S.
, and
Loizou
,
P. C.
(
2002
). “
A multi-band spectral subtraction method for enhancing speech corrupted by colored noise
,” in
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
,
Orlando, FL
.
36.
Kates
,
J.
(
1987
). “
The short-time articulation index
,”
J. Rehabil. Res. Dev.
24
,
271
276
.
37.
Kates
,
J.
(
1992
). “
On using coherence to measure distortion in hearing aids
,”
J. Acoust. Soc. Am.
91
,
2236
2244
.
38.
Kates
,
J.
, and
Arehart
,
K.
(
2005
). “
Coherence and the speech intelligibility index
,”
J. Acoust. Soc. Am.
117
,
2224
2237
.
39.
Kitawaki
,
N.
,
Nagabuchi
,
H.
, and
Itoh
,
K.
(
1988
). “
Objective quality evaluation for low bit-rate speech coding systems
,”
IEEE J. Sel. Areas Commun.
6
, pp.
262
273
.
40.
Klatt
,
D. H.
(
1982
). “
Prediction of perceived phonetic distance from critical-band spectra: A first step
,”
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
, Vol.
2
, pp.
1278
1281
.
41.
Kryter
,
K. D.
(
1962a
). “
Methods for the calculation and use of the articulation index
,”
J. Acoust. Soc. Am.
34
,
1689
1697
.
42.
Kryter
,
K. D.
(
1962b
). “
Validation of the articulation index
,”
J. Acoust. Soc. Am.
34
,
1698
1706
.
43.
Larm
,
P.
, and
Hongisto
,
V.
(
2006
). “
Experimental comparison between speech transmission index, rapid speech transmission index, and speech intelligibility index
,”
J. Acoust. Soc. Am.
119
,
1106
1117
.
44.
Li
,
N.
, and
Loizou
,
P.
(
2008
). “
The contribution of obstruent consonants and acoustic landmarks to speech recognition in noise
,”
J. Acoust. Soc. Am.
124
,
498
509
.
45.
Loizou
,
P.
(
2007
).
Speech Enhancement: Theory and Practice
(
CRC
,
Boca Raton, FL
).
46.
Ludvigsen
,
C.
,
Elberling
,
C.
, and
Keidser
,
G.
(
1993
). “
Evaluation of a noise reduction method—Comparison of observed scores and scores predicted from STI
,”
Scand. Audiol. Suppl.
38
,
50
55
.
47.
Ludvigsen
,
C.
,
Elberling
,
C.
,
Keidser
,
G.
, and
Poulsen
,
T.
(
1990
). “
Prediction of intelligibility of non-linearly processed speech
,”
Acta Oto-Laryngol., Suppl.
469
,
190
195
.
48.
Mapp
,
P.
(
2002
). “
A comparison between STI and RASTI speech intelligibility measurement systems
,” in
The 111th AES Convention
,
Los Angeles, CA
, Preprint No. 5668.
49.
Moore
,
B.
, and
Glasberg
,
B.
(
1993
). “
Suggested formulas for calculation auditory-filter bandwidths and excitation patterns
,”
J. Acoust. Soc. Am.
74
,
750
753
.
50.
Pavlovic
,
C. V.
(
1987
). “
Derivation of primary parameters and procedures for use in speech intelligibility predictions
,”
J. Acoust. Soc. Am.
82
,
413
422
.
51.
Quackenbush
,
S. R.
,
Barnwell
,
T. P.
, and
Clements
,
M. A.
(
1988
).
Objective Measures of Speech Quality
, (
Prentice-Hall
,
Englewood Cliffs, NJ
).
52.
Rhebergen
,
K. S.
, and
Versfeld
,
N. J.
(
2005
). “
A speech intelligibility index-based approach to predict the speech reception threshold for sentences in fluctuating noise for normal-hearing listeners
,”
J. Acoust. Soc. Am.
117
,
2181
2192
.
53.
Rhebergen
,
K. S.
,
Versfeld
,
N. J.
, and
Dreschler
,
W.
(
2006
). “
Extended speech intelligibility index for the prediction of the speech reception threshold in fluctuating noise
,”
J. Acoust. Soc. Am.
120
,
3988
3997
.
54.
Rix
,
A.
,
Beerends
,
J.
,
Hollier
,
M.
, and
Hekstra
,
A.
(
2001
). “
Perceptual evaluation of speech quality (PESQ)—A new method for speech quality assessment of telephone networks and codecs
,” in
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
, Vol.
2
, pp.
749
752
.
55.
Scalart
,
P.
, and
Filho
,
J.
(
1996
). “
Speech enhancement based on a priori signal to noise estimation
,” in
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
, pp.
629
632
.
56.
Steeneken
,
H.
, and
Houtgast
,
T.
(
1980
). “
A physical method for measuring speech transmission quality
,”
J. Acoust. Soc. Am.
67
,
318
326
.
57.
Steeneken
,
H.
, and
Houtgast
,
T.
(
1982
). “
Some applications of the speech transmission index (STI) in auditoria
,”
Acustica
51
,
229
234
.
58.
Stevens
,
K.
(
2002
). “
Toward a model for lexical access based on acoustic landmarks and distinctive features
,”
J. Acoust. Soc. Am.
111
,
1872
1891
.
59.
Studebaker
,
G.
, and
Sherbecoe
,
R.
(
2002
). “
Intensity-importance functions for bandlimited monosyllabic words
,”
J. Acoust. Soc. Am.
111
,
1422
1436
.
60.
van Buuren
,
R.
,
Festen
,
J.
, and
Houtgast
,
T.
(
1999
). “
Compression and expansion of the temporal envelope: Evaluation of speech intelligibility and sound quality
,”
J. Acoust. Soc. Am.
105
,
2903
2913
.
61.
Van Wijngaarden
,
S.
, and
Houtgast
,
T.
(
2004
). “
Effect of talker and speaking style on the speech transmission index
,”
J. Acoust. Soc. Am.
115
,
38L
41L
.
You do not currently have access to this content.