Using the data presented in the accompanying paper [Hilkhuysen et al., J. Acoust. Soc. Am. 131, 531–539 (2012)], the ability of six metrics to predict intelligibility of speech in noise before and after noise suppression was studied. The metrics considered were the Speech Intelligibility Index (SII), the fractional Articulation Index (fAI), the coherence intelligibility index based on the mid-levels in speech (CSIImid), an extension of the Normalized Coherence Metric (NCM+), a part of the speech-based envelope power model (pre-sEPSM), and the Short Term Objective Intelligibility measure (STOI). Three of the measures, SII, CSIImid, and NCM+, overpredicted intelligibility after noise reduction, whereas fAI underpredicted these intelligibilities. The pre-sEPSM metric worked well for speech in babble but failed with car noise. STOI gave the best predictions, but overall the size of intelligibility prediction errors were greater than the change in intelligibility caused by noise suppression. Suggestions for improvements of the metrics are discussed.

1.
ANSI
(
1997
). S3.5-1997.
Methods for Calculation of the Speech Intelligibility Index
(
American National Standards Institute
,
New York
).
2.
Arehart
,
K. H.
,
Hansen
,
J. H. L.
,
Gallant
,
S.
, and
Kalstein
,
L.
(
2003
). “
Evaluation of an auditory masked threshold noise suppression algorithm in normal-hearing and hearing-impaired listeners
,”
Speech Commun.
40
,
575
592
.
3.
Arehart
,
K. H.
,
Kates
,
J. M.
,
Anderson
,
M. C.
, and
Harvey
,
L. O.
, Jr.
(
2007
). “
Effects of noise and distortion on speech quality judgments in normal-hearing and hearing-impaired listeners
,”
J. Acoust. Soc. Am.
122
,
1150
1164
.
4.
Brady
,
P. T.
(
1968
). “
Equivalent peak level—A threshold-independent speech-level measure
,”
J. Acoust. Soc. Am.
44
,
695
699
.
5.
Brungart
,
D. S.
(
2001
). “
Informational and energetic masking effects in the perception of two simultaneous talkers
,”
J. Acoust. Soc. Am.
109
,
1101
1109
.
6.
Brungart
,
D. S.
,
Simpson
,
B. D.
,
Ericson
,
M. A.
, and
Scott
,
K. R.
(
2001
). “
Informational and energetic masking effects in the perception of multiple simultaneous talkers
,”
J. Acoust. Soc. Am.
110
,
2527
2538
.
7.
Dau
,
T.
,
Kollmeier
,
B.
, and
Kohlrausch
,
A.
(
1997a
). “
Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers
,”
J. Acoust. Soc. Am.
102
,
2892
2905
.
8.
Dau
,
T.
,
Kollmeier
,
B.
, and
Kohlrausch
,
A.
(
1997b
). “
Modeling auditory processing of amplitude modulation. II. Spectral and temporal integration
,”
J. Acoust. Soc. Am.
102
,
2906
2919
.
9.
Dubbelboer
,
F.
, and
Houtgast
,
T.
(
2008
). “
The concept of signal-to-noise ratio in the modulation domain and speech intelligibility
,”
J. Acoust. Soc. Am.
124
,
3937
3946
.
10.
Ewert
,
S.
, and
Dau
,
T.
(
2000
). “
Characterizing frequency selectivity for envelope fluctuations
,”
J. Acoust. Soc. Am.
108
,
1181
1196
.
11.
French
,
N. R.
, and
Steinberg
,
J. C.
(
1947
). “
Factors governing the intelligibility of speech sounds
,”
J. Acoust. Soc. Am.
19
,
90
119
.
12.
Glasberg
,
B. R.
, and
Moore
,
B. C.
(
1990
). “
Derivation of auditory filter shapes from notched-noise data
,”
Hear. Res.
47
,
103
138
.
13.
Goldsworthy
,
R. L.
, and
Greenberg
,
J. E.
(
2004
). “
Analysis of speech-based Speech Transmission Index methods with implications for nonlinear operations
,”
J. Acoust. Soc. Am.
116
,
3679
3689
.
14.
Greenberg
,
S.
, and
Arai
,
T.
(
2001
). “
The relation between speech intelligibility and the complex modulation spectrum
,”
Proceedings of the 7th European Conference on Speech Communication and Technology
,
Aalborg
,
Denmark
, pp.
473
476
.
15.
Healy
,
E. W.
, and
Warren
,
R. M.
(
2003
). “
The role of contrasting temporal amplitude patterns in the perception of speech
,”
J. Acoust. Soc. Am.
113
,
1676
1688
.
16.
Hilkhuysen
,
G.
,
Gaubitch
,
N.
,
Brookes
,
M.
, and
Huckvale
,
M.
(
2012
). “
Effects of noise suppression on intelligibility: Dependency on signal-to-noise ratios
,”
J. Acoust. Soc. Am.
131
,
531
539
.
17.
Hogan
,
C. A.
, and
Turner
,
C. W.
(
1998
). “
High-frequency audibility: benefits for hearing-impaired listeners
,”
J. Acoust. Soc. Am.
104
,
432
441
.
18.
Holube
,
I.
, and
Kollmeier
,
B.
(
1996
). “
Speech intelligibility prediction in hearing-impaired listeners based on a psychoacoustically motivated perception model
,”
J. Acoust. Soc. Am.
100
,
1703
1716
.
19.
Hornsby
,
B. W.
, and
Ricketts
,
T. A.
(
2003
). “
The effects of hearing loss on the contribution of high- and low-frequency speech information to speech understanding
,”
J. Acoust. Soc. Am.
113
,
1706
1717
.
19.
Hu
,
Y.
, and
Loizou
,
P. C.
(
2007
). “
A comparative intelligibility study of single-microphone noise reduction algorithms
,”
J. Acoust. Soc. Am.
122
,
1777
1786
.
20.
ISO
(
2003
). ISO CD226-2003.
Acoustics-Normal Equal-loudness Level Contours
(
International Organization for Standardization
,
Geneva, Switzerland
).
21.
ISO (
2004
). ISO 389-8:2004. Reference Zero for the Calibration of Audiometric Equipment – Part 8: Reference Equivalent Threshold Sound Pressure Levels for Pure Tones and Circumaural Earphones (International Organization for Standardization, Geneva, Switzerland).
22.
ITU
(
1994
). ITU-T P.56.
Objective Measurement of Active Speech Level
(
International Telecommunication Union
,
Geneva, Switzerland
).
23.
Jorgensen
,
S.
, and
Dau
,
T.
(
2011
). “
Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing
,”
J. Acoust. Soc. Am.
130
,
1475
1487
.
24.
Kanedera
,
N.
,
Arai
,
T.
,
Hermansky
,
H.
, and
Pavel
,
M.
(
1999
). “
On the relative importance of various components of the modulation spectrum for automatic speech recognition
,”
Speech Commun.
28
,
43
55
.
25.
Kates
,
J. M.
, and
Arehart
,
K. H.
(
2005
). “
Coherence and the speech intelligibility index
,”
J. Acoust. Soc. Am.
117
,
2224
2237
.
26.
Koch
,
R.
(
1992
). “
Gehörgerechte Schallanalyse zur Vorhersage und Verbesserung der Sprachverständlichkeit” (“Auditory sound analysis for the prediction and improvement of speech intelligibility”)
, Ph.D. dissertation,
Universität Göttingen
, Göttingen, Germany.
27.
Kollmeier
,
B.
(
1990
). “Meßmethodik, Modellierung und Verbesserung der Verständlichkeit von Sprache” (“Measurement methods, modelling and improvement of the intelligibility of speech”), Habilitationsschrift, Universität Göttingen, Göttingen, Germany.
28.
Loizou
,
P. C.
, and
Kim
,
G.
(
2011
). “
Reasons why current speech-enhancement algorithms do not improve speech intelligibility and suggested solutions
,”
IEEE Trans. Audio Speech Lang. Process.
19
,
47
56
.
29.
Loizou
,
P. C.
, and
Ma
,
J.
(
2011
). “
Extending the articulation index to account for non-linear distortions introduced by noise-suppression algorithms
,”
J. Acoust. Soc. Am.
130
,
986
995
.
30.
Ludvigsen
,
C.
(
1987
). “
Prediction of speech intelligibility for normal-hearing and cochlearly hearing-impaired listeners
,”
J. Acoust. Soc. Am.
82
,
1162
1171
.
31.
Ludvigsen
,
C.
,
Elberling
,
C.
, and
Keidser
,
G.
(
1993
). “
Evaluation of noise reduction method: Comparison between observed scores and scores predicted from STI
,”
Scand. Audiol.
38
,
50
55
.
32.
Ludvigsen
,
C.
,
Elberling
,
C.
,
Keidser
,
G.
, and
Poulsen
,
T.
(
1990
). “
Prediction of intelligibility of non-linearly processed speech
,”
Acta Otolaryngol. Suppl.
469
,
190
195
.
33.
Ma
,
J.
,
Hu
,
Y.
, and
Loizou
,
P. C.
(
2009
). “
Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions
,”
J. Acoust. Soc. Am.
125
,
3387
3405
.
34.
Magnusson
,
L.
(
1996
). “
Predicting the speech recognition performance of elderly individuals with sensorineural hearing impairment. A procedure based on the Speech Intelligibility Index
,”
Scand. Audiol.
25
,
215
222
.
35.
Pavlovic
,
C. V.
(
1987
). “
Derivation of primary parameters and procedures for use in speech intelligibility predictions
,”
J. Acoust. Soc. Am.
82
,
413
422
.
37.
Rhebergen
,
K. S.
, and
Versfeld
,
N. J.
(
2005
). “
A Speech Intelligibility Index-based approach to predict the speech reception threshold for sentences in fluctuating noise for normal-hearing listeners
,”
J. Acoust. Soc. Am.
117
,
2181
2192
.
38.
Steeneken
,
H. J.
, and
Houtgast
,
T.
(
1980
). “
A physical method for measuring speech-transmission quality
,”
J. Acoust. Soc. Am.
67
,
318
326
.
39.
Stone
,
M. A.
,
Füllgrabe
,
C.
,
Mackinnon
,
R. C.
, and
Moore
,
B. C. J.
(
2011
). “
The importance for speech intelligibility of random fluctuations in ‘steady' background noise
,”
J. Acoust. Soc. Am.
130
,
2874
2881
.
40.
Stone
,
M. A.
,
Füllgrabe
,
C.
, and
Moore
,
B. C. J.
(
2012
). “
Notionally steady background noise acts primarily as a modulation masker of speech
,”
J. Acoust. Soc. Am.
132
,
317
326
.
41.
Stone
,
M. A.
, and
Moore
,
B. C.
(
2007
). “
Quantifying the effects of fast-acting compression on the envelope of speech
,”
J. Acoust. Soc. Am.
121
,
1654
1664
.
42.
Stone
,
M. A.
, and
Moore
,
B. C.
(
2008
). “
Effects of spectro-temporal modulation changes produced by multi-channel compression on intelligibility in a competing-speech task
,”
J. Acoust. Soc. Am.
123
,
1063
1076
.
44.
Taal
,
C. H.
,
Hendriks
,
R. C.
,
Heusdens
,
R.
, and
Jensen
,
J.
(
2011
). “
An algorithm for intelligibility prediction of time-frequency weighted noisy speech
,”
IEEE Trans. Audio. Speech. Lang. Process.
19
,
2125
2136
.
45.
Tsoukalas
,
D. E.
,
Mourjopoulos
,
J. N.
, and
Kokkinakis
,
G.
(
1997
). “
Speech enhancement based on audible noise suppression
,”
IEEE Trans. Speech Audio Process.
5
,
497
514
.
You do not currently have access to this content.