Many objective measures have been reported to predict speech intelligibility in noise, most of which were designed and evaluated with English speech corpora. Given the different perceptual cues used by native listeners of different languages, examining whether there is any language effect when the same objective measure is used to predict speech intelligibility in different languages is of great interest, particularly when non-linear noise-reduction processing is involved. In the present study, an extensive evaluation is taken of objective measures for speech intelligibility prediction of noisy speech processed by noise-reduction algorithms in Chinese, Japanese, and English. Of all the objective measures tested, the short-time objective intelligibility (STOI) measure produced the most accurate results in speech intelligibility prediction for Chinese, while the normalized covariance metric (NCM) and middle-level coherence speech intelligibility index (CSIIm) incorporating the signal-dependent band-importance functions (BIFs) produced the most accurate results for Japanese and English, respectively. The objective measures that performed best in predicting the effect of non-linear noise-reduction processing in speech intelligibility were found to be the BIF-modified NCM measure for Chinese, the STOI measure for Japanese, and the BIF-modified CSIIm measure for English. Most of the objective measures examined performed differently even under the same conditions for different languages.

1.
Amano
,
S.
,
Sakamoto
,
S.
,
Kondo
,
T.
, and
Suzuki
,
Y.
(
2009
). “
Development of familiarity-controlled word lists 2003 (FW03) to assess spoken-word intelligibility in Japanese
,”
Speech Commun.
51
,
76
82
.
2.
ANSI
(
1997
).
Methods for Calculation of the Speech Intelligibility Index
(
American National Standards Institute
,
New York
).
3.
Arai
,
T.
,
Pavel
,
M.
,
Hermansky
,
H.
, and
Avendano
,
C.
(
1996
). “
Intelligibility of speech with filtered time trajactories of spectral envelopes
,” in
International Conference on Spoken Language (ICSLP)
, pp.
2490
2493
.
4.
Arehart
,
K.
,
Kates
,
J.
,
Anderson
,
M.
, and
Harvey
,
L.
(
2007
). “
Effects of noise and distortion on speech quality judgments in normal-hearing and hearing-impaired listeners
,”
J. Acoust. Soc. Am.
122
,
1150
1164
.
5.
Banziger
,
T.
, and
Scherer
,
K.
(
2005
). “
The role of intonation in emotional expressions
,”
Speech Commun.
46
,
252
267
.
6.
Boldt
,
J.
, and
Ellis
,
D.
(
2009
). “
A simple correlation-based model of intelligibility for nonlinear speech enhancement and separation
,” in
European Signal Processing Conference
, pp.
1849
1853
.
7.
Brown
,
C.
, and
Bacon
,
S.
(
2010
). “
Fundamental frequency and speech intelligibility in background noise
,”
Hear. Res.
266
,
52
59
.
8.
Chen
,
F.
, and
Loizou
,
P.
(
2012
). “
Contribution of cochlea-scaled entropy versus consonant-vowel boundaries to prediction of speech intelligibility in noise
,”
J. Acoustic. Soc. Am.
131
,
4104
4113
.
9.
Cohen
,
I.
, and
Berdugo
,
B.
(
2001
). “
Speech enhancement for non-stationary noise environments
,”
Sign. Process.
81
,
2403
2418
.
10.
Drullman
,
R.
(
1995
). “
Temporal envelope and fine structure cues for speech intelligibility
,”
J. Acoust. Soc. Am.
97
,
585
592
.
11.
Ephraim
,
Y.
, and
Malah
,
D.
(
1985
). “
Speech enhancement using a minimum mean-square error log-spectral amplitude estimator
,”
IEEE Trans. Acoust. Speech Audio Process.
33
,
443
445
.
12.
French
,
N.
, and
Steinberg
,
J.
(
1947
). “
Factors governing the intelligibility of speech sounds
,”
J. Acoust. Soc. Am.
19
,
90
119
.
13.
Fu
,
Q.
,
Zeng
,
F.
,
Shannon
,
R.
, and
Soli
,
S.
(
1998
). “
Importance of tonal envelope cues in Chinese speech recognition
,”
J. Acoust. Soc. Am.
104
,
505
510
.
14.
Furui
,
S.
(
1986
). “
On the role of spectral transition for speech perception
,”
J. Acoustic. Soc. Am.
80
,
1016
1025
.
15.
Goldsworthy
,
R.
, and
Greenberg
,
J.
(
2004
). “
Analysis of speech-based speech transmission index methods with implications for nonlinear operations
,”
J. Acoust. Soc. Am.
116
,
3679
3689
.
16.
Havelock
,
D.
,
Kuwano
,
S.
, and
Vorlander
,
M.
(
2009
).
Handbook of Signal Processing in Acoustics
(
Springer
,
New York
), pp.
197
204
.
17.
Hisch
,
H.
, and
Pearce
,
D.
(
2000
). “
The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions
,” in
ISCA Tutorial and Research Workshop ASR
2000, pp.
29
32
.
18.
Hollube
,
I.
, and
Kollmeier
,
K.
(
1996
). “
Speech intelligibility prediction in hearing-impaired listeners based on a psychoacoustically motivated perception model
,”
J. Acoust. Soc. Am.
100
,
1703
1715
.
19.
Houtgast
,
T.
, and
Steeneken
,
H.
(
1984
). “
A multi-language evaluation of the RASTI method for estimating speech intelligibility in auditoria
,”
Acta Acust. united Ac.
54
,
185
199
.
20.
Houtgast
,
T.
, and
Steeneken
,
H.
(
1985
). “
A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria
,”
J. Acoust. Soc. Am.
77
,
1069
1077
.
21.
Hu
,
Y.
, and
Loizou
,
P.
(
2003
). “
A generalized subspace approach for enhancing speech corrupted by collored noise
,”
IEEE Trans. Acoust. Speech Audio Process.
11
,
334
341
.
22.
Hu
,
Y.
, and
Loizou
,
P.
(
2007
). “
A comparative intelligibility study of single-microphone noise reduction algorithms
,”
J. Acoust. Soc. Am.
122
,
1777
1786
.
23.
IEEE
(
1969
). “
IEEE recommended practice for speech quality measurements
,”
IEEE Trans. Audio Electroacoust.
17
,
225
246
.
24.
Kamath
,
S.
, and
Loizou
,
P.
(
2002
). “
A multi-band spectral subtraction method for enhancing speech corrupted by colored noise
,” in
International Conference on Acoustics, Speech, and Signal Processing
, pp.
4164
4167
.
25.
Kates
,
J.
, and
Arehart
,
K.
(
2005
). “
Coherence and the speech intelligibility index
,”
J. Acoust. Soc. Am.
117
,
2224
2237
.
26.
Kryter
,
K. D.
(
1962
). “
Validation of the articulation index
,”
J. Acoust. Soc. Am.
34
,
1698
1706
.
27.
Li
,
J.
,
Yang
,
L.
,
Zhang
,
J.
,
Yan
,
Y.
,
Hu
,
Y.
,
Akagi
,
M.
, and
Loizou
,
P.
(
2011
). “
Comparative intelligibility investigation of single-channel noise-reduction algorithms for Chinese, Japanese, and English
,”
J. Acoust. Soc. Am.
129
,
3291
3301
.
28.
Liu
,
C.
,
Azimi
,
B.
,
Bhandary
,
M.
, and
Hu
,
Y.
(
2014
). “
Contribution of low-frequency harmonics to Mandarin Chinese tone identification in quiet and six-talker babble background
,”
J. Acoust. Soc. Am.
135
,
428
438
.
29.
Liu
,
W.
,
Jellyman
,
K.
,
Evans
,
N.
, and
Mason
,
J.
(
2006
). “
Assessment of objective quality measures for speech intelligibility estimation
,” in
International Conference on Acoustics, Speech, and Signal Processing
, pp.
1225
1228
.
30.
Loizou
,
P.
(
2007
).
Speech Enhancement: Theory and Practice
(
CRC Press, Taylor Francis Group
,
Boca Raton, FL
), Chaps. 5–9.
31.
Luo
,
X.
, and
Fu
,
Q.
(
2006
). “
Contribution of low-frequency acoustic information to Chinese speech recognition in cochlear implant simulations
,”
J. Acoustic. Soc. Am.
120
,
2260
2266
.
32.
Ma
,
D.
, and
Shen
,
H.
(
2004
).
Acoustic Manual
(
Chinese Science Publisher
,
Beijing
), Chap. 19.
33.
Ma
,
J.
,
Hu
,
Y.
, and
Loizou
,
P.
(
2009
). “
Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions
,”
J. Acoust. Soc. Am.
125
,
3387
3405
.
34.
Scalart
,
P.
, and
Filho
,
J.
(
1996
). “
Speech enhancement based on a priori signal to noise estimation
,” in
International Conference on Acoustics, Speech, and Signal Processing
, pp.
629
632
.
35.
Shannon
,
R.
,
Zeng
,
F.
,
Kamath
,
V.
,
Wygonski
,
J.
, and
Ekelid
,
M.
(
1995
). “
Speech recognition with primarily temporal cues
,”
Science
270
,
303
304
.
36.
Steeneken
,
H.
, and
Houtgast
,
T.
(
1980
). “
A physical method for measuring speech transmission quality
,”
J. Acoust. Soc. Am.
67
,
318
326
.
37.
Taal
,
C.
,
Hendriks
,
R.
,
Heusdens
,
R.
, and
Jensen
,
J.
(
2011
). “
An algorithm for intelligibility prediction of time-frequency weighted noisy speech
,”
IEEE Trans. Audio Speech Lang. Process.
19
,
2125
2136
.
38.
Trask
,
R.
(
1998
).
Key Concepts in Language and Lingustics
(
Routledge
,
London
), pp.
15
30
.
39.
Wang
,
D.
, and
Brown
,
G.
(
2006
).
Computational Auditory Scene Analysis: Principles, Algorithms, and Applications
(
Wiley/IEEE Press
,
Hoboken, NJ
).
40.
Yamada
,
T.
,
Kumakura
,
M.
, and
Kitawaki
,
N.
(
2006
). “
Word intelligibility estimation of noise-reduced speech
,” in
Interspeech
, pp.
169
172
.
You do not currently have access to this content.