The present work assessed the contributions of high root-mean-square (RMS) level (H-level, containing primarily vowels) and middle-RMS-level (M-level, with mostly consonants and vowel-consonant transitions) segments to the intelligibility of noise-masked and noise-suppressed sentences. In experiment 1, noise-masked (by speech-spectrum shaped noise and 6-talker babble) Mandarin sentences were edited to preserve only H- or M-level segments, while replacing the non-target segments with silence. In experiment 2, Mandarin sentences were subjected to four commonly-used single-channel noise-suppression algorithms before generating H-level-only and M-level-only noise-suppressed sentences. To test the influence of an effective signal-to-noise ratio (SNR) on intelligibility, both experiments incorporated a condition in which the SNRs of H-level segments and M-level segments were matched. The processed sentences were presented to normal-hearing listeners to recognize. Experimental results showed that (1) H-level-only sentences carried more perceptual information than M-level-only sentences under both noise-masked and noise-suppressed conditions; and (2) this intelligibility advantage of H-level-only sentences over M-level-only sentences persisted even when effective SNR levels were matched, and it might be attributed to the perceptual advantage of vowels in speech intelligibility. In addition, the lesser distortion in H-level segments than in M-level segments following noise-suppression processing suggests that differential processing distortion might contribute to the H-level advantage observed.

1.
ANSI
(
1997
). ANSI-S3.5-1997,
Methods for Calculation of the Speech Intelligibility Index
(
American National Standards Institute
,
New York
).
2.
Benard
,
M. R.
, and
Başkent
,
D.
(
2014
). “
Perceptual learning of temporally interrupted spectrally degraded speech
,”
J. Acoust. Soc. Am.
136
,
1344
1351
.
3.
Chen
,
F.
, and
Loizou
,
P.
(
2012
). “
Contributions of cochlea-scaled entropy and consonant-vowel boundaries to prediction of speech intelligibility in noise
,”
J. Acoust. Soc. Am.
131
,
4104
4113
.
4.
Chen
,
F.
,
Wong
,
L. L. N.
, and
Hu
,
Y.
(
2014
). “
Effects of lexical tone contour on Mandarin sentence intelligibility
,”
J. Speech Lang. Hear. Res.
57
,
338
345
.
5.
Chen
,
F.
,
Wong
,
L. L. N.
, and
Wong
,
Y. W.
(
2013
). “
Assessing the perceptual contributions of vowels and consonants to Mandarin sentence intelligibility
,”
J. Acoust. Soc. Am.
134
,
EL178
EL184
.
6.
Cole
,
R.
,
Yan
,
Y.
,
Mak
,
B.
,
Fanty
,
M.
, and
Bailey
,
T.
(
1996
). “
The contribution of consonants versus vowels to word recognition in fluent speech
,” in
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
, pp.
853
856
.
7.
Ephraim
,
Y.
, and
Malah
,
D.
(
1985
). “
Speech enhancement using a minimum mean-square error log-spectral amplitude estimator
,”
IEEE Trans. Acoust., Speech, Signal Process
33
,
443
445
.
8.
Fogerty
,
D.
(
2014
). “
Importance of envelope modulations during consonants and vowels in segmentally interrupted sentences
,”
J. Acoust. Soc. Am.
135
,
1568
1576
.
10.
Fogerty
,
D.
, and
Humes
,
L. E.
(
2012
). “
The role of vowel and consonant fundamental frequency, envelope, and temporal fine structure cues to the intelligibility of words and sentences
,”
J. Acoust. Soc. Am.
131
,
1490
1501
.
11.
Fogerty
,
D.
, and
Kewley-Port
,
D.
(
2009
). “
Perceptual contributions of the consonant-vowel boundary to sentence intelligibility
,”
J. Acoust. Soc. Am.
126
,
847
857
.
12.
Goldsworthy
,
R.
, and
Greenberg
,
J.
(
2004
). “
Analysis of speech-based speech transmission index methods with implications for nonlinear operations
,”
J. Acoust. Soc. Am.
116
,
3679
3689
.
13.
Holube
,
I.
, and
Kollmeier
,
K.
(
1996
). “
Speech intelligibility prediction in hearing-impaired listeners based on a psychoacoustically motivated perception model
,”
J. Acoust. Soc. Am.
100
,
1703
1715
.
14.
Hu
,
Y.
, and
Loizou
,
P.
(
2003
). “
A generalized subspace approach for enhancing speech corrupted by colored noise
,”
IEEE Trans. Speech Audio Process.
11
,
334
341
.
15.
Hu
,
Y.
, and
Loizou
,
P.
(
2007
). “
A comparative intelligibility study of single-microphone noise reduction algorithms
,”
J. Acoust. Soc. Am.
122
,
1777
1786
.
16.
Kamath
,
S.
, and
Loizou
,
P.
(
2002
). “
A multi-band spectral subtraction method for enhancing speech corrupted by colored noise
,” in
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
, pp.
IV
4164
.
17.
Kates
,
J.
, and
Arehart
,
K.
(
2005
). “
Coherence and the speech intelligibility index
,”
J. Acoust. Soc. Am.
117
,
2224
2237
.
18.
Kewley-Port
,
D.
,
Burkle
,
T. Z.
, and
Lee
,
J. H.
(
2007
). “
Contribution of consonant versus vowel information to sentence intelligibility for young normal-hearing and elderly hearing-impaired listeners
,”
J. Acoust. Soc. Am.
122
,
2365
2375
.
19.
Kim
,
G.
, and
Loizou
,
P.
(
2011
). “
Gain-induced speech distortions and the absence of intelligibility benefit with existing noise-reduction algorithms
,”
J. Acoust. Soc. Am.
130
,
1581
1596
.
20.
Li
,
J.
,
Yang
,
L.
,
Zhang
,
J.
,
Yan
,
T.
,
Hu
,
Y.
,
Akagi
,
M.
, and
Loizou
,
P.
(
2011
). “
Comparative intelligibility investigation of single-channel noise-reduction algorithms for Chinese, Japanese, and English
,”
J. Acoust. Soc. Am.
129
,
3291
3301
.
21.
Loizou
,
P.
(
2007
).
Speech Enhancement: Theory and Practice
(
CRC
,
Boca Raton, FL
).
22.
Loizou
,
P.
, and
Kim
,
G.
(
2011
). “
Reasons why current speech-enhancement algorithms do not improve speech intelligibility and suggested solutions
,”
IEEE Trans. Audio, Speech, Lang. Process.
19
,
47
56
.
23.
Ma
,
J.
,
Hu
,
Y.
, and
Loizou
,
P.
(
2009
). “
Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions
,”
J. Acoust. Soc. Am.
125
,
3387
3405
.
24.
Miller
,
G. A.
, and
Licklider
,
J.
(
1950
). “
The intelligibility of interrupted speech
,”
J. Acoust. Soc. Am.
22
,
167
173
.
24.
Mittal
,
U.
, and
Phamdo
,
N.
(
2000
). “
Signal/noise KLT based approach for enhancing speech degraded by colored noise
,”
IEEE Trans. Speech Audio Proc.
8
,
159
167
.
25.
Powers
,
G. L.
, and
Speaks
,
C.
(
1973
). “
Intelligibility of temporally interrupted speech
,”
J. Acoust. Soc. Am.
54
,
661
667
.
26.
Powers
,
G. L.
, and
Wilcox
,
J. C.
(
1977
). “
Intelligibility of temporally interrupted speech with and without intervening noise
,”
J. Acoust. Soc. Am.
61
,
195
199
.
27.
Remez
,
R. E.
,
Rubin
,
P. E.
,
Pisoni
,
D. B.
, and
Carrell
,
T. D.
(
1981
). “
Speech perception without traditional speech cues
,”
Science
212
,
947
949
.
28.
Scalart
,
P.
, and
Filho
,
J.
(
1996
). “
Speech enhancement based on a priori signal to noise estimation
,” in
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
, pp.
629
632
.
28.
Sohn
,
J.
,
Kim
,
N.
, and
Sung
,
W.
(
1999
). “
A statistical model based voice activity detection
,”
IEEE Signal Process. Lett.
6
,
1
3
.
29.
Stilp
,
C. E.
, and
Kluender
,
K. R.
(
2010
). “
Cochlear-scaled entropy, not consonants, vowels or time, best predicts speech intelligibility
,”
Proc. Natl. Acad. Sci. U.S.A.
107
,
12387
12392
.
30.
Studebaker
,
G. A.
(
1985
). “
A ‘rationalized’ arcsine transform
,”
J. Speech Hear. Res.
28
,
455
462
.
31.
Warren
,
R. M.
(
1970
). “
Perceptual restoration of missing speech sounds
,”
Science
167
,
392
393
.
32.
Wong
,
L. L.
,
Soli
,
S. D.
,
Liu
,
S.
,
Han
,
N.
, and
Huang
,
M. W.
(
2007
). “
Development of the Mandarin Hearing in Noise Test (MHINT)
,”
Ear Hear.
28
,
70S
74S
.
You do not currently have access to this content.