This study investigates how the intelligibility advantage of ideal binary mask (IBM) processing in synthesizing speech is affected by the use of a small number of the most energetic channels. In experiment 1, IBM-processed Mandarin speech that had been corrupted by speech spectrum-shaped noise or two-talker babble was synthesized by using as few as four of the most energetic target-dominated channels at each frame. This approach provided intelligibility comparable to that of speech synthesized with all of the target-dominated channels. Experiments 2, 3, and 4 examined how the intelligibility advantage of IBM processing from experiment 1 was affected by the local SNR threshold, low-frequency region (LFR) cut-off frequency, and vowel-based segmentation, respectively. Experiments 2 and 3 showed that a threshold of 0 dB for local SNR and a cutoff of 3000 Hz for LFR were optimal choices for improving the intelligibility of IBM processing based on the most energetic channels. Experiment 4 found that the intelligibility advantage of IBM processing with the most energetic channels was preserved at the segmental level of vowel-only IBM-processed speech. Taken together, the results suggest that compared to IBM-processed speech synthesized with all of the target-dominated channels, Mandarin speech synthesized by selecting a small number of the most energetic target-dominated channels can achieve similar levels of intelligibility.

1.
Anzalone
,
M. C.
,
Calandruccio
,
L.
,
Doherty
,
K. A.
, and
Carney
,
L. H.
(
2006
). “
Determination of the potential benefit of time-frequency gain manipulation
,”
Ear Hear.
27
,
480
492
.
2.
Brungart
,
D.
,
Chang
,
P.
,
Simpson
,
B.
, and
Wang
,
D.
(
2006
). “
Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation
,”
J. Acoust. Soc. Am.
120
,
4007
4018
.
3.
Cao
,
S.
,
Li
,
L.
, and
Wu
,
X. H.
(
2011
). “
Improvement of intelligibility of ideal binary-masked noisy speech by adding background noise
,”
J. Acoust. Soc. Am.
129
,
2227
2236
.
4.
Chen
,
F.
, and
Kwok
,
A. S. T.
(
2015
). “
Segmental contribution to the intelligibility of ideal binary-masked sentences
,” in
Proceedings of 16th Annual Conference of the International Speech Communication Association (InterSpeech)
, Dresden, pp.
3404
3407
.
5.
Chen
,
F.
, and
Loizou
,
P.
(
2010
). “
Analysis of a simplified normalized covariance measure based on binary weighting functions for predicting the intelligibility of noise-suppressed speech
,”
J. Acoust. Soc. Am.
128
,
3715
3723
.
6.
Chen
,
F.
,
Wong
,
L. L.
, and
Wong
,
Y. W.
(
2013
). “
Assessing the perceptual contributions of vowels and consonants to Mandarin sentence intelligibility
,”
J. Acoust. Soc. Am.
134
,
EL178
EL184
.
7.
Dorman
,
M.
,
Loizou
,
P.
, and
Rainey
,
D.
(
1997
). “
Speech intelligibility as a function of the number of channels of stimulation for signal processors using sine-wave and noise-band outputs
,”
J. Acoust. Soc. Am.
102
,
2403
2411
.
8.
Fogerty
,
D.
, and
Chen
,
F.
(
2014
). “
Vowel spectral contributions to English and Mandarin sentence intelligibility
,” in
Proceedings of 15th Annual Conference of the International Speech Communication Association (InterSpeech)
, Singapore, pp.
499
503
.
9.
Fogerty
,
D.
, and
Kewley-Port
,
D.
(
2009
). “
Perceptual contributions of the consonant-vowel boundary to sentence intelligibility
,”
J. Acoust. Soc. Am.
126
,
847
857
.
10.
Healy
,
E. W.
,
Yoho
,
S. E.
,
Wang
,
Y.
, and
Wang
,
D.
(
2013
). “
An algorithm to improve speech recognition in noise for hearing-impaired listeners
,”
J. Acoust. Soc. Am.
134
,
3029
3038
.
11.
Kjems
,
U.
,
Boldt
,
J. B.
,
Pedersen
,
M. S.
,
Lunner
,
T.
, and
Wang
,
D.
(
2009
). “
Role of mask pattern in intelligibility of ideal binary-masked noisy speech
,”
J. Acoust. Soc. Am.
126
,
1415
1426
.
12.
Koning
,
R.
,
Madhu
,
N.
, and
Wouters
,
J.
(
2015
). “
Ideal time-frequency masking algorithms lead to different speech intelligibility and quality in normal-hearing and cochlear implant listeners
,”
IEEE. Trans. Biomed. Eng.
62
,
331
341
.
13.
Kressner
,
A. A.
, and
Rozell
,
C. J.
(
2015
). “
Structure in time-frequency binary masking errors and its impact on speech intelligibility
,”
J. Acoust. Soc. Am.
137
,
2025
2035
.
14.
Kressner
,
A. A.
,
Westermann
,
A.
,
Buchholz
,
J. M.
, and
Rozell
,
C. J.
(
2016
). “
Cochlear implant speech intelligibility outcomes with structured and unstructured binary mask errors
,”
J. Acoust. Soc. Am.
139
,
800
810
.
15.
Li
,
N.
, and
Loizou
,
P.
(
2008a
). “
Effect of spectral resolution on the intelligibility of ideal binary masked speech
,”
J. Acoust. Soc. Am.
123
,
EL59
EL64
.
16.
Li
,
N.
, and
Loizou
,
P. C.
(
2008b
). “
Factors influencing intelligibility of ideal binary-masked speech: Implications for noise reduction
,”
J. Acoust. Soc. Am.
123
,
1673
1682
.
17.
Loizou
,
P.
(
1999
). “
Signal processing techniques for cochlear Implants
,”
IEEE. Eng. Med. Biol. Mag.
18
,
34
46
.
18.
Shannon
,
R. V.
,
Zeng
,
F. G.
,
Kamath
,
V.
,
Wygonski
,
J.
, and
Ekelid
,
M.
(
1995
). “
Speech recognition with primarily temporal cues
,”
Science
270
,
303
304
.
19.
Sinex
,
D. G.
(
2013
). “
Recognition of speech in noise after application of time-frequency masks: Dependence on frequency and threshold parameters
,”
J. Acoust. Soc. Am.
133
,
2390
2396
.
20.
Wang
,
D.
(
2005
). “
On ideal binary mask as the computational goal of auditory scene analysis
,” in
Speech Separation by Humans and Machines
, edited by
P.
Divenyi
(
Kluwer Academic
,
Dordrecht
), pp.
181
197
.
21.
Wang
,
D.
,
Kjems
,
U.
,
Pedersen
,
M. S.
,
Boldt
,
J. B.
, and
Lunner
,
T.
(
2008
). “
Speech perception of noise with binary gains
,”
J. Acoust. Soc. Am.
124
,
2303
2307
.
22.
Wang
,
D.
,
Kjems
,
U.
,
Pedersen
,
M. S.
,
Boldt
,
J. B.
, and
Lunner
,
T.
(
2009
). “
Speech intelligibility in background noise with ideal binary time-frequency masking
,”
J. Acoust. Soc. Am.
125
,
2336
2347
.
23.
Wong
,
L. L.
,
Soli
,
S. D.
,
Liu
,
S.
,
Han
,
N.
, and
Huang
,
M. W.
(
2007
). “
Development of the Mandarin Hearing in Noise Test (MHINT)
,”
Ear Hear.
28
,
70S
74S
.
You do not currently have access to this content.