This study investigates how the intelligibility advantage of ideal binary mask (IBM) processing in synthesizing speech is affected by the use of a small number of the most energetic channels. In experiment 1, IBM-processed Mandarin speech that had been corrupted by speech spectrum-shaped noise or two-talker babble was synthesized by using as few as four of the most energetic target-dominated channels at each frame. This approach provided intelligibility comparable to that of speech synthesized with all of the target-dominated channels. Experiments 2, 3, and 4 examined how the intelligibility advantage of IBM processing from experiment 1 was affected by the local SNR threshold, low-frequency region (LFR) cut-off frequency, and vowel-based segmentation, respectively. Experiments 2 and 3 showed that a threshold of 0 dB for local SNR and a cutoff of 3000 Hz for LFR were optimal choices for improving the intelligibility of IBM processing based on the most energetic channels. Experiment 4 found that the intelligibility advantage of IBM processing with the most energetic channels was preserved at the segmental level of vowel-only IBM-processed speech. Taken together, the results suggest that compared to IBM-processed speech synthesized with all of the target-dominated channels, Mandarin speech synthesized by selecting a small number of the most energetic target-dominated channels can achieve similar levels of intelligibility.
Skip Nav Destination
Article navigation
December 2016
December 06 2016
Representing the intelligibility advantage of ideal binary masking with the most energetic channels
a)
Electronic mail: fchen@sustc.edu.cn
J. Acoust. Soc. Am. 140, 4161–4169 (2016)
Article history
Received:
April 07 2016
Accepted:
November 17 2016
Citation
Fei Chen; Representing the intelligibility advantage of ideal binary masking with the most energetic channels. J. Acoust. Soc. Am. 1 December 2016; 140 (6): 4161–4169. https://doi.org/10.1121/1.4971206
Download citation file:
Sign in
Don't already have an account? Register
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
Pay-Per-View Access
$40.00
Citing articles via
A survey of sound source localization with deep learning methods
Pierre-Amaury Grumiaux, Srđan Kitić, et al.
Co-speech head nods are used to enhance prosodic prominence at different levels of narrow focus in French
Christopher Carignan, Núria Esteve-Gibert, et al.
In a presentation, Ted once said I'd like my epitaph to be “I simplified.”
Paul Schomer, Truls Gjestland
Related Content
Assessing the perceptual contributions of level-dependent segments to sentence intelligibility
J. Acoust. Soc. Am. (November 2016)
Understanding frequency-compressed Mandarin sentences: Role of vowels
J. Acoust. Soc. Am. (March 2016)
Comparing the perceptual contributions of cochlear-scaled entropy and speech level
J. Acoust. Soc. Am. (December 2016)
Effects of noise suppression and envelope dynamic range compression on the intelligibility of vocoded sentences for a tonal language
J. Acoust. Soc. Am. (September 2017)
Effects of fundamental frequency contour on understanding Mandarin sentences in bimodal hearing simulations
J. Acoust. Soc. Am. (May 2018)