A listener’s ability to understand a target speaker in the presence of one or more simultaneous competing speakers is subject to two types of masking: energetic and informational. Energetic masking takes place when target and interfering signals overlap in time and frequency resulting in portions of target becoming inaudible. Informational masking occurs when the listener is unable to distinguish target and interference, while both are audible. A computational model of multitalker speech perception is presented to account for both types of masking. Human perception in the presence of energetic masking is modeled using a speech recognizer that treats the masked time-frequency units of target as missing data. The effects of informational masking are modeled as errors in target segregation by a speech separation system. On a systematic evaluation, the performance of the proposed model is in broad agreement with the results of a recent perceptual study.
Skip Nav Destination
Article navigation
November 2008
November 01 2008
A model for multitalker speech perception
Soundararajan Srinivasan;
Soundararajan Srinivasan
a)
Biomedical Engineering Department,
The Ohio State University
, Columbus, Ohio 43210
Search for other works by this author on:
DeLiang Wang
DeLiang Wang
b)
Department of Computer Science and Engineering and Center for Cognitive Science,
The Ohio State University
, Columbus, Ohio 43210
Search for other works by this author on:
a)
Present address: Robert Bosch LLC, Research and Technology Center North America, Pittsburgh, PA 15212. Electronic mails: [email protected] and [email protected]
b)
Electronic mail: [email protected]
J. Acoust. Soc. Am. 124, 3213–3224 (2008)
Article history
Received:
November 05 2007
Accepted:
August 18 2008
Citation
Soundararajan Srinivasan, DeLiang Wang; A model for multitalker speech perception. J. Acoust. Soc. Am. 1 November 2008; 124 (5): 3213–3224. https://doi.org/10.1121/1.2982413
Download citation file:
Pay-Per-View Access
$40.00
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
Citing articles via
All we know about anechoic chambers
Michael Vorländer
Day-to-day loudness assessments of indoor soundscapes: Exploring the impact of loudness indicators, person, and situation
Siegbert Versümer, Jochen Steffens, et al.
A survey of sound source localization with deep learning methods
Pierre-Amaury Grumiaux, Srđan Kitić, et al.
Related Content
Multitalker speech perception with ideal time-frequency segregation: Effects of voice characteristics and number of talkers
J. Acoust. Soc. Am. (June 2009)
Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation
J. Acoust. Soc. Am. (December 2006)
The multiple contributions of interaural differences to improved speech intelligibility in multitalker scenarios
J. Acoust. Soc. Am. (May 2016)
Children’s perception of speech in multitalker babble
J Acoust Soc Am (December 2000)
Understanding effects of hearing loss on multitalker speech intelligibility in terms of glimpsing
J Acoust Soc Am (May 2017)