Listeners can reliably perceive speech in noisy conditions, but it is not well understood what specific features of speech they use to do this. This paper introduces a data-driven framework to identify the time-frequency locations of these features. Using the same speech utterance mixed with many different noise instances, the framework is able to compute the importance of each time-frequency point in the utterance to its intelligibility. The mixtures have approximately the same global signal-to-noise ratio at each frequency, but very different recognition rates. The difference between these intelligible vs unintelligible mixtures is the alignment between the speech and spectro-temporally modulated noise, providing different combinations of “glimpses” of speech in each mixture. The current results reveal the locations of these important noise-robust phonetic features in a restricted set of syllables. Classification models trained to predict whether individual mixtures are intelligible based on the location of these glimpses can generalize to new conditions, successfully predicting the intelligibility of novel mixtures. They are able to generalize to novel noise instances, novel productions of the same word by the same talker, novel utterances of the same word spoken by different talkers, and, to some extent, novel consonants.
Skip Nav Destination
,
,
Article navigation
October 2016
October 13 2016
Measuring time-frequency importance functions of speech with bubble noisea) Available to Purchase
Michael I. Mandel;
Michael I. Mandel
b)
Department of Computer Science and Engineering,
The Ohio State University
, Columbus, Ohio 43210, USA
Search for other works by this author on:
Sarah E. Yoho;
Sarah E. Yoho
Department of Speech and Hearing Science,
The Ohio State University
, Columbus, Ohio 43210, USA
Search for other works by this author on:
Eric W. Healy
Eric W. Healy
Department of Speech and Hearing Science,
The Ohio State University
, Columbus, Ohio 43210, USA
Search for other works by this author on:
Michael I. Mandel
b)
Sarah E. Yoho
Eric W. Healy
Department of Computer Science and Engineering,
The Ohio State University
, Columbus, Ohio 43210, USA
b)
Current address: Department of Computer and Information Science, Brooklyn College, CUNY, Brooklyn, NY 11210, USA. Electronic mail: [email protected]
a)
Portions of this work were presented at the 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, the 2014 ISCA Interspeech conference, and the 169th meeting of the Acoustical Society of America.
J. Acoust. Soc. Am. 140, 2542–2553 (2016)
Article history
Received:
March 02 2016
Accepted:
September 12 2016
Citation
Michael I. Mandel, Sarah E. Yoho, Eric W. Healy; Measuring time-frequency importance functions of speech with bubble noise. J. Acoust. Soc. Am. 1 October 2016; 140 (4): 2542–2553. https://doi.org/10.1121/1.4964102
Download citation file:
Pay-Per-View Access
$40.00
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
Citing articles via
Focality of sound source placement by higher (ninth) order ambisonics and perceptual effects of spectral reproduction errors
Nima Zargarnezhad, Bruno Mesquita, et al.
A survey of sound source localization with deep learning methods
Pierre-Amaury Grumiaux, Srđan Kitić, et al.
Variation in global and intonational pitch settings among black and white speakers of Southern American English
Aini Li, Ruaridh Purse, et al.
Related Content
Informational masking of speech by acoustically similar intelligible and unintelligible interferers
J. Acoust. Soc. Am. (February 2020)
Acoustic distinctions between intelligible and unintelligible vowels in the speech of the deaf
J. Acoust. Soc. Am. (August 2005)
Consonant recognition loss in hearing impaired listeners
J. Acoust. Soc. Am. (November 2009)
Effects of time reversal on consonant identification
J. Acoust. Soc. Am. (November 2000)
Consonant identification for hearing impaired listeners.
J. Acoust. Soc. Am. (April 2009)