Listeners can reliably perceive speech in noisy conditions, but it is not well understood what specific features of speech they use to do this. This paper introduces a data-driven framework to identify the time-frequency locations of these features. Using the same speech utterance mixed with many different noise instances, the framework is able to compute the importance of each time-frequency point in the utterance to its intelligibility. The mixtures have approximately the same global signal-to-noise ratio at each frequency, but very different recognition rates. The difference between these intelligible vs unintelligible mixtures is the alignment between the speech and spectro-temporally modulated noise, providing different combinations of “glimpses” of speech in each mixture. The current results reveal the locations of these important noise-robust phonetic features in a restricted set of syllables. Classification models trained to predict whether individual mixtures are intelligible based on the location of these glimpses can generalize to new conditions, successfully predicting the intelligibility of novel mixtures. They are able to generalize to novel noise instances, novel productions of the same word by the same talker, novel utterances of the same word spoken by different talkers, and, to some extent, novel consonants.
Skip Nav Destination
,
,
Article navigation
October 2016
October 13 2016
Measuring time-frequency importance functions of speech with bubble noisea)
Michael I. Mandel;
Michael I. Mandel
b)
Department of Computer Science and Engineering,
The Ohio State University
, Columbus, Ohio 43210, USA
Search for other works by this author on:
Sarah E. Yoho;
Sarah E. Yoho
Department of Speech and Hearing Science,
The Ohio State University
, Columbus, Ohio 43210, USA
Search for other works by this author on:
Eric W. Healy
Eric W. Healy
Department of Speech and Hearing Science,
The Ohio State University
, Columbus, Ohio 43210, USA
Search for other works by this author on:
Michael I. Mandel
b)
Sarah E. Yoho
Eric W. Healy
Department of Computer Science and Engineering,
The Ohio State University
, Columbus, Ohio 43210, USA
b)
Current address: Department of Computer and Information Science, Brooklyn College, CUNY, Brooklyn, NY 11210, USA. Electronic mail: [email protected]
a)
Portions of this work were presented at the 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, the 2014 ISCA Interspeech conference, and the 169th meeting of the Acoustical Society of America.
J. Acoust. Soc. Am. 140, 2542–2553 (2016)
Article history
Received:
March 02 2016
Accepted:
September 12 2016
Citation
Michael I. Mandel, Sarah E. Yoho, Eric W. Healy; Measuring time-frequency importance functions of speech with bubble noise. J. Acoust. Soc. Am. 1 October 2016; 140 (4): 2542–2553. https://doi.org/10.1121/1.4964102
Download citation file:
Pay-Per-View Access
$40.00
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
Citing articles via
I can't hear you without my glasses
Tessa Bent
A survey of sound source localization with deep learning methods
Pierre-Amaury Grumiaux, Srđan Kitić, et al.
Related Content
The time course of sound category identification: Insights from acoustic features
J. Acoust. Soc. Am. (December 2017)
Do you have COVID-19? An artificial intelligence-based screening tool for COVID-19 using acoustic parameters
J. Acoust. Soc. Am. (September 2021)
A review: Blood pressure monitoring based on PPG and circadian rhythm
APL Bioeng. (July 2024)