A major determinant of success when separating a speech signal from a noisy environment is the intelligibility of the extracted speech. The fraction of words correctly recognized by listeners is often used as the “gold standard”, but reliable computational metrics would be preferred because they are less labor-intensive than collecting listener judgements. We compare listener intelligibility to three acoustically derived metrics: (a) speech-to-interference ratios (SIRs) estimated from the processed mixtures (ESIR), (b) average coherence (Coh) and (c) speech-based Speech Transmission Index (sSTI). Sentences were recorded against restaurant babble, white Gaussian noise, and nonstationary noise by four microphones at different SNRs ranging from +6 dB to −8 dB. Treatments included (1) the original mixture; (2) the mixture processed by a critically determined 4-channel blind source separation (BSS) algorithm; (3) the speech component extracted from the mixture using a least-mean squared (LMS) algorithm; (4) an LMS algorithm used to remove the two noises and then 2-channel BSS to separate the sentences from the babble; and (5) pristine speech recorded with no noise. The metrics are compared to intelligibility results from listening tests. The Coh and sSTI metrics show the best fit to listener intelligibility across talkers.
Skip Nav Destination
Article navigation
13 May 2024
186th Meeting of the Acoustical Society of America and the Canadian Acoustical Association
13–17 May 2024
Ottawa, Ontario, Canada
Psychological and Physiological Acoustics: Paper 2pPP9
September 17 2024
Relationship between objective measures and listener intelligibility of speech processed by source-separation algorithms
Karen Payton;
Karen Payton
1
Department of Research, Speech Technology & Applied Research Corp
., Lexington, MA, 02421, USA
; kpayton@umassd.edu
Search for other works by this author on:
Behdad Dousti;
Behdad Dousti
3
Department of Communication Science & Disorders, University of Cincinnati
, Cincinnati, OH, USA
; doustibd@mail.uc.edu
Search for other works by this author on:
Sarah Dugan;
Sarah Dugan
4
Rehabilitation, Exercise & Nutrition Science, University of Cincinnati
, Cincinnati, OH, USA
; hamilsm@ucmail.uc.edu
Search for other works by this author on:
Suzanne Boyce
;
Suzanne Boyce
5
Department of Communication Science & Disorders, University of Cincinnati
, Cincinnati, OH, USA
; boycese@ucmail.uc.edu
Search for other works by this author on:
Joel MacAuslan
Joel MacAuslan
6
Speech Technology & Applied Research Corp
., Lexington, MA, 02421, USA
; JoelM@STARAnalyticalServices.com
Search for other works by this author on:
Proc. Mtgs. Acoust. 54, 050003 (2024)
Article history
Received:
August 20 2024
Accepted:
September 04 2024
Connected Content
Citation
Karen Payton, Richard Goldhor, Behdad Dousti, Sarah Dugan, Suzanne Boyce, Joel MacAuslan; Relationship between objective measures and listener intelligibility of speech processed by source-separation algorithms. Proc. Mtgs. Acoust. 13 May 2024; 54 (1): 050003. https://doi.org/10.1121/2.0001952
Download citation file:
31
Views
Citing articles via
Flyback sonic booms from Falcon-9 rockets: Measured data and some considerations for future models
Mark C. Anderson, Kent L. Gee, et al.
Related Content
Relationship between intelligibility, naturalness, and listening effort of source-separated speech
Proc. Mtgs. Acoust. (October 2024)
Using the short-time speech transmission index to predict speech reception thresholds in fluctuating noise
J Acoust Soc Am (April 2014)
Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing
J. Acoust. Soc. Am. (September 2011)
Unmasking the effects of masking on performance: The potential of multiple-voice masking in the office environment
J. Acoust. Soc. Am. (August 2015)
The role of auditory spectro-temporal modulation filtering and the decision metric for speech intelligibility prediction
J. Acoust. Soc. Am. (June 2014)