Human-generated measures of speech intelligibility are time-intensive methods. The present study seeks to automate the assessment of speech intelligibility by developing a deep neural network that estimates a standardized intelligibility score based on acoustic input. Mel-frequency cepstral coefficients were extracted from the UW/NU IEEE sentence corpus, which had been manipulated with three signal-to-noise ratios (-2, 0, 2 dB). Listener transcriptions were obtained from the UAW speech intelligibility dataset, and the Levenshtein distance was calculated between the transcriptions and the speaker's prompt. The neural network was trained to predict the Levenshtein distance given MFCC representations of sentences. Ten-fold cross-validation was used to verify the accuracy of the model and to investigate the correlation of the model predictions with the average human responses. The accuracy of the model was also compared with the Levenshtein distance generated by transcriptions produced by the DeepSpeech ASR model. This study investigates the reliability of deep neural networks as an alternative to human-based inference in quantifying the intelligibility of speech. The advantages and disadvantages of the different approaches to assessing speech intelligibility are discussed.
Skip Nav Destination
Article navigation
5 December 2022
183rd Meeting of the Acoustical Society of America
5–9 December 2022
Nashville, Tennessee
Speech Communication: Paper 5aSC16
November 05 2024
Acoustic-based automatic speech intelligibility scoring using deep neural networks
Benjamin V. Tucker
;
Benjamin V. Tucker
3
Department of Communication Sciences and Disorders, Northern Arizona University
, Flagstaff, AZ, 86011, USA
; benjamin.tucker@nau.edu
Search for other works by this author on:
Richard A. Wright
Richard A. Wright
Search for other works by this author on:
Proc. Mtgs. Acoust. 50, 060010 (2022)
Article history
Received:
September 25 2024
Accepted:
October 23 2024
Connected Content
This is a companion to:
Acoustic-based automatic speech intelligibility scoring using deep neural networks
Citation
Nikita B. Emberi, Tyler T. Schnoor, Benjamin V. Tucker, Richard A. Wright; Acoustic-based automatic speech intelligibility scoring using deep neural networks. Proc. Mtgs. Acoust. 5 December 2022; 50 (1): 060010. https://doi.org/10.1121/2.0001973
Download citation file:
29
Views
Citing articles via
Flyback sonic booms from Falcon-9 rockets: Measured data and some considerations for future models
Mark C. Anderson, Kent L. Gee, et al.
Related Content
Acoustic-based automatic speech intelligibility scoring using deep neural networks
J Acoust Soc Am (October 2022)
ASR for Indian regional language using Nvidia’s NeMo toolkit
AIP Conf. Proc. (November 2023)
Comparing Levenshtein distance and dynamic time warping in predicting listeners’ judgements of accent distance
J Acoust Soc Am (October 2021)
On training targets for deep learning approaches to clean speech magnitude spectrum estimation
J. Acoust. Soc. Am. (May 2021)
How pronunciation distance impacts word recognition in children and adults
J. Acoust. Soc. Am. (December 2021)