Speech technology harnessing information from a speaker's voice promises to enhance security and assist in everyday tasks. Automated speech recognition (ASR) converts spoken words into text, facilitating interaction with electronic devices. ASR is also increasingly used in education in programs that assess students' learning through interaction with computers. However, ASR may not work equally well for underrepresented accent groups. Multiple studies over the last several years (e.g., Tatman 2017, Koenecke et al., 2020) have shown that ASR performs particularly poorly on African American English (AAE). This performance drop is likely due to imbalances in accent representation in training data. Here, we assess vocal tract length adjustment as a data augmentation method for increasing representation of AAE speech in the training data, with the aim of improving ASR performance on AAE. We compare this data augmentation method to standard data augmentation methods (e.g., environmental).
Skip Nav Destination
Article navigation
March 2023
March 01 2023
Towards improving automatic speech recognition for underrepresented dialects with data augmentation
Sarah Bakst;
Sarah Bakst
SRI Int., 333 Ravenswood Ave., Menlo Park, CA 94025, [email protected]
Search for other works by this author on:
Diego Castan
Diego Castan
SRI Int., Menlo Park, CA
Search for other works by this author on:
J. Acoust. Soc. Am. 153, A297 (2023)
Connected Content
A companion article has been published:
Towards improving automatic speech recognition for underrepresented dialects with data augmentation
Citation
Sarah Bakst, Emre Yilmaz, Diego Castan; Towards improving automatic speech recognition for underrepresented dialects with data augmentation. J. Acoust. Soc. Am. 1 March 2023; 153 (3_supplement): A297. https://doi.org/10.1121/10.0018920
Download citation file:
271
Views
Citing articles via
All we know about anechoic chambers
Michael Vorländer
Day-to-day loudness assessments of indoor soundscapes: Exploring the impact of loudness indicators, person, and situation
Siegbert Versümer, Jochen Steffens, et al.
A survey of sound source localization with deep learning methods
Pierre-Amaury Grumiaux, Srđan Kitić, et al.
Related Content
Southern dialects of United States English and automatic speech recognition
J Acoust Soc Am (October 2022)
A descriptive study of vowels in Dialects of Pakistani English
J. Acoust. Soc. Am. (March 2024)
Dialect perception in song versus speech
J. Acoust. Soc. Am. (October 2023)
Toward the automatic detection of manually labeled irregular pitch periods
J Acoust Soc Am (September 2018)
Toward the automatic detection of manually labeled irregular pitch periods
J Acoust Soc Am (March 2018)