The problem of speaker/environment adaptation to improve the recognition accuracy and thus making recognizers robust is addressed here. For this, a fast on‐line adaptation algorithm that does not need a separate adaptation training data and that adapts acoustic models fast enough to achieve near real‐time recognition is developed. This technique is based on stochastic matching in the model space similar to [A. Shankar and C.‐H. Lee, IEEE Trans. Signal Process. 4, 190–202 (1996)]. For fast adaptation only the models and the mixture components that need to be adapted are selected based on the cluster formation and Euclidean distance. This adaptation algorithm is implemented as part of a GMM based continuous speech recognizer. It is tested using a non‐native speakers dataset. For example, the five best hypotheses output of the speech recognizer before and after applying the adaptation technique indicated that the right answer corresponding to an utterance ‘‘none of the earth’’ before adaptation did not correspond to the best hypothesis; however, it corresponded to the third best. After adaptation all of the five‐best hypotheses converged to the right answer. The test results of this technique on a larger non‐native speakers’ dataset shows 70% to 75% relative WER improvement.
Skip Nav Destination
Article navigation
November 2002
Meeting abstract. No PDF available.
October 25 2002
Fast on‐line speaker/environment adaptation using modified maximum likelihood stochastic matching
Shubha L. Kadambe;
Shubha L. Kadambe
HRL Labs., LLC, 3011 Malibu Canyon Rd., Malibu, CA 90265
Search for other works by this author on:
Marcus Iseli
Marcus Iseli
UCLA, Westwood, CA
Search for other works by this author on:
J. Acoust. Soc. Am. 112, 2321 (2002)
Citation
Shubha L. Kadambe, Marcus Iseli; Fast on‐line speaker/environment adaptation using modified maximum likelihood stochastic matching. J. Acoust. Soc. Am. 1 November 2002; 112 (5_Supplement): 2321. https://doi.org/10.1121/1.4779361
Download citation file:
44
Views
Citing articles via
All we know about anechoic chambers
Michael Vorländer
Day-to-day loudness assessments of indoor soundscapes: Exploring the impact of loudness indicators, person, and situation
Siegbert Versümer, Jochen Steffens, et al.
A survey of sound source localization with deep learning methods
Pierre-Amaury Grumiaux, Srđan Kitić, et al.
Related Content
Convolutive mixture separation in time–frequency domain for robust automatic speech recognition
J Acoust Soc Am (October 2002)
Fifty years of progress in speech and speaker recognition
J Acoust Soc Am (October 2004)
Maximum likelihood parsing of speech in the presence of segmentation errors
J Acoust Soc Am (August 2005)
Study of effect of speaker variability and driving conditions on the performance of an automatic speech recognition engine inside a vehicle
J Acoust Soc Am (April 2005)
Automatic detection of the second subglottal resonance and its application to speaker normalization
J. Acoust. Soc. Am. (December 2009)