This paper describes advances for acoustic models in Chinese spontaneous Conversational Telephone Speech (CTS) recognition task. A number of approaches were investigated in the acoustic modeling, including Heteroscedastic Linear Discriminant Analysis (HLDA), Vocal Tract Length Normalization (VTLN), Gaussianization, Minimum Phone Error (MPE), Feature space MPE (fMPE), and etc. Considering pronunciation variations in continuous speech, tones in recognition vocabulary were modified due to the Sandhi rule. The acoustic models were trained on over 200 hours of audio data from standard LDC corpora. The improved acoustic models reduce the relative Character Error Rate (CER) by about 25% over the baseline acoustic models on standard LDC test set and China 863 program evaluation data set. Acknowledgment: This work is partially supported by the National Natural Science Foundation of China (No's. 10925419, 90920302, 10874203, 60875014, 61072124, 11074275, 11161140319).
Skip Nav Destination
Article navigation
April 2012
Meeting abstract. No PDF available.
April 01 2012
Improved acoustic models for spontaneous speech recognition
Qingqing Zhang;
Qingqing Zhang
Key Laboratory of Speech Acoustics and Content Understanding, Chinese Academy of Sciences, [email protected]
Search for other works by this author on:
Shang Cai;
Shang Cai
Key Laboratory of Speech Acoustics and Content Understanding, Chinese Academy of Sciences, [email protected]
Search for other works by this author on:
Jielin Pan;
Jielin Pan
Key Laboratory of Speech Acoustics and Content Understanding, Chinese Academy of Sciences, [email protected]
Search for other works by this author on:
Yonghong Yan
Yonghong Yan
Key Laboratory of Speech Acoustics and Content Understanding, Chinese Academy of Sciences, [email protected]
Search for other works by this author on:
J. Acoust. Soc. Am. 131, 3236 (2012)
Citation
Qingqing Zhang, Shang Cai, Jielin Pan, Yonghong Yan; Improved acoustic models for spontaneous speech recognition. J. Acoust. Soc. Am. 1 April 2012; 131 (4_Supplement): 3236. https://doi.org/10.1121/1.4708075
Download citation file:
56
Views
Citing articles via
All we know about anechoic chambers
Michael Vorländer
A survey of sound source localization with deep learning methods
Pierre-Amaury Grumiaux, Srđan Kitić, et al.
Does sound symbolism need sound?: The role of articulatory movement in detecting iconicity between sound and meaning
Mutsumi Imai, Sotaro Kita, et al.
Related Content
Automatic detection of the second subglottal resonance and its application to speaker normalization
J. Acoust. Soc. Am. (December 2009)
Unsupervised speaker adaptation for speaker independent acoustic to articulatory speech inversion
J. Acoust. Soc. Am. (July 2019)
Speaker normalization in noisy environments using subglottal resonances
J Acoust Soc Am (November 2013)
Acoustic realization of Mandarin neutral tone and tone sandhi in infant-directed speech and Lombard speech
J. Acoust. Soc. Am. (November 2017)
Auditory-model based robust feature selection for speech recognition
J Acoust Soc Am (January 2010)