Previous studies have proposed ways to estimate articulatory information from the acoustic speech signal and have shown that when used with standard cepstral features, they help to improve word recognition performance in noise for a connected digit recognition task. In this paper, I present results from a word recognition and a phone recognition experiments in noise that uses two sets of articulatory representation: continuous (tract variable trajectories) and discrete (articulatory gestures) along with standard mel cepstral features for acoustic modeling. The acoustic model is a dynamic Bayesian network (DBN) that treats the continuous articulatory information as observed and the discrete articulatory presentation as hidden random variables. Our results indicate that the use of articulatory information improved noise robustness for both the word recognition and phone recognition tasks substantially.
Skip Nav Destination
Article navigation
Meeting abstract. No PDF available.
October 01 2011
Robust speech recognition with articulatory features using dynamic Bayesian networks
Vikramjit Mitra;
Vikramjit Mitra
Speech Technol. and Res. Lab., SRI Int., 333 Ravenswood Ave., Menlo Park, CA 94025
Search for other works by this author on:
Carol Espy-Wilson;
Carol Espy-Wilson
Univ. of Maryland, College Park, MD 20742
Search for other works by this author on:
Elliot Saltzman;
Elliot Saltzman
Boston Univ., Boston, MA 02115
Search for other works by this author on:
Louis Goldstein
Louis Goldstein
Univ. of Southern California, Los Angeles, CA 90089
Search for other works by this author on:
J. Acoust. Soc. Am. 130, 2408 (2011)
Citation
Vikramjit Mitra, Hosung Nam, Carol Espy-Wilson, Elliot Saltzman, Louis Goldstein; Robust speech recognition with articulatory features using dynamic Bayesian networks. J. Acoust. Soc. Am. 1 October 2011; 130 (4_Supplement): 2408. https://doi.org/10.1121/1.3654653
Download citation file:
Citing articles via
Related Content
Recognizing articulatory gestures from speech for robust speech recognition
J. Acoust. Soc. Am. (March 2012)
Channel and noise robustness of articulatory features in a deep neural net based speech recognition system
J Acoust Soc Am (April 2015)
Improved speech inversion using general regression neural network
J. Acoust. Soc. Am. (September 2015)
Chatter diagnosis using Mel frequency cepstral coefficient of vibrational signal for various operating conditions
J Acoust Soc Am (October 2016)
Data‐driven Modeling of Metal‐oxide Sensors with Dynamic Bayesian Networks
AIP Conference Proceedings (September 2011)