Previous studies have proposed ways to estimate articulatory information from the acoustic speech signal and have shown that when used with standard cepstral features, they help to improve word recognition performance in noise for a connected digit recognition task. In this paper, I present results from a word recognition and a phone recognition experiments in noise that uses two sets of articulatory representation: continuous (tract variable trajectories) and discrete (articulatory gestures) along with standard mel cepstral features for acoustic modeling. The acoustic model is a dynamic Bayesian network (DBN) that treats the continuous articulatory information as observed and the discrete articulatory presentation as hidden random variables. Our results indicate that the use of articulatory information improved noise robustness for both the word recognition and phone recognition tasks substantially.
Meeting abstract. No PDF available.
Robust speech recognition with articulatory features using dynamic Bayesian networks
Vikramjit Mitra, Hosung Nam, Carol Espy-Wilson, Elliot Saltzman, Louis Goldstein; Robust speech recognition with articulatory features using dynamic Bayesian networks. J. Acoust. Soc. Am. 1 October 2011; 130 (4_Supplement): 2408. https://doi.org/10.1121/1.3654653
Download citation file: