In this paper, a method is developed to employ vowel duration properties in a hidden Markov model (HMM)‐based large vocabulary speaker trained recognition system. It is found that each of the vowel phonemes spoken in isolated words can be divided into three allophones, each corresponding to a largely distinctive range of vowel durations. Such a division is based upon the phonetic context where the vowel occurs. In order to incorporate the durational information, each vowel’s HMM is trained using a maximum‐likelihood method with three separate sets of transition probabilities, corresponding to the three allophones. The output distributions of the HMM are assumed to be the same for all three allophones and trained jointly, to make best use of the limited number of available training tokens. The duration‐specific HMMs for vowel allophones have been evaluated in isolated word recognition experiments for two male speakers. The results show that the performance of the recognizer is improved, reducing the error rate by approximately 14% compared with recognition results without the use of the vowel durational models. The performance improvement resulting from use of the vowel durational models is due to reduction of postvocalic consonant errors arising from their contextual correlation with vowels of different durations, as well as to improved discrimination between vowel phonemes.
Skip Nav Destination
Article navigation
August 1989
August 01 1989
Use of vowel duration information in a large vocabulary word recognizer Available to Purchase
L. Deng;
L. Deng
INRS‐Télécommunications, 3 Place du Commerce, Montreal, Quebec H3E 1H6, Canada
Search for other works by this author on:
M. Lennig;
M. Lennig
INRS‐Télécommunications, 3 Place du Commerce, Montreal, Quebec H3E 1H6, Canada
Search for other works by this author on:
P. Mermelstein
P. Mermelstein
INRS‐Télécommunications, 3 Place du Commerce, Montreal, Quebec H3E 1H6, Canada
Search for other works by this author on:
L. Deng
INRS‐Télécommunications, 3 Place du Commerce, Montreal, Quebec H3E 1H6, Canada
M. Lennig
INRS‐Télécommunications, 3 Place du Commerce, Montreal, Quebec H3E 1H6, Canada
P. Mermelstein
INRS‐Télécommunications, 3 Place du Commerce, Montreal, Quebec H3E 1H6, Canada
J. Acoust. Soc. Am. 86, 540–548 (1989)
Article history
Received:
July 06 1988
Accepted:
March 21 1989
Citation
L. Deng, M. Lennig, P. Mermelstein; Use of vowel duration information in a large vocabulary word recognizer. J. Acoust. Soc. Am. 1 August 1989; 86 (2): 540–548. https://doi.org/10.1121/1.398233
Download citation file:
Pay-Per-View Access
$40.00
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
Citing articles via
Focality of sound source placement by higher (ninth) order ambisonics and perceptual effects of spectral reproduction errors
Nima Zargarnezhad, Bruno Mesquita, et al.
A survey of sound source localization with deep learning methods
Pierre-Amaury Grumiaux, Srđan Kitić, et al.
Variation in global and intonational pitch settings among black and white speakers of Southern American English
Aini Li, Ruaridh Purse, et al.
Related Content
Relationships between expressive vocabulary size and spoken word recognition in children
J. Acoust. Soc. Am. (October 1999)
Vocabulary and syntactic complexity in speech understanding systems
J. Acoust. Soc. Am. (August 2005)
Fast search strategy in a large vocabulary word recognizer
J. Acoust. Soc. Am. (December 1988)
A probabilistic framework for landmark detection based on phonetic features for automatic speech recognition
J. Acoust. Soc. Am. (February 2008)
Structural design of hidden Markov model speech recognizer using multivalued phonetic features: Comparison with segmental speech units
J. Acoust. Soc. Am. (December 1992)