The use of Very Large Vocabularies (∼20 000 words) imposes two major constraints on the design of isolated or connected word recognition systems: the efficient reduction of the large search space to a subvocabulary of manageable size [V. Zue and D. Shipman, J. Acoust. Soc. Am. Suppl. 1 71, C7 (1982)] and the robustness of the search space reduction heuristics involved. Psychological evidence suggests that prosodic and robust segmental features may be used as preliminary decision criteria in human speech perception. In this paper we present attempts to apply such heuristics to the design of VLVR systems. As a data base a 20 000 word vocabulary has been compiled providing phonemic, prosodic, and pragmatic information. Based on this corpus, the tradeoffs between the robustness of certain features and their power to reduce the search space have been studied. Our results indicate that combining prosodic information (syllable counts, stress patterns) with a set of robustly detectable features (frication, stops, vowel nuclei of stressed syllable) can reduce the vocabulary size to groups of less than 400 words. Additional potentially useful prosodic features, e.g., rhythmic patterns, are currently being investigated.
Skip Nav Destination
Article navigation
November 1982
August 12 2005
Very large vocabulary recognition (VLVR): using prosodic and spectral filters
Alex Waibel
Alex Waibel
Computer Science Department, Carnegie‐Mellon University, Pittsburgh, PA 15213
Search for other works by this author on:
J. Acoust. Soc. Am. 72, S32 (1982)
Citation
Alex Waibel; Very large vocabulary recognition (VLVR): using prosodic and spectral filters. J. Acoust. Soc. Am. 1 November 1982; 72 (S1): S32. https://doi.org/10.1121/1.2019828
Download citation file:
86
Views
Citing articles via
A survey of sound source localization with deep learning methods
Pierre-Amaury Grumiaux, Srđan Kitić, et al.
Variation in global and intonational pitch settings among black and white speakers of Southern American English
Aini Li, Ruaridh Purse, et al.
Related Content
Decisions about features
J Acoust Soc Am (August 2005)
A powerful post‐processing algorithm for time domain pitch trackers
J Acoust Soc Am (August 2005)
Alignment classification method to facilitate automatic acoustic‐phonetic statistics collection
J Acoust Soc Am (August 2005)
Unifying dynamic programming methods
J Acoust Soc Am (August 2005)
Phone recognition in continuous speech (Dutch)
J Acoust Soc Am (August 2005)