The use of Very Large Vocabularies (∼20 000 words) imposes two major constraints on the design of isolated or connected word recognition systems: the efficient reduction of the large search space to a subvocabulary of manageable size [V. Zue and D. Shipman, J. Acoust. Soc. Am. Suppl. 1 71, C7 (1982)] and the robustness of the search space reduction heuristics involved. Psychological evidence suggests that prosodic and robust segmental features may be used as preliminary decision criteria in human speech perception. In this paper we present attempts to apply such heuristics to the design of VLVR systems. As a data base a 20 000 word vocabulary has been compiled providing phonemic, prosodic, and pragmatic information. Based on this corpus, the tradeoffs between the robustness of certain features and their power to reduce the search space have been studied. Our results indicate that combining prosodic information (syllable counts, stress patterns) with a set of robustly detectable features (frication, stops, vowel nuclei of stressed syllable) can reduce the vocabulary size to groups of less than 400 words. Additional potentially useful prosodic features, e.g., rhythmic patterns, are currently being investigated.

This content is only available via PDF.