Human level speech recognition has proved to be an elusive goal because of the many sources of variability that affect speech: from stationary and dynamic noise, microphone variability, and speaker variability to variability at phonetic, prosodic, and grammatical levels. Over the past 50 years, Jim Flanagan has been a continuous source of encouragement and inspiration to the speech recognition community. While early isolated word systems primarily used acoustic knowledge, systems in the 1970s found mechanisms to represent and utilize syntactic (e.g., information retrieval) and semantic knowledge (e.g., Chess) in speech recognition systems. As vocabularies became larger, leading to greater ambiguity and perplexity, we had to explore the use task specific and context specific knowledge to reduce the branching factors. As the need arose for systems that can be used by open populations using telephone quality speech, we developed learning techniques that use very large data sets and noise adaptation methods. We still have a long way to go before we can satisfactorily handle unrehearsed spontaneous speech, speech from non‐native speakers, and dynamic learning of new words, phrases, and grammatical forms.
Skip Nav Destination
Meeting abstract. No PDF available.
October 01 2004
Fifty years of progress in speech recognition
J. Acoust. Soc. Am. 116, 2498 (2004)
Raj Reddy; Fifty years of progress in speech recognition. J. Acoust. Soc. Am. 1 October 2004; 116 (4_Supplement): 2498. https://doi.org/10.1121/1.4784968
Download citation file: