This paper reports the results of representing the 1 000 000‐word Brown Corpus by a set of morphs and their underlying morphemes. The detailed procedure, including practical considerations for handling such a large data base, is discussed, and the algorithm which decomposes words into morphs is presented. It was found that many words decompose into morphs ambiguously, so that a set of selectional rules is needed to indicate the correct morph sequence. Thus affixation is generally preferred over compounding in English. The data base for obtaining these rules is discussed. Some statistics for the resulting lexicon are also presented. Finally, a procedure for merging the morph lexicon (spellings) with the Merriam Pocket Dictionary to obtain pronunciations and parts of speech is presented, together with a description of the resulting dictionary. [This work was supported in part by the National Institutes of Health.]
August 11 2005
Morph Lexicon for Speech Synthesis by Rule
Jonathan Allen
Jonathan Allen
Department of Electrical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Search for other works by this author on:
J. Acoust. Soc. Am. 51, 111 (1972)
Citation
Jonathan Allen; Morph Lexicon for Speech Synthesis by Rule. J. Acoust. Soc. Am. 1 January 1972; 51 (1A_Supplement): 111. https://doi.org/10.1121/1.1981295
Download citation file: