This paper reports the results of representing the 1 000 000‐word Brown Corpus by a set of morphs and their underlying morphemes. The detailed procedure, including practical considerations for handling such a large data base, is discussed, and the algorithm which decomposes words into morphs is presented. It was found that many words decompose into morphs ambiguously, so that a set of selectional rules is needed to indicate the correct morph sequence. Thus affixation is generally preferred over compounding in English. The data base for obtaining these rules is discussed. Some statistics for the resulting lexicon are also presented. Finally, a procedure for merging the morph lexicon (spellings) with the Merriam Pocket Dictionary to obtain pronunciations and parts of speech is presented, together with a description of the resulting dictionary. [This work was supported in part by the National Institutes of Health.]

This content is only available via PDF.