A system has been developed for speech synthesis from Japanese orthographic text of Japanese. The system consists of four processing stages. The linguistic processing stage utilizes natural language processing techniques for extracting lexical, syntactic, semantic, and discourse information from each paragraph of the input text. The phonetic processing stage utilizes this information to derive a string of segmental and prosedie symbols for the entire paragraph. The acoustic processing stage generates time‐varying patterns of parameters from these symbols to control the final stage, which is a formant‐type synthesizer. The Fujisaki‐Ljungqvist model is adopted for the excitation of the voiced sounds [Proc. ICASSP 86, 1605–1608 (1986)], and its fundamental frequency is controlled by a model of F0 contour generation [H. Fujisaki and K. Hirose, J. Acoust. Soc. Jpn. (E) 5, 233–242 (1984)]. The segmental features, on the other hand, are synthesized by concatenating pole‐zero frequency patterns prestored for each syllable. The validity of the system, especially of the prosodic feature synthesis, was confirmed by the naturalness of the accent and intonation of the synthesized speech. [Work supported by Grant‐in‐Aid for Scientific Research on Priority Areas from Ministry of Education, Science and Culture of Japan, No. 63608002.]

This content is only available via PDF.