The concept of a formant as representing a peak in the pressure spectrum is assumed to be applicable to both voiced and unvoiced speech. A scheme has been developed that estimates five formant frequencies and amplitudes. The cepstrum technique is used, along with intensity, zero‐crossing, and slope‐change information, to make voiced‐unvoiced decisions and to estimate the fundamental voicing frequency. An attempt is made to detect bursts of energy (due primarily to stop consonants) in the time waveform and to analyze them with sufficient time resolution so that the burst characteristic is preserved. The formant estimating procedure is based on assumed formant exclusive domains in frequency space. Several smoothing procedures are used to remove discontinuities from the formant and fundamental frequency data, and the smoothed values are used to control a five‐pole parallel synthesizer. The synthesizer is excited with a pulse train, noise, or a mixture of the two. Examples of the natural and synthetic speech are presented.

This content is only available via PDF.