A method for synthesizing vocal-tract spectra from phoneme sequences by mimicking the speech production process of humans is presented. The model consists of four main processes and is particularly characterized by an adaptive formation of articulatory movements. First, our model determines the time when each phoneme is articulated. Next, it generates articulatory constraints that must be met for the production of each phoneme, and then it generates trajectories of the articulatory movements that satisfy the constraints. Finally, the time sequence of spectra is estimated from the produced articulatory trajectories. The articulatory constraint of each phoneme does not change with the phonemic context, but the contextual variability of speech is reproduced because of the dynamic articulatory model. The accuracy of the synthesis model was evaluated using data collected by the simultaneous measurement of speech and articulatory movements. The accuracy of the phonemic timing estimates were measured and compared the synthesized results to the measured results. Experimental results showed that the model captured the contextual variability of both the articulatory movements and speech acoustics.
Skip Nav Destination
,
Article navigation
January 2007
January 01 2007
Generation of the vocal tract spectrum from the underlying articulatory mechanism
Tokihiko Kaburagi;
Tokihiko Kaburagi
Department of Acoustic Design, Faculty of Design,
Kyushu University
, 4-9-1 Shiobaru, Minami-ku, Fukuoka, 815-8540 Japan
Search for other works by this author on:
Jiji Kim
Jiji Kim
NTT Communications Corporation
, 26-1 Sakuragaoka-cho, Shibuya-ku, Tokyo, 150-8512 Japan
Search for other works by this author on:
Tokihiko Kaburagi
Jiji Kim
Department of Acoustic Design, Faculty of Design,
Kyushu University
, 4-9-1 Shiobaru, Minami-ku, Fukuoka, 815-8540 JapanJ. Acoust. Soc. Am. 121, 456–468 (2007)
Article history
Received:
April 02 2006
Accepted:
October 09 2006
Citation
Tokihiko Kaburagi, Jiji Kim; Generation of the vocal tract spectrum from the underlying articulatory mechanism. J. Acoust. Soc. Am. 1 January 2007; 121 (1): 456–468. https://doi.org/10.1121/1.2384847
Download citation file:
Pay-Per-View Access
$40.00
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
Citing articles via
Focality of sound source placement by higher (ninth) order ambisonics and perceptual effects of spectral reproduction errors
Nima Zargarnezhad, Bruno Mesquita, et al.
Related Content
Developmental and cross-linguistic variation in the infant vowel space: The case of Canadian English and Canadian French
J. Acoust. Soc. Am. (October 2006)
Articulatory limit and extreme segmental reduction in Taiwan Mandarin
J. Acoust. Soc. Am. (December 2013)
The effects of tongue loading and auditory feedback on vowel production
J. Acoust. Soc. Am. (February 2011)
Coupling relations underlying complex coordinative patterns in speech production
J. Acoust. Soc. Am. (April 2015)
Control of phonemic length contrast and speech rate in vocalic and consonantal syllable nuclei
J. Acoust. Soc. Am. (October 2011)