This paper investigates the theoretical basis for estimating vocal-tract length (VTL) from the formant frequencies of vowel sounds. A statistical inference model was developed to characterize the relationship between vowel type and VTL, on the one hand, and formant frequency and vocal cavity size, on the other. The model was applied to two well known developmental studies of formant frequency. The results show that VTL is the major source of variability after vowel type and that the contribution due to other factors like developmental changes in oral-pharyngeal ratio is small relative to the residual measurement noise. The results suggest that speakers adjust the shape of the vocal tract as they grow to maintain a specific pattern of formant frequencies for individual vowels. This formant-pattern hypothesis motivates development of a statistical-inference model for estimating VTL from formant-frequency data. The technique is illustrated using a third developmental study of formant frequencies. The VTLs of the speakers are estimated and used to provide a more accurate description of the complicated relationship between VTL and glottal pulse rate as children mature into adults.

1.
Adank
,
P.
,
Smits
,
R.
, and
van Hout
,
R.
(
2004
). “
A comparison of vowel normalization procedures for language variation research
,”
J. Acoust. Soc. Am.
116
,
3099
3107
.
2.
Broad
,
D. J.
, and
Wakita
,
H.
(
1977
). “
Piecewise-planar representation of vowel formant frequencies
,”
J. Acoust. Soc. Am.
62
,
1467
1473
.
3.
de Cheveigné
,
A.
, and
Kawahara
,
H.
(
1999
). “
Missing-data model of vowel identification
,”
J. Acoust. Soc. Am.
105
,
3497
3508
.
4.
Cohen
,
L.
(
1993
). “
The scale transform
,”
IEEE Trans. Signal Process.
41
,
3275
3292
.
5.
Fant
,
G.
(
1966
). “
A note on vocal tract size factors and non-uniform F-pattern scalings
,” QPSR Report No. 4, Speech Transmission Laboratory,
Royal Institute of Technology
, Stockholm.
6.
Fant
,
G.
(
1975
). “
Non-uniform vowel normalization
,” Speech Transmission Laboratory,
Royal Institute of Technology
, Stockholm. QPSR Report No. 2–3.
7.
Fitch
,
W. T.
, and
Giedd
,
J.
(
1999
). “
Morphology and development of the human vocal tract: A study using magnetic resonance imaging
,”
J. Acoust. Soc. Am.
106
,
1511
1522
.
8.
Ghahramani
,
Z.
, and
Hinton
,
G. E.
(
1996
). “
The EM algorithm for mixtures of factor analyzers
,” University of Toronto Technical Report No. CRG-TR-96-1, http://www.learning.eng.cam.ac.uk/zoubin/papers.html (Last viewed January, 2008).
9.
González
,
J.
(
2004
). “
Formant frequencies and body size of speaker: A weak relationship in adult humans
,”
J. Phonetics
32
,
277
287
.
10.
Hillenbrand
,
J. M.
,
Getty
,
L. A.
,
Clark
,
M. J.
, and
Wheeler
,
K.
(
1995
). “
Acoustic characteristics of American English vowels
,”
J. Acoust. Soc. Am.
97
,
3099
4111
.
11.
Huber
,
J. E.
,
Stathopoulos
,
E. T.
,
Curione
,
G. M.
,
Ash
,
T. A.
, and
Johnson
,
K.
(
1999
). “
Formants of children, women, and men: The effects of vocal intensity variation
,”
J. Acoust. Soc. Am.
106
,
1532
1542
.
12.
Irino
,
T.
, and
Patterson
,
R. D.
(
2002
). “
Segregating information about the size and shape of the vocal tract using a time-domain auditory model: The stabilized wavelet-Mellin transform
,”
Speech Commun.
36
,
181
203
.
13.
Kawahara
,
H.
, and
Irino
,
T.
(
2004
). “
Underlying principles of a high-quality speech manipulation system STRAIGHT and its application to speech segregation
,” in
Speech Separation by Humans and Machines
, edited by
P.
Divenyi
(
Kluwer Image Analysis Academic
,
Norwell, MA
), pp.
167
180
.
14.
Kawahara
,
H.
,
Masuda-Kasuse
,
I.
, and
de Cheveigne
,
A.
(
1999
). “
Restructuring speech representations using pitch-adaptive time-frequency smoothing and instantaneous-frequency-based F0 extraction: Possible role of repetitive structure in sounds
,”
Speech Commun.
27
,
187
207
.
15.
Lee
,
S.
,
Potamianos
,
A.
, and
Narayanan
,
S.
(
1999
). “
Acoustics of children's speech: Developmental changes of temporal and spectral parameters
,”
J. Acoust. Soc. Am.
105
,
1455
1468
.
16.
Lloyd
,
R. J.
(
1890
). “
Speech sounds: Their nature and causation (I)
,”
Phonetische Studien
3
,
251
278
.
17.
Mackay
,
D. J.
(
2003
).
Information Theory, Inference and Learning Algorithms
(
Cambridge University Press
,
Cambridge, UK
).
18.
Miller
,
J. D.
(
1989
). “
Auditory-perceptual interpretation of the vowel
,”
J. Acoust. Soc. Am.
85
,
2114
2133
.
19.
McGowan
,
R. S.
(
2006
). “
Perception of synthetic vowel exemplars of 4year old children and estimation of their corresponding vocal tract shapes
,”
J. Acoust. Soc. Am.
129
,
2850
2858
.
20.
Monsen
,
R. B.
, and
Engebretson
,
A. M.
(
1983
). “
The accuracy of formant frequency measurements: A comparison of spectrographic analysis and linear prediction
,”
J. Speech Hear. Res.
36
,
89
97
.
21.
Patterson
,
R. D.
,
van Dinther
,
R.
, and
Irino
,
T.
(
2007
). “
The robustness of bio-acoustic communication and the role of normalization
,”
Proceedings of the 19th International Congress on Acoustics
,
Madrid
, September, p.
07
011
.
23.
Peterson
,
G. E.
, and
Barney
,
H. I.
(
1952
). “
Control methods used in the study of vowels
,”
J. Acoust. Soc. Am.
24
,
75
184
.
24.
Potter
,
R. K.
, and
Steinberg
,
J. C.
(
1950
). “
Toward the specification of speech
,”
J. Acoust. Soc. Am.
22
,
807
820
.
25.
Roweis
,
S.
, and
Ghahramani
,
Z.
(
1999
). “
A unifying review of linear Gaussian models
,”
Neural Comput.
11
,
305
345
.
26.
Umesh
,
S.
,
Bharath Kumar
,
S. V.
,
Vinay
,
M. K.
,
Sharma
,
R.
, and
Sinha
,
R.
(
2002
). “
A simple approach to non-uniform vowel normalization
,”
ICASSP
,
Orlando, FL
.
27.
Umesh
,
S.
,
Cohen
,
L.
,
Marinovic
,
N.
, and
Nelson
,
D. J.
(
1999
). “
Scale-transform in speech analysis
,”
IEEE Trans. Speech Audio Process.
7
,
40
45
.
28.
Vallabha
,
G. K.
, and
Tuller
,
B.
(
2002
). “
Systematic errors in the formant analysis of steady-state vowels
,”
Speech Commun.
38
,
141
160
.
29.
Welling
,
M.
, and
Ney
,
H.
(
2004
). “
Speaker adaptive modeling by vocal tract normalization
,”
IEEE Trans. Speech Audio Process.
10
,
415
426
.
You do not currently have access to this content.