The measurement of formant frequencies of vowels is among the most common measurements in speech studies, but measurements are known to be biased by the particular fundamental frequency (F0) exciting the formants. Approaches to reducing the errors were assessed in two experiments. In the first, synthetic vowels were constructed with five different first formant (F1) values and nine different F0 values; formant bandwidths, and higher formant frequencies, were constant. Input formant values were compared to manual measurements and automatic measures using the linear prediction coding-Burg algorithm, linear prediction closed-phase covariance, the weighted linear prediction-attenuated main excitation (WLP-AME) algorithm [Alku, Pohjalainen, Vainio, Laukkanen, and Story (2013). J. Acoust. Soc. Am. 134(2), 1295–1313], spectra smoothed cepstrally and by averaging repeated discrete Fourier transforms. Formants were also measured manually from pruned reassigned spectrograms (RSs) [Fulop (2011). Speech Spectrum Analysis (Springer, Berlin)]. All but WLP-AME and RS had large errors in the direction of the strongest harmonic; the smallest errors occur with WLP-AME and RS. In the second experiment, these methods were used on vowels in isolated words spoken by four speakers. Results for the natural speech show that F0 bias affects all automatic methods, including WLP-AME; only the formants measured manually from RS appeared to be accurate. In addition, RS coped better with weaker formants and glottal fry.

1.
Alku
,
P.
,
Pohjalainen
,
J.
,
Vainio
,
M.
,
Laukkanen
,
A.-M.
, and
Story
,
B. H.
(
2013
). “
Formant frequency estimation of high-pitched vowels using weighted linear prediction
,”
J. Acoust. Soc. Am.
134
(
2
),
1295
1313
.
2.
Allen
,
J.
,
Hunnicutt
,
M. S.
, and
Klatt
,
D. H.
(
1987
).
From Text to Speech
(
Cambridge University Press
,
Cambridge
), pp.
108
122
.
3.
Andersen
,
N.
(
1974
). “
On the calculation of filter coefficients for maximum entropy spectral analysis
,”
Geophysics
39
,
69
72
.
4.
ANSI
(
2013
). ANSI/ASA S1.1-2013,
American National Standard Acoustical Terminology
(
Acoustical Society of America
,
Melville, NY
).
5.
Assmann
,
P. F.
, and
Katz
,
W. F.
(
2000
). “
Time-varying spectral change in the vowels of children and adults
,”
J. Acoust. Soc. Am.
108
,
1856
1866
.
6.
Atal
,
B. S.
(
1975
). “
Linear prediction of speech—Recent applications to speech analysis
,” in
Speech Recognition
, edited by
R. D.
Reddy
(
Elsevier
,
New York
), pp.
221
230
.
7.
Atal
,
B. S.
, and
Schroeder
,
M. R.
(
1978
). “
Linear prediction analysis of speech based on a pole-zero representation
,”
J. Acoust. Soc. Am.
64
,
1310
1318
.
8.
Baer
,
T.
,
Gore
,
J. C.
,
Gracco
,
L. C.
, and
Nye
,
P. W.
(
1991
). “
Analysis of vocal tract shape and dimensions using magnetic resonance imaging: Vowels
,”
J. Acoust. Soc. Am.
90
(
2
),
799
828
.
9.
Boersma
,
P.
, and
Weenink
,
D.
(
2013
). “
Praat: Doing phonetics by computer (version 5.3.56) [computer program]
,” http://www.praat.org.
10.
Bradlow
,
A.
(
2002
). “
Confluent talker- and listener-oriented forces in clear speech production
,” in
Lab Phonology7
, edited by
C.
Gussenhoven
and
N.
Warner
(
Mouton de Gruyter
,
Berlin, Germany
), pp.
241
273
.
11.
Burg
,
J. P.
(
1967
). “
Maximum entropy spectral analysis
,” in
Proc. 37th Meeting of the Society of Exploration Geophysicists
, October 31,
Oklahoma City
,
OK
.
12.
Burris
,
C.
,
Vorperian
,
H. K.
,
Fourakis
,
M.
,
Kent
,
R. D.
, and
Bolt
,
D. M.
(
2014
). “
Quantitative and descriptive comparison of four acoustic analysis systems: Vowel measurements
,”
J. Speech Lang. Res.
57
,
26
45
.
13.
Castelli
,
E.
, and
Badin
,
P.
(
1988
). “
Vocal tract transfer functions with white noise excitation—Application to the naso-pharyngeal tract
,” in
Proc. 7th FASE Symp.
, Edinburgh, pp.
415
422
.
14.
Chiba
,
T.
, and
Kajiyama
,
M.
(
1941
).
The Vowel: Its Nature and Structure
(
Tokyo-Kaiseikan
,
Tokyo
), pp.
115
154
.
15.
Childers
,
D. G.
(
1978
).
Modern Spectrum Analysis
(
IEEE
,
New York)
, pp.
34
41
, 252–255.
16.
Clopper
,
C.
, and
Pierrehumbert
,
J.
(
2008
). “
Effects of semantic predictability and regional dialect on vowel space reduction
,”
J. Acoust. Soc. Am.
124
(
3
),
1682
1688
.
17.
Davies
,
P. O. A. L.
,
McGowan
,
R. S.
, and
Shadle
,
C. H.
(
1992
). “
Practical flow duct acoustics applied to the vocal tract
,” in
Vocal Fold Physiology: Frontiers in Basic Science
, edited by
I. R.
Titze
(
Singular Pub. Group, Inc.
,
San Diego
), pp.
93
142
.
18.
de Cheveigné
,
A.
, and
Kawahara
,
H.
(
1999
). “
Missing-data model of vowel identification
,”
J. Acoust. Soc. Am.
105
,
3497
3508
.
19.
Deng
,
L.
,
Lee
,
L. J.
,
Attias
,
H.
, and
Acero
,
A.
(
2007
). “
Adaptive Kalman filtering and smoothing for tracking vocal tract resonances using a continuous-valued hidden dynamic model
,”
IEEE Trans. Audio Speech Lang. Process.
15
(
1
),
13
23
.
20.
Djeradi
,
A.
,
Guerin
,
B.
,
Badin
,
P.
, and
Perrier
,
P.
(
1991
). “
Measurement of the acoustic transfer function of the vocal tract: A fast and accurate method
,”
J. Phonetics
19
,
387
395
.
21.
Drugman
,
T.
,
Thomas
,
M.
,
Gudnason
,
J.
,
Naylor
,
P.
, and
Dutoit
,
T.
(
2012
). “
Detection of glottal closure instants from speech signals: A quantitative review
,”
IEEE Trans. Audio, Speech, Lang. Process.
20
(
3
),
994
1006
.
22.
Epps
,
J.
,
Smith
,
J. R.
, and
Wolfe
,
J.
(
1997
). “
A novel instrument to measure acoustic resonances of the vocal tract during phonation
,”
Meas. Sci. Technol.
8
,
1112
1121
.
23.
Fant
,
G.
(
1960
).
The Acoustic Theory of Speech Production
(
Mouton
,
The Hague
), pp.
20
, 53.
24.
Fujimura
,
O.
, and
Lindqvist
,
J.
(
1971
). “
Sweep-tone measurements of vocal tract characteristics
,”
J. Acoust. Soc. Am.
49
,
541
557
.
25.
Fulop
,
S. A.
(
2007
). “
Phonetic applications of the time-corrected instantaneous frequency spectrogram
,”
Phonetica
64
,
237
262
.
26.
Fulop
,
S. A.
(
2010
). “
Accuracy of formant measurement for synthesized vowels using the reassigned spectrogram and comparison with linear prediction
,”
J. Acoust. Soc. Am.
127
(
4
),
2114
2117
.
27.
Fulop
,
S. A.
(
2011
).
Speech Spectrum Analysis
(
Springer
,
Berlin
), pp.
127
201
.
28.
Fulop
,
S. A.
, and
Disner
,
S. F.
(
2012
). “
Examining the voice bar
,”
POMA
14
,
060002
.
29.
Harrington
,
J.
, and
Cassidy
,
S.
(
1999
).
Techniques in Speech Acoustics
(
Kluwer Academic
,
Dordrecht, The Netherlands
), pp.
174
177
, 222–225.
29.
Henrich
,
N.
,
Smith
,
J.
, and
Wolfe
,
J.
(
2011
). “
Vocal tract resonances in singing: Strategies used by sopranos, altos, tenors, and baritones
,”
J. Acoust. Soc. Am.
129
(
2
),
1024
1035
.
30.
Hillenbrand
,
J.
,
Getty
,
L. A.
,
Clark
,
M. J.
, and
Wheeler
,
K.
(
1995
). “
Acoustic characteristics of American English vowels
,”
J. Acoust. Soc. Am.
97
(
5
),
3099
3111
.
31.
Holmes
,
J. N.
(
1976
). “
Formant excitation before and after glottal closure
,” in
Proc. International Conf. Acoust. Speech and Signal Proc.
, pp.
39
42
.
32.
Joliveau
,
E.
,
Smith
,
J.
, and
Wolfe
,
J.
(
2004
). “
Vocal tract resonances in singing: The soprano voice
,”
J. Acoust. Soc. Am.
116
(
4
),
2434
2439
.
33.
Klatt
,
D.
(
1986
). “
Representation of the first formant in speech recognition and LF models of the auditory periphery
,” in
Proc. Montreal Satellite Symposium on Speech Recognition, 12th International Cong. on Acoust.
, edited by
P.
Mermelstein
, Toronto (July).
34.
Liberman
,
A. M.
, and
Whalen
,
D. H.
(
2000
). “
On the relation of speech to language
,”
Trends Cognit. Sci.
4
(
5
),
187
196
.
35.
Mehta
,
D. D.
,
Rudoy
,
D.
, and
Wolfe
,
P. J.
(
2012
). “
Kalman-based autoregressive moving average modeling and inference for formant and antiformant tracking
,”
J. Acoust. Soc. Am.
123
(
3
),
1732
1746
.
36.
Monsen
,
R. B.
(
1976
). “
The production of English vowels by deaf adolescents
,”
J. Phonetics
4
,
189
198
.
37.
Monsen
,
R. B.
, and
Engebretson
,
A. M.
(
1983
). “
The accuracy of formant frequency measurements: A comparison of spectrographic analysis and linear prediction
,”
J. Speech Hear. Res.
26
(
3
),
89
97
.
38.
Narayanan
,
S.
,
Alwan
,
A.
, and
Song
,
Y.
(
1997
). “
New results in vowel production: MRI, EPG, and acoustic data
,” in
Proc. Eurospeech 97
, Rhodes, Greece, Vol.
2
, pp.
1007
1010
.
39.
Pham Thi Ngoc
,
Y.
, and
Badin
,
P.
(
1994
). “
Vocal tract acoustic transfer function measurements: Further developments and applications
,”
J. Phys. IV C
5
,
549
552
.
40.
Peterson
,
G. E.
, and
Barney
,
H. L.
(
1952
). “
Control methods used in a study of the vowels
,”
J. Acoust. Soc. Am.
24
(
2
),
175
184
.
41.
Potter
,
R. K.
, and
Steinberg
,
J. C.
(
1950
). “
Toward the specification of speech
,”
J. Acoust. Soc. Am.
22
(
6
),
807
820
.
42.
Press
,
W. H.
,
Teukolsky
,
S. A.
,
Vetterling
,
W. T.
, and
Flannery
,
B. P.
(
1986
).
Numerical Recipes in C: The Art of Scientific Computing
(
Cambridge University Press
,
Cambridge
), pp.
430
435
.
43.
Quatieri
,
T. F.
(
2002
).
Discrete-Time Speech Signal Processing
(
Prentice-Hall
,
Upper Saddle River, NJ
), pp.
158
159
.
44.
Rabiner
,
L. R.
,
Cheng
,
M. J.
,
Rosenberg
,
A. E.
, and
McGonegal
,
C. A.
(
1976
). “
A comparative study of several pitch detection algorithms
,”
IEEE Trans. Acoust., Speech, Signal Process.
24
,
399
417
.
45.
Shue
,
Y.-L.
,
Keating
,
P.
,
Vicenik
,
C.
, and
Yu
,
K.
(
2011
). “
Voicesauce: A program for voice analysis
,” in
Proceedings of the Seventeenth International Congress of Phonetic Sciences
, Hong Kong, pp.
1846
1849
.
46.
Story
,
B. H.
,
Titze
,
I. R.
, and
Hoffman
,
E. A.
(
1996
). “
Vocal tract area functions from magnetic resonance imaging
,”
J. Acoust. Soc. Am.
100
(
1
),
537
554
.
46.
Story
,
B. H.
,
Titze
,
I. R.
, and
Hoffman
,
E. A.
(
1998
). “
Vocal tract area functions for an adult female speaker based on volumetric imaging
,”
J. Acoust. Soc. Am.
104
(
1
),
471
487
.
47.
Swerdlin
,
Y.
,
Smith
,
J.
, and
Wolfe
,
J.
(
2010
). “
The effect of a whisper and creak vocal mechanisms on vocal tract resonances
,”
J. Acoust. Soc. Am.
127
(
4
),
2590
2598
.
48.
Titze
,
I. R.
,
Baken
,
R. J.
,
Bozeman
,
K. W.
,
Granqvist
,
S.
,
Henrich
,
N.
,
Herbst
,
C. T.
,
Howard
,
D. M.
,
Hunter
,
E. J.
,
Kaelin
,
D.
,
Kent
,
R. D.
,
Kreiman
,
J.
,
Kob
,
M.
,
Löfqvist
,
A.
,
McCoy
,
S.
,
Miller
,
D. G.
,
Noé
,
H.
,
Scherer
,
R. C.
,
Smith
,
J. R.
,
Story
,
B. H.
,
Švec
,
J. G.
,
Ternström
,
S.
, and
Wolfe
,
J.
(
2015
). “
Toward a consensus on symbolic notation of harmonics, resonances and formants in vocalization
,”
J. Acoust. Soc. Am.
137
(
5
),
3005
3007
.
49.
Vallabha
,
G. K.
, and
Tuller
,
B.
(
2002
). “
Systematic errors in the formant analysis of steady-state vowels
,”
Speech Commun.
38
,
141
160 (2002)
.
50.
Woehrling
,
C.
, and
de Maureuil
,
P. B.
(
2007
). “
Comparing Praat and Snack formant measurements on two large corpora of northern and southern French
,” in
Proc. Interspeech 2007
, August, Antwerp, Belgium, pp.
1006
1009
.
51.
Yuan
,
J.
, and
Liberman
,
M.
(
2008
). “
Speaker identification on the SCOTUS corpus
,” in
Proceedings of Acoustics'08
, Paris, France, pp.
5685
5688
.
52.
Zhang
,
C.
,
Morrison
,
G. S.
,
Ochoa
,
F.
, and
Enzinger
,
E.
(
2013
). “
Reliability of human-supervised formant-trajectory measurement for forensic voice comparison
,”
J. Acoust. Soc. Am.
133
(
1
),
EL54
EL60
.
You do not currently have access to this content.