All-pole modeling is a widely used formant estimation method, but its performance is known to deteriorate for high-pitched voices. In order to address this problem, several all-pole modeling methods robust to fundamental frequency have been proposed. This study compares five such previously known methods and introduces a technique, Weighted Linear Prediction with Attenuated Main Excitation (WLP-AME). WLP-AME utilizes temporally weighted linear prediction (LP) in which the square of the prediction error is multiplied by a given parametric weighting function. The weighting downgrades the contribution of the main excitation of the vocal tract in optimizing the filter coefficients. Consequently, the resulting all-pole model is affected more by the characteristics of the vocal tract leading to less biased formant estimates. By using synthetic vowels created with a physical modeling approach, the results showed that WLP-AME yields improved formant frequencies for high-pitched sounds in comparison to the previously known methods (e.g., relative error in the first formant of the vowel [a] decreased from 11% to 3% when conventional LP was replaced with WLP-AME). Experiments conducted on natural vowels indicate that the formants detected by WLP-AME changed in a more regular manner between repetitions of different pitch than those computed by conventional LP.

1.
Alku
,
P.
,
Airas
,
M.
,
Björkner
,
E.
, and
Sundberg
,
J.
(
2006
). “
An amplitude quotient based method to analyze changes in the shape of the glottal pulse in the regulation of vocal intensity
,”
J. Acoust. Soc. Am.
120
,
1052
1062
.
2.
Alku
,
P.
,
Magi
,
C.
,
Yrttiaho
,
S.
,
Bäckström
,
T.
, and
Story
,
B.
(
2009
). “
Closed phase covariance analysis based on constrained linear prediction for glottal inverse filtering
,”
J. Acoust. Soc. Am.
120
,
3289
3305
.
3.
Alku
,
P.
, and
Vilkman
,
E.
(
1996
). “
A comparison of glottal voice source quantification parameters in breathy, normal, and pressed phonation of female and male speakers
,”
Folia Phoniatr. Logop.
48
,
240
254
.
4.
Baayen
,
R.
,
Davidson
,
D.
, and
Bates
,
D.
(
2008
). “
Mixed-effects modeling with crossed random effects for subjects and items
,”
J. Mem. Lang.
59
,
390
412
.
5.
Deng
,
L.
,
Lee
,
L.
,
Attias
,
H.
, and
Acero
,
A.
(
2007
). “
Adaptive Kalman filtering and smoothing for tracking vocal tract resonances using a continuous-valued hidden dynamic model
,”
IEEE Trans. Audio Speech Lang. Process.
15
,
13
23
.
6.
El-Jaroudi
,
A.
, and
Makhoul
,
J.
(
1991
). “
Discrete all-pole modeling
,”
IEEE Trans. Signal Process.
39
,
411
423
.
7.
Fant
,
G.
(
1970
).
Acoustic Theory of Speech Production
(
Mouton
,
The Hague
), pp.
15
26
.
8.
Fant
,
G.
,
Liljencrants
,
J.
, and
Lin
,
Q.
(
1985
). “
A four-parameter model of glottal flow
,”
STL-QPSR 4
(
Speech, Music and Hearing, Royal Institute of Technology
,
Stockholm, Sweden
), pp.
1
13
.
9.
Fröhlich
,
M.
,
Michaelis
,
D.
, and
Strube
,
H.
(
2001
). “
SIM—Simultaneous inverse filtering and matching of a glottal flow model for acoustic speech signals
,”
J. Acoust. Soc. Am.
110
,
479
488
.
10.
Gobl
,
C.
(
1989
). “
A preliminary study of acoustic voice quality correlates
,”
STL-QPSR 4
(
Speech, Music and Hearing, Royal Institute of Technology
,
Stockholm, Sweden
), pp.
9
22
.
11.
Gold
,
B.
, and
Rabiner
,
L.
(
1968
). “
Analysis of digital and analog formant synthesizers
,”
IEEE Trans. Audio Electroacoust.
16
,
81
94
.
12.
Golub
,
G.
, and
Van Loan
,
C.
(
1983
).
Matrix Computation
(
Johns Hopkins University Press
,
Baltimore, MD
), p.
55
.
13.
Hagiwara
,
R.
(
1997
). “
Dialect variation and formant frequency: The American English vowels revisited
,”
J. Acoust. Soc. Am.
102
,
655
658
.
14.
Hermansky
,
H.
,
Fujisaki
,
H.
, and
Sato
,
Y.
(
1984
). “
Spectral envelope sampling and interpolation in linear predictive analysis of speech
,” in
Proceedings of the International Conference on Acoustics, Speech and Signal Processing
, San Diego, CA, pp. 2.2.1–2.2.4.
15.
Hillenbrand
,
J.
,
Getty
,
L.
,
Clark
,
M.
, and
Wheeler
,
K.
(
1995
). “
Acoustic characteristics of American English vowels
,”
J. Acoust. Soc. Am.
97
,
3099
3111
.
16.
Holmberg
,
E.
,
Hillman
,
R.
, and
Perkell
,
J.
(
1988
). “
Glottal airflow and transglottal air pressure measurements for male and female speakers in soft, normal, and loud voice
,”
J. Acoust. Soc. Am.
84
,
511
529
.
17.
Krishnamurthy
,
A.
, and
Childers
,
D.
(
1986
). “
Two-channel speech analysis
,”
IEEE Trans. Acoust. Speech Signal Process.
34
,
730
743
.
18.
Lee
,
C.-H.
(
1988
). “
On robust linear prediction of speech
,”
IEEE Trans. Acoust. Speech Signal Process.
36
,
642
650
.
19.
Liljencrants
,
J.
(
1985
). “
Speech synthesis with a reflection-type line analog
,”
DS dissertation, Dep. of Speech Comm. and Music Acoustics, Royal Inst. of Technol
.,
Stockholm
,
Sweden
, pp.
1
125
.
20.
Ma
,
C.
,
Kamp
,
Y.
, and
Willems
,
L.
(
1993
). “
Robust signal selection for linear prediction analysis of voice speech
,”
Speech Commun.
12
,
69
81
.
21.
Magi
,
C.
,
Pohjalainen
,
J.
,
Bäckström
,
T.
, and
Alku
,
P.
(
2009
). “
Stabilised weighted linear prediction
,”
Speech Commun.
51
,
401
411
.
22.
Makhoul
,
J.
(
1975a
). “
Linear prediction: A tutorial review
,”
Proc. IEEE
63
,
561
580
.
23.
Makhoul
,
J.
(
1975b
). “
Spectral linear prediction: Properties and applications
,”
IEEE Trans. Acoust. Speech Signal Process.
23
,
283
296
.
24.
Markel
,
J.
, and
Gray
,
A.
, Jr.
(
1976
).
Linear Prediction of Speech
(
Springer-Verlag
,
Berlin
), pp.
1
288
.
25.
Miyoshi
,
Y.
,
Yamato
,
K.
,
Mizoguchi
,
R.
,
Yanagida
,
M.
, and
Kakusho
,
O.
(
1987
). “
Analysis of speech signals of short pitch period by a sample-selective linear prediction
,”
IEEE Trans. Acoust. Speech Signal Process.
35
,
1233
1240
.
26.
Murthy
,
P. S.
, and
Yegnanarayana
,
B.
(
1999
). “
Robustness of group-delay-based method for extraction of significant instants of excitation from speech signals
,”
IEEE Trans. Speech Audio Process.
7
,
609
619
.
27.
Naylor
,
P.
,
Kounoudes
,
A.
,
Gudnason
,
J.
, and
Brookes
,
M.
(
2007
). “
Estimation of glottal closure instants in voiced speech using the DYPSA algorithm
,”
IEEE Trans. Speech Audio Process.
15
,
34
43
.
28.
Olive
,
J.
(
1971
). “
Automatic formant tracking by a Newton-Raphson technique
,”
J. Acoust. Soc. Am.
50
,
661
670
.
29.
Oppenheim
,
A.
, and
Schafer
,
R.
(
1989
).
Discrete-Time Signal Processing
(
Prentice-Hall
,
Englewood Cliffs, NJ)
, pp.
33
39
.
30.
Plumpe
,
M.
,
Quatieri
,
T.
, and
Reynolds
,
D.
(
1999
). “
Modeling of the glottal flow derivative waveform with application to speaker identification
,”
IEEE Trans. Speech Audio Process.
7
,
569
586
.
31.
Potamianos
,
A.
, and
Maragos
,
P.
(
1996
). “
Speech formant frequency and bandwidth tracking using multiband energy demodulation
,”
J. Acoust. Soc. Am.
99
,
3795
3806
.
32.
Rabiner
,
L.
, and
Schafer
,
R.
(
1978
).
Digital Processing of Speech Signals
(
Prentice-Hall
,
Englewood Cliffs, NJ
), Chap. 8, pp.
403
404
.
33.
Rahman
,
S.
, and
Shimamura
,
T.
(
2007
). “
Linear prediction using refined autocorrelation function
,”
EURASIP J. Audio Speech Music Process.
45962
,
1
9
.
34.
R Development Core Team
(
2009
).
A Language and Environment for Statistical Computing
(
R Foundation for Statistical Computing
,
Vienna, Austria
), pp.
1
409
.
35.
Rudoy
,
D.
,
Quatieri
,
T.
, and
Wolfe
,
P.
(
2011
). “
Time-varying autoregressions in speech: Detection theory and applications
,”
IEEE Trans. Audio Speech Lang. Process.
19
,
977
989
.
36.
Saeidi
,
R.
,
Pohjalainen
,
J.
,
Kinnunen
,
T.
, and
Alku
,
P.
(
2010
). “
Temporally weighted linear prediction features for tackling additive noise in speaker verification
,”
IEEE Signal Process. Lett.
17
,
599
602
.
37.
Schnell
,
K.
, and
Lacroix
,
A.
(
2008
). “
Time-varying linear prediction for speech analysis and synthesis
,” in
Proceedings of the International Conference on Acoustics, Speech and Signal Processing
, Las Vegas, Nevada, USA, pp.
3941
3944
.
38.
Shiga
,
Y.
, and
King
,
S.
(
2003
). “
Estimating the spectral envelope of voiced speech using multi-frame analysis
,” in
Proceedings of Interspeech
,
Geneva, Switzerland
, pp.
1737
1740
.
39.
Story
,
B.
(
1995
). “
Speech simulation with an enhanced wave-reflection model of the vocal tract
,” Ph.D. thesis,
University of Iowa
, Iowa City, pp.
1
352
.
40.
Story
,
B.
(
2005
). “
Synergistic modes of vocal tract articulation for American English vowels
,”
J. Acoust. Soc. Am.
118
,
3834
3859
.
41.
Story
,
B.
(
2006
). “
A technique for ‘tuning’ vocal tract area functions based on acoustic sensitivity functions
,”
J. Acoust. Soc. Am.
119
,
715
718
.
42.
Story
,
B.
(
2008
). “
Comparison of magnetic resonance imaging-based vocal tract area functions obtained from the same speaker in 1994 and 2002
,”
J. Acoust. Soc. Am.
123
,
327
335
.
43.
Story
,
B.
(
2013
). “
Phrase-level speech simulation with an airway modulation model of speech production
,”
Comp. Speech Lang.
27
,
989
1010
.
44.
Strube
,
H.
(
1974
). “
Determination of the instant of glottal closure from the speech wave
,”
J. Acoust. Soc. Am.
56
,
1625
1629
.
45.
Titze
,
I.
(
1984
). “
Parameterization of the glottal area, glottal flow, and vocal fold contact area
,”
J. Acoust. Soc. Am.
75
,
570
580
.
46.
Titze
,
I.
(
2002
). “
Regulating glottal airflow in phonation: Application of the maximum power transfer theorem to a low dimensional phonation model
,”
J. Acoust. Soc. Am.
111
,
367
376
.
47.
Titze
,
I.
(
2006
).
The Myoelastic Aerodynamic Theory of Phonation
(
National Center for Voice and Speech
,
Iowa City, IA
), pp.
1
430
.
48.
Vallabha
,
G.
, and
Tuller
,
B.
(
2002
). “
Systematic errors in the formant analysis of steady-state vowels
,”
Speech Commun.
38
,
141
160
.
49.
Wang
,
T.
, and
Quatieri
,
T.
(
2010
). “
High-pitch formant estimation by exploiting temporal change of pitch
,”
IEEE Trans. Audio, Speech, Lang. Process.
18
,
171
186
.
50.
Wong
,
D.
,
Markel
,
J.
, and
Gray
,
A.
, Jr.
(
1979
). “
Least squares glottal inverse filtering from the acoustic speech waveform
,”
IEEE Trans. Acoust. Speech Signal Process.
27
,
350
355
.
51.
Yegnanarayana
,
B.
, and
Veldhuis
,
N.
(
1998
). “
Extraction of vocal-tract system characteristics from speech signals
,”
IEEE Trans. Speech Audio Process.
6
,
313
327
.
You do not currently have access to this content.