Recently, a quasi-closed phase (QCP) analysis of speech signals for accurate glottal inverse filtering was proposed. However, the QCP analysis which belongs to the family of temporally weighted linear prediction (WLP) methods uses the conventional forward type of sample prediction. This may not be the best choice especially in computing WLP models with a hard-limiting weighting function. A sample selective minimization of the prediction error in WLP reduces the effective number of samples available within a given window frame. To counter this problem, a modified quasi-closed phase forward-backward (QCP-FB) analysis is proposed, wherein each sample is predicted based on its past as well as future samples thereby utilizing the available number of samples more effectively. Formant detection and estimation experiments on synthetic vowels generated using a physical modeling approach as well as natural speech utterances show that the proposed QCP-FB method yields statistically significant improvements over the conventional linear prediction and QCP methods.

1.
Airaksinen
,
M.
,
Raitio
,
T.
,
Story
,
B.
, and
Alku
,
P.
(
2014
). “
Quasi closed phase glottal inverse filtering analysis with weighted linear prediction
,”
IEEE/ACM Trans. Audio, Speech, Lang. Process.
22
,
596
607
.
2.
Alku
,
P.
,
Pohjalainen
,
J.
,
Vainio
,
M.
,
Laukkanen
,
A.-M.
, and
Story
,
B.
(
2012
). “
Improved formant frequency estimation from high-pitched vowels by downgrading the contribution of the glottal source with weighted linear prediction
,” in
Proceedings of Interspeech
,
Portland, Oregon
, pp.
1610
1613
.
3.
Alku
,
P.
,
Pohjalainen
,
J.
,
Vainio
,
M.
,
Laukkanen
,
A.-M.
, and
Story
,
B. H.
(
2013
). “
Formant frequency estimation of high-pitched vowels using weighted linear prediction
,”
J. Acoust. Soc. Am.
134
,
1295
1313
.
4.
Assmann
,
P. F.
(
1995
). “
The role of formant transitions in the perception of concurrent vowels
,”
J. Acoust. Soc. Am.
97
,
575
584
.
5.
Atal
,
B. S.
, and
Schroeder
,
M. R.
(
1967
). “
Predictive coding of speech signals
,” in
Proceedings of the 1967 Conference Communication and Processing
,
Cambridge, MA
, pp.
360
361
.
6.
Boersma
,
P.
(
2001
). “
Praat, a system for doing phonetics by computer
,”
Glot Int.
5
,
341
345
.
7.
Bruce
,
I. C.
(
2004
). “
Physiological assessment of contrast-enhancing frequency shaping and multiband compression in hearing aids
,”
Physiol. Measure.
25
,
945
956
.
8.
Chan
,
P. Y.
,
Dong
,
M.
,
Lim
,
Y. Q.
,
Toh
,
A.
,
Chong
,
E.
,
Yeo
,
M.
,
Chua
,
M.
, and
Li
,
H.
(
2015
). “
Formant excursion in singing synthesis
,” in
Proceedings of the 2015 IEEE International Conference on Digital Signal Processing (DSP)
,
Singapore
, pp.
168
172
.
9.
Chen
,
W. Y.
, and
Stegen
,
G. R.
(
1974
). “
Experiments with maximum entropy power spectra of sinusoids
,”
J. Geophys. Res.
79
,
3019
3022
, doi:.
10.
Deng
,
L.
,
Cui
,
X.
,
Pruvenok
,
R.
,
Huang
,
J.
, and
Momen
,
S.
(
2006
). “
A database of vocal tract resonance trajectories for research in speech processing
,” in
Proceedings of the International Conference on Acoustics Speech and Signal Processing (ICASSP)
,
Toulouse, France
, pp.
I369
I372
.
11.
Deng
,
L.
,
Lee
,
L.
,
Attias
,
H.
, and
Acero
,
A.
(
2004
). “
A structured speech model with continuous hidden dynamics and prediction-residual training for tracking vocal tract resonances
,” in
Proceedings of the International Conference on Acoustics Speech and Signal Processing (ICASSP)
, Vol.
1
,
Montreal, Quebec, Canada
, pp.
I-557
I-560
.
12.
Deng
,
L.
,
Lee
,
L.
,
Attias
,
H.
, and
Acero
,
A.
(
2007
). “
Adaptive Kalman filtering and smoothing for tracking vocal tract resonances using a continuous-valued hidden dynamic model
,”
IEEE Trans. Audio, Speech, Lang. Process.
15
,
13
23
.
13.
Drugman
,
T.
,
Thomas
,
M.
,
Gudnason
,
J.
,
Naylor
,
P.
, and
Dutoit
,
T.
(
2012
). “
Detection of glottal closure instants from speech signals: A quantitative review
,”
IEEE Trans. Audio, Speech, Lang. Process.
20
,
994
1006
.
14.
Fant
,
G.
(
1960
).
Acoustic Theory of Speech Production
(
Mouton & Co.
,
The Hague, Netherlands
), pp.
1
328
.
15.
Fant
,
G.
,
Liljencrants
,
J.
, and
Lin
,
Q. G.
(
1985
). “
A four-parameter model of glottal flow
,”
Q. Prog. Stat. Rep.
4
,
1
17
.
16.
Flanagan
,
J. L.
(
1972
).
Speech Analysis, Synthesis and Perception
(
Springer-Verlag
,
New York
), pp.
279
280
.
17.
Fougere
,
P. F.
,
Zawalick
,
E. J.
, and
Radoski
,
H. R.
(
1976
). “
Spontaneous line splitting in maximum entropy power spectrum analysis
,”
Phys. Earth Planetary Int.
12
,
201
207
.
18.
Gobl
,
C.
(
2003
). “
The voice source in speech communication—production and perception experiments involving inverse filtering and synthesis
,” Ph.D. thesis, Stockholm, Sweden.
19.
Gold
,
B.
, and
Rabiner
,
L.
(
1968
). “
Analysis of digital and analog formant synthesizers
,”
IEEE Trans. Audio Electroacoust.
16
,
81
94
.
20.
Hillenbrand
,
J.
,
Getty
,
L. A.
,
Clark
,
M. J.
, and
Wheeler
,
K.
(
1995
). “
Acoustic characteristics of American English vowels
,”
J. Acoust. Soc. Am.
97
,
3099
3111
.
21.
Itakura
,
F.
, and
Saito
,
S.
(
1968
). “
Analysis synthesis telephony based upon the maximum likelihood method
,” in
Proceedings of the 6th International Congress on Acoustics
, edited by
Y.
Kohasi
,
Tokyo, Japan
, pp. C5–C5,
C17
C20
.
22.
Kay
,
S. M.
(
1988
).
Modern Spectral Estimation: Theory & Application
(
Prentice Hall
,
Englewood Cliffs, NJ
), pp.
1
543
.
23.
Lee
,
C.-H.
(
1988
). “
On robust linear prediction of speech
,”
IEEE Trans. Acoust. Speech Signal Process.
36
,
642
650
.
24.
Ma
,
C.
,
Kamp
,
Y.
, and
Willems
,
L. F.
(
1993
). “
Robust signal selection for linear prediction analysis of voiced speech
,”
Speech Commun.
12
,
69
81
.
25.
Magi
,
C.
,
Pohjalainen
,
J.
,
Bäckström
,
T.
, and
Alku
,
P.
(
2009
). “
Stabilized weighted linear prediction
,”
Speech Commun.
51
,
401
411
.
26.
Makhoul
,
J.
(
1975
). “
Linear prediction: A tutorial review
,”
Proc. IEEE
63
,
561
580
.
27.
Mehta
,
D. D.
,
Rudoy
,
D.
, and
Wolfe
,
P. J.
(
2012
). “
Kalman-based autoregressive moving average modeling and inference for formant and antiformant tracking
,”
J. Acoust. Soc. Am.
132
,
1732
1746
.
28.
Mizoguchi
,
R.
,
Yanagida
,
M.
, and
Kakusho
,
O.
(
1982
). “
Speech analysis by selective linear prediction in the time domain
,” in
Proceedings of the International Conference on Acoustics Speech and Signal Processing (ICASSP)
, Vol.
7
,
Paris, France
, pp.
1573
1576
.
29.
Pinto
,
N. B.
,
Childers
,
D. G.
, and
Lalwani
,
A. L.
(
1989
). “
Formant speech synthesis: Improving production quality
,”
IEEE Trans. Acoust. Speech Signal Process
37
,
1870
1887
.
30.
Pohjalainen
,
J.
,
Saeidi
,
R.
,
Kinnunen
,
T.
, and
Alku
,
P.
(
2010
). “
Extended weighted linear prediction (XLP) analysis of speech and its application to speaker verification in adverse conditions
,” in
Proceedings Interspeech
,
Makuhari, Japan
, pp.
1477
1480
.
31.
Schilling
,
J. R.
,
Miller
,
R. L.
,
Sachs
,
M. B.
, and
Young
,
E. D.
(
1998
). “
Frequency-shaped amplification changes the neural representation of speech with noise-induced hearing loss
,”
Hear. Res.
117
,
57
70
.
32.
Singh
,
R.
,
Gencaga
,
D.
, and
Raj
,
B.
(
2016
). “
Formant manipulations in voice disguise by mimicry
,” in
Proceedings of the 4th International Conference on Biometrics and Forensics (IWBF)
,
Limassol, Cyprus
, pp.
1
6
.
33.
Sjolander
,
K.
, and
Beskow
,
J.
(
2000
). “
Wavesurfer—An open source speech tool
,” in
Proceedings of the International Conference on Spoken Language Processing
,
Beijing, China
, pp.
464
467
.
34.
Smit
,
T.
,
Trckheim
,
F.
, and
Mores
,
R.
(
2012
). “
Fast and robust formant detection from LP data
,”
Speech Commun.
54
,
893
902
.
35.
Steiglitz
,
K.
, and
Dickinson
,
B.
(
1977
). “
The use of time-domain selection for improved linear prediction
,”
IEEE Trans. Acoust. Speech Signal Process.
25
,
34
39
.
36.
Swingler
,
D. N.
(
1979
). “
A comparison between Burg's maximum entropy method and a nonrecursive technique for the spectral analysis of deterministic signals
,”
J. Geophys. Res.
84
,
679
685
, doi:.
37.
Ulrych
,
T. J.
, and
Clayton
,
R. W.
(
1976
). “
Time series modeling and maximum entropy
,”
Phys. Earth Planetary Int.
12
,
188
200
.
38.
Welling
,
L.
, and
Ney
,
H.
(
1998
). “
Formant estimation for speech recognition
,”
IEEE Trans. Speech Audio Process.
6
,
36
48
.
39.
Wong
,
D.
,
Markel
,
J.
, and
Gray
,
A.
(
1979
). “
Least squares glottal inverse filtering from the acoustic speech waveform
,”
IEEE Trans. Acoust. Speech Signal Process.
27
,
350
355
.
40.
Yanagida
,
M.
, and
Kakusho
,
O.
(
1985
). “
A weighted linear prediction analysis of speech signals by using the Givens reduction
,” in
Proceedings of the International Symposium on Applied Signal Processing and Digital Filtering (IASTED)
,
Paris, France
, pp.
129
132
.
41.
Yegnanarayana
,
B.
, and
Veldhuis
,
R.
(
1998
). “
Extraction of vocal-tract system characteristics from speech signals
,”
IEEE Trans. Speech Audio Process.
6
,
313
327
.
42.
Yoo
,
I. C.
,
Lim
,
H.
, and
Yook
,
D.
(
2015
). “
Formant-based robust voice activity detection
,”
IEEE/ACM Trans. Audio, Speech, Lang. Process.
23
,
2238
2245
.
You do not currently have access to this content.