This paper introduces a combinational feature extraction approach to improve speech recognition systems. The main idea is to simultaneously benefit from some features obtained from Poincaré section applied to speech reconstructed phase space (RPS) and typical Mel frequency cepstral coefficients (MFCCs) which have a proved role in speech recognition field. With an appropriate dimension, the reconstructed phase space of speech signal is assured to be topologically equivalent to the dynamics of the speech production system, and could therefore include information that may be absent in linear analysis approaches. Moreover, complicated systems such as speech production system can present cyclic and oscillatory patterns and Poincaré sections could be used as an effective tool in analysis of such trajectories. In this research, a statistical modeling approach based on Gaussian mixture models (GMMs) is applied to Poincaré sections of speech RPS. A final pruned feature set is obtained by applying an efficient feature selection approach to the combination of the parameters of the GMM model and MFCC-based features. A hidden Markov model-based speech recognition system and TIMIT speech database are used to evaluate the performance of the proposed feature set by conducting isolated and continuous speech recognition experiments. By the proposed feature set, 5.7% absolute isolated phoneme recognition improvement is obtained against only MFCC-based features.

1.
Abarbanel
,
H.
,
Analysis of Observed Chaotic Data
(
Springer-Verlag
,
New York
,
1996
).
2.
Ashkenazy
,
Y.
,
Ivanov
,
P. C.
,
Havlin
,
S.
,
Peng
,
C. K.
,
Goldberger
,
A. L.
, and
Stanley
,
H. E.
, “
Magnitude and sign correlations in heartbeat fluctuations
,”
Phys. Rev. Lett.
86
,
1900
1903
(
2001
).
3.
Banbrook
,
M.
,
McLaughlin
,
S.
, and
Mann
,
I.
, “
Speech characterization and synthesis by nonlinear methods
,”
IEEE Trans. Speech Audio Process.
7
,
1
17
(
1999
).
4.
Bohez
,
E.
and
Senevirathne
,
T. R.
, “
Speech recognition using fractals
,”
Pattern Recognition
(
Elsevier
,
2001
), Vol.
34
, Iss. 11, pp.
2227
2243
.
5.
Brown
,
R.
, “
Calculating Lyapunov exponents for short and/or noisy data sets
,”
Phys. Rev. E
47
,
3962
3969
(
1993
).
6.
Cohen
,
A.
,
Biomedical Signal Processing
(
CRC
,
Boca Raton, FL
,
1986
), Vol.
II
.
7.
Darbyshire
,
A.
and
Broomhead
,
D.
, “
Robust estimation of tangent maps and Lyapunov spectra
,”
Physica D
89
,
287
305
(
1996
).
8.
Dempster
,
A. P.
,
Laird
,
N. M.
, and
Rubin
,
D. B.
, “
Maximum likelihood from incomplete data via the EM algorithm
,”
J. R. Stat. Soc. Ser. B (Methodol.)
39
,
1
38
(
1977
).
10.
Garofolo
,
J. S.
,
Lamel
,
L. F.
,
Fisher
,
W. M.
,
Fiscus
,
J. G.
,
Pallett
,
D. S.
, and
Dahlgren
,
N. L.
, DARPA TIMIT acoustic-phonetic continuous speech corpus [CD-ROM],
1993
.
11.
Grassberger
,
P.
and
Procaccia
,
I.
, “
Measuring the strangeness of strange attractors
,”
Physica D
9
,
189
208
(
1983
).
12.
Hermansky
,
H.
and
Morgan
,
N.
, “
RASTA processing of speech
IEEE Trans. Speech Audio Process.
2
,
578
589
(
1994
).
13.
Johnson
,
M. T.
,
Povinelli
,
R. J.
,
Lindgren
,
A. C.
,
Ye
,
J.
,
Liu
,
X.
, and
Indrebo
,
K. M.
, “
Time-domain isolated phoneme classification using reconstructed phase spaces
,”
IEEE Trans. Speech Audio Process.
13
,
458
466
(
2005
).
14.
Kadtke
,
J.
, “
Classification of highly noisy signals using global dynamical models
,”
Phys. Lett. A
203
,
196
202
(
1995
).
15.
Kaiser
,
J. F.
, “
Some observations on vocal tract operation from a fluid flow point of view
,” in
Vocal Fold Physiology: Biomechanics, Acoustics, and Phonatory Control
, edited by
Titze
,
I. R.
and
Scherer
,
R. C.
(
Denver Center for Performing Arts
,
Denver, CO
,
1983
), pp.
358
386
.
16.
Kantz
,
H.
and
Schreiber
,
T.
,
Nonlinear Time Series Analysis
(
Cambridge University Press
,
Cambridge, England
,
1997
).
17.
Kokkinos
,
I.
and
Maragos
,
P.
, “
Nonlinear speech analysis using models for chaotic systems
,”
IEEE Trans. Speech Audio Process.
13
,
1098
1109
(
2005
).
18.
Kubin
,
G.
, “
Nonlinear processing of speech
,” in
Speech Coding and Synthesis
, edited by
Kleijn
,
W. B.
and
Paliwal
,
K. K.
(
Elsevier
,
Amsterdam
,
1995
), Chap. 16.
19.
Leung
,
H.
, “
System identification using chaos with application to equalization of a chaotic modulation system
,”
IEEE Trans. Circuits Syst., I: Fundam. Theory Appl.
45
,
314
320
(
1998
).
20.
Maragos
,
P.
,
Dimakis
,
A.
, and
Kokkinos
,
I.
, “
Some advances in nonlinear speech modeling using modulations, fractals, and chaos
,”
Proceedings of the International Conference on DSP
, Santorini, Greece, July
2002
.
21.
McGowan
,
R. S.
, “
An aeroacoustics approach to phonation
,”
J. Acoust. Soc. Am.
83
,
696
704
(
1988
).
22.
Mika
,
S.
,
Ratscht
,
G.
, and
Müller
,
K. R.
, “
A mathematical programming approach to the kernel Fisher algorithm
,”
Advances in Neural Information Processing Systems
(
MIT
,
Cambridge, MA
,
2001
), Vol.
13
, pp.
591
597
23.
Mika
,
S.
,
Ratscht
,
G.
,
Weston
,
J.
,
Scholkopft
,
B.
, and
Muller
,
K. R.
, “
Fisher discriminant analysis with kernels
,”
IEEE Conference on Neural Networks for Signal Processing IX
,
1999
.
47.
Mitchell
,
T. M.
,
Machine Learning
(
McGraw-Hill
,
New York
,
1997
).
24.
Moore
,
B.
,
Hearing
(
Academic
,
New York
,
1995
).
25.
Munkherjee
,
S.
,
Osuna
,
E.
, and
Girosi
,
F.
, “
Nonlinear prediction of chaotic time series using support vector machines
,”
Proceedings of the IEEE Workshop on Neural Networks for Signal Processing
, September
1997
, pp.
511
520
.
26.
Narayanan
,
S.
and
Alwan
,
A.
, “
A nonlinear dynamical systems analysis of fricative consonants
,”
J. Acoust. Soc. Am.
97
,
2511
2524
(
1995
).
27.
Ott
,
E.
,
Chaos in Dynamical Systems
(
Cambridge University Press
,
Cambridge, England
,
1993
).
28.
Packard
,
N. H.
,
Crutchfield
,
J. P.
,
Farmer
,
J. D.
, and
Shaw
,
R. S.
, “
Geometry from a time series
,”
Phys. Rev. Lett.
45
,
712
716
(
1980
).
29.
Petry
,
A.
,
Augusto
,
D.
, and
Barone
,
C.
, “
Speaker identification using nonlinear dynamical features
,”
Chaos, Solitons Fractals
13
,
221
231
(
2002
).
30.
Pitsikalis
,
V.
,
Kokkinos
,
I.
, and
Maragos
,
P.
, “
Nonlinear analysis of speech signals: Generalized dimensions and Lyapunov exponents
,”
Proceedings of Eurospeech
, Geneva, Switzerland, September
2003
.
31.
Povinelli
,
R. J.
,
Johnson
,
M. T.
,
Lindgren
,
A. C.
,
Roberts
,
F. M.
, and
Ye
,
J.
, “
Statistical models of reconstructed phase spaces for signal classification
,”
IEEE Trans. Signal Process.
54
,
2178
2186
(
2006
).
32.
Povinelli
,
R. J.
,
Johnson
,
M. T.
,
Lindgren
,
A. C.
, and
Ye
,
J.
, “
Time series classification using Gaussian mixture models of reconstructed phase spaces
,”
IEEE Trans. Knowl. Data Eng.
16
,
779
783
(
2004
).
33.
Povinelli
,
R. J.
,
Roberts
,
F. M.
,
Ropella
,
K. M.
, and
Johnson
,
M. T.
, “
Are nonlinear ventricular arrhythmia characteristics lost, as signal duration decreases?
,”
Proceedings of Computers in Cardiology
, Memphis, TN,
2002
, pp.
221
224
.
34.
Rapp
,
P. E.
,
Watanabe
,
T. A. A.
,
Faure
,
P.
, and
Cellucci
,
C. J.
, “
Nonlinear signal classification
,”
Int. J. Bifurcation Chaos Appl. Sci. Eng.
12
,
1273
1293
(
2002
).
35.
Rosenstein
,
M. T.
,
Collins
,
J. J.
, and
De Luca
,
C.
, “
A practical method for calculating largest Lyapunov exponents from small data sets
,” NeuroMuscular Research Center and Department of Biomedical Engineering,
1992
.
36.
Sauer
,
T.
,
Yorke
,
J. A.
, and
Casdagli
,
M.
, “
Embedology
,”
J. Stat. Phys.
65
,
579
616
(
1991
).
37.
Sciamarella
,
D.
and
Mindlin
,
G. B.
, “
Unveiling the topological structure of chaotic flows from data
,”
Phys. Rev. E
64
,
036209
(
2001
).
38.
Senevirathne
,
T. R.
,
Bohez
,
E.
, and
Van Winden
,
J. A.
, “
Amplitude scale method: New and efficient approach to measure the fractal dimension of speech wave forms
,”
Electron. Lett.
28
,
420
422
(
1992
).
39.
Skijarov
,
O.
and
Bortnik
,
B.
, “
Chaos and speech rhythm
,”
Proceedings of IJCNN
,
2005
, Vol.
4
, pp.
2070
2075
.
40.
Takens
,
F.
, “
Detecting strange attractors in turbulence
,”
Proceedings on Dynamical System Turbulence
, Warwick,
1980
, pp.
366
381
.
41.
Teager
,
H. M.
and
Teager
,
S. M.
, “
Evidence for nonlinear sound production mechanisms in the vocal tract
,” in
Speech Production and Speech Modeling
, Vol.
55
,
NATO Adv. Study Inst., Series D
, edited by
Hardcastle
,
W. J.
and
Marchal
,
A.
(
Kluwer Academic
,
Bonas, France
,
1990
), pp. 241–262.
42.
Theiler
,
J.
,
Eubank
,
S.
,
Longtin
,
A.
,
Galdrikian
,
B.
, and
Farmer
,
J. D.
, “
Testing for nonlinearity in time series: The method of surrogate data
,”
Physica D
58
,
77
94
(
1992
).
43.
Tishby
,
N.
, “
A dynamical systems approach to speech processing
,”
Proceedings of International Conference on Acoustics, Speech, Signal Processing (ICASSP)
,
1990
, pp.
365
368
.
44.
Vapnik
,
V.
,
Golowich
,
S.
, and
Smola
,
A.
, “
Support vector method for function approximation, regression estimation, and signal processing
,”
Advances in Neural Information Processing Systems
(
MIT
,
Cambridge, MA
,
1996
), Vol.
8
, pp.
281
287
.
45.
Whitney
,
H.
, “
Differentiable manifolds
,”
Ann. Math.
37
,
645
680
(
1936
).
46.
Yamada
,
M.
and
Saiki
,
M.
, “
Chaotic properties of a fully developed model turbulence
,”
Nonlinear Processes Geophys.
14
,
631
640
(
2007
).
You do not currently have access to this content.