Normal vowels are known to have irregularities in the pitch-to-pitch variation which is quite important for speech signals to be perceived as natural human sound. Such pitch-to-pitch variation of vowels is studied in the light of nonlinear dynamics. For the analysis, five normal vowels recorded from three male and two female subjects are exploited, where the vowel signals are shown to have normal levels of the pitch-to-pitch variation. First, by the false nearest-neighbor analysis, nonlinear dynamics of the vowels are shown to be well analyzed by using a relatively low-dimensional reconstructing dimension of 4⩽d⩽7. Then, we further studied nonlinear dynamics of the vowels by spike-and-wave surrogate analysis. The results imply that there exists nonlinear dynamical correlation between one pitch-waveform pattern to another in the vowel signals. On the basis of the analysis results, applicability of the nonlinear prediction technique to vowel synthesis is discussed.

1.
G. Fant, Acoustic Theory of Speech Production (Mouton, Gravenhage, 1960).
2.
B. S.
Atal
and
S. L.
Hanauer
, “
Speech analysis and synthesis by linear prediction of the speech wave
,”
J. Acoust. Soc. Am.
50
,
637
655
(
1971
).
3.
J. D. Markel and A. H. Gray, Linear Prediction of Speech (Springer-Verlag, Berlin, 1976).
4.
D. H.
Klatt
and
L. C.
Klatt
, “
Analysis, synthesis, and perception of voice quality variations among female and male talkers
,”
J. Acoust. Soc. Am.
87
,
820
857
(
1990
).
5.
D. G.
Childers
and
C. K.
Lee
, “
Vocal quality factors: Analysis, synthesis, and perception
,”
J. Acoust. Soc. Am.
90
,
2394
2410
(
1991
).
6.
K.
Ishizaka
and
J. L.
Flanagan
, “
Synthesis of voiced sounds from a two-mass model of the vocal cords
,”
Bell Syst. Tech. J.
51
(
6
),
1233
1268
(
1972
).
7.
I. R.
Titze
and
D. T.
Talkin
, “
A theoretical study of the effects of various laryngeal configurations on the acoustics of phonation
,”
J. Acoust. Soc. Am.
66
,
60
74
(
1979
).
8.
R. C.
Scherer
,
I. R.
Titze
, and
J. F.
Curtis
, “
Pressure-flow relationships in two models of the larynx having rectangular glottal shapes
,”
J. Acoust. Soc. Am.
73
,
668
676
(
1983
).
9.
N.
Miki
, “
Recent progress of the acoustic theory of speech production process
,”
J. Acoust. Soc. Jpn.
48
(
1
),
15
19
(
1992
).
10.
A. E.
Rosenberg
, “
Effect of glottal pulse shape on the quality of natural vowels
,”
J. Acoust. Soc. Am.
49
,
583
590
(
1971
).
11.
G.
Fant
and
Q.
Lin
, “Frequency domain interpretation and derivation of glottal flow parameters,”
Speech Trans. Lab. Q. Prog. Stat. Rep.
2
(
3
),
1
21
(
1988
).
12.
H. Fujisaki and M. Ljungqvist, “Proposal and evaluation of models for glottal source waveform,” Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (1986), pp. 1605–1608.
13.
M. R. Schroeder and B. S. Atal, “Code-excited linear prediction (CELP): High-quality speech at very low bit rates,” Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Tampa, FL (1985), pp. 937–940.
14.
G. Nicolis and I. Prigogine, Self-Organization in Nonequilibrium Systems (Wiley, New York, 1977).
15.
W.
Lauterborn
and
U.
Parlitz
, “
Methods of chaos physics and their application to acoustics
,”
J. Acoust. Soc. Am.
84
,
1975
1993
(
1988
).
16.
H. D. I.
Abarbanel
,
R.
Brown
,
J. J.
Sidorowich
, and
L. S.
Tsimring
, “
The analysis of observed chaotic data in physical systems
,”
Rev. Mod. Phys.
65
,
1331
1392
(
1993
).
17.
I. R. Titze, R. J. Baken, and H. Herzel, “Evidence of chaos in vocal fold vibration,” in Vocal Fold Physiology, edited by I. R. Titze (Singular, San Diego, 1993), pp. 143–188.
18.
H.
Herzel
,
D.
Berry
,
I. R.
Titze
, and
M.
Saleh
, “
Analysis of vocal disorders with method from nonlinear dynamics
,”
J. Speech Hear. Res.
37
,
1008
1019
(
1994
).
19.
M.
Tigges
,
P.
Mergel
,
H.
Herzel
,
T.
Wittenberg
, and
U.
Eysholdt
, “
Observation and modelling of glottal biphonation
,”
Acustica
83
,
707
714
(
1997
).
20.
A.
Behrman
and
R. J.
Baken
, “
Correlation dimension of electroglottographic data from healthy and pathologic subjects
,”
J. Acoust. Soc. Am.
102
,
2371
2379
(
1997
).
21.
A.
Behrman
, “
Global and local dimensions of vocal dynamics
,”
J. Acoust. Soc. Am.
105
,
432
443
(
1999
).
22.
W.
Mende
,
H.
Herzel
, and
I. R.
Titze
, “
Bifurcations and chaos in newborn cries
,”
Phys. Lett. A
145
,
418
424
(
1990
).
23.
S. S.
Narayanan
and
A. A.
Alwan
, “
A nonlinear dynamical systems analysis of fricative consonants
,”
J. Acoust. Soc. Am.
97
,
2511
2524
(
1995
).
24.
B. Townshend, “Nonlinear prediction of speech signals,” in Nonlinear Modeling and Forecasting, edited by M. Casdagli and S. Eubank, SFI Studies in Sciences of Complexity (Addison–Wesley, Reading, MA, 1992), pp. 433–453.
25.
M.
Banbrook
,
S.
McLaughlin
, and
I.
Mann
, “
Speech characterization and synthesis by nonlinear methods
,”
IEEE Trans. Speech Audio Process.
7
(
1
),
1
17
(
1999
).
26.
M.
Sato
,
K.
Joe
, and
T.
Hirahara
, “
APOLONN brings us to the real world
,”
Proc. Int. Joint Conf. Neural Networks
1
,
581
587
(
1990
).
27.
I.
Tokuda
,
R.
Tokunaga
, and
K.
Aihara
, “
A simple geometrical structure underlying speech signals of the Japanese vowel /a/
,”
Int. J. Bifurcation Chaos Appl. Sci. Eng.
6
(
1
),
149
160
(
1996
).
28.
G. Kubin, “Nonlinear processing of speech,” in Speech Coding and Synthesis, edited by W. B. Kleijin and K. K. Paliwal (Elsevier Science, Amsterdam, 1995), pp. 557–610.
29.
I. Mann and S. McLaughlin, “Nonlinear dynamical modelling for speech synthesis using radial basis functions,” preprint (2000).
30.
K. Judd, “Nonlinear modelling: Keep it simple, vary the embedding, and make sure the dynamics are right,” presented at Newton Institute Workshop on Nonlinear Dynamics and Statistics, Cambridge, 21–25 September 1998.
31.
A.
Kumar
and
S. K.
Mullick
, “
Nonlinear dynamical analysis of speech
,”
J. Acoust. Soc. Am.
100
,
615
629
(
1996
).
32.
T.
Ikeguchi
and
K.
Aihara
, “
Estimating correlation dimensions of biological time series with a reliable method
,”
J. Int. Fuzzy Sys.
5
(
1
),
33
52
(
1997
).
33.
T. Miyano, “Are Japanese vowels chaotic?,” Proc. 4th Int. Conf. Soft Computing (1996), Vol. 2, pp. 634–637.
34.
L.
Dolanský
and
P.
Tjernlund
, “
On certain irregularities of voiced speech waveforms
,”
IEEE Trans. Audio Electroacoust.
16
(
1
),
51
56
(
1968
).
35.
T.
Ifukube
,
M.
Hashiba
, and
J.
Matsushima
, “
A role of ‘waveform fluctuation’ on the naturality of vowels
,”
J. Acoust. Soc. Jpn.
47
(
12
),
903
910
(
1991
).
36.
T.
Kobayashi
and
H.
Sekine
, “
The role of fluctuations in fundamental period for natural speech synthesis
,”
J. Acoust. Soc. Jpn.
47
(
8
),
539
544
(
1991
).
37.
O.
Komuro
and
H.
Kasuya
, “
Characteristic of fundamental period variation and its modeling
,”
J. Acoust. Soc. Jpn.
47
(
12
),
928
934
(
1991
).
38.
N.
Aoki
and
T.
Ifukube
, “
Analysis and perception of spectral 1/f characteristics of amplitude and period fluctuations in normal sustained vowels
,”
J. Acoust. Soc. Am.
106
,
423
433
(
1999
).
39.
R. W.
Wendahl
, “
Laryngeal analog synthesis of harsh voice quality
,”
Folia Phoniatr.
15
,
241
250
(
1963
).
40.
R. W.
Wendahl
, “
Laryngeal analog synthesis of jitter and shimmer auditory parameters of harshness
,”
Folia Phoniatr.
18
,
98
108
(
1966
).
41.
S.
Hiki
,
K.
Sugawara
, and
J.
Oizumi
, “
On the rapid fluctuation of voice pitch
,”
J. Acoust. Soc. Jpn.
22
,
290
291
(
1966
).
42.
R. F.
Coleman
, “
Effect of median frequency levels upon the roughness of jittered stimuli
,”
J. Speech Hear. Res.
12
,
330
336
(
1969
).
43.
R. F.
Coleman
, “
Effect of waveform changes upon roughness perception
,”
Folia Phoniatr.
23
,
314
322
(
1971
).
44.
A. J.
Rozsypal
and
B. F.
Miller
, “
Perception of jitter and shimmer in synthetic vowels
,”
J. Phonetics
7
,
343
355
(
1979
).
45.
D. R.
Griffin
and
J. E.
Lim
, “
Multiband excitation vocoder
,”
IEEE Trans. Acoust., Speech, Signal Process.
36
(
8
),
1223
1235
(
1988
).
46.
J.
Theiler
, “
Spurious dimension from correlation algorithms applied to limited time-series data
,”
Phys. Rev. A
34
(
3
),
2427
2432
(
1986
).
47.
L. A.
Smith
, “
Intrinsic limits of on dimension calculations
,”
Phys. Lett. A
133
(
6
),
283
288
(
1988
).
48.
D.
Ruelle
, “
Deterministic chaos: Science and fiction
,”
Proc. R. Soc. London, Ser. A
427
,
244
248
(
1990
).
49.
D. Ruelle, “Where can one hope to profitably apply the ideas of chaos?” Phys. Today July, 24–30 (1994).
50.
P. E.
Rapp
, “
Chaos in the neurosciences: cautionary tales from the frontier
,”
Biologist
40
(
2
),
89
94
(
1993
).
51.
P. E.
Rapp
,
A. M.
Albano
,
T. I.
Schmah
, and
L. A.
Farwell
, “
Filtered noise can mimic low-dimensional chaotic attractors
,”
Phys. Rev. E
47
,
2289
2297
(
1993
).
52.
J.
Theiler
,
S.
Eubank
,
A.
Longtin
,
B.
Galdrikian
, and
J. D.
Farmer
, “
Testing for nonlinearity in time series: the method of surrogate data
,”
Physica D
58
,
77
94
(
1992
).
53.
J.
Theiler
and
D.
Prichard
, “
Constrained-realization Monte-Carlo method for hypothesis testing
,”
Physica D
94
,
221
235
(
1996
).
54.
J.
Theiler
, “
On the evidence for low-dimensional chaos in an epileptic electroencephalogram
,”
Phys. Lett. A
196
,
335
341
(
1995
).
55.
T.
Miyano
,
A.
Nagami
,
I.
Tokuda
, and
K.
Aihara
, “
Detecting nonlinear determinism in voiced sounds of Japanese vowel /a/
,”
Int. J. Bifurcation Chaos Appl. Sci. Eng.
10
(
8
),
1973
1979
(
2000
).
56.
H.
Hollien
,
J.
Michel
, and
E. T.
Doherty
, “
A method for analyzing vocal jitter in sustained phonation
,”
J. Phonetics
1
,
85
91
(
1973
).
57.
Y.
Horii
, “
Some statistical characteristics of voice fundamental frequency
,”
J. Speech Hear. Res.
18
,
192
201
(
1975
).
58.
Y.
Horii
, “
Fundamental frequency perturbation observed in sustained phonation
,”
J. Speech Hear. Res.
22
,
5
19
(
1979
).
59.
I. R.
Titze
,
Y.
Horii
, and
R. C.
Scherer
, “
Some technical considerations in voice perturbation measurements
,”
J. Speech Hear. Res.
30
,
252
260
(
1987
).
60.
I. R.
Titze
and
H.
Liang
, “
Comparison of F0 extraction methods for high-precision voice perturbation measurements
,”
J. Speech Hear. Res.
36
,
1120
1133
(
1993
).
61.
N. B.
Pinto
and
I. R.
Titze
, “
Unification of perturbation measures in speech signals
,”
J. Acoust. Soc. Am.
87
,
1278
1289
(
1990
).
62.
R. C.
Scherer
,
V. J.
Vail
, and
C. G.
Guo
, “
Required number of tokens to determine representative voice perturbation values
,”
J. Speech Hear. Res.
38
,
1260
1269
(
1995
).
63.
F. Takens, “Detecting strange attractors in turbulence,” in Lecture Notes in Math (Springer, Berlin, 1981), Vol. 898, pp. 366–381.
64.
T.
Sauer
,
J. A.
York
, and
M.
Casdagli
, “
Embedology
,”
J. Stat. Phys.
65
(
3
),
579
616
(
1991
).
65.
M. B.
Kennel
,
R.
Brown
, and
H. D. I.
Abarbanel
, “
Determining embedding dimension for phase-space reconstruction using a geometric construction
,”
Phys. Rev. A
45
(
6
),
3403
3411
(
1992
).
66.
R.
Wayland
,
D.
Bromely
,
D.
Pickett
, and
A.
Passamante
, “
Recognizing determinism in a time series
,”
Phys. Rev. Lett.
70
(
5
),
580
582
(
1993
).
67.
T.
Miyano
, “
Time series analysis of complex dynamical behavior contaminated with observational noise
,”
Int. J. Bifurcation Chaos Appl. Sci. Eng.
6
,
2031
2045
(
1996
).
68.
P.
Mergel
and
H.
Herzel
, “
Modeling biphonation
,”
Speech Commun.
22
,
141
154
(
1997
).
69.
I.
Steinecke
and
H.
Herzel
, “
Bifurcations in an asymmetric vocal-fold model
,”
J. Acoust. Soc. Am.
97
,
1874
1884
(
1995
).
This content is only available via PDF.
You do not currently have access to this content.