This study presents a method for analyzing speech rhythm using empirical mode decomposition of the speech amplitude envelope, which allows for extraction and quantification of syllabic- and supra-syllabic time-scale components of the envelope. The method of empirical mode decomposition of a vocalic energy amplitude envelope is illustrated in detail, and several types of rhythm metrics derived from this method are presented. Spontaneous speech extracted from the Buckeye Corpus is used to assess the effect of utterance length on metrics, and it is shown how metrics representing variability in the supra-syllabic time-scale components of the envelope can be used to identify stretches of speech with targeted rhythmic characteristics. Furthermore, the envelope-based metrics are used to characterize cross-linguistic differences in speech rhythm in the UC San Diego Speech Lab corpus of English, German, Greek, Italian, Korean, and Spanish speech elicited in read sentences, read passages, and spontaneous speech. The envelope-based metrics exhibit significant effects of language and elicitation method that argue for a nuanced view of cross-linguistic rhythm patterns.

1.
Abercrombie
,
D.
(
1967
).
Elements of General Phonetics
(
Edinburgh University Press
,
Edinburgh
), Chap. 6, pp.
89
110
.
2.
Allen
,
G. D.
(
1972
). “
The location of rhythmic stress beats in English: An experimental study, parts I and II
,”
Lang. Speech
15
,
72
100
,179–195.
3.
Allen
,
G. D.
(
1975
). “
Speech rhythm: Its relation to performance and articulatory timing
,”
J. Phonetics
3
,
75
86
.
4.
Arvaniti
,
A.
(
1994
). “
Acoustic features of Greek rhythmic structure
,”
J. Phonetics
22
,
239
268
.
5.
Arvaniti
,
A.
(
2007
). “
Greek phonetics: The state of the art
,”
J. Greek Linguist.
8
,
97
208
.
6.
Arvaniti
,
A.
(
2012
). “
The usefulness of metrics in the quantification of speech rhythm
,”
J. Phonetics
40
,
351
373
.
7.
Bertinetto
,
P. M.
(
1989
). “
Reflections on the dichotomy ‘stress’ vs. ‘syllable timing,' 
Rev. Phonét. Appl.
91-93
,
99
129
.
8.
Chatfield
,
C.
(
1975
).
The Analysis of Time Series
(
Chapman and Hall
,
London
), Chap. 7, pp.
127
132
.
9.
Clopper
,
C. G.
, and
Smiljanic
,
R.
(
2011
). “
Effects of gender and regional dialect on prosodic patterns in American English
,”
J. Phonetics
39
,
237
245
.
10.
Crystal
,
T. H.
, and
House
,
A. S.
(
1990
). “
Articulation rate and the duration of syllables and stress groups in connected speech
,”
J. Acoust. Soc. Am.
88
,
101
112
.
11.
Cummins
,
F.
(
2009
). “
Rhythm as affordance for the entrainment of movement
,”
Phonetica
66
,
15
28
.
12.
Cummins
,
F.
, and
Port
,
R.
(
1998
). “
Rhythmic constraints on stress timing in English
,”
J. Phonetics
26
,
145
171
.
13.
Dankovičová
,
J.
, and
Dellwo
,
V.
(
2007
). “
Czech speech rhythm and the rhythm class hypothesis
,” in
Proceedings of 16th International Congress of Phonetic Sciences
,
Saarbrücken
, pp.
1241
1244
.
14.
Dauer
,
R. M.
(
1983
). “
Stress-timing and syllable-timing reanalyzed
,”
J. Phonetics
11
,
51
62
.
15.
Dauer
,
R. M.
(
1987
). “
Phonetic and phonological components of language rhythm
,” in
Proceedings of the 11th International Congress of Phonetic Sciences
,
Tallinn
, pp.
447
449
.
16.
Dellwo
,
V.
(
2006
). “
Rhythm and speech rate: A variation coefficient for deltaC
,” in
Language and Language-Processing: Proceedings of the 38th Linguistics Colloquium, Piliscsaba 2003
, edited by
P.
Karnowski
and
I.
Szigeti
(
Peter Lang, Frankfurt am Main
), pp.
231
241
.
17.
Farnetani
,
E.
, and
Kori
,
S.
(
1990
). “
Rhythmic structure in Italian noun phrases: A study on vowel duration
,”
Phonetica
47
,
50
65
.
18.
Goswami
,
U.
,
Thomson
,
J.
,
Richardson
,
U.
,
Stainthorp
,
R.
,
Hughes
,
D.
, and
Scott
,
S. K.
(
2002
). “
Amplitude envelope onsets and developmental dyslexia: A new hypothesis
,”
Proc. Natl. Acad. Sci. U.S.A.
99
(
16
),
10911
10916
.
19.
Grabe
,
E.
, and
Low
,
E. L.
(
2002
). “
Durational variability in speech and the rhythm class hypothesis
,” in
Laboratory Phonology 7
, edited by
C.
Gussenhoven
and
N.
Warner
(
Mouton de Gruyter
,
Berlin)
, pp.
515
546
.
20.
Howell
,
P.
(
1988
). “
Prediction of P-center location from the distribution of energy in the amplitude envelope
,”
Percept. Psychophys.
43
(
1
),
90
93
.
21.
Huang
,
N. E.
,
Shen
,
Z.
,
Long
,
S. R.
,
Wu
,
M. C.
,
Shih
,
H. H.
,
Zheng
,
Q.
,
Yen
,
N.-C.
,
Tung
,
C. C.
, and
Liu
,
H. H.
(
1998
). “
The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis
,”
Proc. R. Soc. London, Ser. A
454
,
903
995
.
22.
Jun
,
S.-A.
(
2005
). “
Korean intonational phonology and prosodic transcription
,” in
Prosodic Typology: The Phonology of Intonation and Phrasing
, edited by
S.-A.
Jun
(
Oxford University Press
,
Oxford
), pp.
201
229
.
23.
Kiesling
,
S.
,
Dilley
,
L.
, and
Raymond
,
W.
(
2006
). “
The Variation in Conversation (ViC) Project: Creation of the Buckeye Corpus of conversational speech
,” Department of Psychology, Ohio State University, Columbus, OH, available at www.buckeyecorpus.osu.edu (Last viewed 11/07/2012).
24.
Kohler
,
K. J.
(
2008
). “
The perception of prominence patterns
,”
Phonetica
65
,
257
269
.
25.
Kohler
,
K. J.
(
2009a
). “
Whither speech rhythm research?
Phonetica
66
,
5
14
.
26.
Kohler
,
K. J.
(
2009b
). “
Rhythm in speech and language. A new research paradigm
,”
Phonetica
66
,
29
45
.
27.
Lee
,
C. S.
, and
Todd
,
N. P. M.
(
2004
). “
Towards an auditory account of speech rhythm: Application of a model of the auditory ‘primal sketch’ to two multi-language corpora
,”
Cognition
93
,
225
254
.
28.
Levelt
,
W. J. M.
(
1989
).
Speaking: From Intention to Articulation
(
The MIT Press
,
Cambridge, MA
), Chap. 4, pp.
107
160
.
29.
Lloyd James
,
A.
(
1940
).
Speech Signals in Telephony
(
Pitman and Sons
,
London
), Chap. III, pp.
16
27
.
30.
Morton
,
J.
,
Marcus
,
S.
, and
Frankish
,
C.
(
1976
). “
Perceptual centers (P-centers)
,”
Psychol. Rev.
83
(
5
),
405
408
.
31.
Nolan
,
F.
, and
Asu
,
E. L.
(
2009
) “
The Pairwise Variability Index and coexisting rhythms in language
,”
Phonetica
66
,
64
77
.
32.
O'Dell
,
M. L.
, and
Nieminen
,
T.
(
1999
). “
Coupled oscillator model of speech rhythm
,” in
Proceedings of the XIVth International Congress of Phonetic Sciences
, edited by
J. J.
Ohala
,
Y.
Hasegawa
,
M.
Ohala
,
D.
Granville
, and
A. C.
Bailey
(
AIP
,
New York
), Chap. 2, pp.
1075
1078
.
33.
Pike
,
K.
(
1945
).
The Intonation of American English
(
University of Michigan Press
,
Ann Arbor
), pp.
34
35
.
34.
Pitt
,
M.
,
Johnson
,
K.
,
Hume
,
E.
,
Kiesling
,
S.
, and
Raymond
,
W.
(
2005
). “
The Buckeye Corpus of conversational speech: Labeling conventions and a test of transcriber reliability
,”
Speech Commun.
45
,
89
95
.
35.
Pompino-Marschall
,
B.
(
1989
). “
On the psychoacoustic nature of the P-center phenomenon
,”
J. Phonetics
17
,
175
192
.
36.
Prieto
,
P.
,
Vanrell
,
M.
,
Astruc
,
L.
,
Payne
,
E.
, and
Post
,
B.
(
2012
). “
Phonotactic and phrasal properties of speech rhythm. Evidence from Catalan, English, and Spanish
,”
Speech Commun.
54
,
681
702
.
37.
Ramus
,
F.
,
Nespor
,
M.
, and
Mehler
,
J.
(
1999
). “
Correlates of linguistic rhythm in the speech signal
,”
Cognition
75
,
265
292
.
38.
Tilsen
,
S.
(
2008
). “
Relations between speech rhythm and segmental deletion
,” in Proceedings from the Annual Meeting of the Chicago Linguistic Society, Vol.
44
,
211
223
.
39.
Tilsen
,
S.
(
2009
). “
Multitimescale dynamical interactions between speech rhythm and gesture
,”
Cogn. Sci.
33
,
839
879
.
40.
Tilsen
,
S.
, and
Johnson
,
K.
(
2008
). “
Low-frequency Fourier analysis of speech rhythm
,”
J. Acoust. Soc. Am.
124
,
EL34
39
.
41.
White
,
L.
, and
Mattys
,
S. L.
(
2007
). “
Calibrating rhythm: First language and second language studies
,”
J. Phonetics
35
,
501
522
.
42.
Wiget
,
L.
,
White
,
L.
,
Schuppler
,
B.
,
Grenon
,
I.
,
Rauch
,
O.
, and
Mattys
,
S. L.
(
2010
). “
How stable are acoustic metrics of contrastive speech rhythm?
,”
J. Acoust. Soc. Am.
127
,
1559
1569
.
43.
Yu
,
A.
(
2010
). “
Tonal effects on perceived vowel duration
,” in
Laboratory Phonology 10
, edited by
C.
Fougeron
,
B.
Kühnert
,
M.
D'Imperio
, and
N.
Vallée
(
Mouton de Gruyter
,
Berlin
), pp.
151
168
.
You do not currently have access to this content.