Prosodic rhythm in speech [the alternation of “Strong” (S) and “weak” (w) syllables] is cued, among others, by slow rates of amplitude modulation (AM) within the speech envelope. However, it is unclear exactly which envelope modulation rates and statistics are the most important for the rhythm percept. Here, the hypothesis that the phase relationship between “Stress” rate (∼2 Hz) and “Syllable” rate (∼4 Hz) AMs provides a perceptual cue for speech rhythm is tested. In a rhythm judgment task, adult listeners identified AM tone-vocoded nursery rhyme sentences that carried either trochaic (S-w) or iambic patterning (w-S). Manipulation of listeners' rhythm perception was attempted by parametrically phase-shifting the Stress AM and Syllable AM in the vocoder. It was expected that a 1π radian phase-shift (half a cycle) would reverse the perceived rhythm pattern (i.e., trochaic → iambic) whereas a 2π radian shift (full cycle) would retain the perceived rhythm pattern (i.e., trochaic → trochaic). The results confirmed these predictions. Listeners judgments of rhythm systematically followed Stress-Syllable AM phase-shifts, but were unaffected by phase-shifts between the Syllable AM and the Sub-beat AM (∼14 Hz) in a control condition. It is concluded that the Stress-Syllable AM phase relationship is an envelope-based modulation statistic that supports speech rhythm perception.

1.
Arvaniti
,
A.
(
2009
). “
Rhythm, timing and the timing of rhythm
,”
Phonetica
66
,
46
63
.
2.
Barbosa
,
P. A.
(
2002
). “
Explaining cross-linguistic rhythmic variability via a coupled-oscillator model of rhythm production
,” Proceedings of the Speech Prosody 2002 Conference, Aix-en-Provence, pp.
163
166
.
3.
Bolinger
,
D.
(
1958
). “
A theory of the pitch accent in English
,”
Word: J. Intl. Linguistic Assoc.
7
,
199
210
;
reprinted in
D.
Bolinger
,
Forms of English: Accent, Morpheme, Order
(
Harvard University Press
,
Cambridge, MA
).
4.
Bryant
,
P. E.
,
Bradley
,
L.
,
Maclean
,
M.
, and
Crossland
,
J.
(
1989
). “
Nursery rhymes, phonological skills and reading
,”
J. Child Lang.
16
,
407
428
.
5.
Cutler
,
A.
(
2005
). “
Lexical stress
,” in
The Handbook of Speech Perception
, edited by
D. B.
Pisoni
and
R. E.
Remez
(
Blackwell
,
Oxford
), pp.
264
289
.
6.
Dauer
,
R.
(
1983
). “
Stress-timing and syllable timing reanalyzed
,”
J. Phonetics
11
,
51
62
.
7.
Dellwo
,
V.
, and
Wagner
,
P.
(
2003
). “
Relations between language rhythm and speech rate
,” in
Proceedings of the International Congress of Phonetics Science
,
Barcelona
,
Spain
, pp.
471
474
.
8.
Doelling
,
K. B.
,
Arnal
,
L. H.
,
Ghitza
,
O.
, and
Poeppel
,
D.
(
2014
). “
Acoustic landmarks drive delta-theta oscillations to enable speech comprehension by facilitating perceptual parsing
,”
NeuroImage
85
,
761
768
.
9.
Drullman
,
R.
,
Festen
,
J. M.
, and
Plomp
,
R.
(
1994a
). “
Effect of temporal envelope smearing on speech reception
,”
J. Acoust. Soc. Am.
95
,
1053
1064
.
10.
Drullman
,
R.
,
Festen
,
J. M.
, and
Plomp
,
R.
(
1994b
). “
Effect of reducing slow temporal modulations on speech reception
,”
J. Acoust. Soc. Am.
95
,
2670
2680
.
11.
Dupoux
,
E.
,
Sebastian-Galles
,
N.
,
Navarrete
,
E.
and
Peperkamp
,
S.
(
2008
). “
Persistent stress ‘deafness’: The case of French learners of Spanish
,”
Cognition
106
,
682
706
.
12.
Fry
,
D. B.
(
1955
). “
Duration and intensity as physical correlates of linguistic stress
,”
J. Acoust. Soc. Am.
27
,
765
768
.
13.
Fry
,
D. B.
(
1958
). “
Experiments in the perception of stress
,”
Lang. Speech
1
,
126
152
.
14.
Fullgrabe
,
C.
,
Stone
,
M. A.
, and
Moore
,
B. C.
(
2009
). “
Contribution of very low amplitude-modulation rates to intelligibility in a competing-speech task (L)
,”
J. Acoust. Soc. Am.
125
,
1277
1280
.
15.
Ghitza
,
O.
(
2001
). “
On the upper cutoff frequency of the auditory critical-band envelope detectors in the context of speech perception
,”
J. Acoust. Soc. Am.
110
,
1628
1640
.
16.
Ghitza
,
O.
(
2011
). “
Linking speech perception and neurophysiology: Speech decoding guided by cascaded oscillators locked to the input rhythm
,”
Frontiers in Psychol.
2
,
1
13
.
17.
Ghitza
,
O.
(
2013
). “
The theta-syllable: A unit of speech information defined by cortical function
,”
Frontiers in Psychol.
4
,
1
5
.
18.
Ghitza
,
O.
and
Greenberg
,
S.
(
2009
). “
On the possible role of brain rhythms in speech perception: Intelligibility of time compressed speech with periodic and aperiodic insertions of silence
,”
Phonetica
66
,
113
126
.
19.
Gilbert
,
G.
, and
Lorenzi
,
C.
(
2006
). “
The ability of listeners to use recovered envelope cues from speech fine structure
,”
J. Acoust. Soc. Am.
119
,
2438
2444
.
20.
Giraud
,
A. L.
, and
Poeppel
,
D.
(
2012
). “
Cortical oscillations and speech processing: emerging computational principles and operations
,”
Nat. Neurosci.
15
,
511
517
.
21.
Goswami
,
U.
(
2011
). “
A temporal sampling framework for developmental dyslexia
,”
Trends Cognit. Sci.
15
,
3
10
.
22.
Greenberg
,
S.
(
1999
). “
Speaking in shorthand—A syllable-centric perspective for understanding pronunciation variation
,”
Speech Commun.
29
,
159
176
.
23.
Greenberg
,
S.
(
2006
). “
A multi-tier framework for understanding spoken language
,” in
Understanding Speech: An Auditory Perspective
, edited by
S.
Greenberg
and
W.
Ainsworth
(
LEA
,
Mahwah, NJ
), pp.
411
434
.
24.
Greenberg
,
S.
,
Carvey
,
H.
,
Hitchcock
,
L.
, and
Chang
,
S.
(
2003
). “
Temporal properties of spontaneous speech—a syllable-centric perspective
,”
J. Phonetics
31
,
465
485
.
25.
Gueron
,
J.
(
1974
). “
The meter of nursery rhymes: An application of the Halle-Keyser theory of meter
,”
Poetics
3
,
73
111
.
26.
Hayes
,
B.
(
1995
).
Metrical Stress Theory: Principles and Case Studies
(
University of Chicago Press
,
Chicago, IL
),
458
p.
27.
Howell
,
P.
(
1984
). “
An acoustic determinant of perceived and produced anisochrony
,” in
Proceedings of the Tenth International Congress of Phonetic Sciences
, edited by
M. P. R.
Van den Broecke
and
A.
Cohen
(
Foris, Dordrecht
,
Holland
), pp.
429
433
.
28.
Howell
,
P.
(
1988a
). “
Prediction of P-center location from the distribution of energy in the amplitude envelope: I
,”
Percept. Psychophys.
43
,
90
93
.
29.
Howell
,
P.
(
1988b
). “
Prediction of P-center location from the distribution of energy in the amplitude envelope: II
,”
Percept. Psychophys.
43
,
99
.
70.
http://learning.eng.cam.ac.uk/Public/Turner/PAD (Last accessed 11/06/2014).
71.
http://www.cne.psychol.cam.ac.uk/publications-1 (Last accessed 11/06/2014).
30.
Jusczyk
,
P. W.
,
Cutler
,
A.
, and
Redanz
,
N.
(
1993
). “
Preference for the predominant stress patterns of English words
,”
Child Dev.
64
,
675
687
.
31.
Jusczyk
,
P. W.
,
Houston
,
D. M.
, and
Newsome
,
M.
(
1999
). “
The beginnings of word segmentation in English-learning infants
,”
Cognitive Psychol.
39
,
159
207
.
32.
Kim
,
J.
,
Davis
,
C.
, and
Cutler
,
A.
(
2008
). “
Perceptual tests of rhythmic similarity: II. Syllable rhythm
,”
Lang. Speech
51
(
4
),
343
359
.
33.
Kochanski
,
G.
,
Grabe
,
E.
,
Coleman
,
J.
, and
Rosner
,
B.
(
2005
). “
Loudness predicts prominence: Fundamental frequency adds little
,”
J. Acoust. Soc. Am.
118
,
1038
1054
.
34.
Lee
,
C.
, and
Todd
,
N.
(
2004
). “
Towards an auditory account of speech rhythm: Application of a model of the auditory;primal sketch' to two multi-language corpora
,”
Cognition
93
,
225
254
.
35.
Leong
,
V.
(
2012
). “
Prosodic rhythm in the speech amplitude envelope: Amplitude modulation phase hierarchies (AMPHs) and AMPH models
,” Doctoral dissertation, University of Cambridge,
359
p. Available online at: http://www.cne.psychol.cam.ac.uk/pdfs/phds/vleong (Last viewed October 24, 2013).
36.
Lerdahl
,
F.
, and
Jackendoff
,
R.
(
1983
).
A Generative Theory of Tonal Music
(
MIT Press
,
Cambridge, MA)
,
368
p.
37.
Liberman
,
M.
, and
Prince
,
A.
(
1977
). “
On stress and linguistic rhythm
,”
Ling. Inq.
8
,
249
336
.
38.
Luo
,
H.
, and
Poeppel
,
D.
(
2007
). “
Phase patterns of neuronal responses reliably discriminate speech in human auditory cortex
,”
Neuron
54
,
1001
1010
.
39.
Maclean
,
M.
,
Bryant
,
P. E.
, and
Bradley
,
L.
(
1987
). “
Rhymes, nursery rhymes and reading in early childhood
,”
Merrill-Palmer Quar.
33
,
255
282
.
40.
Miller
,
G. A.
, and
Licklider
,
J. C. R.
(
1950
). “
The intelligibility of interrupted speech
,”
J. Acoust. Soc. Am.
22
,
167
173
.
41.
Nazzi
,
T.
,
Bertoncini
,
J.
, and
Mehler
,
J.
(
1998
). “
Language discrimination by newborns: Toward an understanding of the role of rhythm
,”
J. Exp. Psychol.: Human Percept. Perform.
24
,
756
766
.
42.
Obleser
,
J.
,
Herrmann
,
B.
, and
Henry
,
M. J.
(
2012
). “
Neural oscillations in speech: Don't be enslaved by the envelope
,”
Front. Hum. Neurosci.
6
,
1
4
.
43.
O'Dell
,
M.
and
Nieminen
,
T.
(
1999
). “
Coupled oscillator model of speech rhythm
,” in
Proceedings of the XIVth International Congress of Phonetic Sciences
, edited by
J.
Ohala
,
Y.
Hasegawa
,
M.
Ohala
,
D.
Granville
, and
A.
Bailey
, Vol.
2
,
University of California
,
Berkeley
, pp.
1075
1078
.
44.
Plomp
,
R.
(
1983
). “
Perception of speech as a modulated signal
,” in Proceedings of the 10th International Congress of Phonetic Sciences, Utrecht, pp.
29
40
.
45.
Poeppel
,
D.
(
2003
). “
The analysis of speech in different temporal integration windows: cerebral lateralization as asymmetric sampling in time
,”
Speech Commun.
41
,
245
255
.
46.
Ramus
,
F.
,
Hauser
,
M. D.
,
Miller
,
C.
,
Morris
,
D.
, and
Mehler
,
J.
(
2000
). “
Language discrimination by human newborns and by cotton-top tamarin monkeys
,”
Science
288
,
349
351
.
47.
Ramus
,
F.
,
Nespor
,
M.
, and
Mehler
,
J.
(
1999
). “
Correlates of linguistic rhythm in the speech signal
,”
Cognition
73
(
3
),
265
292
.
48.
Roach
,
P. J.
(
1982
). “
On the distinction between ‘stress-timed’ and ‘syllable-timed’ languages
,” in
Linguistic Controversies
, edited by
D.
Crystal
(
Edward Arnold
,
London)
, pp.
73
79
.
49.
Rosen
,
S.
(
1992
). “
Temporal information in speech: Acoustic, auditory and linguistic aspects
,”
Philos. Trans. R. Soc., B
336
,
367
373
.
50.
Schane
,
S. A.
(
1979
). “
The rhythmic nature of English word accentuation
,”
Language
55
,
559
602
.
51.
Selkirk
,
E. O.
(
1980
). “
The role of prosodic categories in English word stress
,”
Linguistic Inq.
11
,
563
605
.
52.
Selkirk
,
E. O.
(
1984
).
Phonology and Syntax. The Relation Between Sound and Structure
(
MIT Press
,
Cambridge, MA
),
476
p.
53.
Selkirk
,
E. O.
(
1986
). “
On derived domains in sentence phonology
,”
Phonology Yearbook
3
,
371
405
.
54.
Shepard
,
R. N.
(
1972
). “
Psychological representation of speech sounds
,” in
Human Communication: A Unified View
, edited by
E. E.
David
and
P. B.
Denes
(
McGraw-Hill
,
New York
), pp.
67
113
.
55.
Silipo
,
R.
, and
Greenberg
,
S.
(
1999
). “
Automatic transcription of prosodic stress for spontaneous English discourse
,” in
The Phonetics of Spontaneous Speech
,
ICPhS-99
,
San Francisco, CA
.
56.
Stone
,
M. A.
, and
Moore
,
B. C. J.
(
2003
). “
Effect of the speed of a single-channel dynamic range compressor on intelligibility in a competing speech task
,”
J. Acoust. Soc. Am.
114
,
1023
1034
.
57.
Tierney
,
A.
, and
Kraus
,
N.
(
2013
). “
The ability to tap to a beat relates to cognitive, linguistic, and perceptual skills
,”
Brain Lang.
124
,
225
231
.
58.
Tilsen
,
S.
, and
Arvaniti
,
A.
(
2013
). “
Speech rhythm analysis with decomposition of the amplitude envelope: characterizing rhythmic patterns within and across languages
,”
J. Acoust. Soc. Am.
134
,
628
639
.
59.
Tilsen
,
S.
, and
Johnson
,
K.
(
2008
). “
Low-frequency Fourier analysis of speech rhythm
,”
J. Acoust Soc. Am.
124
,
EL34
EL39
.
60.
Todd
,
N. P. M.
(
1994
). “
The auditory ‘primal sketch’: A multiscale model of rhythmic grouping
,”
J. New Music Res.
23
,
25
70
.
61.
Todd
,
N. P. M.
, and
Brown
,
G. J.
(
1996
). “
Visualization of rhythm, time and metre
,”
Artif. Intell. Rev.
10
,
253
273
.
62.
Trevarthen
,
C.
(
1986
). “
Development of intersubjective motor control in infants
,” in
Motor Development in Children: Aspects of Coordination and Control
, edited by
M. G.
Wade
and
H. T. A.
Whiting
(
Martinus Nijhoff
,
Dordrecht
), pp.
209
261
.
63.
Trevarthen
,
C.
(
1987
). “
Sharing makes sense
,” in
Language Topics—Essays in Honour of Michael Halliday
, Vol. 1, edited by
R.
Steele
and
T.
Treadgold
(
John Benjamin Publishing
,
Amsterdam)
, pp.
177
199
.
64.
Turner
,
R. E.
(
2010
). “
Statistical models for natural sounds
,” Doctoral dissertation, University College London. Available online at: http://www.gatsby.ucl.ac.uk/~turner/Publications/turner-2010.html (Last viewed October 24, 2013).
65.
Turner
,
R. E.
, and
Sahani
,
M.
(
2007
). “
Probabilistic amplitude demodulation
,” in
Proceedings of the 7th International Conference on Independent Component Analysis and Signal Separation
, pp.
544
551
.
66.
Turner
,
R. E.
and
Sahani
,
M.
(
2011
). “
Demodulation as probabilistic inference
,”
IEEE Trans. Audio, Speech, Lang. Proc.
19
,
2398
2411
.
67.
Whalley
,
K.
, and
Hansen
,
J.
(
2006
). “
The role of prosodic sensitivity in children's reading development
,”
J. Res. Read.
29
,
288
303
.
68.
Whitmal
,
N. A.
 III
,
Poissant
,
S. F.
,
Freyman
,
R. L.
, and
Helfer
,
K. S.
(
2007
). “
Speech intelligibility in cochlear implant simulations: Effects of carrier type, interfering noise, and subject experience
,”
J. Acoust. Soc. Am.
122
,
2376
2388
.
69.
Wood
,
C.
, and
Terrell
,
C.
(
1998
). “
Poor reader's ability to detect speech rhythm and perceive rapid speech
,”
Br. J. Dev. Psychol.
16
,
397
413
.
You do not currently have access to this content.