Languages show systematic variation in their sound patterns and grammars. Accordingly, they have been classified into typological categories such as stress-timed vs syllable-timed, or Head-Complement (HC) vs Complement-Head (CH). To date, it has remained incompletely understood how these linguistic properties are reflected in the acoustic characteristics of speech in different languages. In the present study, the amplitude-modulation (AM) and frequency-modulation (FM) spectra of 1797 utterances in ten languages were analyzed. Overall, the spectra were found to be similar in shape across languages. However, significant effects of linguistic factors were observed on the AM spectra. These differences were magnified with a perceptually plausible representation based on the modulation index (a measure of the signal-to-noise ratio at the output of a logarithmic modulation filterbank): the maximum value distinguished between HC and CH languages, with the exception of Turkish, while the exact frequency of this maximum differed between stress-timed and syllable-timed languages. An additional study conducted on a semi-spontaneous speech corpus showed that these differences persist for a larger number of speakers but disappear for less constrained semi-spontaneous speech. These findings reveal that broad linguistic categories are reflected in the temporal modulation features of different languages, although this may depend on speaking style.

1.
M. S.
Dryer
and
M.
Haspelmath
,
WALS Online
(
Max Planck Institute for Evolutionary Anthropology
,
Leipzig, Germany
,
2013
).
2.
P.
Jusczyk
, “
Learning language: What infants know about it, and what we don't know about that
,” in
Language, Brain, and Cognitive Development: Essays in Honor of Jacques Mehler
, edited by
E.
Dupoux
(
MIT Press
,
Cambridge, MA
,
2001
), pp.
363
377
.
3.
J.
Mehler
,
N.
Sebastian-Galls
, and
M.
Nespor
, “
Biological foundations of language: Language acquisition, cues for parameter setting and the bilingual infant
,” in
The New Cognitive Neuroscience
, edited by
M. S.
Gazzaniga
(
MIT Press
,
Cambridge, MA
,
2004
), pp.
825
836
.
4.
F.
Ramus
,
M. D.
Hauser
,
C.
Miller
,
D.
Morris
, and
J.
Mehler
, “
Language discrimination by human newborns and by cotton-top tamarin monkeys
,”
Science
288
,
349
351
(
2000
).
5.
J. L.
Morgan
and
K.
Demuth
, eds.,
Signal to Syntax: Bootstrapping From Speech To Grammar in Early Acquisition
(
Psychology Press
,
Mahwah, NJ
,
1996
).
6.
H.
Dudley
, “
The carrier nature of speech
,”
Bell Syst. Tech. J.
19
,
495
515
(
1940
).
7.
R.
Plomp
, “
The role of modulation in hearing
,” in
HEARING–Physiological Bases and Psychophysics
, edited by
D. R.
Klinke
and
D. R.
Hartmann
(
Springer
,
Berlin, Germany
,
1983
), pp.
270
276
.
8.
T.
Houtgast
and
H. J. M.
Steeneken
, “
A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria
,”
J. Acoust. Soc. Am.
77
,
1069
1077
(
1985
).
9.
J. C. R.
Licklider
and
I.
Pollack
, “
Effects of differentiation, integration, and infinite peak clipping upon the intelligibility of speech
,”
J. Acoust. Soc. Am.
20
,
42
51
(
1948
).
10.
R.
Drullman
, “
Temporal envelope and fine structure cues for speech intelligibility
,”
J. Acoust. Soc. Am.
97
,
585
592
(
1995
).
11.
R. V.
Shannon
,
F.-G.
Zeng
,
V.
Kamath
,
J.
Wygonski
, and
M.
Ekelid
, “
Speech recognition with primarily temporal cues
,”
Science
270
,
303
304
(
1995
).
12.
K.
Saberi
and
D. R.
Perrott
, “
Cognitive restoration of reversed speech
,”
Nature
398
,
760
(
1999
).
13.
H. J. M.
Steeneken
and
T.
Houtgast
, “
A physical method for measuring speech-transmission quality
,”
J. Acoust. Soc. Am.
67
,
318
326
(
1980
).
14.
S.
Rosen
, “
Temporal information in speech: Acoustic, auditory and linguistic aspects
,”
Philos. Trans. R. Soc. Lond., B, Biol. Sci.
336
,
367
373
(
1992
).
15.
V.
Leong
,
M. A.
Stone
,
R. E.
Turner
, and
U.
Goswami
, “
A role for amplitude modulation phase relationships in speech rhythm perception
,”
J. Acoust. Soc. Am.
136
,
366
381
(
2014
).
16.
A.-L.
Giraud
and
D.
Poeppel
, “
Cortical oscillations and speech processing: Emerging computational principles and operations
,”
Nat. Neurosci.
15
,
511
517
(
2012
).
17.
S.
Sheft
,
V.
Shafiro
,
C.
Lorenzi
,
R.
McMullen
, and
C.
Farrell
, “
Effects of age and hearing loss on the relationship between discrimination of stochastic frequency modulation and speech perception
,”
Ear. Hear.
33
,
709
720
(
2012
).
18.
H.
Attias
and
C. E.
Schreiner
, “
Temporal low-order statistics of natural sounds
,” in
NIPS
(
MIT Press
,
Cambridge, MA
,
1997
), pp.
27
33
.
19.
R.
Voss
and
J.
Clarke
, “
 ‘1/f noise’ in music and speech
,”
Nature
258
,
317
318
(
1975
).
20.
R. F.
Voss
and
J.
Clarke
,“
 ‘1/f noise’ in music: Music from 1/f noise
,”
J. Acoust. Soc. Am.
63
,
258
263
(
1978
).
21.
F. A.
Rodríguez
,
C.
Chen
,
H. L.
Read
, and
M. A.
Escabí
, “
Neural modulation tuning characteristics scale to efficiently encode natural sound statistics
,”
J. Neurosci.
30
,
15969
15980
(
2010
).
22.
T.
Dau
,
B.
Kollmeier
, and
A.
Kohlrausch
, “
Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers
,”
J. Acoust. Soc. Am.
102
,
2892
2905
(
1997
).
23.
T.
Dau
,
B.
Kollmeier
, and
A.
Kohlrausch
, “
Modeling auditory processing of amplitude modulation. II. Spectral and temporal integration
,”
J. Acoust. Soc. Am.
102
,
2906
2919
(
1997
).
24.
T.
Houtgast
, “
Frequency selectivity in amplitude-modulation detection
,”
J. Acoust. Soc. Am.
85
,
1676
1680
(
1989
).
25.
S. P.
Bacon
and
D. W.
Grantham
, “
Modulation masking: Effects of modulation frequency, depth, and phase
,”
J. Acoust. Soc. Am.
85
,
2575
2580
(
1989
).
26.
R.
Plomp
, “
The negative effect of amplitude compression in multichannel hearing aids in the light of the modulation-transfer function
,”
J. Acoust. Soc. Am.
83
,
2322
2327
(
1988
).
27.
S.
Greenberg
,
H.
Carvey
,
L.
Hitchcock
, and
S.
Chang
, “
Temporal properties of spontaneous speech–A syllable-centric perspective
,”
J. Phonet.
31
,
465
485
(
2003
).
28.
J. C.
Krause
and
L. D.
Braida
, “
Acoustic properties of naturally produced clear speech at normal speaking rates
,”
J. Acoust. Soc. Am.
115
,
362
378
(
2004
).
29.
J. C.
Krause
and
L. D.
Braida
, “
Evaluating the role of spectral and envelope characteristics in the intelligibility advantage of clear speech
,”
J. Acoust. Soc. Am.
125
,
3346
3357
(
2009
).
30.
S.
Greenberg
and
T.
Arai
, “
The relation between speech intelligibility and the complex modulation spectrum
,” in
Proceedings of the 7th European Conference on Speech Communication and Technology
, Aalborg, Denmark (
2001
), pp.
473
476
.
31.
T.
Arai
and
S.
Greenberg
, “
The temporal properties of spoken japanese are similar to those of English
,” in
Proceedings of Eurospeech
(
1997
), pp.
1011
1014
.
32.
U.
Goswami
and
V.
Leong
, “
Speech rhythm and temporal structure: Converging perspectives?
,” in
Linguistic Rhythm and Literacy
, Trends in Language Acquisition Research No. 17, edited by
J.
Thomson
and
L.
Jarmulowicz
(
John Benjamins
,
Amsterdam, the Netherlands
,
2016
), pp.
111
132
.
33.
N.
Ding
,
A. D.
Patel
,
L.
Chen
,
H.
Butler
,
C.
Luo
, and
D.
Poeppel
, “
Temporal modulations in speech and music
,”
Neurosci. Biobehav. Rev.
(published online
2017
).
34.
S. J.
van Wijngaarden
and
T.
Houtgast
, “
Effect of talker and speaking style on the Speech Transmission Index (L)
,”
J. Acoust. Soc. Am.
115
,
38
41
(
2004
).
35.
F.
Dubbelboer
and
T.
Houtgast
, “
A detailed study on the effects of noise on speech intelligibility
,”
J. Acoust. Soc. Am.
122
,
2865
2871
(
2007
).
36.
A.
Schlueter
,
U.
Lemke
,
B.
Kollmeier
, and
I.
Holube
, “
Intelligibility of time-compressed speech: The effect of uniform versus non-uniform time-compression algorithms
,”
J. Acoust. Soc. Am.
135
,
1541
1555
(
2014
).
37.
F.
Ramus
, “
Language discrimination by newborns: Teasing apart phonotactic, rhythmic, and intonational cues
,”
Ann. Rev. Language Acquis.
2
,
85
115
(
2002
).
38.
J.
Mehler
,
P.
Jusczyk
,
G.
Lambertz
,
N.
Halsted
,
J.
Bertoncini
, and
C.
Amiel-Tison
, “
A precursor of language acquisition in young infants
,”
Cognition
29
,
143
178
(
1988
).
39.
T.
Nazzi
,
J.
Bertoncini
, and
J.
Mehler
, “
Language discrimination by newborns: Toward an understanding of the role of rhythm
,”
J. Exp. Psychol. Hum. Percept. Perform.
24
,
756
766
(
1998
).
40.
R. M.
Dauer
, “
Stress-timing and syllable-timing reanalyzed
,”
J. Phon.
11
,
51
62
(
1983
).
41.
F.
Ramus
,
M.
Nespor
, and
J.
Mehler
, “
Correlates of linguistic rhythm in the speech signal
,”
Cognition
73
,
265
292
(
1999
).
42.
V.
Dellwo
, “
Rhythm and speech rate: A variation coefficient for ΔC
,” in
Language and Language-Processing: Proceedings of the 38th Linguistic Colloquium
, Frankfurt, Germany (
2006
), pp.
231
241
.
43.
A.
Loukina
,
G.
Kochanski
,
B.
Rosner
,
E.
Keane
, and
C.
Shih
, “
Rhythm measures and dimensions of durational variation in speech
,”
J. Acoust. Soc. Am.
129
,
3258
3270
(
2011
).
44.
E.
Grabe
and
E. L.
Low
, “
Durational variability in speech and the rhythm class hypothesis
,” in
Laboratory Phonology 7
(
De Gruyter
,
Boston, MA
,
2002
).
45.
L.
Wiget
,
L.
White
,
B.
Schuppler
,
I.
Grenon
,
O.
Rauch
, and
S. L.
Mattys
, “
How stable are acoustic metrics of contrastive speech rhythm?,
J. Acoust. Soc. Am.
127
,
1559
1569
(
2010
).
46.
V.
Dellwo
and
P.
Wagner
, “
Relations between language rhythm and speech rate
,” in
Proceedings of the International Congress of Phonetics Science
, Barcelona, Spain (
2003
), pp.
471
474
.
47.
A.
Arvaniti
, “
Rhythm, timing and the timing of rhythm
,”
Phonetica
66
,
46
63
(
2009
).
48.
A.
Arvaniti
, “
The usefulness of metrics in the quantification of speech rhythm
,”
J. Phonetics
40
,
351
373
(
2012
).
49.
J.
Gervain
and
J. F.
Werker
, “
Prosody cues word order in 7-month-old bilingual infants
,”
Nat. Commun.
4
,
1490
(
2013
).
50.
M.
Nespor
,
M.
Shukla
,
R. V. D.
Vijver
,
C.
Avesani
,
H.
Schraudolf
, and
C.
Donati
, “
Different Phrasal Prominence Realizations in VO and OV Languages
,”
Lingue Linguaggio
7
,
1
28
(
2008
).
51.
G.
Fenk-Oczlon
and
A.
Fenk
, “
Crosslinguistic correlations between size of syllables, number of cases, and adposition order
,” in
Sprache und natürlichkeit, gedenkband für Willi Mazerthaler (Language and Naturalness, Commemorative Book for Willi Mazerthaler)
(
Narr
,
Tübingen, Germany
,
2005
).
52.
F.-G.
Zeng
,
K.
Nie
,
G. S.
Stickney
,
Y.-Y.
Kong
,
M.
Vongphoe
,
A.
Bhargave
,
C.
Wei
, and
K.
Cao
, “
Speech recognition with amplitude and frequency modulations
,”
Proc. Natl. Acad. Sci. U.S.A.
102
,
2293
2298
(
2005
).
53.
S.
Sheft
,
M.
Ardoint
, and
C.
Lorenzi
, “
Speech identification based on temporal fine structure cues
,”
J. Acoust. Soc. Am.
124
,
562
575
(
2008
).
54.
J.
Obleser
,
B.
Herrmann
, and
M. J.
Henry
, “
Neural oscillations in speech: Don't be enslaved by the envelope
,”
Front. Hum. Neurosci.
6
,
250
(
2012
).
55.
A.
Papoulis
, “
Random modulation: A review
,”
IEEE Trans. Acoust. Speech Signal Process.
31
,
96
105
(
1983
).
56.
C.
Binns
and
J. F.
Culling
, “
The role of fundamental frequency contours in the perception of speech against interfering speech
,”
J. Acoust. Soc. Am.
122
,
1765
1776
(
2007
).
57.
S. E.
Miller
,
R. S.
Schlauch
, and
P. J.
Watson
, “
The effects of fundamental frequency contour manipulations on speech intelligibility in background noise
,”
J. Acoust. Soc. Am.
128
,
435
443
(
2010
).
58.
J.
Vaissière
, “
Language-independent prosodic features
,” in
Prosody: Models and Measurements
(
Springer
,
New York
,
1983
), pp.
53
65
.
59.
F.
Cummins
,
F.
Gers
, and
J.
Schmidhuber
, “
Comparing prosody across many languages
,”
Tech. Rep.
(
Istituto Dalle Molle Di Studi Sull Intelligenza Artificiale
,
Lugano, Switzerland
,
1999
).
60.
P.
Welby
, “
The role of early fundamental frequency rises and elbows in French word segmentation
,”
Speech Commun.
49
,
28
48
(
2007
).
61.
S.-A.
Jun
and
C.
Fougeron
, “
A phonological model of French intonation
,” in
Intonation, Text, Speech and Language Technology No. 15
, edited by
A.
Botinis
(
Springer
,
Amsterdam, the Netherlands
,
2000
), pp.
209
242
.
62.
D. R.
Ladd
,
Intonational Phonology
(
Cambridge University Press
,
Cambridge, MA
,
1996
).
63.
A.
de Cheveigné
and
H.
Kawahara
, “
YIN, a fundamental frequency estimator for speech and music
,”
J. Acoust. Soc. Am.
111
,
1917
1930
(
2002
).
64.
R.
Guevara Erra
and
J.
Gervain
, “
The efficient coding of speech: Cross-linguistic differences
,”
PLoS One
11
,
0148861
(
2016
).
65.
M.
Molnar
,
M.
Carreiras
, and
J.
Gervain
, “
Language dominance shapes non-linguistic rhythmic grouping in bilinguals
,”
Cognition
152
,
150
159
(
2016
).
66.
R.
Cole
and
Y.
Muthusamy
, “
OGI Multilanguage Corpus LDC94s17
” (Linguistic Data Consortium, Philadelphia,
1994
).
67.
V.
Hohmann
, “
Frequency analysis and synthesis using a Gammatone filterbank
,”
Acta Acust. Acust.
88
,
433
442
(
2002
).
68.
Z. M.
Smith
,
Bertrand
Delgutte
, and
A. J.
Oxenham
, “
Chimaeric sounds reveal dichotomies in auditory perception
,”
Nature
416
,
87
90
(
2002
).
69.
K.
Nie
,
G.
Stickney
, and
F.-G.
Zeng
, “
Encoding frequency modulation to improve cochlear implant performance in noise
,”
IEEE Trans. Biomed. Eng.
52
,
64
73
(
2005
).
70.
W. H.
Press
,
B. P.
Flannery
,
S. A.
Teukolsky
, and
W. T.
Vetterling
,
Numerical Recipes in Fortran 77: The Art of Scientific Computing
, 2nd ed. (
Cambridge University Press
,
Cambridge, MA
,
1992
).
71.
N. C.
Singh
and
F. E.
Theunissen
, “
Modulation spectra of natural sounds and ethological theories of auditory processing
,”
J. Acoust. Soc. Am.
114
,
3394
3411
(
2003
).
72.
M.
Nespor
, “
About parameters, prominence and bootstrapping
,” in
Language, Brain, and Cognitive Development: Essays in Honor of Jacques Mehler
, edited by
Emmanuel
Dupoux
(
MIT Press
,
Cambridge, MA
,
2001
), pp.
127
142
.
73.
C. E.
Stilp
and
M. S.
Lewicki
, “
Statistical structure of speech sound classes is congruent with cochlear nucleus response properties
,”
J. Acoust. Soc. Am.
134
,
4229
(
2013
).
74.
M. L.
Jepsen
,
S. D.
Ewert
, and
T.
Dau
, “
A computational model of human auditory signal processing and perception
,”
J. Acoust. Soc. Am.
124
,
422
438
(
2008
).
75.
Y. E.
Cohen
,
F.
Theunissen
,
B. E.
Russ
, and
P.
Gill
, “
Acoustic features of rhesus vocalizations and their representation in the ventrolateral prefrontal cortex
,”
J. Neurophysiol.
97
,
1470
1484
(
2007
).
You do not currently have access to this content.