This paper explores the relationship between the acoustic duration of phonemic sequences and their frequencies of occurrence. The data were obtained from large (sub)corpora of spontaneous speech in Dutch, English, German, and Italian. Acoustic duration of an n-phone is shown to codetermine the n-phone’s frequency of use, such that languages preferentially use diphones and triphones that are neither very long nor very short. The observed distributions are well approximated by a theoretical function that quantifies the concurrent action of the self-regulatory processes of minimization of articulatory effort and minimization of perception effort.

1.
Aylett
,
M.
, and
Turk
,
A.
(
2004
). “
The smooth signal redundancy hypothesis: A functional explanation for relationships between redundancy, prosodic prominence, and duration in spontaneous speech
,”
Lang Speech
47
,
31
56
.
2.
Aylett
,
M.
, and
Turk
,
A.
(
2006
). “
Language redundancy predicts syllabic duration and the spectral characteristics of vocalic syllabic nuclei
,”
J. Acoust. Soc. Am.
119
,
3048
3058
.
3.
Baayen
,
R. H.
(
1994
). “
Productivity in language production
,”
Lang. Cognit. Processes
9
,
447
469
.
4.
Baayen
,
R. H.
,
Piepenbrock
,
R.
, and
Gulikers
,
L.
(
1995
).
The CELEX Lexical Database (CD-ROM)
(
Linguistic Data Consortium, University of Pennsylvania
, Philadelphia, PA).
5.
Bard
,
E.
,
Anderson
,
A.
,
Sotillo
,
C.
,
Aylett
,
M.
,
Doherty-Sneddon
,
G.
, and
Newlands
,
A.
(
2000
). “
Controlling the intelligibility of referring expressions in dialogue
,”
J. Mem. Lang.
42
,
1
22
.
6.
Beckman
,
M.
, and
Edwards
,
J.
(
1992
). “
Intonational categories and the articulatory control of duration
,” in
Speech Perception, Production, and Linguistic Structure
, edited by
Y.
Tohkura
,
E.
Vatikiotis-Bateson
, and
Y.
Sagisaka
(
Omaha
,
Tokyo
), pp.
359
375
.
7.
Bell
,
A.
,
Jurafsky
,
D.
,
Fosler-Lussier
,
E.
,
Girand
,
C.
, and
Gildea
,
D.
(
2003
). “
Effects of disfluencies, predictability, and utterance position on word form variation in English conversation
,”
J. Acoust. Soc. Am.
113
,
1001
1024
.
8.
Scuola Normale Superiore de Pisa
(
2001
).
AVIP (Archivio di Varietá di Italiano Parlato)
, [
Varieties of spoken Italian archive
], edited by
P.
Bertinetto
(
Ufficio Pubblicazioni della Classe di Lettere della Scuola Normale Superiore di Pisa
,
Pisa
).
9.
Bolinger
,
D.
(
1963
). “
Length, vowel, juncture
,”
Linguistics
1
,
5
29
.
10.
Browman
,
C.
, and
Goldstein
,
L.
(
1992
). “
Articulatory phonology: An overview
,”
Phonetica
49
,
155
180
.
11.
Byrd
,
D.
, and
Saltzman
,
E.
(
2003
). “
The elastic phrase: Modeling the dynamics of boundary-adjacent lengthening
,”
J. Phonetics
31
,
149
180
.
12.
Cambier-Langeveld
,
T.
(
2000
).
Temporal Marking of Accents and Boundaries
, (
Landelijke Onderzoekschool Taalwetenschap
,
Amsterdam
).
13.
Cleveland
,
W. S.
(
1979
). “
Robust locally weighted regression and smoothing scatterplots
,”
J. Am. Stat. Assoc.
74
,
829
836
.
14.
Cutler
,
A.
, and
Clifton
,
C.
, Jr.
(
1999
). “
Comprehending spoken language: A blueprint of the listener
,” in
The Neurocognition of Language
, edited by
C.
Brown
and
P.
Hagoort
(
Oxford University Press
,
Oxford
), pp.
123
166
.
15.
De Boer
,
B.
(
2000
). “
Self-organization in vowel systems
,”
J. Phonetics
28
,
441
465
.
16.
De Boer
,
B.
(
2001
).
The Origins of Vowel Systems
(
Oxford University Press
,
Oxford
).
17.
Ernestus
,
M.
, and
Baayen
,
R. H.
(
2007
). “
The comprehension of acoustically reduced morphologically complex words: The roles of deletion, duration and frequency of occurrence
,” in
Proceedings of the 16th International Congress of Phonetic Sciences
,
Saarbruecken, Germany
, pp.
773
776
.
18.
Fougeron
,
C.
, and
Keating
,
P.
(
1997
). “
Articulatory strengthening at the edges of prosodic domains
,”
J. Acoust. Soc. Am.
101
,
3728
3740
.
19.
Fowler
,
C.
, and
Housum
,
J.
(
1987
). “
Talkers’ signalling of “new” and “old” words in speech and listeners’ perception and use of the distinction
,”
J. Mem. Lang.
26
,
489
504
.
20.
Harrell
,
F.
(
2001
).
Regression Modeling Strategies
(
Springer-Verlag
,
Berlin
).
21.
Janse
,
E.
(
2004
). “
Word perception in fast speech: Artificially time-compressed vs. naturally produced fast speech
,”
Speech Commun.
42
,
155
173
.
22.
Janse
,
E.
,
Nooteboom
,
S.
, and
Quene
,
H.
(
2003
). “
Word-level intelligibility of time-compressed speech: Prosodic and segmental factors
,”
Speech Commun.
41
,
287
301
.
23.
Job
,
U.
, and
Altmann
,
G.
(
1985
). “
Ein modell für anstrenungsbedingte lautveränderungen (A model for conditional effort sound changes)
,”
Folia Linguistica Historica
VI
,
401
407
.
24.
Jurafsky
,
D.
,
Bell
,
A.
,
Gregory
,
M.
, and
Raymond
,
W.
(
2001
). “
Probabilistic relations between words: Evidence from reduction in lexical production
,” in
Frequency and the Emergence of Linguistic Structure
, edited by
J.
Bybee
and
P.
Hopper
(
John Benjamins
,
Amsterdam
).pp.
229
254
.
25.
Kemps
,
R.
,
Wurm
,
L.
,
Ernestus
,
M.
,
Schreuder
,
R.
, and
Baayen
,
R.
(
2005
). “
Prosodic cues for morphological complexity in Dutch and English
,”
Lang. Cognit. Processes
20
,
43
73
.
26.
Köhler
,
R.
(
1987
). “
System theoretical linguistics
,”
Theoretical Linguistics
14
,
241
257
.
27.
Ladefoged
,
P.
(
1982
).
A Course in Phonetics
, 2nd ed. (
Hartcourt, Brace
,
Jovanovich, New York
).
28.
Levelt
,
W. J. M.
(
1989
).
Speaking. From Intention to Articulation
(
MIT
,
Cambridge, MA
).
29.
Levy
,
R.
, and
Jaeger
,
F.
(
2006
). “
Speakers optimize information density through syntactic reduction
,” in
Proceedings of the 20th Annual Conference on Neural Information Processing Systems
, (
Neural Information Processing Systems Foundation
,
Vancouver
), pp.
29
37
.
30.
Lieberman
,
P.
(
1963
). “
Some effects of semantic and grammatical context on the production and perception of speech
,”
Lang Speech
6
,
172
187
.
31.
Lindblom
,
B.
(
1983
). “
Economy of speech gestures
,” in
The Production of Speech
, edited by
P.
MacNeilage
(
Springer-Verlag
,
New York
), pp.
217
245
.
32.
Lindblom
,
B.
(
1990
). “
Explaining phonetic variation: A sketch of the H&H theory
,” in
Speech Production and Speech Modeling
, edited by
W.
Hardcastle
and
A.
Marchal
(
Kluwer
,
Dordrecht
), pp.
403
440
.
33.
Lindblom
,
B.
,
MacNeilage
,
P.
, and
Studdert-Kennedy
,
M.
(
1984
). “
Self-organizing processes and the explanation of linguistic universals
,” in
Explanations for Language Universals
, edited by
B.
Butterworth
,
B.
Comrie
, and
O.
Dahl
(
Mouton
,
Berlin
), pp.
181
203
.
34.
McQueen
,
J.
(
1998
). “
Segmentation of continuous speech using phonotactics
,”
J. Mem. Lang.
39
,
21
46
.
35.
Nelson
,
W. L.
(
1983
). “
Physical principles for economies of skilled movements
,”
Biol. Cybern.
46
,
135
147
.
36.
Nooteboom
,
S. G.
(
1972
).
Production and Perception of Vowel Duration: A Study of the Durational Properties of Vowels in Dutch
(
University of Utrecht
,
Utrecht
).
37.
Ohala
,
J. J.
(
1996
). “
Speech perception is hearing sounds, not tongues
,”
J. Acoust. Soc. Am.
99
,
1718
1725
.
38.
O’Shaughnessy
,
D.
,
Barbeau
,
L.
,
Bernardi
,
D.
, and
Archambault
,
D.
(
1988
). “
Diphone speech synthesis
,”
Speech Commun.
7
,
55
65
.
39.
Oudeyer
,
P.-Y.
(
2005
). “
The self-organization of speech sounds
,”
J. Theor. Biol.
233
,
435
449
.
40.
Perkell
,
J.
,
Zandipour
,
M.
,
Matthies
,
M.
, and
Lane
,
H.
(
2002
). “
Economy of effort in different speaking conditions. I. A preliminary study of intersubject differences and modeling issues
,”
J. Acoust. Soc. Am.
112
,
1627
1641
.
41.
Pitt
,
M.
,
Johnson
,
K.
,
Hume
,
E.
,
Kiesling
,
S.
, and
Raymond
,
W.
(
2005
). “
The Buckeye corpus of conversational speech: Labeling conventions and a test of transcriber reliability
,”
Speech Commun.
45
,
90
95
.
42.
Pluymaekers
,
M.
,
Ernestus
,
M.
, and
Baayen
,
R.
(
2005
). “
Lexical frequency and acoustic reduction in spoken Dutch
,”
J. Acoust. Soc. Am.
118
,
2561
2569
.
43.
R Development Core Team
(
2007
).
R: A Language and Environment for Statistical Computing
,
R Foundation for Statistical Computing
, Vienna, Austria, URL: http://www.R-project.org (Last viewed 10/1/2008).
44.
Richardson
,
M.
,
Bilmes
,
J.
, and
Diorio
,
C.
(
2003
). “
Hidden-articulator Markov models for speech recognition
,”
Speech Commun.
41
,
511
529
.
45.
Salverda
,
A.
,
Dahan
,
D.
, and
McQueen
,
J.
(
2003
). “
The role of prosodic boundaries in the resolution of lexical embedding in speech comprehension
,”
Cognition
90
,
51
89
.
46.
Schiel
,
F.
,
Draxler
,
C.
, and
Tillmann
,
H.
(
1997
). “
The Bavarian archive for speech signals: Resources for the speech community
,” in
Proceedings of the EUROSPEECH 1997
,
Rhodos, Greece
, pp.
1687
1690
.
47.
Smith
,
B.
,
Hillenbrand
,
J.
,
Wasowitz
,
J.
, and
Preston
,
J.
(
1986
). “
Durational characteristics of vocal and subvocal speech: Implications concerning phonological organization and articulatory difficulty
,”
J. Phonetics
14
,
265
281
.
48.
Van Son
,
R.
,
Binnenpoorte
,
D.
,
van den Heuvel
,
H.
, and
Pols
,
L.
(
2001
). “
The IFA corpus: A phonemically segmented Dutch open source speech database
,” in
Proceedings of Eurospeech 2001
,
Aalborg, Denmark
.
49.
Van Son
,
R.
, and
Pols
,
L.
(
2003
). “
Information structure and efficiency in speech production
,” in
Proceedings of Eurospeech 2003
,
Geneva, Switzerland
.
50.
Van Son
,
R.
, and
Van Santen
,
J.
(
2005
). “
Duration and spectral balance of intervocalic consonants: A case for efficient communication
,”
Speech Commun.
47
,
100
123
.
51.
Zipf
,
G. K.
(
1929
). “
Relative frequency as a determinant of phonetic change
,”
Harvard Studies in Classical Philology
15
,
1
95
.
52.
Zipf
,
G. K.
(
1935
).
The Psycho-Biology of Language
(
Houghton Mifflin
,
Boston, MA
).
You do not currently have access to this content.