The modulation statistics of natural sound ensembles were analyzed by calculating the probability distributions of the amplitude envelope of the sounds and their time-frequency correlations given by the modulation spectra. These modulation spectra were obtained by calculating the two-dimensional Fourier transform of the autocorrelation matrix of the sound stimulus in its spectrographic representation. Since temporal bandwidth and spectral bandwidth are conjugate variables, it is shown that the joint modulation spectrum of sound occupies a restricted space: sounds cannot have rapid temporal and spectral modulations simultaneously. Within this restricted space, it is shown that natural sounds have a characteristic signature. Natural sounds, in general, are low-passed, showing most of their modulation energy for low temporal and spectral modulations. Animal vocalizations and human speech are further characterized by the fact that most of the spectral modulation power is found only for low temporal modulation. Similarly, the distribution of the amplitude envelopes also exhibits characteristic shapes for natural sounds, reflecting the high probability of epochs with no sound, systematic differences across frequencies, and a relatively uniform distribution for the log of the amplitudes for vocalizations. It is postulated that the auditory system as well as engineering applications may exploit these statistical properties to obtain an efficient representation of behaviorally relevant sounds. To test such a hypothesis we show how to create synthetic sounds with first and second order envelope statistics identical to those found in natural sounds.

1.
Atick
,
J.
(
1992
). “
Could information theory provide an ecological theory of sensory processing?
Network
3
,
213
251
.
2.
Attias
,
H.
, and
Schreiner
,
C. E.
(
1997
). “
Temporal low-order statistics of natural sounds
,”
Adv. Neural Info. Process. Syst.
9
,
27
33
.
3.
Attias, H., and Schreiner, C. E. (1998). “Coding of naturalistic stimuli by auditory midbrain neurons,” in Advances in Neural Information Processing Systems (MIT, Cambridge, MA).
4.
Attneave
,
F.
(
1954
). “
Some informational aspects of visual perception
,”
Psychol. Rev.
61
,
183
193
.
5.
Barlow, H. B. (1961). “Possible principles underlying the transformation of sensory messages,” in Sensory Communication, edited by W. A. Rosenbluth (MIT, Cambridge, MA), pp. 217–234.
6.
Brillinger
,
D. R.
, and
Irizarry
,
R. A.
(
1998
). “
An investigation of the second- and higher-order spectra of music
,”
Signal Process.
65
,
161
179
.
7.
Calhoun
,
B.
, and
Schreiner
,
C.
(
1998
). “
Spectral envelope coding in cat primary auditory cortex: linear and non-linear effects of stimulus characteristics
,”
Eur. J. Neurosci.
10
,
926
940
8.
Chi
,
T.
,
Gao
,
Y.
,
Guyton
,
M. C.
,
Ru
,
P.
, and
Shamma
,
S.
(
1999
). “
Spectro-temporal modulation transfer functions and speech intelligibility
,”
J. Acoust. Soc. Am.
106
,
2719
2732
.
9.
Cohen, L. (1995). Time-Frequency Analysis (Prentice Hall, Englewood Cliffs, NJ).
10.
Dan
,
Y.
,
Atick
,
J. J.
, and
Reid
,
R. C.
(
1996
). “
Efficient coding of natural scenes in the lateral geniculate nucleus: experimental test of a computational theory
,”
J. Neurosci.
16
,
3351
3362
.
11.
deCharms
,
R. C.
,
Blake
,
D. T.
, and
Merzenich
,
M. M.
(
1998
). “
Optimizing sound features for cortical neurons
,”
Science
280
,
1439
1443
.
12.
Depireux
,
D. A.
,
Simon
,
J. Z.
,
Klein
,
D. J.
, and
Shamma
,
S. A.
(
2001
). “
Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex
,”
J. Neurophysiol.
85
,
1220
1234
.
13.
Dorman
,
M. F.
,
Loizou
,
P. C.
, and
Rainey
,
D.
(
1997
). “
Speech intelligibility as a function of the number of channels of stimulation for signal processors using sine-wave and noise-band outputs
,”
J. Acoust. Soc. Am.
102
,
2403
2411
.
14.
Drullman
,
R.
(
1995
). “
Temporal envelope and fine structure cues for speech intelligibility
,”
J. Acoust. Soc. Am.
97
,
585
592
.
15.
Drullman
,
R.
,
Festen
,
J. M.
, and
Plomp
,
R.
(
1994
). “
Effect of temporal envelope smearing on speech reception
,”
J. Acoust. Soc. Am.
95
,
1053
1064
.
16.
Eggermont
,
J. J.
(
2002
). “
Temporal modulation transfer functions in cat primary auditory cortex: separating stimulus effects from neural mechanisms
,”
J. Neurophysiol.
87
,
305
321
.
17.
Eggermont
,
J. J.
,
Aertsen
,
A. M.
, and
Johannesma
,
P. I.
(
1983
). “
Prediction of the responses of auditory neurons in the midbrain of the grass frog based on the spectro-temporal receptive field
,”
Hear. Res.
10
,
191
202
.
18.
Escabi
,
M. A.
, and
Schreiner
,
C. E.
(
2002
). “
Nonlinear spectrotemporal sound analysis by neurons in the auditory midbrain
,”
J. Neurosci.
22
,
4114
4131
.
19.
Field
,
D. J.
(
1987
). “
Relations between the statistics of natural images and the response properties of cortical cells
,”
J. Opt. Soc. Am. A
4
,
2379
2394
.
20.
Flanagan
,
J. L.
(
1980
). “
Parametric coding of speech spectra
,”
J. Acoust. Soc. Am.
68
,
412
419
.
21.
Grace
,
J. A.
,
Amin
,
N.
,
Singh
,
N. C.
, and
Theunissen
,
F. E.
(
2003
). “
Selectivity for conspecific song in the zebra finch auditory forebrain
,”
J. Neurophysiol.
89
,
472
487
.
22.
Green, D. (1986). “Frequency and the detection of spectral shape change,” in Auditory Frequency Selectivity, edited by B. C. Moore and R. Patterson (Plenum, Cambridge), pp. 351–359.
23.
Griffin
,
D.
, and
Lim
,
J.
(
1984
). “
Signal estimation from modified short-time Fourier transform
,”
IEEE Trans. Acoust., Speech, Signal Process.
32
,
236
242
.
24.
Klein
,
D. J.
,
Depireux
,
D. A.
,
Simon
,
J. Z.
, and
Shamma
,
S. A.
(
2000
). “
Robust spectro-temporal reverse correlation for the auditory system: Optimizing stimulus design
,”
J. Comput. Neurosci.
9
,
85
111
.
25.
Lewicki
,
M. S.
(
2002
). “
Efficient coding of natural sounds
,”
Nat. Neurosci.
5
,
356
363
.
26.
Lohr
,
B.
, and
Dooling
,
R. J.
(
1998
). “
Detection of changes in timbre and harmonicity in complex sounds by zebra finches (Taeniopygia guttata) and budgerigars (Melopsittacus undulatus)
,”
J. Comp. Psychol.
112
,
36
47
.
27.
Machens
,
C. K.
,
Stemmler
,
M. B.
,
Prinz
,
P.
,
Krahe
,
R.
,
Ronacher
,
B.
, and
Herz
,
A. V.
(
2001
). “
Representation of acoustic communication signals by insect auditory receptor neurons
,”
J. Neurosci.
21
(
9
),
3215
3227
.
28.
Margoliash
,
D.
(
1983
). “
Acoustic parameters underlying the responses of song-specific neurons in the white-crowned sparrow
,”
J. Neurosci.
3
,
1039
1057
.
29.
Miller
,
L. M.
,
Escabi
,
M. A.
,
Read
,
H. L.
, and
Schreiner
,
C. E.
(
2002
). “
Spectrotemporal receptive fields in the lemniscal auditory thalamus and cortex
,”
J. Neurophysiol.
87
,
516
527
.
30.
Newman
,
J.
, and
Wollberg
,
Z.
(
1978
). “
Multiple coding of species-specific vocalizations in the auditory cortex of squirrel monkeys
,”
Brain Res.
54
,
287
304
.
31.
Paez
,
M. D.
, and
Glisson
,
T. H.
(
1972
). “
Minimum Mean-Squared-Error Quantization in speech PCM and DPCM Systems
,”
IEEE Trans. Commun.
COM-20
(
2
),
225
230
.
32.
Painter
,
T.
, and
Spanias
,
A.
(
2000
). “
Perceptual Coding of Digital Audio
,”
Proc. IEEE
88
,
451
513
.
33.
Palmer
,
A. R.
, and
Evans
,
E. F.
(
1982
). “
Intensity coding in the auditory periphery of the cat: responses of cochlear nerve and cochlear nucleus neurons to signals in the presence of bandstop masking noise
,”
Hear. Res.
7
,
305
323
.
34.
Phillips
,
D. P.
(
1990
). “
Neural representation of sound amplitude in the auditory cortex: effects of noise masking
,”
Behav. Brain Res.
37
,
197
214
.
35.
Phillips
,
D. P.
, and
Hall
,
S. E.
(
1987
). “
Responses of single neurons in cat auditory cortex to time-varying stimuli: linear amplitude modulations
,”
Exp. Brain Res.
67
,
479
492
.
36.
Popper, A. N., and Fay, R. R. (1992). The Mammalian Auditory Pathway: Neurophysiology. (Springer-Verlag, New York).
37.
Rieke
,
F.
,
Bodnar
,
D. A.
, and
Bialek
,
W.
(
1995
). “
Naturalistic stimuli increase the rate and efficiency of information transmission by primary auditory afferents
,”
Proc. R. Soc. London, Ser. B
262
,
259
265
.
38.
Ruggero
,
M. A.
, and
Rich
,
N. C.
(
1991
). “
Application of a commercially-manufactured Doppler-shift laser velocimeter to the measurement of basilar-membrane vibration
,”
Hear. Res.
51
,
215
230
.
39.
Sachs
,
M. B.
, and
Abbas
,
P. J.
(
1974
). “
Rate versus level functions for auditory-nerve fibers in cats: tone-burst stimuli
,”
J. Acoust. Soc. Am.
56
,
1835
1847
.
40.
Schlauch
,
R. S.
,
DiGiovanni
,
J. J.
, and
Ries
,
D. T.
(
1998
). “
Basilar membrane nonlinearity and loudness
,”
J. Acoust. Soc. Am.
103
,
2010
2020
.
41.
Schreiner
,
C. E.
, and
Calhoun
,
B. M.
(
1994
). “
Spectral envelope coding in cat primary auditory cortex: properties of ripple transfer functions
,”
Aud. Neurosci.
1
,
39
61
.
42.
Sen
,
K.
,
Theunissen
,
F. E.
, and
Doupe
,
A. J.
(
2001
). “
Feature analysis of natural sounds in the songbird auditory forebrain
,”
J. Neurophysiol.
86
,
1445
1458
.
43.
Shannon
,
R. V.
,
Zeng
,
F. G.
,
Kamath
,
V.
,
Wygonski
,
J.
, and
Ekelid
,
M.
(
1995
). “
Speech recognition with primarily temporal cues
,”
Science
270
,
303
304
.
44.
Simoncelli
,
E. P.
, and
Olshausen
,
B. A.
(
2001
). “
Natural image statistics and neural representation
,”
Annu. Rev. Neurosci.
24
,
1193
1216
.
45.
Slaney M. (1994). “An introduction to auditory model inversion,” Interval Technical Report IRC1994-014.
46.
Stevens
,
S. S.
(
1956
). “
The direct estimation of sensory magnitudes: loudness
,”
Am. J. Psychol.
69
,
1
25
.
47.
Suga
,
N.
,
O’Neill
,
W. E.
, and
Manabe
,
T.
(
1978
). “
Cortical neurons sensitive to combinations of information-bearing elements of biosonar signals in the moustache bat
,”
Science
200
,
778
781
.
48.
Theunissen
,
F. E.
, and
Doupe
,
A. J.
(
1998
). “
Temporal and spectral sensitivity of complex auditory neurons in the nucleus HVc of male zebra finches
,”
J. Neurosci.
18
,
3786
3802
.
49.
Theunissen
,
F. E.
,
Sen
,
K.
, and
Doupe
,
A. J.
(
2000
). “
Spectral-temporal receptive fields of nonlinear auditory neurons obtained using natural sounds
,”
J. Neurosci.
20
,
2315
2331
.
50.
Theunissen
,
F. E.
,
David
,
S. V.
,
Singh
,
N. C.
,
Hsu
,
A.
,
Vinje
,
W.
, and
Gallant
,
J. L.
(
2001
). “
Estimating spatio-temporal receptive fields of auditory and visual neurons from their responses to natural stimuli
,”
Network Comput. Neural Syst.
12
,
1
28
.
51.
Tyler, R. S., Preece, J. P. and Tye-Murray, K. (1990). “Iowa Audiovisual Speech Perception Tests,” Department of Otolaryngology. The University of Iowa, Iowa City, IA 52242.
52.
van Hateren
,
J. H.
(
1992a
). “
Theoretical predictions of spatiotemporal receptive fields of fly LMCs, and experimental validation
,”
J. Comp. Physiol. [A]
171
,
157
170
.
53.
van Hateren
,
J. H.
(
1992b
). “
A theory of maximizing sensory information
,”
Biol. Cybern.
68
,
23
29
.
54.
Viemeister
,
N. F.
(
1979
). “
Temporal modulation transfer functions based upon modulation thresholds
,”
J. Acoust. Soc. Am.
66
,
1364
1380
.
55.
Voss
,
R. F.
, and
Clarke
,
J.
(
1975
). “
1/f noise in music and speech
,”
Nature (London)
258
,
317
318
.
This content is only available via PDF.
You do not currently have access to this content.