A controversial issue in neurolinguistics is whether basic neural auditory representations found in many animals can account for human perception of speech. This question was addressed by examining how a population of neurons in the primary auditory cortex (A1) of the naïve awake ferret encodes phonemes and whether this representation could account for the human ability to discriminate them. When neural responses were characterized and ordered by spectral tuning and dynamics, perceptually significant features including formant patterns in vowels and place and manner of articulation in consonants, were readily visualized by activity in distinct neural subpopulations. Furthermore, these responses faithfully encoded the similarity between the acoustic features of these phonemes. A simple classifier trained on the neural representation was able to simulate human phoneme confusion when tested with novel exemplars. These results suggest that A1 responses are sufficiently rich to encode and discriminate phoneme classes and that humans and animals may build upon the same general acoustic representations to learn boundaries for categorical and robust sound classification.

1.
R. P.
Lippmann
, “
Speech recognition by machines and humans
,”
Speech Commun.
22
,
1
15
(
1997
).
2.
S.
Greenberg
,
W.
Ainsworth
,
A. N.
Popper
, and
R. R.
Fay
,
Speech Processing in the Auditory System
(
Springer-Verlag
, New York,
2004
), Vol.
18
.
3.
P. K.
Kuhl
and
J. D.
Miller
, “
Speech perception by the chinchilla: Voiced-voiceless distinction in alveolar plosive consonants
,”
Science
190
,
69
72
(
1975
).
4.
P. K.
Kuhl
and
D. M.
Padden
, “
Enhanced discriminability at the phonetic boundaries for the place feature in macaques
,”
J. Acoust. Soc. Am.
73
(
3
),
1003
1010
(
1983
).
5.
P. K.
Kuhl
and
D. M.
Padden
, “
Enhanced discriminability at the phonetic boundaries for the place feature in macaques
,”
J. Acoust. Soc. Am.
73
(
3
),
1003
1010
(
1983
).
6.
K. R.
Kluender
,
A. J.
Lotto
,
L. L.
Holt
, and
S. L.
Bloedel
, “
Role of experience for language-specific functional mappings of vowel sounds
,”
J. Acoust. Soc. Am.
104
(
6
),
3568
3582
(
1998
).
7.
F.
Pons
, “
The effects of distributional learning on rats’ sensitivity to phonetic information
,”
J. Exp. Psychol. Anim. Behav. Process
32
(
1
),
97
101
(
2006
).
8.
R. D.
Hienz
,
C. M.
Aleszczyk
, and
B. J.
May
, “
Vowel discrimination in cats: Acquisition, effects of stimulus level, and performance in noise
,”
J. Acoust. Soc. Am.
99
(
6
),
3656
3668
(
1996
).
9.
M. L.
Dent
,
E. F.
Brittan-Powell
,
R. J.
Dooling
, and
A.
Pierce
, “
Perception of synthetic /ba/-/wa/ speech continuum by budgerigars (Melopsittacus undulatus)
,”
J. Acoust. Soc. Am.
102
(
3
),
1891
1897
(
1997
).
10.
A. J.
Lotto
,
K. R.
Kluender
, and
L. L.
Holt
, “
Perceptual compensation for coarticulation by Japanese quail (Coturnix coturnix japonica)
,”
J. Acoust. Soc. Am.
102
(
2 Pt 1
),
1134
1140
(
1997
).
11.
M.
Steinschneider
,
Y. I.
Fishman
, and
J. C.
Arezzo
, “
Representation of the voice onset time (VOT) speech parameter in population responses within primary auditory cortex of the awake monkey
,”
J. Acoust. Soc. Am.
114
(
1
),
307
321
(
2003
).
12.
M.
Steinschneider
,
D.
Reser
,
C. E.
Schroeder
, and
J. C.
Arezzo
, “
Tonotopic organization of responses reflecting stop consonant place of articulation in primary cortex (A1) of the monkey
,”
Brain Res.
674
,
147
152
(
1995
).
13.
M.
Steinschneider
,
I. O.
Volkov
,
Y. I.
Fishman
,
H.
Oya
,
J. C.
Arezzo
, and
M. A.
Howard
, “
Intracortical responses in human and monkey auditory cortex support a temporal processing mechanism for encoding of the voice onset time phonetic parameter
,”
Cereb. Cortex
15
,
170
186
(
2005
).
14.
J. J.
Eggermont
and
C. W.
Ponton
, “
The neurophysiology of auditory perception: From single units to evoked potentials
,”
Audiol. Neuro-Otol.
7
(
2
),
71
99
(
2002
).
15.
C. P.
Hung
,
G. K.
Kreiman
,
T.
Poggio
, and
J. J.
DiCarlo
, “
Fast readout of object identity from macaque inferior temporal cortex
,”
Science
310
,
863
866
(
2005
).
16.
K.
Walker
,
B.
Ahmed
, and
J. W.
Schnupp
, “
Linking cortical spike pattern codes to auditory perception
,”
J. Cogn Neurosci.
, Oct 5 (Epub) (
2007
).
17.
G.
Miller
and
P.
Nicely
, “
An analysis of perceptual confusions among some English consonants
,”
J. Acoust. Soc. Am.
27
,
338
352
(
1955
).
18.
F. E.
Theunissen
,
S. V.
David
,
N. C.
Singh
,
A.
Hsu
,
W. E.
Vinje
, and
J. L.
Gallant
, “
Estimating spatio-temporal receptive fields of auditory and visual neurons from their responses to natural stimuli
,”
Network
12
(
3
),
289
316
(
2001
).
19.
D. J.
Klein
,
J. Z.
Simon
,
D. A.
Depireux
, and
S. A.
Shamma
, “
Stimulus-invariant processing and spectrotemporal reverse correlation in primary auditory cortex
,”
J. Comput. Neurosci.
20
(
2
),
111
136
(
2006
).
20.
S.
Seneft
and
V.
Zue
, “
Transcription and alignment of the timit database
,”
J. S.
Garofolo
, editor, National Institute of Standards and Technology (NIST), Gaithersburg, MD (
1988
).
21.
X.
Yang
,
K.
Wang
, and
S. A.
Shamma
, “
Auditory representation of acoustic signals
,”
IEEE Trans. Inf. Theory
38
(
2
),
824
839
(Special issue on wavelet transforms and multi-resolution signal analysis) (
1992
).
22.
S. V.
David
and
J. L.
Gallant
, “
Predicting neuronal responses during natural vision
,”
Network
16
(
2–3
),
239
260
(
2005
).
23.
P.
Ladefoged
,
A Course in Phonetics
, 5th ed. (
Harcourt Brace
, Orlando,
2006
).
24.
K. N.
Stevens
,
Acoustic Phonetics
(
MIT Press
, Cambridge, MA,
1980
).
25.
S.
Shamma
, “
Speech processing in the auditory system. Part I: The representation of speech sounds in the responses of the auditory-nerve
,”
J. Acoust. Soc. Am.
78
(
5
),
1612
1621
(
1985
).
26.
E. D.
Young
and
M. B.
Sachs
, “
Representation of steady-state vowels in the temporal aspects of the discharge patterns of populations of auditory nerve fibers
,”
J. Acoust. Soc. Am.
66
,
1381
1403
(
1979
).
27.
C. E.
Schreiner
,
H. L.
Read
, and
M. L.
Sutter
, “
Modular organization of frequency integration in primary auditory cortex
,”
Annu. Rev. Neurosci.
23
,
501
529
(
2000
).
28.
H. L.
Read
,
J. A.
Winer
, and
C. E.
Schreiner
, “
Functional architecture of auditory cortex
,”
Curr. Opin. Neurobiol.
12
(
4
),
433
440
(
2002
).
29.
D. A.
Depireux
,
J. Z.
Simon
,
D. J.
Klein
, and
S. A.
Shamma
, “
Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex
,”
J. Neurophysiol.
85
,
1220
1234
(
2001
).
30.
L. A.
Chistovich
and
V. V.
Lublinskaya
, “
The center of gravity effect in vowel spectra and critical distance between the formants: Psychoacoustical study of the perception of vowel-like stimuli
,”
Hear. Res.
1
,
185
195
(
1979
).
31.
We emphasize that this response pattern is unlikely to be due to a nonuniform sampling of the scale and frequency variables, since no such bias in the joint distribution of the scale frequency is evident in Fig. 6(A). Furthermore, note that high scale neurons can be driven well by spectra with low frequencies as in phoneme /o/. The opposite is true for vowel /e/ where low scale units are driven well by high frequency energy.
32.
W.
Klein
,
R.
Plomp
, and
L. C.
Pols
, “
Vowel spectra, vowel spaces and vowel identification
,”
J. Acoust. Soc. Am.
48
(
4
),
999
1009
(
1970
).
33.
T. F.
Quatieri
,
Discrete-Time Speech Signal Processing: Principles and Practice
(
Prentice–Hall
, Englewood Cliffs, NJ,
2002
).
34.
O.
Deshmukh
,
C.
Espy-Wilson
,
A.
Salomon
, and
J.
Singh
, “
Use of temporal information: Detection of the periodicity and aperiodicity profile of speech
,”
IEEE Trans. Speech Audio Process.
13
(
5
),
776
786
(
2005
).
35.
D.
Bendor
and
X.
Wang
, “
The neuronal representation of pitch in primate auditory cortex
,”
Nature (London)
436
,
1161
1165
(
2005
).
36.
V. N.
Vapnik
,
The Nature of Statistical Learning Theory
(
Springer
, New York,
1995
).
37.
J. B.
Allen
,
Articulation and Intelligibility
(
Morgan and Claypool
,
2005
).
38.
D.
Depireux
,
J. Z.
Simon
, and
S.
Shamma
, “
Measuring the dynamics of neural responses in primary auditory cortex
,”
Comments in Theoretical Biology
5
(
2
),
89
118
(
1998
).
39.
N.
Kowalski
,
D.
Depireux
, and
S.
Shamma
, “
Analysis of dynamic spectra in ferret primary auditory cortex: Prediction of single-unit responses to arbitrary dynamic spectra
,”
J. Neurophysiol.
76
(
5
),
3524
3534
(
1996
).
40.
L. M.
Miller
,
M. A.
Escabi
,
H. L.
Read
, and
C. E.
Schreiner
, “
Spectrotemporal receptive fields in the lemniscal auditory thalamus and cortex
,”
J. Neurophysiol.
87
,
516
527
(
2002
).
41.
C. T.
Novitski
 et al.,
Program 800.18/Poster E45, Neural coding of speech sounds in naïve and trained rat primary auditory cortex
, Society for Neuroscience, Atlanta (
2006
).
You do not currently have access to this content.