To understand the mechanisms of speech perception in everyday listening environments, it is important to elucidate the relative contributions of different acoustic cues in transmitting phonetic content. Previous studies suggest that the envelope of speech in different frequency bands conveys most speech content, while the temporal fine structure (TFS) can aid in segregating target speech from background noise. However, the role of TFS in conveying phonetic content beyond what envelopes convey for intact speech in complex acoustic scenes is poorly understood. The present study addressed this question using online psychophysical experiments to measure the identification of consonants in multi-talker babble for intelligibility-matched intact and 64-channel envelope-vocoded stimuli. Consonant confusion patterns revealed that listeners had a greater tendency in the vocoded (versus intact) condition to be biased toward reporting that they heard an unvoiced consonant, despite envelope and place cues being largely preserved. This result was replicated when babble instances were varied across independent experiments, suggesting that TFS conveys voicing information beyond what is conveyed by envelopes for intact speech in babble. Given that multi-talker babble is a masker that is ubiquitous in everyday environments, this finding has implications for the design of assistive listening devices such as cochlear implants.

1.
Ardoint
,
M.
, and
Lorenzi
,
C.
(
2010
). “
Effects of lowpass and highpass filtering on the intelligibility of speech based on temporal fine structure or envelope cues
,”
Hear. Res.
260
(
1
),
89
95
.
2.
Bacon
,
S. P.
, and
Grantham
,
D. W.
(
1989
). “
Modulation masking: Effects of modulation frequency, depth, and phase
,”
J. Acoust. Soc. Am.
85
(
6
),
2575
2580
.
3.
Benjamini
,
Y.
, and
Hochberg
,
Y.
(
1995
). “
Controlling the false discovery rate: A practical and powerful approach to multiple testing
,”
J. R. Stat. Soc. Ser. B Stat. Methodol.
57
,
289
300
.
4.
Bernstein
,
J. G.
, and
Oxenham
,
A. J.
(
2006
). “
The relationship between frequency selectivity and pitch discrimination: Effects of stimulus level
,”
J. Acoust. Soc. Am.
120
(
6
),
3916
3928
.
5.
Bharadwaj
,
H.
(
2021
).
SNAPlabonline, a Django-based web application for conducting psychoacoustics on the web from the Systems Neuroscience of Auditory Perception Lab (SNAPlab) [pre-print release]
.
Zenodo
.
6.
Darwin
,
C. J.
(
1997
). “
Auditory grouping
,”
Trends Cogn. Sci.
1
(
9
),
327
333
.
7.
Ding
,
N.
,
Chatterjee
,
M.
, and
Simon
,
J. Z.
(
2014
). “
Robust cortical entrainment to the speech envelope relies on the spectro-temporal fine structure
,”
Neuroimage
88
,
41
46
.
8.
Dubbelboer
,
F.
, and
Houtgast
,
T.
(
2008
). “
The concept of signal-to-noise ratio in the modulation domain and speech intelligibility
,”
J. Acoust. Soc. Am.
124
(
6
),
3937
3946
.
9.
Elliott
,
T. M.
, and
Theunissen
,
F. E.
(
2009
). “
The modulation transfer function for speech intelligibility
,”
PLoS Comput. Biol.
5
(
3
),
e1000302
.
10.
Fisher
,
R. A.
(
1921
). “
On the ‘probable error’ of a coefficient of correlation deduced from a small sample
,”
Metron
1
,
1
32
.
11.
Francis
,
A. L.
,
Kaganovich
,
N.
, and
Driscoll-Huber
,
C.
(
2008
). “
Cue-specific effects of categorization training on the relative weighting of acoustic cues to consonant voicing in English
,”
J. Acoust. Soc. Am.
124
(
2
),
1234
1251
.
12.
Gilbert
,
G.
, and
Lorenzi
,
C.
(
2006
). “
The ability of listeners to use recovered envelope cues from speech fine structure
,”
J. Acoust. Soc. Am.
119
(
4
),
2438
2444
.
13.
Glasberg
,
B. R.
, and
Moore
,
B. C. J.
(
1990
). “
Derivation of auditory filter shapes from notched-noise data
,”
Hear. Res.
47
(
1
),
103
138
.
14.
Gnansia
,
D.
,
Péan
,
V.
,
Meyer
,
B.
, and
Lorenzi
,
C.
(
2009
). “
Effects of spectral smearing and temporal fine structure degradation on speech masking release
,”
J. Acoust. Soc. Am.
125
(
6
),
4023
4033
.
15.
Green
,
D. M.
, and
Swets
,
J. A.
(
1966
).
Signal Detection Theory and Psychophysics
(
Wiley
,
New York
),
151
187
.
16.
Heinz
,
M. G.
, and
Swaminathan
,
J.
(
2009
). “
Quantifying envelope and fine-structure coding in auditory nerve responses to chimaeric speech
,”
J. Assoc. Res. Otolaryngol.
10
(
3
),
407
423
.
17.
Heng
,
J.
,
Cantarero
,
G.
,
Elhilali
,
M.
, and
Limb
,
C. J.
(
2011
). “
Impaired perception of temporal fine structure and musical timbre in cochlear implant users
,”
Hear. Res.
280
(
1
),
192
200
.
18.
Hilbert
,
D.
(
1906
). “ 
‘Grundzüge einer allgemeinen Theorie der linearen Integralgleichungen. Vierte Mitteilung’ (Foundations of a general theory of linear integral equations. Fourth communication)
,”
Nachr. von der Gesellschaft der Wissenschaften zu Göttingen, Mathematisch-Physikalische Klasse
1906
,
157
228
.
19.
Holt
,
L. L.
,
Tierney
,
A. T.
,
Guerra
,
G.
,
Laffere
,
A.
, and
Dick
,
F.
(
2018
). “
Dimension-selective attention as a possible driver of dynamic, context-dependent re-weighting in speech processing
,”
Hear. Res.
366
,
50
64
.
20.
Hopkins
,
K.
, and
Moore
,
B. C. J.
(
2010
). “
The importance of temporal fine structure information in speech at different spectral regions for normal-hearing and hearing-impaired subjects
,”
J. Acoust. Soc. Am.
127
(
3
),
1595
1608
.
21.
Houtsma
,
A. J.
, and
Smurzynski
,
J.
(
1990
). “
Pitch identification and discrimination for complex tones with many harmonics
,”
J. Acoust. Soc. Am.
87
(
1
),
304
310
.
22.
Johnson
,
D. H.
(
1980
). “
The relationship between spike rate and synchrony in responses of auditory-nerve fibers to single tones
,”
J. Acoust. Soc. Am.
68
(
4
),
1115
1122
.
23.
Joris
,
P. X.
, and
Yin
,
T. C.
(
1992
). “
Responses to amplitude-modulated tones in the auditory nerve of the cat
,”
J. Acoust. Soc. Am.
91
(
1
),
215
232
.
24.
Kates
,
J. M.
(
2011
). “
Spectro-temporal envelope changes caused by temporal fine structure modification
,”
J. Acoust. Soc. Am.
129
(
6
),
3981
3990
.
25.
Killion
,
M. C.
,
Niquette
,
P. A.
,
Gudmundsen
,
G. I.
,
Revit
,
L. J.
, and
Banerjee
,
S.
(
2004
). “
Development of a quick speech-in-noise test for measuring signal-to-noise ratio loss in normal-hearing and hearing-impaired listeners
,”
J. Acoust. Soc. Am.
116
(
4
),
2395
2405
.
26.
Li
,
N.
, and
Loizou
,
P. C.
(
2008
). “
The contribution of obstruent consonants and acoustic landmarks to speech recognition in noise
,”
J. Acoust. Soc. Am.
124
(
6
),
3947
3958
.
27.
Lorenzi
,
C.
,
Gilbert
,
G.
,
Carn
,
H.
,
Garnier
,
S.
, and
Moore
,
B. C. J.
(
2006
). “
Speech perception problems of the hearing impaired reflect inability to use temporal fine structure
,”
Proc. Natl. Acad. Sci. U.S.A.
103
(
49
),
18866
18869
.
28.
Magnusson
,
L.
(
2011
). “
Comparison of the fine structure processing (fsp) strategy and the cis strategy used in the MED-EL cochlear implant system: Speech intelligibility and music sound quality
,”
Int. J. Audiol.
50
(
4
),
279
287
.
29.
Meddis
,
R.
, and
O'Mard
,
L.
(
1997
). “
A unitary model of pitch perception
,”
J. Acoust. Soc. Am.
102
(
3
),
1811
1820
.
30.
Micheyl
,
C.
, and
Oxenham
,
A. J.
(
2010
). “
Pitch, harmonicity and concurrent sound segregation: Psychoacoustical and neurophysiological findings
,”
Hear. Res.
266
(
1
),
36
51
.
31.
Miller
,
G. A.
, and
Nicely
,
P. E.
(
1955
). “
An analysis of perceptual confusions among some English consonants
,”
J. Acoust. Soc. Am.
27
(
2
),
338
352
.
32.
Mok
,
B. A.
,
Viswanathan
,
V.
,
Borjigin
,
A.
,
Singh
,
R.
,
Kafi
,
H. I.
, and
Bharadwaj
,
H. M.
(
2021
). “
Web-based psychoacoustics: Hearing screening, infrastructure, and validation
,” bioRxiv 2021.05.10.443520.
33.
Moore
,
B. C. J.
,
Glasberg
,
B. R.
,
Flanagan
,
H. J.
, and
Adams
,
J.
(
2006
). “
Frequency discrimination of complex tones; assessing the role of component resolvability and temporal fine structure
,”
J. Acoust. Soc. Am.
119
(
1
),
480
490
.
34.
Moore
,
B. C. J.
, and
Rosen
,
S. M.
(
1979
). “
Tune recognition with reduced pitch and interval information
,”
Q. J. Exp. Psychol.
31
(
2
),
229
240
.
35.
Nichols
,
T. E.
, and
Holmes
,
A. P.
(
2002
). “
Nonparametric permutation tests for functional neuroimaging: A primer with examples
,”
Hum. Brain Mapp.
15
(
1
),
1
25
.
36.
Oxenham
,
A. J.
,
Bernstein
,
J. G.
, and
Penagos
,
H.
(
2004
). “
Correct tonotopic representation is necessary for complex pitch perception
,”
Proc. Natl. Acad. Sci. U.S.A.
101
(
5
),
1421
1425
.
37.
Oxenham
,
A. J.
, and
Simonson
,
A. M.
(
2009
). “
Masking release for low-and high-pass-filtered speech in the presence of noise and single-talker interference
,”
J. Acoust. Soc. Am.
125
(
1
),
457
468
.
38.
Phatak
,
S. A.
, and
Allen
,
J. B.
(
2007
). “
Consonant and vowel confusions in speech-weighted noise
,”
J. Acoust. Soc. Am.
121
(
4
),
2312
2326
.
39.
Phatak
,
S. A.
, and
Grant
,
K. W.
(
2012
). “
Phoneme recognition in modulated maskers by normal-hearing and aided hearing-impaired listeners
,”
J. Acoust. Soc. Am.
132
(
3
),
1646
1654
.
40.
Qin
,
M.
, and
Oxenham
,
A. J.
(
2003
). “
Effects of simulated cochlear-implant processing on speech reception in fluctuating maskers
,”
J. Acoust. Soc. Am.
114
(
1
),
446
454
.
41.
Relaño-Iborra
,
H.
,
May
,
T.
,
Zaar
,
J.
,
Scheidiger
,
C.
, and
Dau
,
T.
(
2016
). “
Predicting speech intelligibility based on a correlation metric in the envelope power spectrum domain
,”
J. Acoust. Soc. Am.
140
(
4
),
2670
2679
.
42.
Rimmele
,
J. M.
,
Golumbic
,
E. Z.
,
Schröger
,
E.
, and
Poeppel
,
D.
(
2015
). “
The effects of selective attention and speech acoustics on neural speech-tracking in a multi-talker scene
,”
Cortex
68
,
144
154
.
43.
Rosen
,
S.
(
1992
). “
Temporal information in speech: Acoustic, auditory and linguistic aspects
,”
Philos. Trans. R. Soc. Lond. B Biol. Sci.
336
(
1278
),
367
373
.
44.
Shamma
,
S.
, and
Klein
,
D.
(
2000
). “
The case of the missing pitch templates: How harmonic templates emerge in the early auditory system
,”
J. Acoust. Soc. Am.
107
(
5
),
2631
2644
.
45.
Shannon
,
R. V.
,
Zeng
,
F.-G.
,
Kamath
,
V.
,
Wygonski
,
J.
, and
Ekelid
,
M.
(
1995
). “
Speech recognition with primarily temporal cues
,”
Science
270
(
5234
),
303
304
.
46.
Sheft
,
S.
,
Ardoint
,
M.
, and
Lorenzi
,
C.
(
2008
). “
Speech identification based on temporal fine structure cues
,”
J. Acoust. Soc. Am.
124
(
1
),
562
575
.
47.
Stone
,
M. A.
, and
Moore
,
B. C. J.
(
2014
). “
On the near non-existence of ‘pure’ energetic masking release for speech
,”
J. Acoust. Soc. Am.
135
(
4
),
1967
1977
.
48.
Swaminathan
,
J.
, and
Heinz
,
M. G.
(
2012
). “
Psychophysiological analyses demonstrate the importance of neural envelope coding for speech perception in noise
,”
J. Neurosci.
32
(
5
),
1747
1756
.
49.
Verschooten
,
E.
,
Robles
,
L.
, and
Joris
,
P. X.
(
2015
). “
Assessment of the limits of neural phase-locking using mass potentials
,”
J. Neurosci.
35
(
5
),
2255
2268
.
50.
Viswanathan
,
V.
,
Hari
,
M. Bharadwaj
,
Barbara
,
G. Shinn-Cunningham
, and
Michael
,
G. Heinz
(
2021
). “
Modulation masking and fine structure shape neural envelope coding to predict speech intelligibility across diverse listening conditions
,”
J. Acoust. Soc. Am.
150
,
2230
2244
.
51.
Ward
,
J. H.
, Jr.
(
1963
). “
Hierarchical grouping to optimize an objective function
,”
J. Am. Stat. Assoc.
58
(
301
),
236
244
.
52.
Wilson
,
B. S.
, and
Dorman
,
M. F.
(
2008
). “
Cochlear implants: A remarkable past and a brilliant future
,”
Hear. Res.
242
(
1
),
3
21
.
53.
Winn
,
M. B.
,
Chatterjee
,
M.
, and
Idsardia
,
W. J.
(
2013
). “
Roles of voice onset time and F0 in stop consonant voicing perception: Effects of masking noise and low-pass filtering
,”
J. Speech Lang. Hear. Res.
56
,
1097
1107
.
54.
Woods
,
K. J.
,
Siegel
,
M. H.
,
Traer
,
J.
, and
McDermott
,
J. H.
(
2017
). “
Headphone screening to facilitate web-based auditory experiments
,”
Atten. Percept. Psychophys.
79
(
7
),
2064
2072
.
55.
Zaar
,
J.
, and
Dau
,
T.
(
2015
). “
Sources of variability in consonant perception of normal-hearing listeners
,”
J. Acoust. Soc. Am.
138
(
3
),
1253
1267
.
You do not currently have access to this content.