In this study, we investigated the effect of specific noise realizations on the discrimination of two consonants, /b/ and /d/. For this purpose, we collected data from twelve participants, who listened to /aba/ or /ada/ embedded in one of three background noises. All noises had the same long-term spectrum but differed in the amount of random envelope fluctuations. The data were analyzed on a trial-by-trial basis using the reverse-correlation method. The results revealed that it is possible to predict the categorical responses with better-than-chance accuracy purely based on the spectro-temporal distribution of the random envelope fluctuations of the corresponding noises, without taking into account the actual targets or the signal-to-noise ratios used in the trials. The effect of the noise fluctuations explained on average 8.1% of the participants' responses in white noise, a proportion that increased up to 13.3% for noises with a larger amount of fluctuations. The estimated time-frequency weights revealed that the measured effect originated from confusions between noise fluctuations and relevant acoustic cues from the target sounds. Similar conclusions were obtained from simulations using an artificial listener.

1.
Ahumada
,
A.
, and
Lovell
,
J.
(
1971
). “
Stimulus features in signal detection
,”
J. Acoust. Soc. Am.
49
,
1751
1756
.
2.
Ahumada
,
A.
,
Marken
,
R.
, and
Sandusky
,
A.
(
1975
). “
Time and frequency analyses of auditory signal detection
,”
J. Acoust. Soc. Am.
57
,
385
390
.
3.
Alwan
,
A.
,
Jiang
,
J.
, and
Chen
,
W.
(
2011
). “
Perception of place of articulation for plosives and fricatives in noise
,”
Speech Commun.
53
,
195
209
.
4.
Clayards
,
M.
(
2018
). “
Differences in cue weights for speech perception are correlated for individuals within and across contrasts
,”
J. Acoust Soc. Am.
144
,
EL172
EL177
.
5.
Cooke
,
M.
(
2006
). “
A glimpsing model of speech perception in noise
,”
J. Acoust. Soc. Am.
119
,
1562
1573
.
6.
Cooke
,
M.
(
2009
). “
Discovering consistent word confusions in noise
,” in
Proceedings of Interspeech 2009
, pp.
1887
1890
.
7.
Dau
,
T.
,
Kollmeier
,
B.
, and
Kohlrausch
,
A.
(
1997
). “
Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers
,”
J. Acoust. Soc. Am.
102
,
2892
2905
.
8.
Dau
,
T.
,
Verhey
,
J.
, and
Kohlrausch
,
A.
(
1999
). “
Intrinsic envelope fluctuations and modulation-detection thresholds for narrow-band noise carriers
,”
J. Acoust. Soc. Am.
106
,
2752
2760
.
9.
Delattre
,
P.
,
Liberman
,
A.
, and
Cooper
,
F.
(
1955
). “
Acoustic loci and transitional cues for consonants
,”
J. Acoust. Soc. Am.
27
,
769
773
.
10.
Dorman
,
M.
,
Studdert-Kennedy
,
M.
, and
Raphael
,
L.
(
1977
). “
Stop-consonant recognition: Release bursts and formant transitions as functionally equivalent, context-dependent cues
,”
Percept. Psychophys.
22
,
109
122
.
11.
Drullman
,
R.
(
1995
). “
Temporal envelope and fine structure cues for speech intelligibility
,”
J. Acoust. Soc. Am.
97
,
585
592
.
12.
Dubbelboer
,
F.
, and
Houtgast
,
T.
(
2007
). “
A detailed study on the effects of noise on speech intelligibility
,”
J. Acoust. Soc. Am.
122
,
2865
2871
.
13.
Dubbelboer
,
F.
, and
Houtgast
,
T.
(
2008
). “
The concept of signal-to-noise ratio in the modulation domain and speech intelligibility
,”
J. Acoust. Soc. Am.
124
,
3937
3946
.
14.
Elliott
,
T.
, and
Theunissen
,
F.
(
2009
). “
The modulation transfer function for speech intelligibility
,”
PLoS Comput. Biol.
5
,
e1000302
.
15.
Francart
,
T.
,
van Wieringen
,
A.
, and
Wouters
,
J.
(
2011
). “
Comparison of fluctuating maskers for speech recognition tests
,”
Int. J. Audiol.
50
,
2
13
.
16.
French
,
N. R.
, and
Steinberg
,
J.
(
1947
). “
Factors governing the intelligibility of speech sounds
,”
J. Acoust. Soc. Am.
19
,
90
119
.
17.
Glasberg
,
B.
, and
Moore
,
B.
(
1990
). “
Derivation of auditory filter shapes from notched-noise data
,”
Hear. Res.
47
,
103
138
.
18.
Green
,
D.
(
1964
). “
Consistency of auditory detection judgments
,”
Psychol. Rev.
71
,
392
407
.
19.
Green
,
D.
, and
Swets
,
J.
(
1966
). “
Theory of ideal observers
,” in
Signal Detection Theory and Psychophysics
(
Wiley
,
New York
), pp.
151
179
.
20.
Harvey
,
L.
(
2004
).
Detection Theory: Sensitivity and Response Bias
(
University of Colorado Press
,
Boulder, CO
).
21.
Jørgensen
,
S.
, and
Dau
,
T.
(
2011
). “
Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing
,”
J. Acoust. Soc. Am.
130
,
1475
1487
.
22.
Jürgens
,
T.
, and
Brand
,
T.
(
2009
). “
Microscopic prediction of speech recognition for listeners with normal hearing in noise using an auditory model
,”
J. Acoust. Soc. Am.
126
,
2635
2648
.
23.
Kaernbach
,
C.
(
1991
). “
Simple adaptive testing with the weighted up-down method
,”
Percept. Psychophys.
49
,
227
229
.
24.
Li
,
F.
, and
Allen
,
J.
(
2011
). “
Manipulation of consonants in natural speech
,”
IEEE Trans. Audio. Speech. Lang. Process.
19
,
496
504
.
25.
Li
,
F.
,
Menon
,
A.
, and
Allen
,
J.
(
2010
). “
A psychoacoustic method to find the perceptual cues of stop consonants in natural speech
,”
J. Acoust. Soc. Am.
127
,
2599
2610
.
26.
Meyer
,
B.
,
Jürgens
,
T.
,
Wesker
,
T.
,
Brand
,
T.
, and
Kollmeier
,
B.
(
2010
). “
Human phoneme recognition depending on speech-intrinsic variability
,”
J. Acoust. Soc. Am.
128
,
3126
3141
.
27.
Mineault
,
P.
,
Barthelmé
,
S.
, and
Pack
,
C.
(
2009
). “
Improved classification images with sparse priors in a smooth basis
,”
J. Vis.
9
,
17
24
.
28.
Noordhoek
,
I.
, and
Drullman
,
R.
(
1997
). “
Effect of reducing temporal intensity modulations on sentence intelligibility
,”
J. Acoust. Soc. Am.
101
,
498
502
.
29.
Ohde
,
R.
, and
Stevens
,
K.
(
1983
). “
Effect of burst amplitude on the perception of stop consonant place of articulation
,”
J. Acoust. Soc. Am.
74
,
706
714
.
30.
Osses
,
A.
, and
Kohlrausch
,
A.
(
2021
). “
Perceptual similarity between piano notes: Simulations with a template-based perception model
,”
J. Acoust. Soc. Am.
149
,
3534
3552
.
31.
Osses
,
A.
,
Lorenzi
,
C.
, and
Varnet
,
L.
(
2022a
). “
Assessment of individual listening strategies in amplitude-modulation detection and phoneme categorisation tasks
,” in
International Congress on Acoustics
, hal-03788655,
Gyeongju, South Korea
, pp.
1
12
.
32.
Osses
,
A.
, and
Varnet
,
L.
(
2021
). “
Consonant-in-noise discrimination using an auditory model with different speech-based decision devices
,” in
Proceedings of DAGA
, hal-03345050, pp.
298
301
.
33.
Osses
,
A.
, and
Varnet
,
L.
(
2022a
). “
Auditory reverse correlation on a phoneme-discrimination task: Assessing the effect of different types of background noise
,” in
ARO Mid-Winter Meeting
,
hal-03553443v1
.
34.
Osses
,
A.
, and
Varnet
,
L.
(
2022b
). “
Raw and post-processed data for the microscopic investigation of the effect of random envelope fluctuations on phoneme-in-noise perception
,” doi:10.5281/zenodo.7476407.
35.
Osses
,
A.
, and
Varnet
,
L.
(
2022c
). “
Sound perception using auditory classification images
,” https://osf.io/4ju3f/ (Last viewed February 12, 2024).
36.
Osses
,
A.
, and
Varnet
,
L.
(
2023
). “
fastACI toolbox: The MATLAB toolbox for investigating auditory perception using reverse correlation (v1.3)
,” https://github.com/aosses-tue/fastACI (Last viewed February 12, 2024).
37.
Osses
,
A.
,
Varnet
,
L.
,
Carney
,
L.
,
Dau
,
T.
,
Bruce
,
I.
,
Verhulst
,
S.
, and
Majdak
,
P.
(
2022b
). “
A comparative study of eight human auditory models of monaural processing
,”
Acta Acust.
6
,
17
.
38.
Pallier
,
C.
,
Christophe
,
A.
, and
Mehler
,
J.
(
1997
). “
Language-specific listening
,”
Trends Cogn. Sci.
1
,
129
132
.
39.
Pfafflin
,
S.
(
1968
). “
Detection of auditory signal in restricted sets of reproducible noise
,”
J. Acoust. Soc. Am.
43
,
487
490
.
40.
Pfafflin
,
S.
, and
Mathews
,
M.
(
1966
). “
Detection of auditory signals in reproducible noise
,”
J. Acoust. Soc. Am.
39
,
340
345
.
41.
Plomp
,
R.
, and
Mimpen
,
M.
(
1979
). “
Improving the reliability of testing the speech reception threshold for sentences
,”
Int. J. Audiol.
18
,
43
52
.
42.
Průša
,
Z.
(
2017
). “
The phase retrieval toolbox
,” in
AES International Conference on Semantic Audio
,
Erlangen, Germany
.
43.
Régnier
,
M.
, and
Allen
,
J.
(
2008
). “
A method to identify noise-robust perceptual features: Application for consonant /t/
,”
J. Acoust. Soc. Am.
123
,
2801
2814
.
44.
Shannon
,
R.
,
Zeng
,
F.
,
Kamath
,
V.
,
Wygonski
,
J.
, and
Ekelid
,
M.
(
1995
). “
Speech recognition with primarily temporal cues
,”
Science
270
(5234),
303
304
.
45.
Singh
,
R.
, and
Allen
,
J.
(
2012
). “
The influence of stop consonants' perceptual features on the articulation index model
,”
J. Acoust. Soc. Am.
131
,
3051
3068
.
46.
Stone
,
M.
,
Füllgrabe
,
C.
,
Mackinnon
,
R.
, and
Moore
,
B.
(
2011
). “
The importance for speech intelligibility of random fluctuations in steady background noise
,”
J. Acoust Soc. Am.
130
,
2874
2881
.
47.
Stone
,
M.
,
Füllgrabe
,
C.
, and
Moore
,
B.
(
2012
). “
Notionally steady background noise acts primarily as a modulation masker of speech
,”
J. Acoust. Soc. Am.
132
,
317
326
.
48.
Varnet
,
L.
(
2015
). “
Identification des indices acoustiques utilisés lors de la compréhension de la parole dégradée
,” Ph.D. thesis,
Université Claude Bernard–Lyon I
,
Lyon
, France.
49.
Varnet
,
L.
,
Knoblauch
,
K.
,
Meunier
,
F.
, and
Hoen
,
M.
(
2013
). “
Using auditory classification images for the identification of fine acoustic cues used in speech perception
,”
Front. Hum. Neurosci.
7
,
865
.
50.
Varnet
,
L.
,
Knoblauch
,
K.
,
Serniclaes
,
W.
,
Meunier
,
F.
, and
Hoen
,
M.
(
2015
). “
A psychophysical imaging method evidencing auditory cue extraction during speech perception: A group analysis of auditory classification images
,”
PLoS One
10
(
3
),
e0118009
23
.
51.
Varnet
,
L.
,
Langlet
,
C.
,
Lorenzi
,
C.
,
Lazard
,
D.
, and
Micheyl
,
C.
(
2019
). “
High-frequency sensorineural hearing loss alters cue-weighting strategies for discriminating stop consonants in noise
,”
Trends Hear.
23
,
2331216519886707
.
52.
Varnet
,
L.
, and
Lorenzi
,
C.
(
2022
). “
Probing temporal modulation detection in white noise using intrinsic envelope fluctuations: A reverse-correlation study
,”
J. Acoust. Soc. Am.
151
,
1353
1366
.
53.
Venezia
,
J.
,
Hickok
,
G.
, and
Richards
,
V.
(
2016
). “
Auditory bubbles: Efficient classification of the spectrotemporal modulations essential for speech intelligibility
,”
J. Acoust. Soc. Am.
140
,
1072
1088
.
54.
Wood
,
S.
(
2017
). “
Generalized linear models
,” in
Generalized Additive Models: An Introduction with R
, 2nd ed. (
CRC Press
,
Boca Raton, FL
), Chap. 3, pp.
101
160
.
55.
Zaar
,
J.
, and
Dau
,
T.
(
2015
). “
Sources of variability in consonant perception of normal-hearing listeners
,”
J. Acoust. Soc. Am.
138
,
1253
1267
.

Supplementary Material

You do not currently have access to this content.