Human auditory perception and speech intelligibility have been successfully described based on the two concepts of spectral masking and amplitude modulation (AM) masking. The power-spectrum model (PSM) [Patterson and Moore (1986). Frequency Selectivity in Hearing, pp. 123–177] accounts for effects of spectral masking and critical bandwidth, while the envelope power-spectrum model (EPSM) [Ewert and Dau (2000). J. Acoust. Soc. Am. 108, 1181–1196] has been successfully applied to AM masking and discrimination. Both models extract the long-term (envelope) power to calculate signal-to-noise ratios (SNR). Recently, the EPSM has been applied to speech intelligibility (SI) considering the short-term envelope SNR on various time scales (multi-resolution speech-based envelope power-spectrum model; mr-sEPSM) to account for SI in fluctuating noise [Jørgensen, Ewert, and Dau (2013). J. Acoust. Soc. Am. 134, 436–446]. Here, a generalized auditory model is suggested combining the classical PSM and the mr-sEPSM to jointly account for psychoacoustics and speech intelligibility. The model was extended to consider the local AM depth in conditions with slowly varying signal levels, and the relative role of long-term and short-term SNR was assessed. The suggested generalized power-spectrum model is shown to account for a large variety of psychoacoustic data and to predict speech intelligibility in various types of background noise.

1.
ANSI
(
1969
). S3.5,
Methods for the Calculation of the Articulation Index
(
Standards Secreteriat, Acoustical Society of America
,
New York
).
2.
ANSI
(
1997
). S3.5,
Methods for Calculation of the Speech Intelligibility Index
(
Standards Secreteriat, Acoustical Society of America
,
New York
).
3.
Beutelmann
,
R.
,
Brand
,
T.
, and
Kollmeier
,
B.
(
2010
). “
Revision, extension and evaluation of a binaural speech intelligibility model
,”
J. Acoust. Soc. Am.
127
,
2479
2497
.
4.
Buus
,
S.
(
1985
). “
Release from masking caused by envelope fluctuations
,”
J. Acoust. Soc. Am.
78
,
1958
1965
.
5.
Buus
,
S.
(
1990
). “
Level discrimination of frozen and random noise
,”
J. Acoust. Soc. Am.
87
,
2643
2654
.
6.
Dau
,
T.
,
Kollmeier
,
B.
, and
Kohlrausch
,
A.
(
1997a
). “
Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers
,”
J. Acoust. Soc. Am.
102
,
2892
2905
.
7.
Dau
,
T.
,
Kollmeier
,
B.
, and
Kohlrausch
,
A.
(
1997b
). “
Modeling auditory processing of amplitude modulation. II. Spectral and temporal integration
,”
J. Acoust. Soc. Am.
102
,
2906
2919
.
8.
Dau
,
T.
,
Püschel
,
D.
, and
Kohlrausch
,
A.
(
1996
). “
A quantitative model of the effective signal processing in the auditory system. I. Model structure
,”
J. Acoust. Soc. Am.
99
,
3615
3622
.
9.
Dau
,
T.
,
Verhey
,
J.
, and
Kohlrausch
,
A.
(
1999
). “
Intrinsic envelope fluctuations and modulation-detection thresholds for narrow-band noise carriers
,”
J. Acoust. Soc. Am.
106
,
2752
2760
.
10.
Dubbelboer
,
F.
, and
Houtgast
,
T.
(
2008
). “
The concept of signal-to-noise ratio in the modulation domain and speech intelligibility
,”
J. Acoust. Soc. Am.
124
,
3937
3946
.
11.
Ewert
,
S. D.
(
2013
). “
AFC—A modular framework for running psychoacoustical experiments and computational perception models
,” in
Proceedings 39th Conference on Acoustics—AIA-DAGA
,
Meran, Italy
.
12.
Ewert
,
S. D.
, and
Dau
,
T.
(
2000
). “
Characterizing frequency selectivity for envelope fluctuations
,”
J. Acoust. Soc. Am.
108
,
1181
1196
.
13.
Ewert
,
S. D.
, and
Dau
,
T.
(
2004
). “
External and internal limitations in amplitude-modulation processing
,”
J. Acoust. Soc. Am.
116
,
478
490
.
14.
Ewert
,
S. D.
,
Verhey
,
J. L.
, and
Dau
,
T.
(
2002
). “
Spectro-temporal processing in the envelope-frequency domain
,”
J. Acoust. Soc. Am.
112
,
2921
2931
.
15.
Festen
,
J. M.
, and
Plomp
,
R.
(
1990
). “
Effects of fluctuating noise and interfering speech on the speech-reception threshold for impaired and normal hearing
,”
J. Acoust. Soc. Am.
88
,
1725
1736
.
16.
Fletcher
,
H.
(
1940
). “
Auditory patterns
,”
Rev. Mod. Phys.
12
,
47
65
.
22.
George
,
E. L. J.
,
Festen
,
J. M.
, and
Houtgast
,
T.
(
2008
). “
The combined effects of reverberation and nonstationary noise on sentence intelligibility
,”
J. Acoust. Soc. Am.
124
,
1269
1277
.
23.
Goossens
,
T.
,
van de Par
,
S.
, and
Kohlrausch
,
A.
(
2008
). “
On the ability to discriminate Gaussian-noise tokens or random tone-burst complexes
,”
J. Acoust. Soc. Am.
124
,
2251
2262
.
24.
Gustafsson
,
H. Å.
, and
Arlinger
,
S. D.
(
1994
). “
Masking of speech by amplitude-modulated noise
,”
J. Acoust. Soc. Am.
95
,
518
529
.
17.
Harlander
,
N.
,
Huber
,
R.
, and
Ewert
,
S. D.
(
2014
). “
Sound quality assessment using auditory models
,”
J. Audio Eng. Soc.
62
,
324
336
.
18.
Holube
,
I.
,
Fredelake
,
S.
,
Vlaming
,
M.
, and
Kollmeier
,
B.
(
2010
). “
Development and analysis of an international speech test signal (ISTS)
,”
Int. J. Audiol.
49
,
891
903
.
19.
Houtgast
,
T.
,
Steeneken
,
H. J. M.
, and
Plomp
,
R.
(
1980
). “
Predicting speech intelligibility in rooms from the modulation transfer function I. General room acoustics
,”
Acta Acust. Acust.
46
,
60
72
.
20.
Houtsma
,
A. J. M.
,
Durlach
,
N. I.
, and
Braida
,
L. D.
(
1980
). “
Intensitiy perception. XI. Experimental results on the relation of intensity resolution to loudness matching
,”
J. Acoust. Soc. Am.
68
,
807
813
.
21.
Howard-Jones
,
P. A.
, and
Rosen
,
S.
(
1993
). “
The perception of speech in fluctuating noise
,”
Acoustica
78
,
258
272
.
25.
ISO 389-7
(
2005
). “
Acoustics-Reference Zero for the Calibration of Audiometric Equipment. Part 7: Reference Threshold of hearing under free-field and diffuse-field listening conditions
” (International Organization for Standardization, Geneva, Switzerland).
26.
Jepsen
,
M. L.
,
Ewert
,
S. D.
, and
Dau
,
T.
(
2008
). “
A computational model of human auditory signal processing and perception
,”
J. Acoust. Soc. Am.
124
,
422
438
.
27.
Jørgensen
,
S.
, and
Dau
,
T.
(
2011
). “
Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing
,”
J. Acoust. Soc. Am.
130
,
1475
1487
.
28.
Jørgensen
,
S.
,
Ewert
,
S. D.
, and
Dau
,
T.
(
2013
). “
A multi-resolution envelope-power based model for speech intelligibility
,”
J. Acoust. Soc. Am.
134
,
436
446
.
29.
Jürgens
,
T.
, and
Brand
,
T.
(
2009
). “
Microscopic prediction of speech recognition for listeners with normal hearing in noise using an auditory model
,”
J. Acoust. Soc. Am.
126
,
2635
2648
.
30.
Jürgens
,
T.
,
Ewert
,
S. D.
,
Kollmeier
,
B.
, and
Brand
,
T.
(
2014
). “
Prediction of consonant recognition in quiet for listeners with normal and impaired hearing using an auditory model
,”
J. Acoust. Soc. Am.
135
,
1506
1517
.
31.
Kohlrausch
,
A.
(
1988
). “
Masking patterns of harmonic complex tone maskers and the role of the inner ear transfer function
,” in
Basic issues in hearing
(
Academic
,
New York
), pp.
339
346
.
32.
Kohlrausch
,
A.
,
Fassel
,
R.
, and
Dau
,
T.
(
2000
). “
The influence of carrier level and frequency on modulation and beat-detection thresholds for sinusoidal carriers
,”
J. Acoust. Soc. Am.
108
,
723
734
.
33.
Langhans
,
A.
, and
Kohlrausch
,
A.
(
1992
). “
Differences in auditory performance between monaural and diotic conditions. I: Masked thresholds in frozen noise
,”
J. Acoust. Soc. Am.
91
,
3456
3470
.
34.
Levitt
,
H.
(
1971
). “
Transformed up−down procedures in psychoacoustics
,”
J. Acoust. Soc. Am.
49
,
467
477
.
35.
Ludvigsen
,
C.
(
1985
). “
Relations among some psychoacoustic parameters in normal and cochlearly impaired listeners
,”
J. Acoust. Soc. Am.
78
,
1271
1280
.
36.
Miller
,
G. A.
, and
Licklider
,
J. D. R.
(
1950
). “
The intelligibility of interrupted speech
,”
J. Acoust. Soc. Am.
22
,
167
173
.
37.
Moore
,
B. C. J.
,
Alcántara
,
J. I.
, and
Dau
,
T.
(
1998
). “
Masking patterns for sinusoidal and narrow-band noise maskers
,”
J. Acoust. Soc. Am.
104
,
1023
1038
.
38.
Moore
,
B. C. J.
, and
Glasberg
,
B. R.
(
1983
). “
Suggested formulae for calculating auditory filter bandwidth and excitation patterns
,”
J. Acoust. Soc. Am.
74
,
750
753
.
39.
Moore
,
B. C. J.
, and
Glasberg
,
B. R.
(
1987
). “
Formulae describing frequency selectivity as a function of frequency and level and their use in calculating excitation patterns
,”
Hear. Res.
28
,
209
225
.
40.
Nielsen
,
J. B.
, and
Dau
,
T.
(
2009
). “
Development of a Danish speech intelligibility test
,”
Int. J. Audiol.
48
,
729
741
.
41.
Patterson
,
R. D.
, and
Henning
,
G. B.
(
1977
). “
Stimulus variability and auditory filter shape
,”
J. Acoust. Soc. Am.
62
,
649
664
.
42.
Patterson
,
R. D.
, and
Moore
,
B. C. J.
(
1986
). “
Auditory filters and excitation patterns as representations of frequency resolution
,” in
Frequency Selectivity in Hearing
(
Academic
,
New York)
, pp.
123
177
.
43.
Rhebergen
,
K. S.
, and
Versfeld
,
N. J.
(
2005
). “
A Speech Intelligibility Index-based approach to predict the speech reception threshold for sentences in fluctuating noise for normal-hearing listeners
,”
J. Acoust. Soc. Am.
117
,
2181
2192
.
44.
Rhebergen
,
K. S.
,
Versfeld
,
N. J.
, and
Dreschler
,
W. A.
(
2006
). “
Extended speech intelligibility index for the prediction of the speech reception threshold in fluctuating noise
,”
J. Acoust. Soc. Am.
120
,
3988
3997
.
45.
Strickland
,
E. A.
, and
Viemeister
,
N. F.
(
1996
). “
Cues of discrimination of envelopes
,”
J. Acoust. Soc. Am.
99
,
3638
3646
.
46.
Tanner
,
W. P.
, and
Sorkin
,
R. D.
(
1972
). “
The theory of signal detectability
,” in
Foundation of Modern Auditory Function
(
Academic
,
New York
), pp.
63
97
.
47.
Verhey
,
J. L.
,
Dau
,
T.
, and
Kollmeier
,
B.
(
1999
). “
Within-channel cues in comodulation masking release (CMR): Experiments and model predictions using a modulation-filterbank model
,”
J. Acoust. Soc. Am.
106
,
2733
2745
.
48.
Viemeister
,
N. F.
(
1979
). “
Temporal modulation transfer functions based upon modulation thresholds
,”
J. Acoust. Soc. Am.
66
,
1364
1380
.
49.
Viemeister
,
N. F.
,
Stellmack
,
M. A.
, and
Byrne
,
A. J.
(
2004
). “
The role of temporal structure in envelope processing
,” in
Auditory Signal Processing: Physiology, Psychoacoustics, and Models
, edited by
D.
Pressnitzer
,
A.
de Cheveigne
,
S.
McAdams
, and
L.
Collet
(
Springer-Verlag
,
New York
), pp.
67
74
.
50.
Zwicker
,
E.
, and
Schorn
,
K.
(
1982
). “
Temporal resolution in hard-of-hearing patients
,”
Audiology
21
,
474
492
.
You do not currently have access to this content.