The generalized power spectrum model [GPSM; Biberger and Ewert (2016). J. Acoust. Soc. Am. 140, 1023–1038], combining the “classical” concept of the power-spectrum model (PSM) and the envelope power spectrum-model (EPSM), was demonstrated to account for several psychoacoustic and speech intelligibility (SI) experiments. The PSM path of the model uses long-time power signal-to-noise ratios (SNRs), while the EPSM path uses short-time envelope power SNRs. A systematic comparison of existing SI models for several spectro-temporal manipulations of speech maskers and gender combinations of target and masker speakers [Schubotz et al. (2016). J. Acoust. Soc. Am. 140, 524–540] showed the importance of short-time power features. Conversely, Jørgensen et al. [(2013). J. Acoust. Soc. Am. 134, 436–446] demonstrated a higher predictive power of short-time envelope power SNRs than power SNRs using reverberation and spectral subtraction. Here the GPSM was extended to utilize short-time power SNRs and was shown to account for all psychoacoustic and SI data of the three mentioned studies. The best processing strategy was to exclusively use either power or envelope-power SNRs, depending on the experimental task. By analyzing both domains, the suggested model might provide a useful tool for clarifying the contribution of amplitude modulation masking and energetic masking.

1.
ANSI
(
1969
). S3.5,
Methods for the Calculation of the Articulation Index
(
Acoustical Society of America
,
New York
).
2.
ANSI
(
1997
). S3.5,
Methods for Calculation of the Speech Intelligibility Index
(
Acoustical Society of America
,
New York
).
3.
Barker
,
J.
, and
Cooke
,
M.
(
2007
). “
Modelling speaker intelligibility in noise
,”
Speech Commun.
49
,
402
417
.
4.
Beutelmann
,
R.
,
Brand
,
T.
, and
Kollmeier
,
B.
(
2010
). “
Revision, extension and evaluation of a binaural speech intelligibility model
,”
J. Acoust. Soc. Am.
127
,
2479
2497
.
5.
Biberger
,
T.
, and
Ewert
,
S. D.
(
2016
). “
Envelope and intensity based prediction of psychoacoustic masking and speech intelligibility
,”
J. Acoust. Soc. Am.
140
,
1023
1038
.
6.
Chabot-Leclerc
,
A.
,
Jørgensen
,
S.
, and
Dau
,
T.
(
2014
). “
The role of auditory spectro-temporal modulation filtering and the decision metric for speech intelligibility prediction
,”
J. Acoust. Soc. Am.
135
,
3502
3512
.
7.
Dau
,
T.
,
Kollmeier
,
B.
, and
Kohlrausch
,
A.
(
1997a
). “
Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers
,”
J. Acoust. Soc. Am.
102
,
2892
2905
.
8.
Dau
,
T.
,
Kollmeier
,
B.
, and
Kohlrausch
,
A.
(
1997b
). “
Modeling auditory processing of amplitude modulation. II. Spectral and temporal integration
,”
J. Acoust. Soc. Am.
102
,
2906
2919
.
9.
Dubbelboer
,
F.
, and
Houtgast
,
T.
(
2008
). “
The concept of signal-to-noise ratio in the modulation domain and speech intelligibility
,”
J. Acoust. Soc. Am.
124
,
3937
3946
.
10.
Eddins
,
D. A.
,
Hall
,
J. W.
, and
Grose
,
J. H.
(
1992
). “
Detection of temporal gaps as a function of frequency region and absolute noise bandwidth
,”
J. Acoust. Soc. Am.
91
,
1069
1077
.
11.
Elhilali
,
M.
,
Chi
,
T.
, and
Shamma
,
S. A.
(
2003
). “
A spectro-temporal modulation index (STMI) for assessment of speech intelligibility
,”
Speech Commun.
41
,
331
348
.
12.
Ewert
,
S. D.
, and
Dau
,
T.
(
2000
). “
Characterizing frequency selectivity for envelope fluctuations
,”
J. Acoust. Soc. Am.
108
,
1181
1196
.
13.
Ewert
,
S. D.
, and
Dau
,
T.
(
2004
). “
External and internal limitations in amplitude-modulation processing
,”
J. Acoust. Soc. Am.
116
,
478
490
.
14.
Ewert
,
S. D.
,
Verhey
,
J. L.
, and
Dau
,
T.
(
2002
). “
Spectro-temporal processing in the envelope-frequency domain
,”
J. Acoust. Soc. Am.
112
,
2921
2931
.
15.
Festen
,
J. M.
(
1993
). “
Contributions of comodulation masking release and temporal resolution to the speech-reception threshold masked by an interfering voice
,”
J. Acoust. Soc. Am.
94
,
1295
1300
.
16.
Fraunhofer IDMT
(
2013
). “
SIP-Toolbox: Sound Quality and Speech Intelligibility Prediction Toolbox
,” Fraunhofer IDMT, Oldenburg, Germany, http://www.idmt.fraunhofer.de/de/institute/projects_products/q_t/sip-toolbox.html (Last viewed July 20, 2017).
17.
George
,
E. L. J.
,
Festen
,
J. M.
, and
Houtgast
,
T.
(
2008
). “
The combined effects of reverberation and nonstationary noise on sentence intelligibility
,”
J. Acoust. Soc. Am.
124
,
1269
1277
.
18.
Hall
,
J. W.
,
Haggard
,
M. P.
, and
Fernandes
,
M. A.
(
1984
). “
Detection in noise by spectro-temporal pattern analysis
,”
J. Acoust. Soc. Am.
76
,
50
56
.
19.
Hochmuth
,
S.
,
Jürgens
,
T.
,
Brand
,
T.
, and
Kollmeier
,
B.
(
2014
). “
Multilingualer Cocktailparty-Einfluss von sprecher- und sprachspezifischen Faktoren auf die Sprachverständlichkeit im Störschall” (“Multilingual effect of speaker- and speech-specific factors on speech intelligibility in noise in cocktail party situations”)
, in
Proceedings of the 17th Jahrestagung der Deutschen Gesellschaft für Audiologie
,
Oldenburg, Germany
.
20.
Holube
,
I.
,
Fredelake
,
S.
,
Vlaming
,
M.
, and
Kollmeier
,
B.
(
2010
). “
Development and analysis of an International Speech Test Signal (ISTS)
,”
Int. J. Audiol.
49
,
891
903
.
21.
Houtsma
,
A. J. M.
,
Durlach
,
N. I.
, and
Braida
,
L. D.
(
1980
). “
Intensity perception. XI. Experimental results on the relation of intensity resolution to loudness matching
,”
J. Acoust. Soc. Am.
68
,
807
813
.
22.
ISO
(
2005
). 389-7,
Acoustics-Reference Zero for the Calibration of Audiometric Equipment. Part 7: Reference Threshold of Hearing Under Free-Field and Diffuse-Field Listening Conditions
(
International Organization for Standardization
,
Geneva, Switzerland
).
23.
Jepsen
,
M. L.
,
Ewert
,
S. D.
, and
Dau
,
T.
(
2008
). “
A computational model of human auditory signal processing and perception
,”
J. Acoust. Soc. Am.
124
,
422
438
.
24.
Jørgensen
,
S.
, and
Dau
,
T.
(
2011
). “
Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing
,”
J. Acoust. Soc. Am.
130
,
1475
1487
.
25.
Jørgensen
,
S.
,
Ewert
,
S. D.
, and
Dau
,
T.
(
2013
). “
A multi-resolution envelope-power based model for speech intelligibility
,”
J. Acoust. Soc. Am.
134
,
436
446
.
26.
Kohlrausch
,
A.
,
Fassel
,
R.
, and
Dau
,
T.
(
2000
). “
The influence of carrier level and frequency on modulation and beat-detection thresholds for sinusoidal carriers
,”
J. Acoust. Soc. Am.
108
,
723
734
.
27.
Kollmeier
,
B.
,
Rennies
,
J.
, and
Brand
,
T.
(
2011
). “
Tools to predict binaural speech intelligibility in complex listening environment for normal and hearing-impaired listeners
,”
J. Acoust. Soc. Am.
129
,
2669
.
28.
Lorenzi
,
C.
,
Husson
,
M.
,
Ardoint
,
M.
, and
Debruille
,
X.
(
2006
). “
Speech masking release in listeners with flat hearing loss: Effects of masker fluctuation rate on identification scores and phonetic feature reception
,”
Int. J. Audiol.
45
,
487
495
.
29.
Ludvigsen
,
C.
(
1985
). “
Relations among some psychoacoustic parameters in normal and cochlearly impaired listeners
,”
J. Acoust. Soc. Am.
78
,
1271
1280
.
30.
Meyer
,
R. M.
, and
Brand
,
T.
(
2013
). “
Comparison of different short-term speech intelligibility index procedures in fluctuating noise for listeners with normal and impaired hearing
,”
Acta Acust. Acust.
99
,
442
456
.
31.
Moore
,
B. C. J.
(
1997
).
An Introduction to the Psychology of Hearing
, 4th ed. (
Academic
,
London, UK
).
32.
Moore
,
B. C. J.
,
Alcántara
,
J. I.
, and
Dau
,
T.
(
1998
). “
Masking patterns for sinusoidal and narrow-band noise maskers
,”
J. Acoust. Soc. Am.
104
,
1023
1038
.
33.
Moore
,
B. C. J.
, and
Glasberg
,
B. R.
(
1983
). “
Suggested formulae for calculating auditory filter bandwidth and excitation patterns
,”
J. Acoust. Soc. Am.
74
,
750
753
.
34.
Nielsen
,
J. B.
, and
Dau
,
T.
(
2009
). “
Development of a Danish speech intelligibility test
,”
Int. J. Audiol.
48
,
729
741
.
35.
Relaño-Iborra
,
H.
,
May
,
T.
,
Zaar
,
J.
,
Scheidiger
,
C.
, and
Dau
,
T.
(
2016
). “
Predicting speech intelligibility based on a correlation metric in the envelope power spectrum domain
,”
J. Acoust. Soc. Am.
140
,
2670
2679
.
36.
Rennies
,
J.
,
Warzybok
,
A.
,
Brand
,
T.
, and
Kollmeier
,
B.
(
2014
). “
Modeling the effects of a single reflection on binaural speech intelligibility
,”
J. Acoust. Soc. Am.
135
,
1556
1567
.
37.
Rhebergen
,
K. S.
, and
Versfeld
,
N. J.
(
2005
). “
A Speech Intelligibility Index-based approach to predict the speech reception threshold for sentences in fluctuating noise for normal-hearing listeners
,”
J. Acoust. Soc. Am.
117
,
2181
2192
.
38.
Rhebergen
,
K. S.
,
Versfeld
,
N. J.
, and
Dreschler
,
W. A.
(
2006
). “
Extended speech intelligibility index for the prediction of the speech reception threshold in fluctuating noise
,”
J. Acoust. Soc. Am.
120
,
3988
3997
.
39.
Schubotz
,
W.
,
Brand
,
T.
,
Kollmeier
,
B.
, and
Ewert
,
S. D.
(
2016
). “
Monaural speech intelligibility and detection in maskers with varying amounts of spectro-temporal speech features
,”
J. Acoust. Soc. Am.
140
,
524
540
.
40.
Shailer
,
M. J.
, and
Moore
,
B. C. J.
(
1983
). “
Gap detection as a function of frequency bandwidth and level
,”
J. Acoust. Soc. Am.
74
,
467
473
.
41.
Steeneken
,
H. J. M.
, and
Houtgast
,
T.
(
1980
). “
A physical method for measuring speech-transmission quality
,”
J. Acoust. Soc. Am.
67
,
318
326
.
42.
Stone
,
M. A.
,
Füllgrabe
,
C.
, and
Moore
,
B. C. J.
(
2012
). “
Notionally steady background noise act primarily as a modulation masker of speech
,”
J. Acoust. Soc. Am.
132
,
317
326
.
43.
Taal
,
C. H.
,
Hendriks
,
R. C.
,
Heusdens
,
R.
, and
Jensen
,
J.
(
2011
). “
An algorithm for intelligibility prediction of time-frequency weighted noisy speech
,”
IEEE Trans. Audio Speech Lang. Process.
19
,
2125
2136
.
44.
Tanner
,
W. P.
, and
Sorkin
,
R. D.
(
1972
). “
The theory of signal detectability
,” in
Foundation of Modern Auditory Function
(
Academic
,
New York
), pp.
63
97
.
45.
Verhey
,
J. L.
,
Dau
,
T.
, and
Kollmeier
,
B.
(
1999
). “
Within-channel cues in comodulation masking release (CMR): Experiments and model predictions using a modulation-filterbank model
,”
J. Acoust. Soc. Am.
106
,
2733
2745
.
46.
Viemeister
,
N. F.
(
1979
). “
Temporal modulation transfer functions based upon modulation thresholds
,”
J. Acoust. Soc. Am.
66
,
1364
1380
.
47.
Wagener
,
K.
,
Brand
,
T.
, and
Kollmeier
,
B.
(
1999
). “
Entwicklung und Evaluation eines Satztests für die deutsche Sprache III: Design, Optimierung und Evaluation des Oldenburger Satztests” (“Development and evaluation of a sentence test for German language III: Design, optimization and evaluation of the Oldenburg sentence test”)
,
Z. Audiol.
38
,
86
95
.
48.
Wagener
,
K.
,
Hochmuth
,
S.
,
Ahrlich
,
M.
,
Zokoll
,
M.
, and
Kollmeier
,
B.
(
2014
). “
Der weibliche Oldenburger Satztest” (“The female version of the Oldenburg sentence test”)
, in
Proceedings of the 17th Jahrestagung der Deutschen Gesellschaft für Audiologie
,
Oldenburg, Germany
.
You do not currently have access to this content.