The generalized power spectrum model [GPSM; Biberger and Ewert (2016). J. Acoust. Soc. Am. 140, 1023–1038], combining the “classical” concept of the power-spectrum model (PSM) and the envelope power spectrum-model (EPSM), was demonstrated to account for several psychoacoustic and speech intelligibility (SI) experiments. The PSM path of the model uses long-time power signal-to-noise ratios (SNRs), while the EPSM path uses short-time envelope power SNRs. A systematic comparison of existing SI models for several spectro-temporal manipulations of speech maskers and gender combinations of target and masker speakers [Schubotz et al. (2016). J. Acoust. Soc. Am. 140, 524–540] showed the importance of short-time power features. Conversely, Jørgensen et al. [(2013). J. Acoust. Soc. Am. 134, 436–446] demonstrated a higher predictive power of short-time envelope power SNRs than power SNRs using reverberation and spectral subtraction. Here the GPSM was extended to utilize short-time power SNRs and was shown to account for all psychoacoustic and SI data of the three mentioned studies. The best processing strategy was to exclusively use either power or envelope-power SNRs, depending on the experimental task. By analyzing both domains, the suggested model might provide a useful tool for clarifying the contribution of amplitude modulation masking and energetic masking.
Skip Nav Destination
Article navigation
August 2017
August 23 2017
The role of short-time intensity and envelope power for speech intelligibility and psychoacoustic masking Available to Purchase
Thomas Biberger;
Thomas Biberger
a)
Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg
, 26111 Oldenburg, Germany
Search for other works by this author on:
Stephan D. Ewert
Stephan D. Ewert
Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg
, 26111 Oldenburg, Germany
Search for other works by this author on:
Thomas Biberger
a)
Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg
, 26111 Oldenburg, Germany
Stephan D. Ewert
Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg
, 26111 Oldenburg, Germany
a)
Electronic mail: [email protected]
J. Acoust. Soc. Am. 142, 1098–1111 (2017)
Article history
Received:
March 07 2017
Accepted:
July 31 2017
Citation
Thomas Biberger, Stephan D. Ewert; The role of short-time intensity and envelope power for speech intelligibility and psychoacoustic masking. J. Acoust. Soc. Am. 1 August 2017; 142 (2): 1098–1111. https://doi.org/10.1121/1.4999059
Download citation file:
Pay-Per-View Access
$40.00
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
Citing articles via
Focality of sound source placement by higher (ninth) order ambisonics and perceptual effects of spectral reproduction errors
Nima Zargarnezhad, Bruno Mesquita, et al.
A survey of sound source localization with deep learning methods
Pierre-Amaury Grumiaux, Srđan Kitić, et al.
Variation in global and intonational pitch settings among black and white speakers of Southern American English
Aini Li, Ruaridh Purse, et al.
Related Content
Envelope and intensity based prediction of psychoacoustic masking and speech intelligibility
J. Acoust. Soc. Am. (August 2016)
Monaural speech intelligibility and detection in maskers with varying amounts of spectro-temporal speech features
J. Acoust. Soc. Am. (July 2016)
The Extended Speech Transmission Index: Predicting speech intelligibility in fluctuating noise and reverberant rooms
J. Acoust. Soc. Am. (March 2019)
The effect of room acoustical parameters on speech reception thresholds and spatial release from masking
J. Acoust. Soc. Am. (October 2019)
A model of speech recognition for hearing-impaired listeners based on deep learning
J. Acoust. Soc. Am. (March 2022)