Human auditory perception and speech intelligibility have been successfully described based on the two concepts of spectral masking and amplitude modulation (AM) masking. The power-spectrum model (PSM) [Patterson and Moore (1986). Frequency Selectivity in Hearing, pp. 123–177] accounts for effects of spectral masking and critical bandwidth, while the envelope power-spectrum model (EPSM) [Ewert and Dau (2000). J. Acoust. Soc. Am. 108, 1181–1196] has been successfully applied to AM masking and discrimination. Both models extract the long-term (envelope) power to calculate signal-to-noise ratios (SNR). Recently, the EPSM has been applied to speech intelligibility (SI) considering the short-term envelope SNR on various time scales (multi-resolution speech-based envelope power-spectrum model; mr-sEPSM) to account for SI in fluctuating noise [Jørgensen, Ewert, and Dau (2013). J. Acoust. Soc. Am. 134, 436–446]. Here, a generalized auditory model is suggested combining the classical PSM and the mr-sEPSM to jointly account for psychoacoustics and speech intelligibility. The model was extended to consider the local AM depth in conditions with slowly varying signal levels, and the relative role of long-term and short-term SNR was assessed. The suggested generalized power-spectrum model is shown to account for a large variety of psychoacoustic data and to predict speech intelligibility in various types of background noise.
Skip Nav Destination
Article navigation
August 2016
August 15 2016
Envelope and intensity based prediction of psychoacoustic masking and speech intelligibility
Thomas Biberger;
Thomas Biberger
a)
Medizinische Physik and Cluster of Excellence Hearing4all,
Universität Oldenburg
, 26111 Oldenburg, Germany
Search for other works by this author on:
Stephan D. Ewert
Stephan D. Ewert
Medizinische Physik and Cluster of Excellence Hearing4all,
Universität Oldenburg
, 26111 Oldenburg, Germany
Search for other works by this author on:
a)
Electronic mail: thomas.biberger@uni-oldenburg.de
J. Acoust. Soc. Am. 140, 1023–1038 (2016)
Article history
Received:
August 23 2015
Accepted:
July 19 2016
Citation
Thomas Biberger, Stephan D. Ewert; Envelope and intensity based prediction of psychoacoustic masking and speech intelligibility. J. Acoust. Soc. Am. 1 August 2016; 140 (2): 1023–1038. https://doi.org/10.1121/1.4960574
Download citation file:
Sign in
Don't already have an account? Register
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
Pay-Per-View Access
$40.00
Citing articles via
A survey of sound source localization with deep learning methods
Pierre-Amaury Grumiaux, Srđan Kitić, et al.
Short-time coherence between repeated room impulse response measurements
Karolina Prawda, Sebastian J. Schlecht, et al.
Efficient design of complex-valued neural networks with application to the classification of transient acoustic signals
Vlad S. Paul, Philip A. Nelson