Minimum mean-square error (MMSE) approaches to speech enhancement are widely used in the literature. The quality of enhanced speech produced by an MMSE approach is directly impacted by the accuracy of the employed a priori signal-to-noise ratio (SNR) estimator. In this paper, the a priori SNR estimate spectral distortion (SD) level that results in a just-noticeable difference (JND) in the perceived quality of MMSE approach enhanced speech is found. The JND SD level is indicative of the accuracy that an a priori SNR estimator must exceed to have no impact on the perceived quality of MMSE approach enhanced speech. To measure the JND SD level, listening tests are conducted across five SNR levels, five noise sources, and two MMSE approaches [the MMSE short-time spectral amplitude (MMSE-STSA) estimator and the Wiener filter]. A statistical analysis of the results indicates that the JND SD level increases with the SNR level, is higher for the MMSE-STSA estimator, and is not impacted by the type of background noise. Following the literature, a significant improvement in a priori SNR estimation accuracy is required to reach the JND SD level.
Skip Nav Destination
Article navigation
October 2020
October 07 2020
Spectral distortion level resulting in a just-noticeable difference between an a priori signal-to-noise ratio estimate and its instantaneous case
Aaron Nicolson
;
Aaron Nicolson
a)
Signal Processing Laboratory, Griffith University
, Brisbane, Queensland 4111, Australia
Search for other works by this author on:
Kuldip K. Paliwal
Kuldip K. Paliwal
b)
Signal Processing Laboratory, Griffith University
, Brisbane, Queensland 4111, Australia
Search for other works by this author on:
a)
Author to whom correspondence should be addressed: [email protected], ORCID: 0000-0002-7163-1809.
b)
ORCID: 0000-0002-3553-3662.
J. Acoust. Soc. Am. 148, 1879–1889 (2020)
Article history
Received:
March 02 2020
Accepted:
September 15 2020
Citation
Aaron Nicolson, Kuldip K. Paliwal; Spectral distortion level resulting in a just-noticeable difference between an a priori signal-to-noise ratio estimate and its instantaneous case. J. Acoust. Soc. Am. 1 October 2020; 148 (4): 1879–1889. https://doi.org/10.1121/10.0002113
Download citation file:
Pay-Per-View Access
$40.00
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
Citing articles via
A survey of sound source localization with deep learning methods
Pierre-Amaury Grumiaux, Srđan Kitić, et al.
Variation in global and intonational pitch settings among black and white speakers of Southern American English
Aini Li, Ruaridh Purse, et al.
Related Content
Evaluation of monaural and binaural speech enhancement for robust auditory‐based automatic speech recognition
J Acoust Soc Am (February 1999)
On training targets for deep learning approaches to clean speech magnitude spectrum estimation
J. Acoust. Soc. Am. (May 2021)
Speech enhancement via two-stage dual tree complex wavelet packet transform with a speech presence probability estimator
J. Acoust. Soc. Am. (February 2017)
Optimal design of minimum mean-square error noise reduction algorithms using the simulated annealing technique
J. Acoust. Soc. Am. (February 2009)