Minimum mean-square error (MMSE) approaches to speech enhancement are widely used in the literature. The quality of enhanced speech produced by an MMSE approach is directly impacted by the accuracy of the employed a priori signal-to-noise ratio (SNR) estimator. In this paper, the a priori SNR estimate spectral distortion (SD) level that results in a just-noticeable difference (JND) in the perceived quality of MMSE approach enhanced speech is found. The JND SD level is indicative of the accuracy that an a priori SNR estimator must exceed to have no impact on the perceived quality of MMSE approach enhanced speech. To measure the JND SD level, listening tests are conducted across five SNR levels, five noise sources, and two MMSE approaches [the MMSE short-time spectral amplitude (MMSE-STSA) estimator and the Wiener filter]. A statistical analysis of the results indicates that the JND SD level increases with the SNR level, is higher for the MMSE-STSA estimator, and is not impacted by the type of background noise. Following the literature, a significant improvement in a priori SNR estimation accuracy is required to reach the JND SD level.

1.
Agus
,
N.
,
Anderson
,
H.
,
Chen
,
J.-M.
,
Lui
,
S.
, and
Herremans
,
D.
(
2018
). “
Perceptual evaluation of measures of spectral variance
,”
J. Acoust. Soc. Am.
143
(
6
),
3300
3311
.
2.
Alkahtani
,
F.
(
2019
). “
Acoustic manifestations of ‘narrow focus’ in Apurímac Quechua vowels
,”
J. Acoust. Soc. Am.
146
(
4
),
3008
.
3.
Allen
,
J.
(
1977
). “
Short term spectral analysis, synthesis, and modification by discrete Fourier transform
,”
IEEE Trans. Acoust. Speech Sign. Process.
25
(
3
),
235
238
.
4.
Allen
,
J. B.
, and
Rabiner
,
L. R.
(
1977
). “
A unified approach to short-time Fourier analysis and synthesis
,”
Proc. IEEE
65
(
11
),
1558
1564
.
5.
Booth
,
D.
, and
Freeman
,
R.
(
1993
). “
Discriminative feature integration by individuals
,”
Acta Psychol.
84
(
1
),
1
16
.
6.
Boucher
,
M. A.
,
Rychtarikova
,
M.
,
Zelem
,
L.
,
Pluymers
,
B.
, and
Desmet
,
W.
(
2019
). “
Reverberation time and audibility in phased geometrical acoustics using plane or spherical wave reflection coefficients
,”
J. Acoust. Soc. Am.
145
(
4
),
2681
2690
.
7.
Breithaupt
,
C.
,
Gerkmann
,
T.
, and
Martin
,
R.
(
2008
). “
A novel a priori SNR estimation approach based on selective cepstro-temporal smoothing
,” in
2008 IEEE International Conference on Acoustics, Speech and Signal Processing
, pp.
4897
4900
.
8.
Buck
,
A.
,
Blevins
,
M. G.
,
Wang
,
L. M.
, and
Peng
,
Z.
(
2012
). “
Measurements of the just noticeable difference for reverberation time using a transformed up–down adaptive method
,”
J. Acoust. Soc. Am.
132
(
3
),
2060
.
9.
Chappel
,
R.
,
Schwerin
,
B.
, and
Paliwal
,
K.
(
2016
). “
Phase distortion resulting in a just noticeable difference in the perceived quality of speech
,”
Speech Commun.
81
,
138
147
.
10.
Crochiere
,
R.
(
1980
). “
A weighted overlap-add method of short-time Fourier analysis/synthesis
,”
IEEE Trans. Acoust. Speech Sign. Process.
28
(
1
),
99
102
.
11.
Ephraim
,
Y.
, and
Malah
,
D.
(
1984
). “
Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator
,”
IEEE Trans. Acoust. Speech Sign. Process.
32
(
6
),
1109
1121
.
12.
Ephraim
,
Y.
, and
Malah
,
D.
(
1985
). “
Speech enhancement using a minimum mean-square error log-spectral amplitude estimator
,”
IEEE Trans. Acoust. Speech Sign. Process.
33
(
2
),
443
445
.
13.
Gerkmann
,
T.
, and
Hendriks
,
R. C.
(
2012
). “
Unbiased MMSE-based noise power estimation with low complexity and low tracking delay
,”
IEEE Trans. Audio Speech Lang. Process.
20
(
4
),
1383
1393
.
14.
Griffin
,
D.
, and
Jae Lim
(
1984
). “
Signal estimation from modified short-time Fourier transform
,”
IEEE Trans. Acoust. Speech Sign. Process.
32
(
2
),
236
243
.
15.
Kabal
,
P.
(
2002
). “
TSP speech database
,” technical report.
16.
Levitt
,
H.
(
1971
). “
Transformed up–down methods in psychoacoustics
,”
J. Acoust. Soc. Am.
49
(
2B
),
467
477
.
17.
Lim
,
J. S.
, and
Oppenheim
,
A. V.
(
1979
). “
Enhancement and bandwidth compression of noisy speech
,”
Proc. IEEE
67
(
12
),
1586
1604
.
18.
Loizou
,
P. C.
(
2013
).
Speech Enhancement: Theory and Practice
, 2nd ed. (
CRC Press
,
Boca Raton, FL, USA
).
19.
Martin
,
R.
(
2002
). “
Speech enhancement using MMSE short time spectral estimation with gamma distributed speech priors
,” in
2002 IEEE International Conference on Acoustics, Speech, and Signal Processing
, Vol.
1
, pp.
I-253
I-256
.
20.
McAulay
,
R.
, and
Malpass
,
M.
(
1980
). “
Speech enhancement using a soft-decision noise suppression filter
,”
IEEE Trans. Acoust. Speech Sign. Process.
28
(
2
),
137
145
.
21.
Morioka
,
C.
,
Kurashima
,
A.
, and
Takahashi
,
A.
(
2005
). “
Proposal on objective speech quality assessment for wideband IP telephony
,” in
Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005 (ICASSP '05)
, Vol.
1
, pp.
49
52
.
22.
Nadiroh
,
A.
, and
Arifianto
,
D.
(
2018
). “
Just noticeable difference of masker to enhance privacy in an open-plan office
,”
J. Acoust. Soc. Am.
144
(
3
),
1661
.
23.
Neal
,
B.
,
Mittal
,
S.
,
Baratin
,
A.
,
Tantia
,
V.
,
Scicluna
,
M.
,
Lacoste-Julien
,
S.
, and
Mitliagkas
,
I.
(
2018
). “
A modern take on the bias-variance tradeoff in neural networks
,” arXiv:1810.08591 [cs.LG].
24.
Nicolson
,
A.
(
2020a
). “
Test set from 10.1016/j.specom.2019.06.002
,” IEEE Dataport, .
25.
Nicolson
,
A.
(
2020b
). “
Deep Xi: A deep learning approach to a priori SNR estimation
,” https://github.com/anicolson/DeepXi.
26.
Nicolson
,
A.
, and
Paliwal
,
K. K.
(
2019
). “
Deep learning for minimum mean-square error approaches to speech enhancement
,”
Speech Commun.
111
,
44
55
.
27.
Nikzad
,
M.
,
Nicolson
,
A.
,
Gao
,
Y.
,
Zhou
,
J.
,
Paliwal
,
K. K.
, and
Shang
,
F.
(
2020
). “
Deep residual-dense lattice network for speech enhancement
,” in
AAAI Conference on Artificial Intelligence
.
28.
Paliwal
,
K. K.
, and
Atal
,
B. S.
(
1993
). “
Efficient vector quantization of LPC parameters at 24 bits/frame
,”
IEEE Trans. Speech Audio Process.
1
(
1
),
3
14
.
29.
Plapous
,
C.
,
Marro
,
C.
,
Mauuary
,
L.
, and
Scalart
,
P.
(
2004
). “
A two-step noise reduction technique
,” in
2004 IEEE International Conference on Acoustics, Speech, and Signal Processing
, Vol.
1
, pp.
289
292
.
30.
Plapous
,
C.
,
Marro
,
C.
, and
Scalart
,
P.
(
2005
). “
Speech enhancement using harmonic regeneration
,” in
Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005 (ICASSP '05)
, Vol.
1
, pp.
157
160
.
31.
Porter
,
J.
, and
Boll
,
S.
(
1984
). “
Optimal estimators for spectral restoration of noisy speech
,” in
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP '84
, Vol.
9
, pp.
53
56
.
32.
Rix
,
A. W.
,
Beerends
,
J. G.
,
Hollier
,
M. P.
, and
Hekstra
,
A. P.
(
2001
). “
Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs
,” in
Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing
(Cat. No. 01CH37221), Vol.
2
, pp.
749
752
.
33.
Roy
,
S. K.
,
Nicolson
,
A.
, and
Paliwal
,
K. K.
(
2020a
). “
A deep learning-based Kalman filter for speech enhancement
,” in
Proceedings of Interspeech 2020
.
34.
Roy
,
S. K.
,
Nicolson
,
A.
, and
Paliwal
,
K. K.
(
2020b
). “
Deep learning with augmented Kalman filter for single-channel speech enhancement
,” in
2020 IEEE International Symposium on Circuits and Systems (ISCAS)
.
35.
Salamon
,
J.
,
Jacoby
,
C.
, and
Bello
,
J. P.
(
2014
). “
A dataset and taxonomy for urban sound research
,” in
Proceedings of the 22nd ACM International Conference on Multimedia, MM'14
, Association for Computing Machinery, New York, pp.
1041
1044
.
36.
Steeneken
,
H. J.
, and
Geurtsen
,
F. W.
(
1988
). “
Description of the RSG-10 noise database
,”
Report No. IZF 1988-3
, TNO Institute for Perception, Soesterberg, the Netherlands.
37.
Taal
,
C. H.
,
Hendriks
,
R. C.
,
Heusdens
,
R.
, and
Jensen
,
J.
(
2011
). “
An algorithm for intelligibility prediction of time-frequency weighted noisy speech
,”
IEEE Trans. Audio Speech Lang. Process.
19
(
7
),
2125
2136
.
38.
Vary
,
P.
, and
Martin
,
R.
(
2006
).
Digital Speech Transmission: Enhancement, Coding and Error Concealment
(
Wiley
,
Hoboken, NJ, USA
).
39.
Wetherill
,
G. B.
, and
Levitt
,
H.
(
1965
). “
Sequential estimation of points on a psychometric function
,”
Br. J. Math. Stat. Psychol.
18
(
1
),
1
10
.
40.
Wójcicki
,
K. K.
, and
Loizou
,
P. C.
(
2012
). “
Channel selection in the modulation domain for improved speech intelligibility in noise
,”
J. Acoust. Soc. Am.
131
(
4
),
2904
2913
.
41.
Zhang
,
Q.
,
Nicolson
,
A.
,
Wang
,
M.
,
Paliwal
,
K. K.
, and
Wang
,
C.
(
2020
). “
DeepMMSE: A deep learning approach to MMSE-based noise power spectral density estimation
,”
IEEE/ACM Trans. Audio Speech Lang. Process.
28
,
1404
1415
.
You do not currently have access to this content.