The effects on speech intelligibility and sound quality of two noise-reduction algorithms were compared: a deep recurrent neural network (RNN) and spectral subtraction (SS). The RNN was trained using sentences spoken by a large number of talkers with a variety of accents, presented in babble. Different talkers were used for testing. Participants with mild-to-moderate hearing loss were tested. Stimuli were given frequency-dependent linear amplification to compensate for the individual hearing losses. A paired-comparison procedure was used to compare all possible combinations of three conditions. The conditions were: speech in babble with no processing (NP) or processed using the RNN or SS. In each trial, the same sentence was played twice using two different conditions. The participants indicated which one was better and by how much in terms of speech intelligibility and (in separate blocks) sound quality. Processing using the RNN was significantly preferred over NP and over SS processing for both subjective intelligibility and sound quality, although the magnitude of the preferences was small. SS processing was not significantly preferred over NP for either subjective intelligibility or sound quality. Objective computational measures of speech intelligibility predicted better intelligibility for RNN than for SS or NP.

1.
Abadi
,
M.
,
Agarwal
,
A.
,
Barham
,
P.
,
Brevdo
,
E.
,
Chen
,
Z.
,
Citro
,
C.
,
Corrado
,
G. S.
,
Davis
,
A.
,
Dean
,
J.
,
Devin
,
M.
,
Ghemawat
,
S.
,
Goodfellow
,
I.
,
Harp
,
A.
,
Irving
,
G.
,
Isard
,
M.
,
Jia
,
Y.
,
Jozefowicz
,
R.
,
Kaiser
,
L.
,
Kudlur
,
M.
,
Levenberg
,
J.
,
Mane
,
D.
,
Monga
,
R.
,
Moore
,
S.
,
Murray
,
D.
,
Olah
,
C.
,
Schuster
,
M.
,
Shlens
,
J.
,
Steiner
,
B.
,
Sutskever
,
I.
,
Talwar
,
K.
,
Tucker
,
P.
,
Vanhoucke
,
V.
,
Vasudevan
,
V.
,
Viegas
,
F.
,
Vinyals
,
O.
,
Warden
,
P.
,
Wattenberg
,
M.
,
Wicke
,
M.
,
Yu
,
Y.
, and
Zheng
,
X.
(
2016
). “
TensorFlow: Large-scale machine learning on heterogeneous distributed systems
,” arXiv:1603.004467.
2.
Alcántara
,
J. I.
,
Moore
,
B. C. J.
,
Kühnel
,
V.
, and
Launer
,
S.
(
2003
). “
Evaluation of the noise reduction system in a commercial digital hearing aid
,”
Int. J. Audiol.
42
,
34
42
.
3.
Allen
,
J. B.
(
1977
). “
Short term spectral analysis, synthesis and modification by discrete Fourier transform
,”
IEEE Trans. Acoust. Speech Signal Process.
25
,
235
238
.
4.
Arehart
,
K. H.
,
Hansen
,
J. H.
,
Gallant
,
S.
, and
Kalstein
,
L.
(
2003
). “
Evaluation of an auditory masked threshold noise suppression algorithm in normal-hearing and hearing-impaired listeners
,”
Speech Commun.
40
,
575
592
.
5.
Bentler
,
R.
,
Wu
,
Y. H.
,
Kettel
,
J.
, and
Hurtig
,
R.
(
2008
). “
Digital noise reduction: Outcomes from laboratory and field studies
,”
Int. J. Audiol.
47
,
447
460
.
6.
Bolner
,
F.
,
Goehring
,
T.
,
Monaghan
,
J. J.
,
Van Dijk
,
B.
,
Wouters
,
J.
, and
Bleeck
,
S.
(
2016
). “
Speech enhancement based on neural networks applied to cochlear implant coding strategies
,” in
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
,
IEEE, Shanghai, China
, pp.
6520
6524
.
7.
Bramsløw
,
L.
,
Naithani
,
G.
,
Hafez
,
A.
,
Barker
,
T.
,
Pontoppidan
,
N. H.
, and
Virtanen
,
T.
(
2018
). “
Improving competing voices segregation for hearing impaired listeners using a low-latency deep neural network algorithm
,”
J. Acoust. Soc. Am.
144
,
172
185
.
8.
Brons
,
I.
,
Houben
,
R.
, and
Dreschler
,
W. A.
(
2012
). “
Perceptual effects of noise reduction by time-frequency masking of noisy speech
,”
J. Acoust. Soc. Am.
132
,
2690
2699
.
9.
Chen
,
J.
, and
Wang
,
D.
(
2017
). “
Long short-term memory for speaker generalization in supervised speech separation
,”
J. Acoust. Soc. Am.
141
,
4705
4714
.
10.
Chen
,
J.
,
Wang
,
Y.
,
Yoho
,
S. E.
,
Wang
,
D.
, and
Healy
,
E. W.
(
2016
). “
Large-scale training to increase speech intelligibility for hearing-impaired listeners in novel noises
,”
J. Acoust. Soc. Am.
139
,
2604
2612
.
11.
Cienkowski
,
K. M.
, and
Speaks
,
C.
(
2000
). “
Subjective vs. objective intelligibility of sentences in listeners with hearing loss
,”
J. Speech Lang. Hear. Res.
43
,
1205
1210
.
12.
Delfarah
,
M.
, and
Wang
,
D. L.
(
2017
). “
Features for masking-based monaural speech separation in reverberant conditions
,”
IEEE Trans. Audio, Speech Lang. Proc.
25
,
1085
1094
.
13.
Elberling
,
C.
,
Ludvigsen
,
C.
, and
Keidser
,
G.
(
1993
). “
The design and testing of a noise reduction algorithm based on spectral subtraction
,”
Scand. Audiol.
Suppl. 38
,
39
49
.
14.
Festen
,
J. M.
, and
Plomp
,
R.
(
1990
). “
Effects of fluctuating noise and interfering speech on the speech-reception threshold for impaired and normal hearing
,”
J. Acoust. Soc. Am.
88
,
1725
1736
.
15.
Gerkmann
,
T.
, and
Hendriks
,
R. C.
(
2013
). “
Unbiased MMSE-based noise power estimation with low complexity and low tracking delay
,”
IEEE Trans. Audio, Speech Lang. Process.
20
,
1383
1393
.
16.
Glasberg
,
B. R.
, and
Moore
,
B. C. J.
(
1990
). “
Derivation of auditory filter shapes from notched-noise data
,”
Hear. Res.
47
,
103
138
.
17.
Goehring
,
T.
,
Bolner
,
F.
,
Monaghan
,
J. J.
,
van Dijk
,
B.
,
Zarowski
,
A.
, and
Bleeck
,
S.
(
2017
). “
Speech enhancement based on neural networks improves speech intelligibility in noise for cochlear implant users
,”
Hear. Res.
344
,
183
194
.
18.
Goehring
,
T.
,
Chapman
,
J. L.
,
Bleeck
,
S.
, and
Monaghan
,
J. J. M.
(
2018
). “
Tolerable delay for speech production and perception: Effects of hearing ability and experience with hearing aids
,”
Int. J. Audiol.
57
,
61
68
.
19.
Graves
,
A.
,
Mohamed
,
A. R.
, and
Hinton
,
G.
(
2013
). “
Speech recognition with deep recurrent neural networks
,” in
IEEE International Conference on Acoustics, Speech and Signal Processing
,
IEEE
, pp.
6645
6649
.
20.
Green
,
D. M.
, and
Swets
,
J. A.
(
1974
).
Signal Detection Theory and Psychophysics
(
Krieger
,
New York
), p.
479
.
21.
Hamacher
,
V.
,
Chalupper
,
J.
,
Eggers
,
J.
,
Fischer
,
E.
,
Kornagel
,
U.
,
Puder
,
H.
, and
Rass
,
U.
(
2005
). “
Signal processing in high-end hearing aids: State of the art, challenges, and future trends
,”
EURASIP J. Appl. Signal Process.
18
,
2915
2929
.
22.
Hawkins
,
D. B.
, and
Yacullo
,
W. S.
(
1984
). “
Signal-to-noise ratio advantage of binaural hearing aids and directional microphones under different levels of reverberation
,”
J. Speech Hear. Disord.
49
,
278
286
.
23.
Healy
,
E. W.
,
Delfarah
,
M.
,
Vasko
,
J. L.
,
Carter
,
B. L.
, and
Wang
,
D.
(
2017
). “
An algorithm to increase intelligibility for hearing-impaired listeners in the presence of a competing talker
,”
J. Acoust. Soc. Am.
141
,
4230
4239
.
24.
Healy
,
E. W.
,
Yoho
,
S. E.
,
Chen
,
J.
,
Wang
,
Y.
, and
Wang
,
D.
(
2015
). “
An algorithm to increase speech intelligibility for hearing-impaired listeners in novel segments of the same noise type
,”
J. Acoust. Soc. Am.
138
,
1660
1669
.
25.
Healy
,
E. W.
,
Yoho
,
S. E.
,
Wang
,
Y.
, and
Wang
,
D.
(
2013
). “
An algorithm to improve speech recognition in noise for hearing-impaired listeners
,”
J. Acoust. Soc. Am.
134
,
3029
3038
.
26.
Hinton
,
G. E.
,
Osindero
,
S.
, and
Teh
,
Y. W.
(
2006
). “
A fast learning algorithm for deep belief nets
,”
Neur. Comput.
18
,
1527
1554
.
27.
Hochreiter
,
S.
, and
Schmidhuber
,
J.
(
1997
). “
Long short-term memory
,”
Neur. Comput.
9
,
1735
1780
.
28.
Hu
,
Y.
, and
Loizou
,
P. C.
(
2007a
). “
A comparative intelligibility study of single-microphone noise reduction algorithms
,”
J. Acoust. Soc. Am.
122
,
1777
1786
.
29.
Hu
,
Y.
, and
Loizou
,
P. C.
(
2007b
). “
Subjective comparison and evaluation of speech enhancement algorithms
,”
Speech Commun.
49
,
588
601
.
30.
Huang
,
P. S.
,
Kim
,
M.
,
Hasegawa-Johnson
,
M.
, and
Smaragdis
,
P.
(
2015
). “
Joint optimization of masks and deep recurrent neural networks for monaural source separation
,”
IEEE Trans. Audio, Speech Lang. Proc.
23
,
2136
2147
.
31.
Itakura
,
F.
, and
Saito
,
S.
(
1968
). “
Analysis synthesis telephony based on the maximum likelihood method
,” in
International Congress on Acoustics
,
ICA, Tokyo, Japan
, pp.
C17
C20
.
32.
Jamieson
,
D. G.
,
Brennan
,
R. L.
, and
Cornelisse
,
L. E.
(
1995
). “
Evaluation of a speech enhancement strategy with normal-hearing and hearing-impaired listeners
,”
Ear Hear.
16
,
274
286
.
33.
Jørgensen
,
S.
,
Ewert
,
S. D.
, and
Dau
,
T.
(
2013
). “
A multi-resolution envelope-power based model for speech intelligibility
,”
J. Acoust. Soc. Am.
134
,
436
446
.
34.
Kamath
,
S.
, and
Loizou
,
P. C.
(
2002
). “
A multi-band spectral subtraction method for enhancing speech corrupted by colored noise
,” in
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
,
IEEE, Orlando, FL
, pp.
1
4
.
35.
Kates
,
J. M.
(
1987
). “
The short-time articulation index
,”
J. Rehabil. Res. Dev.
24
,
271
276
.
36.
Kates
,
J. M.
, and
Arehart
,
K. H.
(
2014
). “
The hearing-aid speech quality index (HASQI) version 2
,”
J. Audio Eng. Soc.
62
,
99
117
.
37.
Keshavarzi
,
M.
,
Goehring
,
T.
,
Zakis
,
J.
,
Turner
,
R. E.
, and
Moore
,
B. C. J.
(
2018
). “
Use of a deep recurrent neural network to reduce wind noise: Effects on judged speech intelligibility and sound quality
,”
Trends Hear.
22
,
1
12
.
38.
Kim
,
G.
,
Lu
,
Y.
,
Hu
,
Y.
, and
Loizou
,
P. C.
(
2009
). “
An algorithm that improves speech intelligibility in noise for normal-hearing listeners
,”
J. Acoust. Soc. Am.
126
,
1486
1494
.
39.
Kolbæk
,
M.
,
Yu
,
D.
,
Tan
,
Z. H.
, and
Jensen
,
J.
(
2017
). “
Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks
,”
IEEE Trans. Audio, Speech Lang. Proc.
25
,
1901
1913
.
40.
Krawczyk
,
M.
, and
Gerkmann
,
T.
(
2014
). “
STFT phase reconstruction in voiced speech for an improved single-channel speech enhancement
,”
IEEE/ACM Trans. Audio, Speech, Lang. Process.
22
,
1931
1940
.
41.
Launer
,
S.
,
Zakis
,
J.
, and
Moore
,
B. C. J.
(
2016
). “
Hearing aid signal processing
,” in
Hearing Aids
, edited by
G. R.
Popelka
,
B. C. J.
Moore
,
A. N.
Popper
, and
R. R.
Fay
(
Springer
,
New York
), pp.
93
130
.
42.
Levitt
,
H.
,
Bakke
,
M.
,
Kates
,
J.
,
Neuman
,
A.
,
Schwander
,
T.
, and
Weiss
,
M.
(
1993
). “
Signal processing for hearing impairment
,”
Scand. Audiol.
Suppl. 38
,
7
19
.
43.
Lipton
,
Z. C.
,
Berkowitz
,
J.
, and
Elkan
,
C.
(
2015
). “
A critical review of recurrent neural networks for sequence learning
,” arXiv:1506.00019v4.
44.
Loizou
,
P. C.
(
2007
).
Speech Enhancement: Theory and Practice
(
CRC Press
,
Boca Raton, LA
), p.
632
.
45.
Ma
,
J.
,
Hu
,
Y.
, and
Loizou
,
P. C.
(
2009
). “
Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions
,”
J. Acoust. Soc. Am.
125
,
3387
3405
.
46.
Monaghan
,
J. J.
,
Goehring
,
T.
,
Yang
,
X.
,
Bolner
,
F.
,
Wang
,
S.
,
Wright
,
M. C.
, and
Bleeck
,
S.
(
2017
). “
Auditory inspired machine learning techniques can improve speech intelligibility and quality for hearing-impaired listeners
,”
J. Acoust. Soc. Am.
141
,
1985
1998
.
47.
Moore
,
B. C. J.
(
2007
).
Cochlear Hearing Loss: Physiological, Psychological and Technical Issues
, 2nd ed. (
Wiley
,
Chichester, UK
), p.
332
.
48.
Moore
,
B. C. J.
,
Baer
,
T.
,
Ives
,
D. T.
,
Marriage
,
J.
, and
Salorio-Corbetto
,
M.
(
2016
). “
Effects of modified hearing-aid fittings on loudness and tone quality for different acoustic scenes
,”
Ear Hear.
37
,
483
491
.
49.
Moore
,
B. C. J.
, and
Glasberg
,
B. R.
(
1998
). “
Use of a loudness model for hearing aid fitting. I. Linear hearing aids
,”
Br. J. Audiol.
32
,
317
335
.
50.
Moore
,
B. C. J.
, and
Sek
,
A.
(
2013
). “
Comparison of the CAM2 and NAL-NL2 hearing-aid fitting methods
,”
Ear Hear.
34
,
83
95
.
51.
Moore
,
B. C. J.
, and
Sek
,
A.
(
2016
). “
Comparison of the CAM2A and NAL-NL2 hearing-aid fitting methods for participants with a wide range of hearing losses
,”
Int. J. Audiol.
55
,
93
100
.
52.
Natarajan
,
A.
,
Hansen
,
J. H.
,
Arehart
,
K. H.
, and
Rossi-Katz
,
J.
(
2005
). “
An auditory-masking-threshold-based noise suppression algorithm GMMSE-AMT[ERB] for listeners with sensorineural hearing loss
,”
EURASIP J. Adv. Signal Process.
18
,
2938
2953
.
53.
Patterson
,
R. D.
,
Allerhand
,
M. H.
, and
Giguère
,
C.
(
1995
). “
Time-domain modeling of peripheral auditory processing: A modular architecture and a software platform
,”
J. Acoust. Soc. Am.
98
,
1890
1894
.
54.
Peters
,
R. W.
,
Moore
,
B. C. J.
, and
Baer
,
T.
(
1998
). “
Speech reception thresholds in noise with and without spectral and temporal dips for hearing-impaired and normally hearing people
,”
J. Acoust. Soc. Am.
103
,
577
587
.
55.
Picou
,
E. M.
,
Aspell
,
E.
, and
Ricketts
,
T. A.
(
2014
). “
Potential benefits and limitations of three types of directional processing in hearing aids
,”
Ear Hear.
35
,
339
352
.
56.
Picou
,
E. M.
, and
Ricketts
,
T. A.
(
2017
). “
How directional microphones affect speech recognition, listening effort and localisation for listeners with moderate-to-severe hearing loss
,”
Int. J. Audiol.
56
,
909
918
.
57.
Plomp
,
R.
(
1978
). “
Auditory handicap of hearing impairment and the limited benefit of hearing aids
,”
J. Acoust. Soc. Am.
63
,
533
549
.
58.
Poulton
,
E. C.
(
1979
). “
Models for the biases in judging sensory magnitude
,”
Psychol. Bull.
86
,
777
803
.
59.
Relano-Iborra
,
H.
,
May
,
T.
,
Zaar
,
J.
,
Scheidiger
,
C.
, and
Dau
,
T.
(
2016
). “
Predicting speech intelligibility based on a correlation metric in the envelope power spectrum domain
,”
J. Acoust. Soc. Am.
140
,
2670
2679
.
60.
Riedmiller
,
M.
, and
Braun
,
H.
(
1993
). “
A direct adaptive method for faster backpropagation learning: The RPROP algorithm
,” in
IEEE Conference on Neural Networks
, pp.
586
591
.
61.
Sak
,
H.
,
Senior
,
A.
, and
Beaufays
,
F.
(
2014
). “
Long short-term memory recurrent neural network architectures for large scale acoustic modeling
,” in
Interspeech 2014
, pp.
338
342
.
62.
Stone
,
M. A.
, and
Moore
,
B. C. J.
(
1999
). “
Tolerable hearing-aid delays. I. Estimation of limits imposed by the auditory path alone using simulated hearing losses
,”
Ear Hear.
20
,
182
192
.
63.
Stone
,
M. A.
, and
Moore
,
B. C. J.
(
2002
). “
Tolerable hearing-aid delays. II. Estimation of limits imposed during speech production
,”
Ear Hear.
23
,
325
338
.
64.
Stone
,
M. A.
, and
Moore
,
B. C. J.
(
2005
). “
Tolerable hearing-aid delays: IV. Effects on subjective disturbance during speech production by hearing-impaired subjects
,”
Ear Hear.
26
,
225
235
.
65.
Taal
,
C.
,
Hendriks
,
R.
,
Heusdens
,
R.
, and
Jensen
,
J.
(
2011
). “
An algorithm for intelligibility prediction of time-frequency weighted noisy speech
,”
IEEE Trans. Audio, Speech Lang. Proc.
19
,
2125
2136
.
66.
Tang
,
Y.
(
2016
). “
TF. Learn: TensorFlow's high-level module for distributed machine learning
,” arXiv:1612.04251.
67.
Wang
,
Y.
,
Narayanan
,
A.
, and
Wang
,
D.
(
2014
). “
On training targets for supervised speech separation
,”
IEEE Trans. Audio, Speech Lang. Proc.
22
,
1849
1858
.
68.
Weninger
,
F.
,
Erdogan
,
H.
,
Watanabe
,
S.
,
Vincent
,
E.
,
Le Roux
,
J.
,
Hershey
,
J. R.
, and
Schuller
,
B.
(
2015
). “
Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR
,” in
International Conference on Latent Variable Analysis and Signal Separation
,
Springer, Liberec, Czech Republic
.
69.
Williamson
,
D. S.
,
Wang
,
Y.
, and
Wang
,
D.
(
2016
). “
Complex ratio masking for monaural speech separation
,”
IEEE/ACM Trans. Audio, Speech, Lang. Process.
24
,
483
492
.
You do not currently have access to this content.