Hearing aid users are challenged in listening situations with noise and especially speech-on-speech situations with two or more competing voices. Specifically, the task of attending to and segregating two competing voices is particularly hard, unlike for normal-hearing listeners, as shown in a small sub-experiment. In the main experiment, the competing voices benefit of a deep neural network (DNN) based stream segregation enhancement algorithm was tested on hearing-impaired listeners. A mixture of two voices was separated using a DNN and presented to the two ears as individual streams and tested for word score. Compared to the unseparated mixture, there was a 13%-point benefit from the separation, while attending to both voices. If only one output was selected as in a traditional target-masker scenario, a larger benefit of 37%-points was found. The results agreed well with objective metrics and show that for hearing-impaired listeners, DNNs have a large potential for improving stream segregation and speech intelligibility in difficult scenarios with two equally important targets without any prior selection of a primary target stream. An even higher benefit can be obtained if the user can select the preferred target via remote control.

1.
Bach
,
F. R.
, and
Jordan
,
M. I.
(
2005
). “
Blind one-microphone speech separation: A spectral learning approach
,”
Adv. Neural Inf. Process. Syst.
17
,
65
72
.
2.
Barker
,
T.
,
Virtanen
,
T.
, and
Pontoppidan
,
N. H.
(
2015
). “
Low-latency sound-source-separation using non-negative matrix factorisation with coupled analysis and synthesis dictionaries
,” in
2015 IEEE International Conference on Acoustics and Speech Signal Processes
, IEEE, pp.
241
245
.
3.
Bolia
,
R. S.
,
Nelson
,
W. T.
,
Ericson
,
M. A.
, and
Simpson
,
B. D.
(
2000
). “
A speech corpus for multitalker communications research
,”
J. Acoust. Soc. Am.
107
,
1065
1066
.
4.
Boll
,
S.
(
1979
). “
Suppression of acoustic noise in speech using spectral subtraction
,”
IEEE Trans. Acoust.
27
,
113
120
.
5.
Boureau
,
Y.-L.
,
Ponce
,
J.
, and
LeCun
,
Y.
(
2010
). “
A theoretical analysis of feature pooling in visual recognition
,” in
Proceedings of the 27th International Conference on Machine Learning
, pp.
111
118
.
6.
Bramsløw
,
L.
(
2010
). “
Preferred signal path delay and high-pass cut-off in open fittings
,”
Int. J. Audiol.
49
,
634
644
.
7.
Bramsløw
,
L.
,
Vatti
,
M.
,
Hietkamp
,
R. K.
, and
Pontoppidan
,
N. H.
(
2015
). “
Binaural speech recognition for normal-hearing and hearing-impaired listeners in a competing voice test
,” in
Speech Noise 2015
, Copenhagen.
8.
Brungart
,
D. S.
(
2001
). “
Informational and energetic masking effects in the perception of two simultaneous talkers
,”
J. Acoust. Soc. Am.
109
,
1101
1109
.
9.
Chandna
,
P.
,
Miron
,
M.
,
Janer
,
J.
, and
Gómez
,
E.
(
2017
). “
Monoaural audio source separation using deep convolutional neural networks
,” in
International Conference on Latent Variable Analysis and Signal Separation
(Springer, Berlin), pp.
258
266
.
10.
Cherry
,
E. C.
(
1953
). “
Some experiments on the recognition of speech, with one and with two ears
,”
J. Acoust. Soc. Am.
25
,
975
979
.
11.
Chollet
,
F.
(
2016
). Keras, GitHub, https://github.com/keras-team/keras/releases/tag/1.1.0 (Last viewed June 29, 2018).
12.
Dillon
,
H.
(
2012
).
Hearing Aids
, 2nd ed. (
Thieme
,
New York
).
13.
Erdogan
,
H.
,
Hershey
,
J. R.
,
Watanabe
,
S.
, and
Le Roux
,
J.
(
2015
). “
Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks
,” in
Proceedings of the 40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015
, pp.
708
712
.
14.
Ezzatian
,
P.
,
Li
,
L.
,
Pichora-Fuller
,
K.
, and
Schneider
,
B. A.
(
2015
). “
Delayed stream segregation in older adults
,”
Ear Hear.
36
,
482
484
.
15.
Goehring
,
T.
,
Yang
,
X.
,
Monaghan
,
J. J. M.
, and
Bleeck
,
S.
(
2016
). “
Speech enhancement for hearing-impaired listeners using deep neural networks with auditory-model based features
,” in
2016 24th European Signal Processing Conference
, IEEE, pp.
2300
2304
.
16.
Goodfellow
,
I.
,
Bengio
,
Y.
, and
Courville
,
A.
(
2016
).
Deep Learning
(
MIT Press
,
Cambridge, MA
).
17.
Grais
,
E. M.
,
Sen
,
M. U.
, and
Erdogan
,
H.
(
2014
). “
Deep neural networks for single channel source separation
,” in
2014 IEEE International Conference on Acoustics and Speech Signal Processing
, IEEE, pp.
3734
3738
.
18.
Han
,
K.
, and
Wang
,
D.
(
2012
). “
A classification based approach to speech segregation
,”
J. Acoust. Soc. Am.
132
,
3475
3483
.
19.
Hanson
,
B. A.
, and
Wong
,
D. Y.
(
1984
). “
The harmonic magnitude suppression (HMS) technique for intelligibility enhancement in the presence of interfering speech
,” in
IEEE International Conference on Acoustics, Speech, and Signal Processing
, pp.
195
199
.
20.
Healy
,
E. W.
,
Delfarah
,
M.
,
Vasko
,
J. L.
,
Carter
,
B. L.
, and
Wang
,
D.
(
2017
). “
An algorithm to increase intelligibility for hearing-impaired listeners in the presence of a competing talker
,”
J. Acoust. Soc. Am.
141
,
4230
4239
.
21.
Healy
,
E. W.
,
Yoho
,
S. E.
,
Chen
,
J.
,
Wang
,
Y.
, and
Wang
,
D.
(
2015
). “
An algorithm to increase speech intelligibility for hearing-impaired listeners in novel segments of the same noise type
,”
J. Acoust. Soc. Am.
138
,
1660
1669
.
22.
Healy
,
E. W.
,
Yoho
,
S. E.
,
Wang
,
Y.
,
Apoux
,
F.
, and
Wang
,
D.
(
2014
). “
Speech-cue transmission by an algorithm to increase consonant recognition in noise for hearing-impaired listeners
,”
J. Acoust. Soc. Am.
136
,
3325
3336
.
23.
Healy
,
E. W.
,
Yoho
,
S. E.
,
Wang
,
Y.
, and
Wang
,
D.
(
2013
). “
An algorithm to improve speech recognition in noise for hearing-impaired listeners
,”
J. Acoust. Soc. Am.
134
,
3029
3038
.
24.
Helfer
,
K. S.
,
Chevalier
,
J.
, and
Freyman
,
R. L.
(
2010
). “
Aging, spatial cues, and single-versus dual-task performance in competing speech perception
,”
J. Acoust. Soc. Am.
128
,
3625
3633
.
25.
Hochreiter
,
S.
, and
Schmidhuber
,
J.
(
1997
). “
Long short-term memory
,”
Neural Comput.
9
,
1735
1780
.
26.
Huang
,
P. Sen
,
Kim
,
M.
,
Hasegawa-Johnson
,
M.
, and
Smaragdis
,
P.
(
2015
). “
Joint optimization of masks and deep recurrent neural networks for monaural source separation
,”
IEEE/ACM Trans. Speech Lang. Process.
23
,
2136
2147
.
27.
Ihlefeld
,
A.
, and
Shinn-Cunningham
,
B.
(
2008
). “
Disentangling the effects of spatial cues on selection and formation of auditory objects
,”
J. Acoust. Soc. Am.
124
,
2224
2235
.
28.
Ioffe
,
S.
, and
Szegedy
,
C.
(
2015
). “
Batch normalization: Accelerating deep network training by reducing internal covariate shift
,” in
International Conference on Machine Learning
, pp.
448
456
.
29.
Isik
,
Y.
,
Roux
,
J. Le
,
Chen
,
Z.
,
Watanabe
,
S.
, and
Hershey
,
J. R.
(
2016
). “
Single-channel multi-speaker separation using deep clustering
,” in
Proceedings of INTERSPEECH, ISCA
, pp.
545
549
.
30.
Jang
,
G. J.
, and
Lee
,
T. W.
(
2004
). “
A maximum likelihood approach to single-channel source separation
,”
J. Mach. Learn. Res.
4
,
1365
1392
.
31.
Jensen
,
J.
, and
Taal
,
C. H.
(
2016
). “
An algorithm for predicting the intelligibility of speech masked by modulated noise maskers
,”
IEEE/ACM Trans. Audio Speech Lang. Process.
24
,
2009
2022
.
32.
Kidd
,
G.
,
Favrot
,
S.
,
Desloge
,
J. G.
,
Streeter
,
T. M.
, and
Mason
,
C. R.
(
2013
). “
Design and preliminary testing of a visually guided hearing aid
,”
J. Acoust. Soc. Am.
133
,
EL202
EL207
.
33.
Kingma
,
D. P.
, and
Ba
,
J. L.
(
2015
). “
Adam: A method for stochastic optimization
,” in
3rd International Conference on Learning Representations
, pp.
1
15
.
34.
Koelewijn
,
T.
,
Shinn-Cunningham
,
B. G.
,
Zekveld
,
A. A.
, and
Kramer
,
S. E.
(
2014
). “
The pupil response is sensitive to divided attention during speech processing
,”
Hear. Res.
312
,
114
120
.
35.
Kolbæk
,
M.
,
Tan
,
Z.-H.
, and
Jensen
,
J.
(
2017
). “
Speech intelligibility potential of general and specialized deep neural network based speech enhancement systems
,”
IEEE/ACM Trans. Audio Speech Lang. Process.
25
,
153
167
.
36.
Kollmeier
,
B.
, and
Wesselkamp
,
M.
(
1997
). “
Development and evaluation of a German sentence test for objective and subjective speech intelligibility assessment
,”
J. Acoust. Soc. Am.
102
,
2412
2421
.
37.
Kumar
,
A.
, and
Florencio
,
D.
(
2016
). “
Speech enhancement in multiple-noise conditions using deep neural networks
,” arXiv:1605.02427.
38.
Launer
,
S.
, and
Moore
,
B. C.
(
2003
). “
Use of a loudness model for hearing aid fitting V on-line gain control in a digital hearing aid
,”
Int. J. Audiol.
42
,
262
273
.
39.
Lu
,
X.
,
Tsao
,
Y.
,
Matsuda
,
S.
, and
Hori
,
C.
(
2013
). “
Speech enhancement based on deep denoising autoencoder
,” in
Interspeech
, pp.
436
440
.
40.
Lunner
,
T.
(
2003
). “
Cognitive function in relation to hearing aid use
,”
Int. J. Audiol.
42
,
S49
S58
.
41.
Luo
,
Y.
, and
Mesgarani
,
N.
(
2017
). “
TasNet: Time-domain audio separation network for real-time, single-channel speech separation
,” arXiv:1711.00541.
42.
Mackersie
,
C. L.
,
Prida
,
T. L.
, and
Stiles
,
D.
(
2001
). “
The role of sequential stream segregation and frequency selectivity in the perception of simultaneous sentences by listeners with sensorineural hearing loss
,”
J. Speech Lang. Hear. Res.
44
,
19
28
.
43.
McFee
,
B.
,
McVicar
,
M.
,
Nieto
,
O.
,
Balke
,
S.
,
Thome
,
C.
,
Liang
,
D.
,
Battenberg
,
E.
,
Moore
,
J.
,
Bittner
,
R.
,
Yamamoto
,
R.
,
Ellis
,
D.
,
Stoter
,
F.-R.
,
Repetto
,
D.
,
Waloschek
,
S.
,
Carr
,
C.
,
Kranzler
,
S.
,
Choi
,
K.
,
Viktorin
,
P.
,
Santos
,
J. F.
,
Holovaty
,
A.
,
Pimenta
,
W.
, and
Lee
,
H.
(
2017
). Librosa 0.5.0.
44.
Naithani
,
G.
,
Barker
,
T.
,
Parascandolo
,
G.
,
Bramsløw
,
L.
,
Pontoppidan
,
N. H.
, and
Virtanen
,
T.
(
2017
). “
Low-latency sound source separation using convolutional recurrent deep neural networks
,” in
2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
, IEEE, New Paltz, NY, pp.
1
5
.
45.
Naithani
,
G.
,
Parascandolo
,
G.
,
Barker
,
T.
,
Pontoppidan
,
N. H.
,
Virtanen
,
T.
,
Parascandolo
,
G.
,
Bramsløw
,
L.
,
Pontoppidan
,
N. H.
, and
Virtanen
,
T.
(
2016
). “
Low-latency sound source separation using deep neural networks
,” in
2016 IEEE Global Conference on Signal and Information Processing
, IEEE, pp.
272
276
.
46.
Naylor
,
J.
, and
Boll
,
S.
(
1987
). “
Techniques for suppression of an interfering talker in co-channel speech
,” in
ICASSP'87. IEEE International Conference on Acoustics, Speech, and Signal Processing
, Vol.
12
.
47.
Neher
,
T.
,
Behrens
,
T.
,
Kragelund
,
L.
, and
Petersen
,
A. S.
(
2007
). “
Spatial unmasking in aided hearing-impaired listeners and the need for training
,” in
Proceeding of the International Symposium on Auditory and Audiological Research
, Helsingør, Denmark, pp.
515
522
.
48.
Nielsen
,
J. B.
, and
Dau
,
T.
(
2011
). “
The Danish hearing in noise test
,”
Int. J. Audiol.
50
,
202
208
.
49.
Nilsson
,
M.
,
Soli
,
S. D.
, and
Sullivan
,
J. A.
(
1994
). “
Development of the Hearing In Noise Test for the measurement of speech reception thresholds in quiet and in noise
,”
J. Acoust. Soc. Am.
95
,
1085
1099
.
50.
Ohlenforst
,
B.
,
Zekveld
,
A. A.
,
Lunner
,
T.
,
Wendt
,
D.
,
Naylor
,
G.
,
Wang
,
Y.
,
Versfeld
,
N. J.
, and
Kramer
,
S. E.
(
2017
). “
Impact of stimulus-related factors and hearing impairment on listening effort as indicated by pupil dilation
,”
Hear. Res.
351
,
68
79
.
51.
O'Sullivan
,
J.
,
Chen
,
Z.
,
Herrero
,
J.
,
McKhann
,
G. M.
,
Sheth
,
S. A.
,
Mehta
,
A. D.
, and
Mesgarani
,
N.
(
2017
). “
Neural decoding of attentional selection in multi-speaker environments without access to clean sources
,”
J. Neural Eng.
14
,
056001
.
52.
Park
,
S. R.
, and
Lee
,
J.
(
2016
). “
A fully convolutional neural network for speech enhancement
,” arXiv:1609.07132.
53.
Parsons
,
T. W.
(
1976
). “
Separation of speech from interfering speech by means of harmonic selection
,”
J. Acoust. Soc. Am.
60
,
911
918
.
54.
Perron
,
M.
(
2017
). “
Hearing aids of tomorrow: Cognitive control toward individualized experience
,”
Hear. J.
70
,
22
23
.
55.
Pertila
,
P.
, and
Cakir
,
E.
(
2017
). “
Robust direction estimation with convolutional neural networks based steered response power
,” in
2017 IEEE International Conference on Acoustics and Speech Signal Processing
, IEEE, pp.
6125
6129
.
56.
Pontoppidan
,
N.
, and
Dyrholm
,
M.
(
2003
). “
Fast monaural separation of speech
,” in
23rd International Conference on Signal Processing and Audio Recording Reproduction
, pp.
1
6
.
57.
Quatieri
,
T. F.
, and
Danisewicz
,
R. G.
(
1990
). “
An apporoach to co-channel talker interference suppression using a sinusoidal model for speech
,”
IEEE Trans. ASSP
38
,
56
69
.
58.
Raj
,
B.
, and
Smaragdis
,
P.
(
2005
). “
Latent variable decomposition of spectrograms for single channel speaker separation
,” in
IEEE Workshop on Applied Signal Processing to Audio Acoustics
, pp.
17
20
.
59.
Roman
,
N.
, and
Wang
,
D.
(
2006
). “
Pitch-based monaural segregation of reverberant speech
,”
J. Acoust. Soc. Am.
120
,
458
469
.
60.
Roweis
,
S. T.
(
2001
). “
One microphone source separation
,”
Adv. Neural Inf. Process. Syst.
13
,
793
799
.
61.
Seltzer
,
M. L.
,
Raj
,
B.
, and
Stern
,
R. M.
(
2000
). “
Classifier-based mask estimation for missing feature methods of robust speech recognition
,” in
Proceedings of the International Conference on Spoken Language Processing
, Vol.
3
, pp.
538
541
.
62.
Srinivasan
,
S.
,
Roman
,
N.
, and
Wang
,
D.
(
2006
). “
Binary and ratio time-frequency masks for robust speech recognition
,”
Speech Commun.
48
,
1486
1501
.
63.
Srivastava
,
N.
,
Hinton
,
G.
,
Krizhevsky
,
A.
,
Sutskever
,
I.
, and
Salakhutdinov
,
R.
(
2014
). “
Dropout: A simple way to prevent neural networks from overfitting
,”
J. Mach. Learn. Res.
15
,
1929
1958
.
64.
Stone
,
M. A.
,
Moore
,
B. C.
,
Meisenbacher
,
K.
, and
Derleth
,
R. P.
(
2008
). “
Tolerable hearing aid delays. V. Estimation of limits for open canal fittings
,”
Ear Hear.
29
,
601
617
.
65.
Stubbs
,
R. J.
, and
Summerfield
,
Q.
(
1990
). “
Algorithms for separating the speech of interfering talkers: Evaluations with voiced sentences, and normal-hearing and hearing-impaired listeners
,”
J. Acoust. Soc. Am.
87
,
359
372
.
66.
Studebaker
,
G. A.
(
1985
). “
A ‘rationalized’ arcsine transform
,”
J. Speech Lang. Hear. Res.
28
,
455
462
.
67.
Summers
,
V.
,
Makashay
,
M. J.
,
Theodoroff
,
S. M.
, and
Leek
,
M. R.
(
2013
). “
Suprathreshold auditory processing and speech perception in noise: Hearing-impaired and normal-hearing listeners
,”
J. Am. Acad. Audiol.
24
,
274
292
.
68.
Taal
,
C. H.
,
Hendriks
,
R. C.
,
Heusdens
,
R.
, and
Jensen
,
J.
(
2011
). “
An algorithm for intelligibility prediction of time-frequency weighted noisy speech
,”
IEEE Trans. Audio Speech Lang. Process.
19
,
2125
2136
.
69.
Tamura
,
S.
, and
Waibel
,
A.
(
1988
). “
Noise reduction using connectionist models
,” in
ICASSP
, pp.
553
556
.
70.
Vincent
,
E.
,
Gribonval
,
R.
, and
Fevotte
,
C.
(
2006
). “
Performance measurement in blind audio source separation
,”
IEEE Trans. Audio Speech Lang. Process.
14
,
1462
1469
.
71.
Virtanen
,
T.
(
2007
). “
Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria
,”
IEEE Trans. Audio Speech Lang. Process.
15
,
1066
1074
.
72.
Wagener
,
K.
,
Josvassen
,
J. L.
, and
Ardenkjær
,
R.
(
2003
). “
Design, optimization and evaluation of a Danish sentence test in noise
,”
Int. J. Audiol.
42
,
10
17
.
73.
Wang
,
D.
(
2008
). “
Time-frequency masking for speech separation and its potential for hearing aid design
,”
Trends Amplif.
12
,
332
353
.
74.
Wang
,
D.
, and
Brown
,
G. J.
(
2006
).
Computational Auditory Scene Analysis: Principles, Algorithms, and Applications
(
Wiley-IEEE
,
New York
), Vol.
147
, pp.
147
185
.
75.
Wang
,
D.
, and
Chen
,
J.
(
2017
). “
Supervised speech separation based on deep learning: An overview
,” arXiv:1708.07524.
76.
Wang
,
D.
, and
Hu
,
G.
(
2006
). “
Unvoiced speech segregation
,” in
2006 IEEE International Conference on Acoustics and Speed Signal Processing
, IEEE, pp.
V-953
V-956
.
77.
Wang
,
D.
,
Kjems
,
U.
,
Pedersen
,
M. S.
,
Boldt
,
J. B.
, and
Lunner
,
T.
(
2009
). “
Speech intelligibility in background noise with ideal binary time-frequency masking
,”
J. Acoust. Soc. Am.
125
,
2336
2347
.
78.
Wang
,
Y.
(
2015
). “
Supervised speech separation using deep neural networks
,” Ph.D. thesis, Ohio State University.
79.
Wang
,
Y.
,
Narayanan
,
A.
, and
Wang
,
D. L.
(
2014
). “
On training targets for supervised speech separation
,”
IEEE Trans. Acoust. Speech Lang. Process.
22
,
1849
1858
.
80.
Wang
,
Y.
, and
Wang
,
D.
(
2013
). “
Towards scaling up classification-based speech separation
,”
IEEE Trans. Audio Speech Lang. Process.
21
,
1381
1390
.
81.
Weninger
,
F.
,
Hershey
,
J. R.
,
Le Roux
,
J.
, and
Schuller
,
B.
(
2014
). “
Discriminatively trained recurrent neural networks for single-channel speech separation
,” in
2014 IEEE Global Conference on Signal Informaton Processing
, IEEE, pp.
577
581
.
82.
Williamson
,
D. S.
, and
Wang
,
D. L.
(
2017
). “
Time-frequency masking in the complex domain for speech dereverberation and denoising
,”
IEEE/ACM Trans. Audio Speech Lang. Process.
25
,
1492
1501
.
83.
Xie
,
F.
, and
Van Compernolle
,
D.
(
1994
). “
A family of MLP based nonlinear spectral estimators for noise reduction
,” in
Proceedings of ICASSP'94. IEEE International Conference on Acoustics and Speech Signal Processing
, pp.
II/53
II/56
.
84.
Xu
,
Y.
,
Du
,
J.
,
Dai
,
L.-R.
, and
Lee
,
C.-H.
(
2014
). “
An experimental study on speech enhancement based on deep neural networks
,”
IEEE Sign. Process. Lett.
21
,
65
68
.
85.
Xu
,
Y.
,
Du
,
J.
,
Dai
,
L.-R.
, and
Lee
,
C.-H.
(
2015
). “
A regression approach to speech enhancement based on deep neural networks
,”
IEEE/ACM Trans. Audio Speech Lang. Process.
23
,
7
19
.
You do not currently have access to this content.