The effects on speech intelligibility and sound quality of two noise-reduction algorithms were compared: a deep recurrent neural network (RNN) and spectral subtraction (SS). The RNN was trained using sentences spoken by a large number of talkers with a variety of accents, presented in babble. Different talkers were used for testing. Participants with mild-to-moderate hearing loss were tested. Stimuli were given frequency-dependent linear amplification to compensate for the individual hearing losses. A paired-comparison procedure was used to compare all possible combinations of three conditions. The conditions were: speech in babble with no processing (NP) or processed using the RNN or SS. In each trial, the same sentence was played twice using two different conditions. The participants indicated which one was better and by how much in terms of speech intelligibility and (in separate blocks) sound quality. Processing using the RNN was significantly preferred over NP and over SS processing for both subjective intelligibility and sound quality, although the magnitude of the preferences was small. SS processing was not significantly preferred over NP for either subjective intelligibility or sound quality. Objective computational measures of speech intelligibility predicted better intelligibility for RNN than for SS or NP.
Skip Nav Destination
Article navigation
March 2019
March 25 2019
Comparison of effects on subjective intelligibility and quality of speech in babble for two algorithms: A deep recurrent neural network and spectral subtraction
Mahmoud Keshavarzi;
Mahmoud Keshavarzi
a)
Department of Psychology, University of Cambridge
, Cambridge, United Kingdom
Search for other works by this author on:
Tobias Goehring;
Tobias Goehring
MRC Cognition and Brain Sciences Unit, University of Cambridge
, Cambridge, United Kingdom
Search for other works by this author on:
Richard E. Turner;
Richard E. Turner
Department of Engineering, University of Cambridge
, Cambridge, United Kingdom
Search for other works by this author on:
Brian C. J. Moore
Brian C. J. Moore
Department of Psychology, University of Cambridge
, Cambridge, United Kingdom
Search for other works by this author on:
a)
Electronic mail: mahmoud.keshavarzi.ir@ieee.org
J. Acoust. Soc. Am. 145, 1493–1503 (2019)
Article history
Received:
November 15 2018
Accepted:
March 01 2019
Citation
Mahmoud Keshavarzi, Tobias Goehring, Richard E. Turner, Brian C. J. Moore; Comparison of effects on subjective intelligibility and quality of speech in babble for two algorithms: A deep recurrent neural network and spectral subtraction. J. Acoust. Soc. Am. 1 March 2019; 145 (3): 1493–1503. https://doi.org/10.1121/1.5094765
Download citation file:
Sign in
Don't already have an account? Register
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
Pay-Per-View Access
$40.00
Citing articles via
A survey of sound source localization with deep learning methods
Pierre-Amaury Grumiaux, Srđan Kitić, et al.
Using soundscape simulation to evaluate compositions for a public space sound installation
Valérian Fraisse, Nadine Schütz, et al.
Source and propagation modelling scenarios for environmental impact assessment: Model verification
Michael A. Ainslie, Robert M. Laws, et al.