The modulation filtering approach to robust automatic speech recognition (ASR) is based on enhancing perceptually relevant regions of the modulation spectrum while suppressing the regions susceptible to noise. In this paper, a data-driven unsupervised modulation filter learning scheme is proposed using convolutional restricted Boltzmann machine. The initial filter is learned using the speech spectrogram while subsequent filters are learned using residual spectrograms. The modulation filtered spectrograms are used for ASR experiments on noisy and reverberant speech where these features provide significant improvements over other robust features. Furthermore, the application of the proposed method for semi-supervised learning is investigated.
Skip Nav Destination
Article navigation
September 2017
September 27 2017
Unsupervised modulation filter learning for noise-robust speech recognition
Purvi Agrawal;
Indian Institute of Science
, Bangalore, India
Search for other works by this author on:
Sriram Ganapathy
Sriram Ganapathy
Indian Institute of Science
, Bangalore, India
Search for other works by this author on:
a)
Electronic mail: purvi_agrawal@ee.iisc.ernet.in
J. Acoust. Soc. Am. 142, 1686–1692 (2017)
Article history
Received:
February 16 2017
Accepted:
August 24 2017
Citation
Purvi Agrawal, Sriram Ganapathy; Unsupervised modulation filter learning for noise-robust speech recognition. J. Acoust. Soc. Am. 1 September 2017; 142 (3): 1686–1692. https://doi.org/10.1121/1.5001926
Download citation file:
Sign in
Don't already have an account? Register
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
Pay-Per-View Access
$40.00
Citing articles via
A survey of sound source localization with deep learning methods
Pierre-Amaury Grumiaux, Srđan Kitić, et al.
Co-speech head nods are used to enhance prosodic prominence at different levels of narrow focus in French
Christopher Carignan, Núria Esteve-Gibert, et al.
Source and propagation modelling scenarios for environmental impact assessment: Model verification
Michael A. Ainslie, Robert M. Laws, et al.
Related Content
Overlapped speech detection using phase features
J. Acoust. Soc. Am. (October 2021)
Auditory motivated front-end for noisy speech using spectro-temporal modulation filtering
J. Acoust. Soc. Am. (October 2014)
Analysis and prediction of acoustic speech features from mel-frequency cepstral coefficients in distributed speech recognition architectures
J. Acoust. Soc. Am. (December 2008)
Improved short-term significant wave height forecasting using ensemble empirical mode decomposition coupled with linear regression
AIP Conf. Proc. (April 2023)
Enhancing hyperspectral EELS analysis of complex plasmonic nanostructures with pan-sharpening
J. Chem. Phys. (January 2021)