In this paper, we modify the Gabor feature extraction process, while applying the Gabor filters on the power-normalized spectrum and concatenating with power normalized cepstrum coefficients (PNCC), for noise robust large vocabulary continuous speech recognition. In Chang et al., ICASSP (2013), a similar Gabor filter bank (GBFB) feature set with multi-layer perceptron (MLP) processing (to reduce the feature dimension) has been used with mel frequency cepstrum coefficients showing improvements on Aurora-2 and renoised Wall Street Journal corpora. On a subset of the Aurora-4 database (only male), our method has shown promising results (when using PCA) being 7.9% better than 39-dimensional PNCC features. But, the GBFB features are a rich representation of the speech spectrogram (as an overcomplete basis), and an appropriate dimension reduction/manifold learning technique is the key to generalizing these features for the large vocabulary task. Hence, we propose the use of Laplacian Eigenmaps to obtain a reduced manifold of 13 dimension (from a 564-dimensional GBFB feature set) for the training dataset with a MLP being used to learn the mapping so that the same can be applied to out-of-sample points, i.e., the test dataset. The reduced GBFB features are then concatenated with the 26-dimension PNCC plus acceleration coefficients. This technique should lead to better accuracies as speech lies on a non-linear manifold rather than a linear feature space. [This project was supported in part by DARPA.]
Skip Nav Destination
Article navigation
November 2013
Meeting abstract. No PDF available.
November 01 2013
The effect of non-linear dimension reduction on Gabor filter bank feature space
Hitesh A. Gupta;
Hitesh A. Gupta
Elec. Eng., Univ. of California Los Angeles, 550 Veteran Ave., Apt. 102, Los Angeles, CA 90024, [email protected]
Search for other works by this author on:
Anirudh Raju;
Anirudh Raju
Elec. Eng., Univ. of California Los Angeles, 550 Veteran Ave., Apt. 102, Los Angeles, CA 90024, [email protected]
Search for other works by this author on:
Abeer Alwan
Abeer Alwan
Elec. Eng., Univ. of California Los Angeles, 550 Veteran Ave., Apt. 102, Los Angeles, CA 90024, [email protected]
Search for other works by this author on:
J. Acoust. Soc. Am. 134, 4069 (2013)
Citation
Hitesh A. Gupta, Anirudh Raju, Abeer Alwan; The effect of non-linear dimension reduction on Gabor filter bank feature space. J. Acoust. Soc. Am. 1 November 2013; 134 (5_Supplement): 4069. https://doi.org/10.1121/1.4830855
Download citation file:
Citing articles via
All we know about anechoic chambers
Michael Vorländer
A survey of sound source localization with deep learning methods
Pierre-Amaury Grumiaux, Srđan Kitić, et al.
Does sound symbolism need sound?: The role of articulatory movement in detecting iconicity between sound and meaning
Mutsumi Imai, Sotaro Kita, et al.
Related Content
Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition
J. Acoust. Soc. Am. (April 2015)
Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition
J. Acoust. Soc. Am. (May 2012)
A simulation framework for auditory discrimination experiments: Revealing the importance of across-frequency processing in speech perception
J. Acoust. Soc. Am. (May 2016)
PNCC for forensic automatic speaker recognition
AIP Conf. Proc. (April 2020)
Unsupervised modulation filter learning for noise-robust speech recognition
J. Acoust. Soc. Am. (September 2017)