To test if simultaneous spectral and temporal processing is required to extract robust features for automatic speech recognition (ASR), the robust spectro-temporal two-dimensional-Gabor filter bank (GBFB) front-end from Schädler, Meyer, and Kollmeier [J. Acoust. Soc. Am. 131, 4134–4151 (2012)] was de-composed into a spectral one-dimensional-Gabor filter bank and a temporal one-dimensional-Gabor filter bank. A feature set that is extracted with these separate spectral and temporal modulation filter banks was introduced, the separate Gabor filter bank (SGBFB) features, and evaluated on the CHiME (Computational Hearing in Multisource Environments) keywords-in-noise recognition task. From the perspective of robust ASR, the results showed that spectral and temporal processing can be performed independently and are not required to interact with each other. Using SGBFB features permitted the signal-to-noise ratio (SNR) to be lowered by 1.2 dB while still performing as well as the GBFB-based reference system, which corresponds to a relative improvement of the word error rate by 12.8%. Additionally, the real time factor of the spectro-temporal processing could be reduced by more than an order of magnitude. Compared to human listeners, the SNR needed to be 13 dB higher when using Mel-frequency cepstral coefficient features, 11 dB higher when using GBFB features, and 9 dB higher when using SGBFB features to achieve the same recognition performance.
Skip Nav Destination
Article navigation
April 2015
April 01 2015
Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition
Marc René Schädler;
Marc René Schädler
a)
Medizinische Physik and Cluster of Excellence Hearing4all,
Universität Oldenburg
, D-26111 Oldenburg, Germany
Search for other works by this author on:
Birger Kollmeier
Birger Kollmeier
Medizinische Physik and Cluster of Excellence Hearing4all,
Universität Oldenburg
, D-26111 Oldenburg, Germany
Search for other works by this author on:
a)
Author to whom correspondence should be addressed. Electronic mail: [email protected]
J. Acoust. Soc. Am. 137, 2047–2059 (2015)
Article history
Received:
September 26 2014
Accepted:
March 05 2015
Citation
Marc René Schädler, Birger Kollmeier; Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition. J. Acoust. Soc. Am. 1 April 2015; 137 (4): 2047–2059. https://doi.org/10.1121/1.4916618
Download citation file:
Pay-Per-View Access
$40.00
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
Citing articles via
All we know about anechoic chambers
Michael Vorländer
A survey of sound source localization with deep learning methods
Pierre-Amaury Grumiaux, Srđan Kitić, et al.
Does sound symbolism need sound?: The role of articulatory movement in detecting iconicity between sound and meaning
Mutsumi Imai, Sotaro Kita, et al.
Related Content
A simulation framework for auditory discrimination experiments: Revealing the importance of across-frequency processing in speech perception
J. Acoust. Soc. Am. (May 2016)
Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition
J. Acoust. Soc. Am. (May 2012)
The effect of non-linear dimension reduction on Gabor filter bank feature space
J Acoust Soc Am (November 2013)
Modeling the onset advantage in musical instrument recognition
J. Acoust. Soc. Am. (December 2019)
Characteristics of spectro-temporal modulation frequency selectivity in humans
J. Acoust. Soc. Am. (March 2017)