Numerous attempts have been made to find low-dimensional, formant-related representations of speech signals that are suitable for automatic speech recognition. However, it is often not known how these features behave in comparison with true formants. The purpose of this study was to compare two sets of automatically extracted formant-like features, i.e., robust formants and HMM2 features, to hand-labeled formants. The robust formant features were derived by means of the split Levinson algorithm while the HMM2 features correspond to the frequency segmentation of speech signals obtained by two-dimensional hidden Markov models. Mel-frequency cepstral coefficients (MFCCs) were also included in the investigation as an example of state-of-the-art automatic speech recognition features. The feature sets were compared in terms of their performance on a vowel classification task. The speech data and hand-labeled formants that were used in this study are a subset of the American English vowels database presented in Hillenbrand et al. [J. Acoust. Soc. Am. 97, 3099–3111 (1995)]. Classification performance was measured on the original, clean data and in noisy acoustic conditions. When using clean data, the classification performance of the formant-like features compared very well to the performance of the hand-labeled formants in a gender-dependent experiment, but was inferior to the hand-labeled formants in a gender-independent experiment. The results that were obtained in noisy acoustic conditions indicated that the formant-like features used in this study are not inherently noise robust. For clean and noisy data as well as for the gender-dependent and gender-independent experiments the MFCCs achieved the same or superior results as the formant features, but at the price of a much higher feature dimensionality.
Skip Nav Destination
,
,
,
,
,
Article navigation
September 2004
September 07 2004
Evaluation of formant-like features on an automatic vowel classification task
Febe de Wet;
Febe de Wet
Department of Language and Speech, University of Nijmegen, Nijmegen, The Netherlands
Search for other works by this author on:
Katrin Weber;
Katrin Weber
IDIAP—Dalle Molle Institute for Perceptual Artificial Intelligence, Martigny, Switzerland
EPFL—Swiss Federal Institute of Technology, Lausanne, Switzerland
Search for other works by this author on:
Louis Boves;
Louis Boves
Department of Language and Speech, University of Nijmegen, Nijmegen, The Netherlands
Search for other works by this author on:
Bert Cranen;
Bert Cranen
Department of Language and Speech, University of Nijmegen, Nijmegen, The Netherlands
Search for other works by this author on:
Samy Bengio;
Samy Bengio
IDIAP—Dalle Molle Institute for Perceptual Artificial Intelligence, Martigny, Switzerland
Search for other works by this author on:
Hervé Bourlard
Hervé Bourlard
IDIAP—Dalle Molle Institute for Perceptual Artificial Intelligence, Martigny, Switzerland
EPFL—Swiss Federal Institute of Technology, Lausanne, Switzerland
Search for other works by this author on:
Febe de Wet
Katrin Weber
,
Louis Boves
Bert Cranen
Samy Bengio
Hervé Bourlard
,
Department of Language and Speech, University of Nijmegen, Nijmegen, The Netherlands
J. Acoust. Soc. Am. 116, 1781–1792 (2004)
Article history
Received:
December 20 2002
Accepted:
April 23 2004
Citation
Febe de Wet, Katrin Weber, Louis Boves, Bert Cranen, Samy Bengio, Hervé Bourlard; Evaluation of formant-like features on an automatic vowel classification task. J. Acoust. Soc. Am. 1 September 2004; 116 (3): 1781–1792. https://doi.org/10.1121/1.1781620
Download citation file:
Pay-Per-View Access
$40.00
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
Citing articles via
I can't hear you without my glasses
Tessa Bent
A survey of sound source localization with deep learning methods
Pierre-Amaury Grumiaux, Srđan Kitić, et al.
Related Content
Highlighting interlanguage phoneme differences based on similarity matrices and convolutional neural network
J. Acoust. Soc. Am. (January 2021)
Static features in real-time recognition of isolated vowels at high pitch
J. Acoust. Soc. Am. (October 2007)
Closed-set speaker conditioned acoustic-to-articulatory inversion using bi-directional long short term memory network
J. Acoust. Soc. Am. (February 2020)
Statistical modeling of speech Poincaré sections in combination of frequency analysis to improve speech recognition performance
Chaos (August 2010)
Effects of noise suppression on intelligibility: Dependency on signal-to-noise ratios
J. Acoust. Soc. Am. (January 2012)