This paper addresses the problem of automatic identification of vowels uttered in isolation by female and child speakers. In this case, the magnitude spectrum of voiced vowels is sparsely sampled since only frequencies at integer multiples of F0 are significant. This impacts negatively on the performance of vowel identification techniques that either ignore pitch or rely on global shape models. A new pitch-dependent approach to vowel identification is proposed that emerges from the concept of timbre and that defines perceptual spectral clusters (PSC) of harmonic partials. A representative set of static PSC-related features are estimated and their performance is evaluated in automatic classification tests using the Mahalanobis distance. Linear prediction features and Mel-frequency cepstral coefficients (MFCC) coefficients are used as a reference and a database of five (Portuguese) natural vowel sounds uttered by 44 speakers (including 27 child speakers) is used for training and testing the Gaussian models. Results indicate that perceptual spectral cluster (PSC) features perform better than plain linear prediction features, but perform slightly worse than MFCC features. However, PSC features have the potential to take full advantage of the pitch structure of voiced vowels, namely in the analysis of concurrent voices, or by using pitch as a normalization parameter.
Skip Nav Destination
Article navigation
October 2007
October 01 2007
Static features in real-time recognition of isolated vowels at high pitch
Aníbal J. S. Ferreira
Aníbal J. S. Ferreira
a)
Department of Electrical and Computer Engineering,
University of Porto
, Rua Dr. Roberto Frias s/n, 4200-465 Porto, Portugal
Search for other works by this author on:
Aníbal J. S. Ferreira
a)
Department of Electrical and Computer Engineering,
University of Porto
, Rua Dr. Roberto Frias s/n, 4200-465 Porto, Portugala)
Electronic mail: [email protected]
J. Acoust. Soc. Am. 122, 2389–2404 (2007)
Article history
Received:
September 16 2006
Accepted:
July 23 2007
Citation
Aníbal J. S. Ferreira; Static features in real-time recognition of isolated vowels at high pitch. J. Acoust. Soc. Am. 1 October 2007; 122 (4): 2389–2404. https://doi.org/10.1121/1.2772228
Download citation file:
Pay-Per-View Access
$40.00
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
Citing articles via
Focality of sound source placement by higher (ninth) order ambisonics and perceptual effects of spectral reproduction errors
Nima Zargarnezhad, Bruno Mesquita, et al.
Related Content
Effects of predictability on vowel reduction
J. Acoust. Soc. Am. (April 2014)
Recoverability-driven coarticulation: Acoustic evidence from Japanese high vowel devoicing
J. Acoust. Soc. Am. (February 2018)
The distribution of speaker information in Dutch fricatives /s/ and /x/ from telephone dialogues
J. Acoust. Soc. Am. (February 2020)