The influence of different sources of speech-intrinisic variation (speaking rate, effort, style and dialect or accent) on human speech perception was investigated. In listening experiments with 16 listeners, confusions of consonant-vowel-consonant (CVC) and vowel-consonant-vowel (VCV) sounds in speech-weighted noise were analyzed. Experiments were based on the OLLO logatome speech database, which was designed for a man-machine comparison. It contains utterances spoken by 50 speakers from five dialect/accent regions and covers several intrinsic variations. By comparing results depending on intrinsic and extrinsic variations (i.e., different levels of masking noise), the degradation induced by variabilities can be expressed in terms of the SNR. The spectral level distance between the respective speech segment and the long-term spectrum of the masking noise was found to be a good predictor for recognition rates, while phoneme confusions were influenced by the distance to spectrally close phonemes. An analysis based on transmitted information of articulatory features showed that voicing and manner of articulation are comparatively robust cues in the presence of intrinsic variations, whereas the coding of place is more degraded. The database and detailed results have been made available for comparisons between human speech recognition (HSR) and automatic speech recognizers (ASR).
Skip Nav Destination
Article navigation
November 2010
November 24 2010
Human phoneme recognition depending on speech-intrinsic variabilitya)
Bernd T. Meyer;
Bernd T. Meyer
c)
Medizinische Physik,
Carl-von-Ossietzky Universität Oldenburg
, D-26111 Oldenburg, Germany
Search for other works by this author on:
Tim Jürgens;
Tim Jürgens
Medizinische Physik,
Carl-von-Ossietzky Universität Oldenburg
, D-26111 Oldenburg, Germany
Search for other works by this author on:
Thorsten Wesker;
Thorsten Wesker
Medizinische Physik,
Carl-von-Ossietzky Universität Oldenburg
, D-26111 Oldenburg, Germany
Search for other works by this author on:
Thomas Brand;
Thomas Brand
Medizinische Physik,
Carl-von-Ossietzky Universität Oldenburg
, D-26111 Oldenburg, Germany
Search for other works by this author on:
Birger Kollmeier
Birger Kollmeier
Medizinische Physik,
Carl-von-Ossietzky Universität Oldenburg
, D-26111 Oldenburg, Germany
Search for other works by this author on:
c)
Author to whom correspondence should be addressed. Electronic mail: bernd.meyer@uni-oldenburg.de
a)
Parts of this work were presented at the Eighth Annual Conference of the International Speech Communication Association (Interspeech 2007, Antwerp).
J. Acoust. Soc. Am. 128, 3126–3141 (2010)
Article history
Received:
December 29 2008
Accepted:
September 03 2010
Citation
Bernd T. Meyer, Tim Jürgens, Thorsten Wesker, Thomas Brand, Birger Kollmeier; Human phoneme recognition depending on speech-intrinsic variability. J. Acoust. Soc. Am. 1 November 2010; 128 (5): 3126–3141. https://doi.org/10.1121/1.3493450
Download citation file:
Sign in
Don't already have an account? Register
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
Sign in via your Institution
Sign in via your InstitutionPay-Per-View Access
$40.00