This paper shows that machine learning techniques are very successful at classifying the Russian voiceless non-palatalized fricatives [f], [s], and [ʃ] using a small set of acoustic cues. From a data sample of 6320 tokens of read sentences produced by 40 participants, temporal and spectral measurements are extracted from the full sound, the noise duration, and the middle 30 ms windows. Furthermore, 13 mel-frequency cepstral coefficients (MFCCs) are computed from the middle 30 ms window. Classifiers based on single decision trees, random forests, support vector machines, and neural networks are trained and tested to distinguish between these three fricatives. The results demonstrate that, first, the three acoustic cue extraction techniques are similar in terms of classification accuracy (93% and 99%) but that the spectral measurements extracted from the full frication noise duration result in slightly better accuracy. Second, the center of gravity and the spectral spread are sufficient for the classification of [f], [s], and [ʃ] irrespective of contextual and speaker variation. Third, MFCCs show a marginally higher predictive power over spectral cues (<2%). This suggests that both sets of measures provide sufficient information for the classification of these fricatives and their choice depends on the particular research question or application.
Skip Nav Destination
Article navigation
September 2021
September 13 2021
Identifying the Russian voiceless non-palatalized fricatives /f/, /s/, and /ʃ/ from acoustic cues using machine learninga)
Special Collection:
Machine Learning in Acoustics
Natalja Ulrich;
Natalja Ulrich
b)
Laboratoire Dynamique Du Langage (DDL) UMR 5596, CNRS/Université Lyon 2
, Lyon, France
Search for other works by this author on:
Marc Allassonnière-Tang;
Marc Allassonnière-Tang
c)
Laboratoire Dynamique Du Langage (DDL) UMR 5596, CNRS/Université Lyon 2
, Lyon, France
Search for other works by this author on:
François Pellegrino;
François Pellegrino
Laboratoire Dynamique Du Langage (DDL) UMR 5596, CNRS/Université Lyon 2
, Lyon, France
Search for other works by this author on:
Dan Dediu
Dan Dediu
Laboratoire Dynamique Du Langage (DDL) UMR 5596, CNRS/Université Lyon 2
, Lyon, France
Search for other works by this author on:
b)
Electronic mail: natalja.ulrich@univ-lyon2.fr
c)
ORCID: 0000-0002-9057-642X.
a)
This paper is part of a special issue on Machine Learning in Acoustics.
J. Acoust. Soc. Am. 150, 1806–1820 (2021)
Article history
Received:
February 01 2021
Accepted:
August 05 2021
Citation
Natalja Ulrich, Marc Allassonnière-Tang, François Pellegrino, Dan Dediu; Identifying the Russian voiceless non-palatalized fricatives /f/, /s/, and /ʃ/ from acoustic cues using machine learning. J. Acoust. Soc. Am. 1 September 2021; 150 (3): 1806–1820. https://doi.org/10.1121/10.0005950
Download citation file:
Sign in
Don't already have an account? Register
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
Sign in via your Institution
Sign in via your InstitutionPay-Per-View Access
$40.00