This paper shows that machine learning techniques are very successful at classifying the Russian voiceless non-palatalized fricatives [f], [s], and [ʃ] using a small set of acoustic cues. From a data sample of 6320 tokens of read sentences produced by 40 participants, temporal and spectral measurements are extracted from the full sound, the noise duration, and the middle 30 ms windows. Furthermore, 13 mel-frequency cepstral coefficients (MFCCs) are computed from the middle 30 ms window. Classifiers based on single decision trees, random forests, support vector machines, and neural networks are trained and tested to distinguish between these three fricatives. The results demonstrate that, first, the three acoustic cue extraction techniques are similar in terms of classification accuracy (93% and 99%) but that the spectral measurements extracted from the full frication noise duration result in slightly better accuracy. Second, the center of gravity and the spectral spread are sufficient for the classification of [f], [s], and [ʃ] irrespective of contextual and speaker variation. Third, MFCCs show a marginally higher predictive power over spectral cues (<2%). This suggests that both sets of measures provide sufficient information for the classification of these fricatives and their choice depends on the particular research question or application.

1.
Anjos
,
I.
,
Eskenazi
,
M.
,
Marques
,
N.
,
Grilo
,
M.
,
Guimarães
,
I.
,
Magalhães
,
J.
, and
Cavaco
,
S.
(
2020
). “
Detection of voicing and place of articulation of fricatives with deep learning in a virtual speech and language therapy tutor
,” in
Proceedings of Interspeech 2020
, October 25–29, Shanghai, China, pp.
3156
3160
.
2.
Behrens
,
S. J.
, and
Blumstein
,
S. E.
(
1988
). “
Acoustic characteristics of English voiceless fricatives: A descriptive analysis
,”
J. Phon.
16
(
3
),
295
298
.
3.
Blumstein
,
S. E.
, and
Stevens
,
K. N.
(
1981
). “
Phonetic features and acoustic invariance in speech
,”
Cognition
10
(
1
),
25
32
.
4.
Boersma
,
P.
, and
Weenink
,
D.
(
2021
). “
Praat: Doing phonetics by computer (version 3.9) [computer program]
,” https://www.fon.hum.uva.nl/praat/ (Last viewed 4/7/2021).
5.
Bolla
,
K.
(
1981
).
A conspectus of Russian speech sounds
(
Böhlau Verlag
,
Vienna, Austria
).
6.
Breiman
,
L.
(
2001
). “
Random forests
,”
Mach. Learn.
45
(
1
),
5
32
.
7.
Breiman
,
L.
,
Friedman
,
J.
,
Stone
,
C. J.
, and
Olshen
,
R.
(
1984
).
Classification and Regression Trees
(
Taylor & Francis
,
New York
).
8.
Catford
,
J. C.
(
1977
).
Fundamental Problems in Phonetics
(
Indiana University
,
London
).
9.
Catford
,
J. C.
(
1988
).
A Practical Introduction to Phonetics
(
Oxford University
,
London
).
10.
de Manrique
,
A. M. B.
, and
Massone
,
M. I.
(
1981
). “
Acoustic analysis and perception of Spanish fricative consonants
,”
J. Acoust. Soc. Am.
69
(
4
),
1145
1153
.
11.
Derkach
,
M.
,
Fant
,
G.
, and
de Serpa-Leitao
,
A.
(
1970
). “
Phoneme coarticulation in Russian hard and soft VCV-utterances with voiceless fricatives
,”
STLQPSR
11
(
2–3
),
1
7
.
12.
Dowle
,
M.
, and
Srinivasan
,
A.
(
2019
). “
data.table: Extension of data.frame
,” R package version 1.12.2, https://CRAN.R-project.org/package=data.table (Last viewed 8/4/2021).
13.
Draxler
,
C.
, and
Jänsch
,
K.
(
2018
). “
SpeechRecorder (version 3.28.0) [computer program]
,” https://www.bas.uni-muenchen.de/Bas/software/speechrecorder/ (Last viewed 12/28/2020).
14.
Forrest
,
K.
,
Weismer
,
G.
,
Milenkovic
,
P.
, and
Dougall
,
R. N.
(
1988
). “
Statistical analysis of word-initial voiceless obstruents: Preliminary data
,”
J. Acoust. Soc. Am.
84
(
1
),
115
123
.
15.
Fritsch
,
S.
,
Guenther
,
F.
, and
Wright
,
M. N.
(
2019
). “
neuralnet: Training of neural networks
,” R package version 1.44.2, https://CRAN.R-project.org/package=neuralnet (Last viewed 8/4/2021).
16.
Funatsu
,
S.
, and
Kiritani
,
S.
(
1998
). “
Perceptual properties of Russians with Japanese fricatives
,” in
Proceedings of the 5th International Conference on Spoken Language Processing (ICSLP 98)
, November 30-December 4, Sydney, Australia.
17.
Ghaffarvand Mokari
,
P.
, and
Mahdinezhad Sardhaei
,
N.
(
2020
). “
Predictive power of cepstral coefficients and spectral moments in the classification of Azerbaijani fricatives
,”
J. Acoust. Soc. Am.
147
(
3
),
EL228
EL234
.
18.
Gordon
,
M.
,
Barthmaier
,
P.
, and
Sands
,
K.
(
2002
). “
A cross-linguistic acoustic study of voiceless fricatives
,”
J. Int. Phon. Assoc.
32
(
2
),
141
174
.
19.
Haykin
,
S.
(
1998
).
Neural Networks: A Comprehensive Foundation
(
Prentice Hall
,
Englewood Cliffs, NJ
).
20.
Hayward
,
K.
(
2000
).
Longman Linguistics Library Experimental Phonetics
, 2nd ed. (
Longman
,
New York
).
21.
Heinz
,
J. M.
, and
Stevens
,
K. N.
(
1961
). “
On the properties of voiceless fricative consonants
,”
J. Acoust. Soc. Am.
33
(
5
),
589
596
.
22.
Hoelterhoff
,
J.
, and
Reetz
,
H.
(
2007
). “
Acoustic cues discriminating German obstruents in place and manner of articulation
,”
J. Acoust. Soc. Am.
121
(
2
),
1142
1156
.
23.
Hughes
,
G. W.
, and
Halle
,
M.
(
1956
). “
Spectral properties of fricative consonants
,”
J. Acoust. Soc. Am.
28
(
2
),
303
310
.
24.
Jassem
,
W.
(
1965
). “
The formants of fricative consonants
,”
Lang. Speech
8
(
1
),
1
16
.
25.
Jassem
,
W.
(
1995
). “
The acoustic parameters of Polish voiceless fricatives: An analysis of variance
,”
Phonetica
52
(
3
),
251
258
.
26.
Jesus
,
L. M. T.
, and
Jackson
,
P. J. B.
(
2008
). “
Frication and voicing classification
,” in
Computational Processing of the Portuguese Language
, edited by
A.
Teixeira
,
V. L. S.
de Lima
,
L. C.
de Oliveira
, and
P.
Quaresma
(
Springer
,
Berlin
), pp.
11
20
.
27.
Jesus
,
L. M. T.
, and
Shadle
,
C. H.
(
2002
). “
A parametric study of the spectral characteristics of European Portuguese fricatives
,”
J. Phon.
30
(
3
),
437
464
.
28.
Jolliffe
,
I.
(
2002
).
Principal Component Analysis
(
Springer
,
New York
).
29.
Jongman
,
A.
,
Wayland
,
R.
, and
Wong
,
S.
(
2000
). “
Acoustic characteristics of English fricatives
,”
J. Acoust. Soc. Am.
108
(
3
),
1252
1263
.
30.
Kisler
,
T.
,
Reichel
,
U.
, and
Schiel
,
F.
(
2017
). “
Multilingual processing of speech via web services
,”
Comput. Speech Lang.
45
,
326
347
.
31.
Kissine
,
M.
,
Van de Velde
,
H.
, and
van Hout
,
R.
(
2003
). “
An acoustic study of standard Dutch /v/, /f/, /z/ and /s/
,”
Linguist. Netherlands
20
,
93
104
.
32.
Kochetov
,
A.
(
2017
). “
Acoustics of Russian voiceless sibilant fricatives
,”
J. Int. Phon. Assoc.
47
(
3
),
321
348
.
33.
Kong
,
Y.-Y.
,
Mullangi
,
A.
, and
Kokkinakis
,
K.
(
2014
). “
Classification of fricative consonants for speech enhancement in hearing devices
,”
PLoS One
9
(
4
),
e95001
.
34.
Kuhn
,
M.
,
Chow
,
F.
, and
Wickham
,
H.
(
2019
). “
rsample: General resampling infrastructure
,” R package version 0.0.5, https://CRAN.R-project.org/package=rsample (Last viewed 8/4/2021).
35.
Kuhn
,
M.
, and
Vaughan
,
D.
(
2019
). “
parsnip: A common API to modeling and analysis functions
,” R package version 0.0.3.1, https://CRAN.R-project.org/package=parsnip (Last viewed 8/4/2021).
36.
Kuhn
,
M.
, and
Wickham
,
H.
(
2019
). “
recipes: Preprocessing tools to create design matrices
,” R package version 0.1.6, https://CRAN.R-project.org/package=recipes (Last viewed 8/4/2021).
37.
Ladefoged
,
P.
, and
Maddieson
,
I.
(
1996
).
The Sounds of the World's Languages
(
Blackwell
,
Oxford, UK
).
38.
Ladefoged
,
P.
, and
Wu
,
Z.
(
1984
). “
Places of articulation: An investigation of Pekingese fricatives and affricates
,”
J. Phon.
12
(
3
),
267
278
.
39.
Liaw
,
A.
, and
Wiener
,
M.
(
2002
). “
Classification and regression by randomForest
,”
R News
2
(
3
),
18
22
.
40.
Maniwa
,
K.
,
Jongman
,
A.
, and
Wade
,
T.
(
2009
). “
Acoustic characteristics of clearly spoken English fricatives
,”
J. Acoust. Soc. Am.
125
(
6
),
3962
3973
.
41.
Mann
,
V. A.
, and
Repp
,
B. H.
(
1980
). “
Influence of vocalic context on perception of the [ʃ]-[s] distinction
,”
Percept. Psychophys.
28
(
3
),
213
228
.
42.
McFarland
,
D. H.
,
Baum
,
S. R.
, and
Chabot
,
C.
(
1996
). “
Speech compensation to structural modifications of the oral cavity
,”
J. Acoust. Soc. Am.
100
(
2
),
1093
1104
.
43.
McMurray
,
B.
, and
Jongman
,
A.
(
2011
). “
What information is necessary for speech categorization? Harnessing variability in the speech signal by integrating cues computed relative to expectations
,”
Psychol. Rev.
118
(
2
),
219
246
.
44.
Meyer
,
D.
,
Dimitriadou
,
E.
,
Hornik
,
K.
,
Weingessel
,
A.
, and
Leisch
,
F.
(
2019
). “
e1071: Misc functions of the Department of Statistics, Probability Theory Group (formerly: E1071)
,” R package version 1.7-2, https://CRAN.R-project.org/package=e1071 (Last viewed 8/4/2021).
45.
Milborrow
,
S.
(
2019
). “
rpart.plot: Plot rpart models: An enhanced version of plot.rpart
,” R package version 3.0.8, https://CRAN.R-project.org/package=rpart.plot (Last viewed 8/4/2021).
46.
Nagamine
,
T.
,
Seltzer
,
M.
, and
Mesgarani
,
N.
(
2015
). “
Exploring how deep neural networks form phonemic categories
,” in
Proceedings of Interspeech 2015
, September 6–10, Dresden, Germany, pp.
1912
1916
.
47.
Newell
,
K. M.
, and
Hancock
,
P. A.
(
1984
). “
Forgotten moments
,”
J. Mot. Behav.
16
(
3
),
320
335
.
48.
Nirgianaki
,
E.
(
2014
). “
Acoustic characteristics of Greek fricatives
,”
J. Acoust. Soc. Am.
135
(
5
),
2964
2976
.
49.
Nittrouer
,
S.
,
Studdert-Kennedy
,
M.
, and
McGowan
,
R. S.
(
1989
). “
The emergence of phonetic segments
,”
J. Speech Lang. Hear. Res.
32
(
1
),
120
132
.
50.
Padgett
,
J.
, and
Żygis
,
M.
(
2007
). “
The evolution of sibilants in Polish and Russian
,”
J. Slavic Linguist.
15
(
2
),
291
324
.
51.
Paluszynska
,
A.
, and
Biecek
,
P.
(
2017
). “
randomForestExplainer: Explaining and visualizing random forests in terms of variable importance
,” R package version 0.9, https://CRAN.R-project.org/package=randomForestExplainer (Last viewed 8/4/2021).
52.
Parks
,
R. W.
,
Levine
,
D. S.
, and
Long
,
D. L.
(
1998
).
Computational Neuroscience Fundamentals of Neural Network Modeling: Neuropsychology and Cognitive Neuroscience
(
MIT
,
Cambridge, MA
).
53.
Patil
,
V.
, and
Rao
,
P.
(
2008
). “
Acoustic cues to manner of articulation of obstruents in Marathi
,” in
Proceedings of Frontiers of Research on Speech and Music (FRSM)
, edited by
A.
Okrent
and
J.
Boyle
, February 20–21, Kolkata, India.
54.
Peeters
,
G.
(
2004
). “
A large set of audio features for sound description (similarity and classification) in the CUIDADO project
,” CUIDADO First Project Report (
IRCAM
,
Paris, France
).
55.
Rabha
,
S.
,
Sarmah
,
P.
, and
Prasanna
,
S. R. M.
(
2019
). “
Aspiration in fricative and nasal consonants: Properties and detection
,”
J. Acoust. Soc. Am.
146
(
1
),
614
625
.
56.
Reidy
,
P. F.
(
2016
). “
Spectral dynamics of sibilant fricatives are contrastive and language specific
,”
J. Acoust. Soc. Am.
140
(
4
),
2518
2529
.
57.
Schiel
,
F.
(
1999
). “
Automatic Phonetic Transcription of Non-Prompted Speech
,” in
Proceedings of the 14th International Congress of Phonetic Sciences
, August 1–7, San Francisco, CA.
58.
Shadle
,
C. H.
(
1985
). “
The acoustics of fricative consonants
,”
Doctoral dissertation, Massachusetts Institute of Technology.
59.
Shadle
,
C. H.
(
1990
). “
Articulatory-acoustic relationships in fricative consonants
,” in
Speech Production and Speech Modelling
(
Springer
,
New York
), pp.
187
209
.
60.
Shadle
,
C. H.
, and
Mair
,
S.
(
1996
). “
Quantifying spectral characteristics of fricatives
,” in
Proceedings of the Fourth International Conference on Spoken Language Processing, ICSLP '96
, October 3–6, Philadelphia, PA,
Vol.
3
, pp.
1521
1524
.
61.
Shupljakov
,
V.
,
Fant
,
G.
, and
de Serpa-Leitao
,
A.
(
1968
). “
Acoustical features of hard and soft Russian consonants in connected speech: A spectrographic study
,”
STL-QPSR
9
(
4
),
1
6
.
62.
Skarnitzl
,
R.
, and
Machač
,
P.
(
2011
). “
Principles of phonetic segmentation
,”
Phonetica
68
,
198
199
.
63.
Soli
,
S. D.
(
1981
). “
Second formants in fricatives: Acoustic consequences of fricative-vowel coarticulation
,”
J. Acoust. Soc. Am.
70
(
4
),
976
984
.
64.
Spinu
,
L.
,
Kochetov
,
A.
, and
Lilley
,
J.
(
2018
). “
Acoustic classification of Russian plain and palatalized sibilant fricatives: Spectral vs. cepstral measures
,”
Speech Commun.
100
,
41
45
.
65.
Spinu
,
L.
, and
Lilley
,
J.
(
2016
). “
A comparison of cepstral coefficients and spectral moments in the classification of Romanian fricatives
,”
J. Phon.
57
,
40
58
.
66.
Stevens
,
K. N.
(
1998
).
Acoustic Phonetics
(
MIT
,
Cambridge, MA
).
67.
Strevens
,
P.
(
1960
). “
Spectra of fricative noise in human speech
,”
Lang. Speech
3
(
1
),
32
49
.
68.
Tagliamonte
,
S. A.
, and
Baayen
,
H.
(
2012
). “
Models, forests, and trees of York English: Was/were variation as a case study for statistical practice
,”
Lang. Var. Change
24
,
135
178
.
69.
Tang
,
Y.
, and
Horikoshi
,
M.
(
2016
). “
ggfortify: Unified interface to visualize statistical result of popular R packages
,”
R J.
8
(
2
),
474
489
.
70.
Therneau
,
T.
, and
Atkinson
,
B.
(
2019
). “
rpart: Recursive partitioning and regression trees
,” R package version 4.1-15, https://CRAN.R-project.org/package=rpart (Last viewed 8/4/2021).
71.
Timberlake
,
A.
(
2004
).
A Reference Grammar of Russian
(
Cambridge University
,
Cambridge, UK
).
72.
Ting
,
K. M.
(
2010
). “
Precision and recall
,” in
Encyclopedia of Machine Learning
, edited by
C.
Sammut
and
G. I.
Webb
(
Springer
,
Boston, MA
).
73.
Tomiak
,
G. R.
(
1991
). “
An acoustic and perceptual analysis of the spectral moments invariant with voiceless fricative obstruents
,”
Doctoral dissertation.
74.
Venables
,
W. N.
,
Ripley
,
B. D.
, and
Venables
,
W. N.
(
2002
).
Statistics and Computing Modern Applied Statistics with S
, 4th ed. (
Springer
,
New York
).
75.
Vydana
,
H. K.
, and
Vuppala
,
A. K.
(
2016
). “
Detection of fricatives using S -transform
,”
J. Acoust. Soc. Am.
140
(
5
),
3896
3907
.
76.
Wickham
,
H.
(
2017
). “
tidyverse: Easily install and load the Tidyverse
,” R package version 1.2.1, https://CRAN.R-project.org/package=tidyverse (Last viewed 8/4/2021).
77.
Wickham
,
H.
, and
Seidel
,
D.
(
2020
). “
scales: Scale functions for visualization
,” R package version 1.1.1, https://CRAN.R-project.org/package=scales (Last viewed 8/4/2021).
78.
Zsiga
,
E. C.
(
2000
). “
Phonetic alignment constraints: Consonant overlap and palatalization in English and Russian
,”
J. Phon.
28
(
1
),
69
102
.
79.
Żygis
,
M.
, and
Padgett
,
J.
(
2010
). “
A perceptual study of Polish fricatives, and its implications for historical sound change
,”
J. Phon.
38
(
2
),
207
226
.

Supplementary Material

You do not currently have access to this content.