This study examines the role of frequencies above 8 kHz in the classification of conversational speech fricatives [f, v, θ, ð, s, z, ʃ, ʒ, h] in random forest modeling. Prior research has mostly focused on spectral measures for fricative categorization using frequency information below 8 kHz. The contribution of higher frequencies has received only limited attention, especially for non-laboratory speech. In the present study, we use a corpus of sociolinguistic interview recordings from Western Canadian English sampled at 44.1 and 16 kHz. For both sampling rates, we analyze spectral measures obtained using Fourier analysis and the multitaper method, and we also compare models without and with amplitudinal measures. Results show that while frequency information above 8 kHz does not improve classification accuracy in random forest analyses, inclusion of such frequencies can affect the relative importance of specific measures. This includes a decreased contribution of center of gravity and an increased contribution of spectral standard deviation for the higher sampling rate. We also find no major differences in classification accuracy between Fourier and multitaper measures. The inclusion of power measures improves model accuracy but does not change the overall importance of spectral measures.

1.
Apoux
,
F.
, and
Bacon
,
S.
(
2004
). “
Relative importance of temporal information in various frequency regions for consonant identification in quiet and in noise
,”
J. Acoust. Soc. Am.
116
,
1671
1680
.
2.
Baum
,
S. R.
, and
Blumstein
,
S. E.
(
1987
). “
Preliminary observations on the use of duration as a cue to syllable-initial fricative consonant voicing in English
,”
J. Acoust. Soc. Am.
82
(
3
),
1073
1077
.
3.
Best
,
V.
,
Carlile
,
S.
,
Jin
,
C.
, and
van Schaik
,
A.
(
2005
). “
The role of high frequencies in speech localization
,”
J. Acoust. Soc. Am.
118
,
353
363
.
4.
Blacklock
,
O. S.
(
2004
). “
Characteristics of variation in production of normal and disordered fricatives, using reduced-variance spectral methods
,” Ph.D. thesis,
University of Southampton
,
Southampton, UK
.
5.
Boersma
,
P.
, and
Weenink
,
D.
(
2022
). “
Praat: Doing phonetics by computer [computer program]
,” http://www.praat.org/.
6.
Dilts
,
P. C.
(
2013
). “
Modelling phonetic reduction in a corpus of spoken English using random forests and mixed-effects regression
,” Ph.D. thesis,
University of Alberta
,
Edmonton, Alberta, Canada
.
7.
Hayakawa
,
S.
, and
Itakura
,
F.
(
1994
). “
Text-dependent speaker recognition using the information in the higher frequency band
,” in
Proceedings of ICASSP ’94. IEEE International Conference on Acoustics, Speech and Signal Processing
, Adelaide, Australia (IEEE Computer Society, Washington, DC), Vol.
1
, pp.
137
140
.
8.
Hughes
,
G. W.
, and
Halle
,
M.
(
1956
). “
Spectral properties of fricative consonants
,”
J. Acoust. Soc. Am.
28
(
2
),
303
310
.
9.
Hunter
,
L.
,
Monson
,
B.
,
Moore
,
D.
,
Dhar
,
S.
,
Wright
,
B.
,
Munro
,
K.
,
Motlagh Zadeh
,
L.
,
Blankenship
,
C.
,
Stiepan
,
S.
, and
Siegel
,
J.
(
2020
). “
Extended high frequency hearing and speech perception implications in adults and children
,”
Hear. Res.
397
,
107922
.
10.
Jongman
,
A.
,
Wayland
,
R.
, and
Wong
,
S.
(
2000
). “
Acoustic characteristics of English fricatives
,”
J. Acoust. Soc. Am.
108
(
3
),
1252
1263
.
11.
Kharlamov
,
V.
,
Brenner
,
D.
, and
Tucker
,
B. V.
(
2022
). “
Temporal and spectral characteristics of conversational versus read fricatives in American English
,”
J. Acoust. Soc. Am.
152
(
4
),
2073
2081
.
12.
Kuhn
,
M.
(
2008
). “
Building predictive models in R using the caret package
,”
J. Stat. Softw.
28
(
1
),
1
26
.
13.
Ladefoged
,
P.
, and
Disner
,
S. F.
(
2012
).
Vowels and Consonants
(
Wiley-Blackwell
,
Malden, MA
).
14.
McMurray
,
B.
, and
Jongman
,
A.
(
2011
). “
What information is necessary for speech categorization? Harnessing variability in the speech signal by integrating cues computed relative to expectations
,”
Psychol. Rev.
118
(
2
),
219
246
.
15.
Monson
,
B. B.
,
Hunter
,
E. J.
,
Lotto
,
A. J.
, and
Story
,
B. H.
(
2014
). “
The perceptual significance of high-frequency energy in the human voice
,”
Front. Psychol.
16
(
5
),
587
.
16.
Monson
,
B. B.
,
Hunter
,
E. J.
, and
Story
,
B. H.
(
2012a
). “
Horizontal directivity of low- and high-frequency energy in speech and singing
,”
J. Acoust. Soc. Am.
132
(
1
),
433
441
.
17.
Monson
,
B. B.
,
Lotto
,
A. J.
, and
Story
,
B. H.
(
2012b
). “
Analysis of high-frequency energy in long-term average spectra of singing, speech, and voiceless fricatives
,”
J. Acoust. Soc. Am.
132
(
3
),
1754
1764
.
18.
Monson
,
B. B.
,
Rock
,
J.
,
Schulz
,
A.
,
Hoffman
,
E.
, and
Buss
,
E.
(
2019
). “
Ecological cocktail party listening reveals the utility of extended high-frequency hearing
,”
Hear. Res.
381
,
107773
.
19.
Niebuhr
,
O.
(
2017
). “
On the perception of ‘segmental intonation’: F0 context effects on sibilant identification in German
,”
J. Audio, Speech, Music Process.
2017
(
1
),
19
.
20.
Pitt
,
M. A.
,
Johnson
,
K.
,
Hume
,
E.
,
Kiesling
,
S.
, and
Raymond
,
W.
(
2005
). “
The Buckeye corpus of conversational speech: Labeling conventions and a test of transcriber reliability
,”
Speech Commun.
45
(
1
),
89
95
.
21.
Pittman
,
A.
(
2008
). “
Short-term word-learning rate in children with normal hearing and children with hearing loss in limited and extended high-frequency bandwidths
,”
J. Speech. Lang. Hear. Res.
51
,
785
797
.
22.
Rahim
,
K. J.
(
2014
). “
Applications of multitaper spectral analysis to nonstationary data
, Ph.D. thesis,
Queen's University
,
Kingston, Canada
.
23.
R Core Team
(
2022
). “
R: A language and environment for statistical computing
,” in
R Foundation for Statistical Computing
,
Vienna, Austria
, https://www.R-project.org.
24.
Roettger
,
T.
(
2017
).
Tonal Placement in Tashlhiyt: How an Intonation System Accommodates to Adverse Phonological Environments
(
Language Science Press
,
Berlin, Germany
).
25.
Shadle
,
C. H.
(
1985
). “
The acoustics of fricative consonants
,” Ph.D thesis,
Massachusetts Institute of Technology
,
Cambridge, MA
.
26.
Shadle
,
C. H.
(
2012
). “
The acoustics and aerodynamics of fricatives
,” in
The Oxford Handbook of Laboratory Phonology
, edited by
A. C.
Cohn
,
C.
Fougeron
, and
M. K.
Huffman
(
Oxford University Press
,
Oxford, UK
), pp.
511
526
.
27.
Shadle
,
C. H.
(
2023
). “
Alternatives to moments for characterizing fricatives: Reconsidering Forrest et al. (1988)
,”
J. Acoust. Soc. Am.
153
(
2
),
1412
1426
.
28.
Shadle
,
C. H.
, and
Mair
,
S. J.
(
1996
). “
Quantifying spectral characteristics of fricatives
,” in
Proceedings of International Conference on Spoken Language Processing
, Philadelphia, PA (
IEEE
,
Piscataway, NJ
), pp.
1521
1524
.
29.
Strevens
,
P.
(
1960
). “
Spectra of fricative noise in human speech
,”
Lang. Speech
3
(
1
),
32
49
.
30.
Tabain
,
M.
(
1998
). “
Non-sibilant fricatives in English: Spectral information above 10 kHz
,”
Phonetica
55
(
3
),
107
130
.
31.
Tagliamonte
,
S. A.
, and
Baayen
,
R. H.
(
2012
). “
Models, forests, and trees of York English: Was/were variation as a case study for statistical practice
,”
Lang. Var. Change
24
(
2
),
135
178
.
32.
Wittrock
,
B. J.
(
2020
). “
Vowel production and Canadian raising in Southern Alberta and Saskatchewan English
,”
Honors thesis
,
University of Alberta
,
Edmonton, Alberta, Canada
.
33.
Wright
,
M. N.
, and
Ziegler
,
A.
(
2017
). “
ranger: A fast implementation of random forests for high dimensional data in C++ and R
,”
J. Stat. Softw.
77
(
1
),
1
17
.
34.
Yuan
,
J.
, and
Liberman
,
M.
(
2008
). “
Speaker identification on the SCOTUS corpus
,” in
Proceedings of Acoustics 2008,
Paris, France, pp.
5687
5690
.
35.
Zue
,
V. W.
, and
Seneff
,
S.
(
1996
). “
Transcription and alignment of the TIMIT database
,” in
Recent Research towards Advanced Man-Machine Interface through Spoken Language
, edited by
H.
Fujisaki
(
Elsevier
,
Amsterdam, the Netherlands
), pp.
515
525
.

Supplementary Material

You do not currently have access to this content.