The high-frequency region (above 4–5 kHz) of the speech spectrum has received substantial research attention over the previous decade, with a host of studies documenting the presence of important and useful information in this region. The purpose of the current experiment was to compare the presence of indexical and segmental information in the low- and high-frequency region of speech (below and above 4 kHz) and to determine the extent to which information from these regions can be used in a machine learning framework to correctly classify indexical and segmental aspects of the speech signal. Naturally produced vowel segments produced by ten male and ten female talkers were used as input to a temporal dictionary ensemble classification model in unfiltered, low-pass filtered (below 4 kHz), and high-pass filtered (above 4 kHz) conditions. Classification performance in the unfiltered and low-pass filtered conditions was approximately 90% or better for vowel categorization, talker sex, and individual talker identity tasks. Classification performance for high-pass filtered signals composed of energy above 4 kHz was well above chance for the same tasks. For several classification tasks (i.e., talker sex and talker identity), high-pass filtering had minimal effect on classification performance, suggesting the preservation of indexical information above 4 kHz.

1.
Alexander
,
J. M.
(
2013
). “
Individual variability in recognition of frequency-lowered speech
,”
Semin. Hear.
34
,
86
109
.
2.
Box
,
G.
,
Jenkins
,
G. M.
,
Reinsel
,
G. C.
, and
Ljung
,
G. M.
(
2015
).
Time Series Analysis: Forecasting and Control
, 5th ed. (
Wiley
,
New York
).
3.
Cai
,
L.
,
Gao
,
J.
, and
Zhao
,
D.
(
2020
). “
A review of the application of deep learning in medical image classification and segmentation
,”
Ann. Transl. Med.
8
(11),
713
.
4.
Deshpande
,
M. S.
, and
Holambe
,
R. S.
(
2011
). “
Robust speaker identification in the presence of car noise
,”
Int. J. Biom.
3
,
189
205
.
5.
Donai
,
J. J.
, and
Halbritter
,
R.
(
2017
). “
Gender identification using high-frequency speech energy: Effects of increasing the low-frequency limit
,”
Ear Hear.
38
,
65
73
.
6.
Donai
,
J. J.
, and
Lass
,
N. J.
(
2015
). “
Gender identification from high-pass filtered vowel segments: The use of high-frequency energy
,”
Atten. Percept. Psychophys.
77
,
2452
2462
.
7.
Donai
,
J. J.
,
Motiian
,
S.
, and
Doretto
,
G.
(
2016
). “
Automated classification of vowel category and speaker type in the high-frequency spectrum
,”
Aud. Res.
6
,
137
.
8.
Donai
,
J. J.
, and
Paschall
,
D. D.
(
2015
). “
Identification of high-pass filtered male, female, and child vowels: The use of high-frequency cues
,”
J. Acoust. Soc. Am.
137
,
1971
1982
.
9.
Flaherty
,
M.
,
Libert
,
K.
, and
Monson
,
B. B.
(
2021
). “
Extended high-frequency hearing and head orientation cues benefit children during speech-in-speech recognition
,”
Hear. Res.
406
,
108230
.
10.
French
,
N.
, and
Steinberg
,
J.
(
1947
). “
Factors governing the intelligibility of speech sounds
,”
J. Acoust. Soc. Am.
19
,
90
119
.
11.
Hayakawa
,
S.
, and
Itakura
,
F.
(
1994
). “
Text dependent speaker recognition using the information in the higher frequency band
,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (
IEEE Press
,
Piscataway, NJ
), pp.
137
141
).
12.
Hayakawa
,
S.
, and
Itakura
,
F.
(
1995
). “
The influence of noise on the speaker recognition performance using the higher frequency band
,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (
IEEE Press
,
Piscataway, NJ
), pp.
321
324
).
13.
Hillenbrand
,
J.
, and
Gayvert
,
R. T.
(
1993
). “
Vowel classification based on fundamental frequency and formant frequencies
,”
J. Speech Lang. Hear. Res.
36
,
694
700
.
14.
Hillenbrand
,
J.
,
Getty
,
L. A.
,
Clark
,
M. J.
, and
Wheeler
,
K.
(
1995
). “
Acoustic characteristics of American English vowels
,”
J. Acoust. Soc. Am.
97
,
3099
3111
.
15.
Hinton
,
G.
,
Deng
,
L.
,
Yu
,
D.
,
Dahl
,
G. E.
,
Mohamed
,
A.-R.
,
Jaitly
,
N.
,
Senior
,
A.
,
Vanhoucke
,
V.
,
Nguyen
,
P.
,
Sainath
,
T. N.
, and
Kingsbury
,
B.
(
2012
). “
Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
,”
IEEE Signal Process. Mag.
29
,
82
97
.
16.
Hu
,
G. H.
, and
Wang
,
D.
(
2004
). “
Monaural speech segregation based on pitch tracking and amplitude modulation
,”
IEEE Trans. Neural Netw.
15
,
1135
1150
.
17.
Jacewicz
,
E.
, and
Fox
,
R. A.
(
2012
). “
The effects of cross-generalization and cross-dialectical variation on vowel identification and classification
,”
J. Acoust. Soc. Am.
131
,
1413
1433
.
18.
Lavanya
,
P.
, and
Sasikala
,
E.
(
2021
). “
Deep learning techniques on text classification using Natural Language Processing (NLP) In social healthcare network: A comprehensive survey
,” in
2021 3rd International Conference on Signal Processing and Communication (ICPSC)
, Coimbatore, India, 2021, pp.
603
609
.
19.
Latinus
,
M.
, and
Taylor
,
M. J.
(
2012
). “
Discriminating male and female voices: Differentiating pitch and gender
,”
Brain Topogr.
25
,
194
204
.
20.
Mahato
,
V.
,
O'Reilly
,
M.
, and
Cunningham
,
P.
(
2018
). “
A comparison of k-NN methods for time series classification and regression
,” in
Proceedings for the 26th AIAI Irish Conference on Artificial Intelligence and Cognitive Science
, December 6–7, Dublin, Ireland.
21.
Mesgarani
,
N.
, and
Chang
,
E. F.
(
2012
). “
Selective cortical representation of attended speaker in multi-talker speech perception
,”
Nature
485
,
233
236
.
22.
Middlehurst
,
M.
,
Large
,
J.
,
Cawley
,
G.
, and
Bagnall
,
A.
(
2020
). “
The temporal dictionary ensemble (TDE) classifier for time series classification
,” in
Machine Learning and Knowledge Discovery in Databases: ECML PKDD 2020
, edited by
F.
Hutter
,
K.
Kersting
,
J.
Lijffijt
, and
I.
Valera
(
Springer
,
Cham, Switzerland
), pp.
660
676
.
23.
Monson
,
B. B.
,
Lotto
,
A. J.
, and
Story
,
B.
(
2014
). “
Gender and vocal production mode discrimination using the high frequencies for speech and singing
,”
Front. Psychol.
5
,
1239
.
24.
Neel
,
A. T.
(
2004
). “
Formant detail needed for vowel identification
,”
Acoust. Res. Lett. Online
5
,
125
131
.
25.
Polspoel
,
S.
,
Kramer
,
S. E.
,
van Dijk
,
B.
, and
Smits
,
C.
(
2022
). “
The importance of extended high-frequency speech information in the recognition of digits, words, and sentences in quiet and noise
,”
Ear Hear.
43
,
913
920
.
26.
Powers
,
D. M.
(
2011
). “
Evaluation: From precision, recall and F-measure to ROC, informedness, markedness & correlation
,”
J. Mach. Learn. Technol.
2
,
37
63
.
27.
Schäfer
,
P.
, and
Högqvist
,
M.
(
2012
). “
SFA: A symbolic Fourier approximation and index for similarity search in high dimensional datasets
,” in
EDBT '12: Proceedings of the 15th International Conference on Extending Database Technology
, March 27–30, Berlin, Germany (
Association for Computing Machinery
,
New York
), pp.
516
527
.
28.
Schwartz
,
J.
,
Whyte
,
A.
,
Al-Nuaimi
,
M.
, and
Donai
,
J. J.
(
2018
). “
Effects of signal bandwidth and noise on individual speaker identification
,”
J. Acoust. Soc. Am.
144
,
EL447
EL452
.
29.
Spoorthy
,
V.
,
Mulimani
,
M.
, and
Koolagudi
,
S.
(
2021
). “
Acoustic scene classification using deep learning architectures
,” in
Proceedings of the 2021 6th International Conference for Convergence in Technology (I2CT)
, April 2–4, Maharashtra, India (
IEEE
,
New York
).
30.
Stelmachowicz
,
P. G.
,
Pittman
,
A. L.
,
Hoover
,
B. M.
, and
Lewis
,
D. E.
(
2001
). “
Effect of stimulus bandwidth on the perception of /s/ in normal- and hearing-impaired children and adults
,”
J. Acoust. Soc. Am.
110
,
2183
2190
.
31.
Stelmachowicz
,
P. G.
,
Pittman
,
A. L.
,
Hoover
,
B. M.
, and
Lewis
,
D. E.
(
2002
). “
Aided perception of /s/ and /z/ by hearing-impaired children
,”
Ear Hear.
23
,
316
324
.
32.
Vitela
,
A. D.
,
Monson
,
B. B.
, and
Lotto
,
A. J.
(
2015
). “
Phoneme categorization relying solely on high-frequency energy
,”
J. Acoust. Soc. Am.
137
,
EL65
EL70
.
33.
Zhang
,
C.
, and
Ma
,
Y.
(
2012
).
Ensemble Machine Learning: Methods and Applications
(
Springer
,
New York
).
34.
Zhou
,
X.
,
Garcia-Romero
,
D.
,
Duraiswami
,
R.
,
Epsy-Wilson
,
C.
, and
Shamma
,
S.
(
2011
). “
Linear versus mel frequency cepstral coefficients for speaker recognition
,” in
Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding
, December 11–15, Waikoloa, HI (
IEEE
,
New York
), pp.
559
564
.
You do not currently have access to this content.