The reassigned spectrogram (RS) has emerged as the most accurate way to infer vocal tract resonances from the acoustic signal [Shadle, Nam, and Whalen (2016). “Comparing measurement errors for formants in synthetic and natural vowels,” J. Acoust. Soc. Am. 139(2), 713–727]. To date, validating its accuracy has depended on formant synthesis for ground truth values of these resonances. Synthesis is easily controlled, but it has many intrinsic assumptions that do not necessarily accurately realize the acoustics in the way that physical resonances would. Here, we show that physical models of the vocal tract with derivable resonance values allow a separate approach to the ground truth, with a different range of limitations. Our three-dimensional printed vocal tract models were excited by white noise, allowing an accurate determination of the resonance frequencies. Then, sources with a range of fundamental frequencies were implemented, allowing a direct assessment of whether RS avoided the systematic bias towards the nearest strong harmonic to which other analysis techniques are prone. RS was indeed accurate at fundamental frequencies up to 300 Hz; above that, accuracy was somewhat reduced. Future directions include testing mechanical models with the dimensions of children's vocal tracts and making RS more broadly useful by automating the detection of resonances.

1.
Arai
,
T.
(
2006
). “
Sliding three-tube model as a simple educational tool for vowel production
,”
Acoust. Sci. Technol.
27
(
6
),
384
388
.
2.
Arai
,
T.
(
2007
). “
Education system in acoustics of speech production using physical models of the human vocal tract
,”
Acoust. Sci. Technol.
28
(
3
),
190
201
.
3.
Arai
,
T.
(
2012
). “
Education in acoustics and speech science using vocal-tract models
,”
J. Acoust. Soc. Am.
131
(
3
),
2444
2454
.
4.
Arai
,
T.
(
2016
). “
Vocal-tract models and their applications in education for intuitive understanding of speech production
,”
Acoust. Sci. Technol.
37
(
4
),
148
156
.
5.
Atal
,
B. S.
, and
Hanauer
,
S. L.
(
1971
). “
Speech analysis and synthesis by linear prediction of the speech wave
,”
J. Acoust. Soc. Am.
50
(
2B
),
637
655
.
6.
Birkholz
,
P.
,
Kürbis
,
S.
,
Stone
,
S.
,
Häsner
,
P.
,
Blandin
,
R.
, and
Fleischer
,
M.
(
2020
). “
Printable 3D vocal tract shapes from MRI data and their acoustic and aerodynamic properties
,”
Sci. Data
7
(
1
),
255
.
7.
Birkholz
,
P.
, and
Venus
,
E.
(
2018
). “
Considering lip geometry in one-dimensional tube models of the vocal tract
,” in
Studies on Speech Production
, edited by
Q.
Fang
,
J.
Dang
,
P.
Perrier
,
J.
Wei
,
L.
Wang
, and
N.
Yan
(
Springer Nature
,
Cham, Switzerland
), pp.
78
86
.
8.
Boersma
,
P.
, and
Weenink
,
D.
(
2024
). “
Praat: Doing phonetics by computer
” (version 6.4.04)
Boersma
,
P.
, and
Weenink
,
D.
[computer program], http://www.praat.org/ (Last viewed January 6, 2024).
9.
Burg
,
J. P.
(
1967
). “
Maximum entropy spectral analysis
,” in
37th Annual International Meeting
,
Society of Exploration Geophysics
,
Oklahoma City, OK
.
10.
Chen
,
W.-R.
,
Whalen
,
D. H.
, and
Shadle
,
C. H.
(
2019
). “
F0-induced formant measurement errors result in biased variabilities
,”
J. Acoust. Soc. Am.
145
(
5
),
EL360
EL366
.
11.
Cho
,
H.
,
Kim
,
W. J.
, and
Hong
,
W.
(
2019
). “
Underwater signal analysis in the modulation spectrogram with time-frequency reassignment technique
,”
IEICE Trans. Fundam. Electron. Comput. Sci.
E102.A
(
11
),
1542
1544
.
12.
Cox
,
S. R.
,
Huang
,
T.
,
Chen
,
W.-r.
, and
Ng
,
M. L.
(
2021
). “
An acoustic study of vowels produced by Cantonese alaryngeal speakers using clear speech
,”
J. Acoust. Soc. Am.
150
,
A270
.
13.
Crandall
,
I. B.
(
1927
). “
Dynamical study of the vowel sounds part II
,”
Bell Syst. Tech. J.
6
(
1
),
100
116
.
14.
Dang
,
J.
,
Shadle
,
C. H.
,
Kawanishi
,
Y.
,
Honda
,
K.
, and
Suzuki
,
H.
(
1998
). “
An experimental study of the open end correction coefficient for side branches within an acoustic tube
,”
J. Acoust. Soc. Am.
104
(
2
),
1075
1084
.
15.
Davies
,
P.
,
McGowan
,
R. S.
, and
Shadle
,
C. H.
(
1993
).
Practical Flow Duct Acoustics Applied to the Vocal Tract
(
Singular Publishing Group, Inc
.,
San Diego, CA
), pp.
93
142
.
16.
Delvaux
,
B.
, and
Howard
,
D.
(
2014
). “
A new method to explore the spectral impact of the piriform fossae on the singing voice: Benchmarking using MRI-based 3D-printed vocal tracts
,”
PLoS One
9
(
7
),
e102680
.
17.
Epps
,
J.
,
Smith
,
J. R.
, and
Wolfe
,
J.
(
1997
). “
A novel instrument to measure acoustic resonances of the vocal tract during phonation
,”
Meas. Sci. Technol.
8
(
10
),
1112
1121
.
18.
Fant
,
G.
(
1970
).
Acoustic Theory of Speech Production
(
Mouton and Co
.,
The Hague
).
19.
Flanagan
,
J. L.
(
1972
).
Speech Analysis Synthesis and Perception
, 2nd ed. (
Springer-Verlag
,
New York
), Vol.
3
.
20.
Fleischer
,
M.
,
Mainka
,
A.
,
Kürbis
,
S.
, and
Birkholz
,
P.
(
2018
). “
How to precisely measure the volume velocity transfer function of physical vocal tract models by external excitation
,”
PLoS One
13
(
3
),
e0193708
.
21.
Fulop
,
S. A.
(
2010
). “
Accuracy of formant measurement for synthesized vowels using the reassigned spectrogram and comparison with linear prediction
,”
J. Acoust. Soc. Am.
127
(
4
),
2114
2117
.
22.
Fulop
,
S. A.
(
2011
).
Speech Spectrum Analysis
(
Springer
,
Berlin, Heidelberg
).
23.
Fulop
,
S. A.
, and
Fitz
,
K.
(
2006
). “
Algorithms for computing the time-corrected instantaneous frequency (reassigned) spectrogram, with applications
,”
J. Acoust. Soc. Am.
119
(
1
),
360
371
.
24.
Fulop
,
S. A.
, and
Fitz
,
K.
(
2007
). “
Separation of components from impulses in reassigned spectrograms
,”
J. Acoust. Soc. Am.
121
(
3
),
1510
1518
.
25.
Fulop
,
S. A.
, and
Shadle
,
C. H.
(
2018
). “
Automated formant tracking using reassigned spectrograms
,”
J. Acoust. Soc. Am.
143
(
3
),
1870
.
26.
Gabor
,
D.
(
1946
). “
Theory of communication
Part 1: The analysis of information,”
J. Inst. Electr. Eng., Part 3
93
(
26
),
429
441
.
27.
Guðnason
,
J.
,
Mehta
,
D. D.
, and
Quatieri
,
T. F.
(
2015
). “
Evaluation of speech inverse filtering techniques using a physiologically based synthesizer
,” in
2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
, South Brisbane, QLD, Australia, pp.
4245
4249
.
28.
Jeanneteau
,
M.
,
Hanna
,
N.
,
Almeida
,
A.
,
Smith
,
J.
, and
Wolfe
,
J.
(
2019
). “
Open-source software for estimating vocal tract resonances using broadband excitation at the lips
,” in
Proceedings of the 19th International Congress of Phonetic Sciences
, Melbourne, Australia, 2019 (Australasian Speech Science and Technology Association, Canberra, Australia), pp.
2971
2975
.
29.
Klatt
,
D. H.
(
1986
). “
Representation of the first formant in speech recognition and LF models of the auditory periphery
,” in
Montreal Satellite Symposium on Speech Recognition, 12th International Congress on Acoustics
, edited by
P.
Mermelstein
(Canadian Acoustical Society, Montreal, Canada), pp.
5
7
, available at https://jcaa.caa-aca.ca/index.php/jcaa/article/view/3495/3321.
30.
Kodera
,
K.
,
De Villedary
,
C.
, and
Gendrin
,
R.
(
1976
). “
A new method for the numerical analysis of non-stationary signals
,”
Phys. Earth Planet. Inter.
12
(
2-3
),
142
150
.
31.
Liao
,
J.-S.
(
2016
). “
An acoustic study of vowels produced by alaryngeal speakers in Taiwan
,”
Am. J. Speech Lang. Pathol.
25
(
4
),
481
492
.
32.
Lloyd
,
R. J.
(
1890
). “
Speech sounds: Their nature and causation
,”
Phonetische Studien
3
,
251
278
.
33.
McCandless
,
S.
(
1974
). “
An algorithm for automatic formant extraction using linear prediction spectra
,”
IEEE Trans. Acoust. Speech Signal Process.
22
(
2
),
135
141
.
34.
Nelson
,
D. J.
(
2001
). “
Cross-spectral methods for processing speech
,”
J. Acoust. Soc. Am.
110
(
5
),
2575
2592
.
35.
Nelson
,
D. J.
(
2002
). “
Instantaneous higher order phase derivatives
,”
Digit. Signal Process.
12
(
2
),
416
428
.
36.
Plante
,
F.
, and
Ainsworth
,
W. A.
(
1995
). “
Formant tracking using reassigned spectrum
,” in
Proceedings of Fourth European Conference on Speech Communication and Technology
(ESCA, Grenoble, France), pp.
741
744
.
37.
Rayleigh
,
L.
(
1916
). “
The theory of the Helmholtz resonator
,”
Proc. R. Soc. London. Ser. A, Math. Phys. Charac.
92
(
638
),
265
275
.
38.
Shadle
,
C. H.
,
Chen
,
W.-R.
,
Fulop
,
S. A.
, and
Whalen
,
D. H.
(
2022
). “
Mechanical models as ground truth for vowel resonance analysis
,”
J. Acoust. Soc. Am
151
,
A131
.
39.
Shadle
,
C. H.
,
Nam
,
H.
, and
Whalen
,
D. H.
(
2016
). “
Comparing measurement errors for formants in synthetic and natural vowels
,”
J. Acoust. Soc. Am.
139
(
2
),
713
727
.
40.
Sun
,
K.
,
Jin
,
T.
, and
Yang
,
D.
(
2015
). “
A new reassigned spectrogram method in interference detection for GNSS receivers
,”
Sensors
15
(
9
),
22167
22191
.
41.
Sundberg
,
J.
,
Lindblom
,
B.
, and
Liljencrants
,
J.
(
1992
). “
Formant frequency estimates for abruptly changing area functions: A comparison between calculations and measurements
,”
J. Acoust. Soc. Am.
91
(
6
),
3478
3482
.
42.
Tze Wei Chu
,
D.
,
Li
,
K.
,
Epps
,
J.
,
Smith
,
J.
, and
Wolfe
,
J.
(
2013
). “
Experimental evaluation of inverse filtering using physical systems with known glottal flow and tract characteristics
,”
J. Acoust. Soc. Am.
133
(
5
),
EL358
EL362
.
43.
Vallabha
,
G. K.
, and
Tuller
,
B.
(
2002
). “
Systematic errors in the formant analysis of steady-state vowels
,”
Speech Commun.
38
(
1
),
141
160
.
44.
Whalen
,
D. H.
,
Chen
,
W.-R.
,
Shadle
,
C. H.
, and
Fulop
,
S. A.
(
2022
). “
Formants are easy to measure; resonances, not so much: Lessons from Klatt (1986)
,”
J. Acoust. Soc. Am.
152
(
2
),
933
941
.
45.
Zhang
,
Z.
,
Honda
,
K.
, and
Wei
,
J.
(
2020
). “
Retrieving vocal-tract resonance and anti-resonance from high-pitched vowels using a rahmonic subtraction technique
,” in
ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
, Barcelona, Spain (IEEE, Piscataway, NJ), pp.
7359
7363
.

Supplementary Material

You do not currently have access to this content.