This study investigated how the bandwidths of resonances simulated by transmission-line models of the vocal tract compare to bandwidths measured from physical three-dimensional printed vowel resonators. Three types of physical resonators were examined: models with realistic vocal tract shapes based on Magnetic Resonance Imaging (MRI) data, straight axisymmetric tubes with varying cross-sectional areas, and two-tube approximations of the vocal tract with notched lips. All physical models had hard walls and closed glottis so the main loss mechanisms contributing to the bandwidths were sound radiation, viscosity, and heat conduction. These losses were accordingly included in the simulations, in two variants: A coarse approximation of the losses with frequency-independent lumped elements, and a detailed, theoretically more precise loss model. Across the examined frequency range from 0 to 5 kHz, the resonance bandwidths increased systematically from the simulations with the coarse loss model to the simulations with the detailed loss model, to the tube-shaped physical resonators, and to the MRI-based resonators. This indicates that the simulated losses, especially the commonly used approximations, underestimate the real losses in physical resonators. Hence, more realistic acoustic simulations of the vocal tract require improved models for viscous and radiation losses.
I. INTRODUCTION
Acoustic simulations of the vocal tract are widely used in speech research and for articulatory speech synthesis. There are one-dimensional (1D), two-dimensional (2D), and three-dimensional (3D) simulation methods. Although the 3D methods can be very accurate up to high frequencies, they are too computationally expensive for many applications (Arnela , 2019; Blandin , 2022; Speed , 2013; Takemoto , 2010; Vampola , 2008). Therefore, when fast simulations are needed, 1D methods based on the assumption of plane wave propagation in the vocal tract are the preferred choice. The plane wave assumption holds up to frequencies of about 5 kHz, which is sufficient for most applications. A widely-used 1D simulation method is based on the transmission-line (TL) circuit model (Birkholz and Drechsel, 2021; Elie and Laprie, 2016; Flanagan , 1975; Maeda, 1982), which will also be used in this study. The main advantage compared to other 1D simulation models like the Kelly-Lochbaum model (Kelly and Lochbaum, 1962; Liljencrants, 1985) is that it poses no restrictions with regard to the lengths of the tube sections by which the vocal tract is represented.
For an acoustic simulation to be as realistic as possible, it is important that the power losses are adequately modeled because they determine the bandwidths of the acoustic resonances. The main losses in the vocal tract arise from viscous friction at the tube wall, heat conduction at the tube wall, sound radiation from the mouth and nostrils, sound absorption by the soft vocal tract walls, and imperfect closure at the glottis. The equations that are commonly used to model these losses in TL models of the vocal tract (see Sec. II C) are physically well-founded (Flanagan, 1965). Based on these equations, the contributions of the different loss mechanisms to the bandwidths of the resonances can be determined, as illustrated in Fig. 1. However, the equations to model the losses rely on certain simplifying assumptions. For example, the heat conduction loss and the viscous loss were derived for a tube with smooth and hard walls, which does not apply to the vocal tract. Furthermore, the radiation loss is commonly modeled in terms of the radiation impedance of a piston (mouth opening) in a sphere (the head), which approaches that of a piston in an infinite, plane baffle, when the radius of the piston becomes small compared with that of the sphere (Birkholz, 2005; Flanagan, 1965). Again, given the horn-like shape of the human lips, this is also a significant simplification.
Variation of formant bandwidth with formant frequency as simulated by a transmission-line model of the vocal tract according to Flanagan (1975). The diagram shows the relative contributions of different loss mechanisms to the total formant bandwidth.
Variation of formant bandwidth with formant frequency as simulated by a transmission-line model of the vocal tract according to Flanagan (1975). The diagram shows the relative contributions of different loss mechanisms to the total formant bandwidth.
Another difficulty is the frequency-dependence of the equations for the losses due to heat conduction, viscosity, and radiation. In time-domain simulations of the TL model, the frequency-dependence cannot be modeled by the lumped circuit elements, so further approximations are required. Here, a common strategy is to model the viscous loss using the (frequency-independent) Hagen–Poiseuille equation (Birkholz, 2005; Elie and Laprie, 2016; Maeda, 1982). However, this equation only holds for a stationary laminar flow, and significantly underestimates the viscous losses for sound waves. For the radiation loss, Flanagan (1965) proposed an approximation of the radiation impedance of a piston in an infinite plane baffle in terms of a parallel circuit of two frequency-independent circuit elements. However, as will be shown in Sec. II C, this approximation also slightly underestimates the loss.
The present study investigated the extent to which simulated losses based on the aforementioned assumptions and approximations match the losses of physical models of the vocal tract with respect to the bandwidths of the resonances. The physical models were made of hard plastic, and their resonances were measured with a closed glottis. Thus, the effective loss mechanisms in these models were viscous, heat-conduction, and radiation losses. The same types of losses were included in TL simulations of the vocal tract, both using the detailed, frequency-dependent equations (“detailed loss model”), and using the approximations with frequency-independent circuit elements (“coarse loss model”).
The TL simulations with both loss models were performed for 16 different vowels. These were compared with two types of physical models with differently realistic geometries: a set of 16 axisymmetric tube models, and a set of 2 × 16 realistically shaped models based on Magnetic Resonance Imaging (MRI) data. The axisymmetric tubes closely match the situation modeled with the TL circuit model. Hence, any potential bandwidths differences between the TL simulations and the tube-shaped physical models should be mainly related to simplifying assumptions in the models for the viscous and heat-conduction losses. On the other hand, any potential bandwidths differences between the MRI-based physical models and the tube-shaped physical models should be related to the more complex geometry of the MRI-based models, especially in the region of the lips. To study the effect of lip geometry on the bandwidths of the resonances in more detail, a third set of physical resonators was used, which consisted of concatenated cylindrical tubes with notches of different depths to model the “lip horn.” Hence, the study was designed to identify potential shortcomings of the loss models used in TL simulations of the vocal tract and to distinguish the effects of viscous and heat-conduction losses from radiation losses.
II. METHODS
A. Creation of physical resonators
Three types of physical resonators were used. The first type [Fig. 2(a)] represents the vocal tract as a straight axisymmetric tube with circular cross-sections, where the cross-sectional area varies along the tube axis. This type of resonator is the most direct physical equivalent of the TL simulation method described in Sec. II C. Of this type, 16 resonators were made, namely, for the eight tense German vowels /aː, eː, iː, oː, uː, ɛː, øː, yː/ and the eight lax German vowels /ɛ, ɪ, ɔ, ʊ, ʏ, œ, ə, ɐ/. The area functions for these resonators were obtained from the vocal tract shapes defined in the articulatory speech synthesizer VocalTractLab 2.3 (VTL) (see www.vocaltractlab.de). The epilaryngeal tube of the physical models was slightly widened compared to the models defined in VTL. In future studies, this will allow them to be more easily excited with silicone vocal fold models, similar to Birkholz (2019), but this was not done in the present study. For the widening, the radius of the epilaryngeal tube was changed linearly from 8.5 mm at the glottal end to 3.5 mm at a position 1.8 cm above the “glottis.” All models were designed with a wall thickness of 3 mm and supplemented with flanges of 57 mm diameter at both ends. All 16 models were 3D-printed with an Ultimaker 3 printer (Dynamism, Chicago, IL) using the hard plastic material polylactic acid (PLA) with an infill ratio of 100%. Nine of these resonators were previously used by Birkholz (2022).
(Color online) Examples of the 3D-printed vowel resonators used in this study. (a) From left to right: tube resonators for the vowels /aː, iː, uː, ə/. (b) MRI-based resonators for /i:/ (left) and /a:/ (right). (c) Two-tube models for /e/ (front row) and /a/ (back row) including teeth with lip notch depths of 0 (left), 2 (middle), and 4 cm (right).
(Color online) Examples of the 3D-printed vowel resonators used in this study. (a) From left to right: tube resonators for the vowels /aː, iː, uː, ə/. (b) MRI-based resonators for /i:/ (left) and /a:/ (right). (c) Two-tube models for /e/ (front row) and /a/ (back row) including teeth with lip notch depths of 0 (left), 2 (middle), and 4 cm (right).
The second type of resonators were models with realistic 3D vocal tract shapes that were obtained from volumetric MRI data of two subjects [Fig. 2(b)]. These models were previously published as the Dresden Vocal Tract Dataset in terms of 3D-printable geometry files (Birkholz , 2020). Here, we used the models of the eight tense German vowels /aː, eː, iː, oː, uː, ɛː, øː, yː/ and the eight lax German vowels /a, ɛ, ɪ, ɔ, ʊ, ʏ, œ, ə/ of both subjects (32 resonators in total). All resonators were 3D-printed with PLA (100% infill ratio) on an Ultimaker 3 printer.
The third type of resonators are two-tube approximations of the vocal tract with notched lips, as shown in Fig. 2(c). They consist of two cylindrical, cascaded tubes of different cross-sectional areas. Their dimensions represent the four vowels /a, ae, e, ə/, inspired by Flanagan (1965), and are detailed in Fig. 3. Each of these vowels was made in five variants, with lip notches of 0 (no notch), 1 , 2, 3, and 4 cm. This allows to study the specific effect of the lip notch on the bandwidths of the resonances. All 20 models were designed with a wall thickness of 3 mm and 3D-printed on an Ultimaker 3 printer with PLA (100% infill ratio). In addition, all models were equipped with a row of “teeth,” as shown in Fig. 2(c). They reduce the radiating area of the mouth to a more realistic level than without teeth. More details about these resonators are given in Birkholz and Venus (2018).
Geometries of the four two-tube resonators with notched lips. The notch depths were 0 (no notch), 1, 2, 3, and 4 cm.
Geometries of the four two-tube resonators with notched lips. The notch depths were 0 (no notch), 1, 2, 3, and 4 cm.
B. Measurement of transfer functions of the physical resonators
For all physical resonators, the volume velocity transfer function (VVTF), i.e., the complex frequency-dependent ratio of the volume velocity at the lips to the volume velocity through the glottis, was measured with the method described in Fleischer (2018) and Birkholz (2020). This method is based on the principle of reciprocity and has the advantage that it does not require a broadband volume velocity source to excite the resonators at the glottis. Instead, a standard loudspeaker is used, placed about 30 cm in front of the resonator to be measured and aimed at its mouth opening. In a first step, the loudspeaker emits a broadband sine sweep signal, while the sweep response is measured with a microphone inside the resonator at the closed glottal end (closed-glottis case). In a second step, a reference measurement is performed, where the mouth opening of the resonator is closed (using a 3 mm thick circular plate for the flanged resonators, and a layer of modeling clay for the others). For this measurement, the loudspeaker emits the same excitation signal as before, while a microphone measures the sweep response immediately in front of the closed mouth. According to the theory outlined in Fleischer (2018), the VVTF can then be calculated as . The video https://youtu.be/9AoRS9X2BNY illustrates this procedure.
All resonators were measured in the large anechoic chamber of the TU Dresden at a temperature of 20.5 °C. For the measurement of , a 1/4-in. measurement microphone (MK301E capsule with MV301 preamplifier by Microtech Gefell, Berlin, Germany) was used. This microphone was inserted through a hole of a glottal adaptor plate. For the measurement of , the probe microphone G.R.A.S. 40SC was used (apart from the MRI-based resonators, where the G.R.A.S. 46BL was used). Both microphones were connected to the audio interface Terratec Aureon XFire 8.0 HD (Terratec, Alsdorf, Germany), which in turn was connected to a laptop computer (MSI GT72-2QE, MSI, New Tapei City, Taiwan) with the operating system Windows 8.1, 64 Bit.
The measurements were made with the open-source software MeasureTransferFunction (Birkholz, 2019), which implements the method by Farina (2000) and allows one to obtain the linear frequency response of the resonators despite potential harmonic distortions generated by the loudspeaker. The transfer functions were measured from 100 to 10 000 Hz with a spectral resolution of 0.96 Hz. Figure 4 shows examples of obtained transfer functions for the two-tube resonators for the vowel /ae/ without a lip notch (gray curve) and with a lip notch of 3 cm depth (black). Here, the lip notch causes a shift of the resonances towards higher frequencies but also an increase in their bandwidths (well visible for the fourth and fifth resonance). The complete set of 3D-printable resonators and their transfer functions is contained in the supplemental material1 (also at www.vocaltractlab.de/index.php?page=birkholz-supplements).
Measured transfer functions for the two-tube resonators for the vowel /ae/ without a lip notch (gray curve) and with a lip notch of 3 cm depth (black).
Measured transfer functions for the two-tube resonators for the vowel /ae/ without a lip notch (gray curve) and with a lip notch of 3 cm depth (black).
C. Transmission line model of the vocal tract
Acoustic network of the vocal tract (without side branches) driven by a glottal volume velocity source U1. The network includes losses due to viscosity (Ri, ), heat conduction (Gi), and radiation ( ).
Acoustic network of the vocal tract (without side branches) driven by a glottal volume velocity source U1. The network includes losses due to viscosity (Ri, ), heat conduction (Gi), and radiation ( ).
Comparison of the (precise) radiation impedance of a piston in an infinite baffle (solid curves) and its approximation by a parallel R–L-circuit according to Eq. (7). The black curves are the real parts of the impedance (resistance), and the gray curves are the imaginary parts (reactance). The values are normalized, i.e., multiplied by . The upper scale shows the impedance values as a function of ka, where is the wave number and a is the piston radius. The bottom scale shows the frequency for the case of a large piston area with a = 2 cm ( cm2).
Comparison of the (precise) radiation impedance of a piston in an infinite baffle (solid curves) and its approximation by a parallel R–L-circuit according to Eq. (7). The black curves are the real parts of the impedance (resistance), and the gray curves are the imaginary parts (reactance). The values are normalized, i.e., multiplied by . The upper scale shows the impedance values as a function of ka, where is the wave number and a is the piston radius. The bottom scale shows the frequency for the case of a large piston area with a = 2 cm ( cm2).
Given the previously noted considerations, we studied two variants of the transmission-line model:
-
one variant with a “detailed loss model” that applies the theoretically precise losses for Ri, Gi, and according to Eqs. (3), (4), and (5).
-
one variant with a “coarse loss model,” which is frequently used for time-domain simulations, and that omits the heat conduction losses (Gi = 0) and uses the approximations and according to Eqs. (6) and (7).
Since the Struve function in the equation for is not readily available in matlab and many other programs and programming languages, we implemented the approximation provided by Aarts and Janssen (2003).
D. Calculation of transfer functions based on the transmission line model
To determine the effect of the loss model variants on the bandwidths, the transmission-line circuit in Fig. 5 was used to calculate the volume velocity transfer functions from the glottis to the lips, i.e., , for 16 tube shapes. We included the tube shapes for the eight tense German vowels /aː, eː, iː, oː, uː, ɛː, øː, yː/ and for the eight lax German vowels /ɛ, ɪ, ɔ, ʊ, ʏ, œ, ə, ɐ/, which were exported as discrete area functions with N = 40 tube sections from the articulatory speech synthesizer VocalTractLab 2.3. Therefore, the tube shapes used here largely correspond to the 3D-printed tube-shaped models.
The transfer functions for all tube shapes have been calculated with a spectral resolution of 1 Hz, which is similar to the spectral resolution of the transfer functions measured for the physical models, and allows a precise determination of the frequencies and bandwidths of the resonances.
E. Extraction of resonance frequencies and bandwidths
The transfer functions of the physical and simulated resonators were used to obtain the frequencies and bandwidths of the resonances in the frequency range from 0 to 5 kHz. For each resonance, the frequency was determined as that of the corresponding local maximum in the magnitude spectrum . Its bandwidth was calculated as the difference between the frequencies left and right of , where the magnitude was 3 dB below the maximum. Due to the spectral resolution of about 1 Hz of the transfer functions, the error of the estimation was ±0.5 Hz (no interpolation was performed between the spectral points).
While the previously noted procedure was used for most resonances, there were two problem cases that received special consideration. The first problem occurred when the resonance of interest was so close to another resonance that the amplitude on the flank to the adjacent resonance only decreased by less than 3 dB before increasing again. This situation is shown in Fig. 7(b) for the fourth and fifth resonances. In this case, the bandwidth was estimated only on the basis of the –3 dB-frequency on the flank opposite from the near resonance as .
Transfer functions of (a) the vowel /aː/ of speaker s1, and (b) the vowel /oe/ of speaker s2 (bottom) of the Dresden Vocal Tract Dataset (Birkholz , 2020). The two vertical lines around the resonances indicate their 3 dB-bandwidths. Resonances marked with an asterisk were excluded from the bandwidth analysis, because they are located next to an antiresonance.
Transfer functions of (a) the vowel /aː/ of speaker s1, and (b) the vowel /oe/ of speaker s2 (bottom) of the Dresden Vocal Tract Dataset (Birkholz , 2020). The two vertical lines around the resonances indicate their 3 dB-bandwidths. Resonances marked with an asterisk were excluded from the bandwidth analysis, because they are located next to an antiresonance.
The second problem was when there was a spectral zero (antiresonance) next to the resonance of interest. Zeros were mainly found in the transfer functions of the MRI-based physical models, where they are caused by side cavities, such as the piriform fossae, and by transverse modes. A nearby zero may strongly affect the steepness of both flanks of a resonance and hence the frequencies of the –3 dB points on these flanks. Therefore, the strategy of estimating the bandwidth only from the opposite flank does not work reliably here. Hence, resonances with a neighboring zero were completely excluded from the analysis. Figure 7 shows two examples of measured transfer functions, where the omitted resonances/peaks are marked with an asterisk.
F. Estimation of nonlinear losses
The energy losses discussed so far are all linear acoustic losses. However, sharp edges as at the “teeth” of the physical resonators could cause turbulence and hence nonlinear losses (Buick , 2011). In contrast to linear losses, the contribution of nonlinear losses to the bandwidths of the resonances would depend on the amplitude of the acoustic waves. To estimate the extent of nonlinear losses, the transfer functions of two physical resonators were estimated using different sound pressure levels (SPLs) for the excitation signal during the measurements (Sec. II B). The two models were the axisymmetric tube for /a/ and the two-tube model for /a/ with a 2 cm notch. For both models, the VVTF was obtained with excitation signals of 74, 80, 86, and 92 dB SPL (measured during the sweep at about 30 cm from the loudspeaker using the acoustic signal analyser Acoustilyzer AL1 by NTi Audio, NTi Audio, Schaan, Liechtenstein), five times for each SPL. From each of the 20 transfer functions per model, the bandwidths were determined according to Sec. II E. The results are shown in Fig. 8, where each error bar represents ±1 standard deviation estimated from the five repetitions. The results indicate that the estimated bandwidths are largely independent from the SPL. The strongest (but still small) effect of the SPL on the bandwidth was observed for of the notched tube model, which increased by 7 Hz between the SPLs of 74 and 92 dB. Since 7 Hz are small compared to the average bandwidth of 236 Hz (caused by mainly linear losses), we conclude that nonlinear losses can be neglected in our analysis.
Variation of the measured bandwidths to of two physical resonators using different SPLs of the sweep signal generated by the external sound source. The error bars show the range of the bandwidths from five measurements per model and SPL.
Variation of the measured bandwidths to of two physical resonators using different SPLs of the sweep signal generated by the external sound source. The error bars show the range of the bandwidths from five measurements per model and SPL.
III. RESULTS AND DISCUSSION
A. Comparison between physical models and simulations
Figure 9 plots the bandwidths of all considered resonances of all analyzed vowels as a function of resonance frequency. The left plot shows the data from the simulations with the coarse loss model (black squares) and the detailed loss model (white circles), and the right plot shows the data for the physical tube-shaped resonators (black squares) and the MRI-based resonators (white circles). For all four model types, the bandwidths generally increase with frequency. This differs from human bandwidth data, where (closed-glottis) bandwidths decrease up to about 500 Hz, before increasing towards higher frequencies (Fujimura and Lindqvist, 1971). This difference is due to the soft walls of the real vocal tract, which were not present in either the simulations or the physical models. The data in Fig. 9 also show an increasing scattering of the bandwidths with increasing frequency. This indicates that towards higher resonance frequencies, the bandwidths become increasingly dependent on the vocal tract shape.
Left, bandwidths over frequencies of the resonances of the simulated vocal tract transfer functions with a coarse loss model (squares) and a detailed loss model (circles). Right, bandwidths over frequencies of the resonances of the measured transfer functions for the physical tube-shaped resonators (squares) and the MRI-based resonators (circles).
Left, bandwidths over frequencies of the resonances of the simulated vocal tract transfer functions with a coarse loss model (squares) and a detailed loss model (circles). Right, bandwidths over frequencies of the resonances of the measured transfer functions for the physical tube-shaped resonators (squares) and the MRI-based resonators (circles).
Due to this scattering, the data cannot be accurately represented in terms of regression lines beyond a frequency of 2 or 3 kHz. Instead, for a quantitative comparison between the four models, the data points were split into five 1 kHz-wide frequency bands. Figure 10 shows the bandwidth distributions of all four model types in each of the five frequency bands. The mean values and standard deviations of these distributions have been summarized in Table I. In all frequency bands, the average bandwidths increase from the simulations with the coarse loss model, to the simulations with the detailed loss model, to the physical tube-shaped resonators, and to the physical MRI-based resonators. Significant bandwidth differences between the model types in the individual frequency bands, as determined by two-sample t-tests, are indicated by the asterisks in Fig. 10. The significance level for these tests was , and Bonferroni correction was applied to account for the six comparisons in each frequency band. Note that the bandwidth differences in the 4–5 kHz band, due to the strong scattering, did not reach a level of significance. Furthermore, there are fewer data points for the MRI-based models in the 4–5 kHz band due to the relatively high number of antiresonances in this band.
Distributions of the bandwidths of the resonances in five 1 kHz bands for four types of resonators: simulated resonators with a coarse loss model, simulated resonators with a detailed loss model, physical tube-shaped resonators, and physical MRI-based resonators. The black diamonds indicate mean values. The asterisks indicate pairs of resonator types for which the means differ significantly with (considering Bonferroni correction).
Distributions of the bandwidths of the resonances in five 1 kHz bands for four types of resonators: simulated resonators with a coarse loss model, simulated resonators with a detailed loss model, physical tube-shaped resonators, and physical MRI-based resonators. The black diamonds indicate mean values. The asterisks indicate pairs of resonator types for which the means differ significantly with (considering Bonferroni correction).
Average bandwidths and standard deviations (both in Hz, standard deviations in brackets) of the resonances in different frequency bands for the four types of examined resonators, and for humans (as reference).
Resonator type . | 0–1 kHz . | 1–2 kHz . | 2–3 kHz . | 3–4 kHz . | 4–5 kHz . |
---|---|---|---|---|---|
Simulated resonators with coarse loss model | 4 (2) | 16 (8) | 13 (13) | 12 (11) | 62 (74) |
Simulated resonators with detailed loss model | 14 (4) | 34 (9) | 37 (15) | 52 (12) | 94 (84) |
Physical tube-shaped resonators | 15 (4) | 40 (10) | 59 (18) | 66 (33) | 111 (98) |
Physical MRI-based resonators | 25 (9) | 49 (21) | 66 (38) | 92 (52) | 131 (56) |
Human data (Fant, 1972) | 52 (15) | 54 (12) | 79 (39) |
Resonator type . | 0–1 kHz . | 1–2 kHz . | 2–3 kHz . | 3–4 kHz . | 4–5 kHz . |
---|---|---|---|---|---|
Simulated resonators with coarse loss model | 4 (2) | 16 (8) | 13 (13) | 12 (11) | 62 (74) |
Simulated resonators with detailed loss model | 14 (4) | 34 (9) | 37 (15) | 52 (12) | 94 (84) |
Physical tube-shaped resonators | 15 (4) | 40 (10) | 59 (18) | 66 (33) | 111 (98) |
Physical MRI-based resonators | 25 (9) | 49 (21) | 66 (38) | 92 (52) | 131 (56) |
Human data (Fant, 1972) | 52 (15) | 54 (12) | 79 (39) |
Looking at the two types of simulations, the lower bandwidths with the coarse loss model are due to both the approximations for the viscous and radiation losses and the neglect of the heat-conduction loss. As Fig. 6 shows, the resistive component of the radiation impedance is somewhat underestimated by the lumped-element approximation, but mainly at higher frequencies and never by more than 20%. For the viscous loss, the difference between Eqs. (3) and (6) is more significant. For example, for a tube with a circular cross section of 1 cm2 and at a frequency of 1000 Hz, the resistance according to Eq. (3) is 19.6 times that of the frequency-independent approximation in Eq. (6).
The bandwidth differences between the simulations with the detailed loss model and the physical tube-shaped resonators show that the detailed loss model underestimates the real losses, although the tube shapes were virtually identical for the simulated and physical models. The biggest difference is in the 2–3 kHz band, where the bandwidths of the physical models are 59% higher than in the simulations. The differences are likely caused by the approximations inherent to the equations for the visco-thermal losses. While Eqs. (3) and (4) both assume smooth walls, the walls of the 3D-printed resonators have a slightly rough surface, which results from the layer-wise printing process of the 3D-printer. For our models, the layer thickness was 0.1 mm. Hence, there are small grooves with a spacing of 0.1 mm on the surface of all 3D-printed models. This is illustrated with a high-resolution microscopic image of a section of the surface in the supplemental material.
The comparison between the two types of physical resonators shows that the realistically shaped MRI-based vocal tract models cause higher losses than the axisymmetric tubes. This increases the average bandwidth between 12% in the 2–3 kHz band, and 67% in the 0–1 kHz band. The differences can have two causes. On the one hand, due to their more complex geometries, the MRI-based models have larger inner surfaces than the axisymmetric tubes. This likely increases the viscous and heat-conduction losses. On the other hand, the horn-shaped lips of the MRI-based models likely increase the radiation losses compared to the tube-shaped models with their circular radiating apertures. The contributions of these two differences to the wider bandwidths of the MRI-based models are difficult to assess. However, the results for the notched resonators can provide an estimate of the individual effect of the lip geometry on the bandwidths.
B. Specific effect of notched lips
The resonance frequencies and bandwidths of the two-tube resonators with the notched lips are shown in Fig. 11. For each of the four vowels, the results for the five notch depths (0 cm, 1 cm, 2 cm, 3 cm, 4 cm) are displayed in the same plot. This shows that an increasing notch depth (corresponding to increased lip spreading) generally increases both the resonance frequencies and bandwidths. The increase in resonance frequencies is caused by an effective reduction of the vocal tract length by the notches and has been discussed before (Birkholz and Venus, 2018; Lindblom , 2007). The increase in bandwidths is probably related to the increased radiating area, which takes the form of a curved surface in notched tube openings.
Bandwidths of the resonances of the four notched two-tube resonators as a function of the resonance frequency for different notch depths.
Bandwidths of the resonances of the four notched two-tube resonators as a function of the resonance frequency for different notch depths.
An interesting observation is that changing the notch depth affects the bandwidths differently depending on the resonator shape and resonance index. For example, the bandwidths of the first two resonances of /e/ are hardly affected by notch depth, while those of higher-order resonances increase somewhat uniformly with increasing notch depth. In contrast, for /a/ the 1st, 2nd, 3rd, and 5th resonances are hardly affected by the notch depth, while the bandwidth of the 4th resonance increases strongly with increasing notch depth. The likely reason is that the 4th resonance is effectively the 3/4-λ resonance of the anterior (oral) part of the tube model, which is hence most strongly affected by the lip shape. Since the anterior tube section has a length of 8 cm, the 3/4-λ resonance has a wavelength of 10.7 cm and thus a frequency of about 3270 Hz, which corresponds to the affected resonance.
In general, the two-tube approximations of the vocal tract with the notched lips seem to overestimate the bandwidths somewhat, as there are several cases where they exceed 300 Hz, which is rare in the axisymmetric and MRI-based models (Fig. 9). This is probably due to the relatively big cross-sections of the anterior tube segments, which result in a correspondingly high radiation loss.
C. Comparison between physical models and humans
Since the MRI-based models have the most realistic shape of all three types of physical resonators, their bandwidths should most closely match those of humans. To verify this, their bandwidths (the same as in Fig. 9) were plotted along with human bandwidth data by Fant (1972) in Fig. 12. Table I shows their mean values and standard deviations in 1 kHz-frequency bands. For resonances with frequencies above 800 Hz, the human data points (black triangles and squares) are all within the dashed bounding trapezoid around the data points of the MRI-based models (white circles) and look similarly distributed. Below 800 Hz, the human bandwidths increase towards lower frequencies. This phenomenon, which was also found for human bandwidth data by Fujimura and Lindqvist (1971), can be attributed to the soft walls of the vocal tract (Stevens, 1998). The effect of the soft walls was also recently demonstrated for physical models with hard and soft walls (Birkholz , 2022).
Scatter plot of the measured resonance bandwidths over resonance frequencies of the physical MRI-based resonators (white circles) compared to measurements from male (black squares) and female (black triangles) subjects. The human data are those provided by Fant (1972). The dashed lines form the bounding trapezoid around the data points of the MRI-based models. The gray curve is the formant frequency-bandwidth relation proposed by Hawks and Miller (1995).
Scatter plot of the measured resonance bandwidths over resonance frequencies of the physical MRI-based resonators (white circles) compared to measurements from male (black squares) and female (black triangles) subjects. The human data are those provided by Fant (1972). The dashed lines form the bounding trapezoid around the data points of the MRI-based models. The gray curve is the formant frequency-bandwidth relation proposed by Hawks and Miller (1995).
Finally, the gray curve in Fig. 12 is an approximation of the frequency-bandwidth relation proposed by Hawks and Miller (1995). This relation is based on a regression analysis using the human bandwidth data by Fujimura and Lindqvist (1971) and Fant (1972). Obviously, this function cannot account for the increased scattering of bandwidth values towards higher frequencies and seems to overestimate the bandwidths of the MRI-based vocal tract models for frequencies above 3 kHz.
We can conclude that (a) the MRI-based models of the vocal tract with hard walls represent well the losses of the human vocal tract, except for the losses due to soft walls, and (b) the bandwidth is not a simple function of the resonance frequency but also of the vowel (especially for higher frequencies).
IV. GENERAL DISCUSSION AND CONCLUSIONS
The results show that the usual equations for transmission-line simulations of the vocal tract underestimate the viscous, heat-conduction, and radiation losses with both the detailed and the coarse loss models. This underestimation was already suspected by Flanagan (1965), who derived Eqs. (3) and (4) for the viscous and heat-conduction losses based on the simplifying assumptions of a smooth and hard-walled tube. However, previous direct comparisons between losses (in terms of bandwidths) from simulations and measurements are rare, even though multiple studies provide human bandwidth data as a reference (Dunn, 1961; Fant, 1962, 1972; Fujimura and Lindqvist, 1971; Kent and Vorperian, 2018). One exception is the study by Hanna (2016), who found that the simulated resonances of a rigid cylindrical tube had significantly lower bandwidths than those measured on human subjects for the vowel /ɜ:/. They reported that the attenuation coefficient of the complex wave number in the simulations (which accounts for visco-thermal losses) had to be increased by a factor of 5 to achieve the measured bandwidths.
To our knowledge, the present study is the first one that made comprehensive measurements of bandwidths of physical tube models and compared them to simulations. Physical models have the advantage that their shape and boundary conditions (state of the glottis, material of the walls) can be precisely controlled and measurements can be performed very accurately. Therefore, the bandwidths data presented here can serve as a benchmark for future simulations models and are available in the supplementary material.1
How could the results of this study help to improve TL simulations of the vocal tract? With regard to viscous and heat-conduction losses, we saw that the detailed loss model only slightly underestimated the bandwidths compared to the equivalent physical tube-shaped resonators. Hence, if the TL model is simulated in the frequency domain, a correction factor on the right-hand side of Eqs. (3) and (4) might compensate for the bandwidth differences. In additional simulations we found that a correction factor of 1.51 applied to both Ri and Gi (i.e., when both quantities are increased by 51%) minimizes the average bandwidth differences between the simulations and the tube-shaped physical models in the 1 kHz bands of Fig. 10. However, the coarse loss model for time-domain simulations underestimates the bandwidths much more, because the viscous loss has been replaced by the (frequency-independent) flow resistance of Eq. (6), and the heat-conduction loss has been omitted completely. Here, a correction factor for Eq. (6) could certainly reduce the overall bandwidth differences, but the frequency-independence of this equation would then likely overestimate the losses at low frequencies, and underestimate the losses at higher frequencies. An alternative (with the same problem) would be to model the viscous resistance with Eq. (3) using a fixed frequency, e.g., 1000 Hz, as proposed by Titze (2014) and Wakita and Fant (1978). Another approach could be to model the frequency-dependence of the viscous resistance based on a discrete representation of the velocity profile in the boundary layer of the tube sections, as proposed by Birkholz and Jackèl (2004).
With regard to the radiation loss, the results indicate that the radiation model based on a piston in a plane baffle underestimates the loss compared to physical models with more realistic lip shapes. To compensate for these differences in the simulation, a “correction” could be applied to the radiating area in Eqs. (8) and (9). However, the exact nature of the correction is a topic for future study.
Finally, future work should further investigate the impact of bandwidths on the perception of simulated speech. Early studies on this topic indicated that bandwidths affect the perception of vowels relatively little. For example, Carlson (1979) found that variations as large as 40% in and yield little change in subjects' subjective judgments on a psychophysical scale. However, the effect of bandwidths on the perception for continuous speech remains to be explored.
ACKNOWLEDGMENTS
This research was partly supported by grant BI 1639/7-1 by the German Research Foundation (DFG).
See supplemental material at https://doi/org/10.1121/10.0019682 for the 3D-prvintable resonator models, their transfer functions, and matlab scripts for the simulations.