Students in acoustic phonetics and speech science classes often do not have much technical background; an intuitive means to teach acoustic phenomena to them would, thus, be useful. Regarding speech production, physical demonstrations using vocal-tract models have been shown to be an intuitive way to teach acoustic phenomena. In particular, a series of models for different purposes has been developed by Arai over the last 20+ years, including lung models, sound sources, and vocal-tract models, e.g., see Arai [J. Acoust. Soc. Am. 131(3), 2444–2454 (2012)]. Different combinations of these models are helpful for teaching a variety of related topics in the classroom. However, there are still barriers to understanding certain concepts. This study examined ways of minimizing technical explanations and mathematical formulations and maximizing intuitive understanding of seven topics. Its findings were incorporated into an education program that was used in an actual lecture conducted online. A comparison of scores of questionnaires filled out by the audience before and after the lecture showed the program's effectiveness, especially in relating how a set of harmonic waves excites a multiple-resonance system and how the vowel /a/ is produced.

An understanding of acoustics is necessary for students learning acoustic phonetics and speech science. However, students sometimes have problems understanding topics in acoustic phonetics or speech science because of a lack of basic acoustics knowledge. To help students intuitively understand phenomena in human speech production, the author has developed physical models of the human vocal tract and examined their effectiveness (Arai, 2007, 2012a, 2016). These vocal-tract models were found to be especially effective when we discuss the source and filter aspects of human speech production (Fant, 1960) and the relationship between vocal-tract shape and vowel quality (Stevens, 1998).

Arai (2007) describes a set of five acoustic tubes with different configurations that were based on the work of Chiba and Kajiyama (1941). Later, the set was named the “VTM-C10 models” (Table I) and sold by a company from 2002 through 2008. The outer shape of each model is columnar, and there is a hole through the center axis of the column. Cross sections of the hole perpendicular to the axis are circles whose areas vary along the axis such that the inner cavities form rounded bottle-like shapes. The cross-sectional area function is different for each vowel and based on the measurement and the simplification first presented in Chiba and Kajiyama (1941). Recently, the author distributed STL files for three-dimensional (3D) printing of another set of models, VTM-N20 (Fig. 1), which have the same cross-sectional area functions of the inner cavities of the VTM-C10 models but different outer shapes. The VTM-N20 models are no longer columnar and are 2-mm thicker than their inner cavities. We developed the VTM-N20 models from the VTM-C10 models because we wanted the inner shape to be clearly visible even when the models are made with non-transparent materials. Furthermore, the inner shape is tangible, so that people with visual impairment can tactilely feel each shape. Finally, the manufacturing costs can be reduced by damping the portions that are not related to sound production.

TABLE I.

Explanations of model abbreviations.

AbbreviationAPD index no.Explanation
VTM-C10 G200 Cylindrical-type vocal-tract models based on the measurement by Chiba and Kajiyama (1941)  
VTM-N20 I001 Vocal-tract models based on the numerical control machining. (The inner cavities have the same cross-sectional areas as VTM-C10, but the outer shapes of the VTM-N20 models are reflected by their shapes of the inner cavities.) 
VTM-T20 I010 Vocal-tract models designed by tube connecting 
VTM-S20 G300 Sliding vocal-tract model 
VTM-FT I900 Vocal-tract model with flexible tongue 
VTM-AP O500 Vocal-tract model for approximants 
VTM-BR O500 Vocal-tract model for bunched /r
AbbreviationAPD index no.Explanation
VTM-C10 G200 Cylindrical-type vocal-tract models based on the measurement by Chiba and Kajiyama (1941)  
VTM-N20 I001 Vocal-tract models based on the numerical control machining. (The inner cavities have the same cross-sectional areas as VTM-C10, but the outer shapes of the VTM-N20 models are reflected by their shapes of the inner cavities.) 
VTM-T20 I010 Vocal-tract models designed by tube connecting 
VTM-S20 G300 Sliding vocal-tract model 
VTM-FT I900 Vocal-tract model with flexible tongue 
VTM-AP O500 Vocal-tract model for approximants 
VTM-BR O500 Vocal-tract model for bunched /r
FIG. 1.

(Color online) 3D-printed version of the VTM-N20 for five Japanese vowels /i/, /e/, /a/, /o/, and /u/ from left to right (cf. https://splab.net/APD/I001/).

FIG. 1.

(Color online) 3D-printed version of the VTM-N20 for five Japanese vowels /i/, /e/, /a/, /o/, and /u/ from left to right (cf. https://splab.net/APD/I001/).

Close modal

The author presented another set of five acoustic tubes, called VTM-T20, in Arai (2012a), which are a simplification of VTM-N20. In Fant (1960), vowel production is explained in terms of resonances of acoustic tubes that are combinations of a few uniform tubes. The VTM-T20 models were designed in line with this concept. As shown in Fig. 2, each model in VTM-T20 consists of parts that are basically uniform straight tubes with different inner areas and lengths. Different combinations of wide/narrow or long/short tubes can be used to approximate various vocal-tract configurations. Vowels /i/ and /e/ of VTM-T20 are based on the concept of the three-tube model with additional narrow laryngeal cavities. Vowel /a/ of VTM-T20 is exactly a two-tube model. Vowels /o/ and /u/ of VTM-T20 have additional lip roundings. Figure 2 shows a 3D-printed version of the VTM-T20 models. The STL files for 3D-printing of the VTM-T20 models are distributed along with the ones for VTM-N20 from the Acoustic-Phonetics Demonstrations (APD) website: https://splab.net/APD/V100/.

FIG. 2.

(Color online) 3D-printed version of the VTM-T20 for five Japanese vowels /i/, /e/, /a/, /o/, and /u/ from left to right (cf. https://splab.net/APD/I010/).

FIG. 2.

(Color online) 3D-printed version of the VTM-T20 for five Japanese vowels /i/, /e/, /a/, /o/, and /u/ from left to right (cf. https://splab.net/APD/I010/).

Close modal

The above-mentioned physical models of the human vocal tract are very useful because they provide an intuitive way to teach source-filter theory and/or the relationship between vocal-tract shape and vowel quality. It is true for all types of people, including children (Arai, 2012a). Another advantage of the VTM-T20 models is their simple shapes. For vowel /a/, the configuration is approximated with a two-tube model composed of two uniform tubes that are connected, one for the narrow pharyngeal cavity and the other for the wide oral cavity. The other vowels can be approximated with three-tube models. In particular, the sliding three-tube (S3T) model (later called VTM-S20) is useful for classroom teaching and science workshops (Arai, 2012a). However, if we want to teach students what causes formants, for example, we need to teach the concept of “resonance.” To explain resonance in speech production plainly, we need to understand how a multiple-resonance system is excited by a source signal. Unfortunately, many textbooks do not fill the gap between basic acoustics and introductory acoustic phonetics and speech science. In addition, there are barriers to learning in basic acoustics itself. One barrier is that it is harder to imagine longitudinal waves than it is to imagine transverse waves. The concepts of standing waves and resonance are also needed in order to teach in a systematic way. Overall, how to minimize technical explanations containing mathematical formulations and maximize intuitive understanding remains a great challenge. This paper examines several issues that require special care when educating students and introduces solutions to each.

Table II shows a sample program of an introductory acoustics course in acoustic phonetics and speech science. Because non-technical students might need a basic introduction to waves in physics, this program contains such an introduction in a step-by-step manner. Lectures 2.1–2.4 cover the content directly related to speech production, while lectures 1.1–1.8 are on the basic acoustics needed for fully understanding lectures 2.1–2.4. The highlight of lectures 1.1–1.8 might be lecture 1.7, “Resonance,” and lectures 1.1–1.6 are all needed to understand the concept of resonance.

TABLE II.

Sample program of a lecture. This table also shows the index numbers of the APD websites under https://splab.net/APD/ (e.g., if the index number is A100, the website is https://splab.net/APD/A100/).

LecturesTitles
APD index numbers
Lecture 1.1 What is a wave? 
A100, A110, A200 
Lecture 1.2 Circular motion, simple harmonic motion, and sinusoidal waves 
A300, A310, A320 
Lecture 1.3 Longitudinal waves vs transverse waves 
A400, A410 
Lecture 1.4 Principle of superposition 
A420 
Lecture 1.5 Fixed-end reflection/free-end reflection 
A430, A440 
Lecture 1.6 Standing waves 
A500, A510 
Lecture 1.7 Resonance (string/pendulum/acoustic tube) 
A516, A520, A521, A525, A526, A530, A540, A600, A610, A615, A620, A700 
Lecture 1.8 Fourier series expansion 
A900 
Lecture 2.1 Producing the neutral vowel “schwa” with a uniform tube 
G150 
Lecture 2.2 Source and filter in speech production 
G500 
Lecture 2.3 Producing the vowel /a/ with the S2T model 
G250 
Lecture 2.4 Producing many different vowels with vocal-tract models 
G200, G210, G300, G400, I001, I010, I900 
LecturesTitles
APD index numbers
Lecture 1.1 What is a wave? 
A100, A110, A200 
Lecture 1.2 Circular motion, simple harmonic motion, and sinusoidal waves 
A300, A310, A320 
Lecture 1.3 Longitudinal waves vs transverse waves 
A400, A410 
Lecture 1.4 Principle of superposition 
A420 
Lecture 1.5 Fixed-end reflection/free-end reflection 
A430, A440 
Lecture 1.6 Standing waves 
A500, A510 
Lecture 1.7 Resonance (string/pendulum/acoustic tube) 
A516, A520, A521, A525, A526, A530, A540, A600, A610, A615, A620, A700 
Lecture 1.8 Fourier series expansion 
A900 
Lecture 2.1 Producing the neutral vowel “schwa” with a uniform tube 
G150 
Lecture 2.2 Source and filter in speech production 
G500 
Lecture 2.3 Producing the vowel /a/ with the S2T model 
G250 
Lecture 2.4 Producing many different vowels with vocal-tract models 
G200, G210, G300, G400, I001, I010, I900 

Based on my teaching experience, the concepts involved in the following topics pose problems for many students in acoustic phonetics and speech and hearing sciences. Therefore, the seven topics below are introduced at appropriate spots in the education program:

  • Topic 1: Longitudinal standing waves;

  • Topic 2: Reflection of an acoustic wave at the open end of a tube;

  • Topic 3: How can we swing two pendulums of different lengths by applying one periodic external force?

  • Topic 4: Exciting multiple resonances of an acoustic tube;

  • Topic 5: How do we produce the vowel /a/?

  • Topic 6: What determines vowel quality?

  • Topic 7: Why are similar sounds not always produced by one type of articulation?

This section describes the issues raised by these topics and provides solutions to each one.

Lectures 1.1–1.7 would be parts of any regular introductory course in acoustics. However, the topics above are not covered thoroughly in some textbooks. For example, topic 1 is emphasized in lectures 1.3 and 1.6, topic 2 is emphasized in lectures 1.5 and 1.7, topic 3 is emphasized in lecture 1.7, and topic 4 is emphasized in lecture 2.1.

In the rest of this section, we will discuss them.

Lectures 1.1–1.8 cover basic acoustics and are important for understanding the concepts of speech production. Speech communication is usually done in the air, and acoustic waves, including those in the vocal tract made during speech production, are longitudinal waves. If the frequency range of interest is up to 4 kHz, the wavelength is less than 8.5 cm at room temperature. Because the diameter of the vocal tract is half that size or less, we can approximate the sound propagation inside the vocal tract as a plane wave traveling along its length. In this case, the sound wave is reflected at the ends of the vocal tract or when the cross-sectional area of the tract changes. The reflected waves are superimposed on each other to make a standing wave. The explanations of standing wave phenomena given in many textbooks and online demonstrations are mostly in terms of transverse waves. However, an acoustic wave propagating in the air is a longitudinal wave, not a transverse one, and some students have difficulty to transferring their conception of standing waves from the transverse case to longitudinal case by only looking at figures in a textbook. Rather, animations and videos are more useful for understanding these concepts. Here, the situation can be simplified by thinking of a plane wave propagating in an acoustic tube along its length. We can imagine that the tube is filled with “air particles” that have individual masses and are elastically connected. The excellent online demonstrations on longitudinal standing waves created by Russell (2020) embody this picture (https://www.acs.psu.edu/drussell/Demos/StandingWaves/StandingWaves.html) (e.g., Krishnamurti, 2010). These animations show how air particles behave and form longitudinal standing waves by moving a piston sinusoidally at one end. A plane wave travels from left to right along the length of the tube. When a rigid wall seals the right end of the tube, the plane wave reflects off the wall through inversion (fixed-end reflection). The plane wave is reflected even when the right end of the tube is open, but without an inversion (free-end reflection).

An alternative way to learn such resonances is to experience the phenomenon physically. The wave machine shown in Fig. 3 provides students with an intuitive understanding when it sends a pulse as a traveling wave, and they can observe reflections at a free end or fixed end. Furthermore, they can experience both quarter- and half-wavelength resonances with this machine (cf. https://splab.net/APD/A500/).

FIG. 3.

(Color online) A wave machine: the fixed end on the left and the free end on the right (cf. https://splab.net/APD/A500/).

FIG. 3.

(Color online) A wave machine: the fixed end on the left and the free end on the right (cf. https://splab.net/APD/A500/).

Close modal

One thing that we should be aware of as teachers is that a wave machine like the one shown in Fig. 3 only makes transverse waves, not longitudinal ones like acoustic waves in air. A solution to this visualization problem is to demonstrate a “Slinky” instead of a wave machine. The problem with the Slinky is that it is a tension spring, and we need to hold both ends of the spring to show any demonstrations horizontally. The drawbacks of these physical models motivated the author to seek an alternative way of physically demonstrating longitudinal waves in the classroom. A plane wave in an idealized lossless tube yields one-dimensional (1D) wave propagation. Because the air has mass and elasticity, we modeled the 1D wave propagation with springs and masses. A real system consisting of a sequence of alternating springs and masses, as shown in Fig. 4, was eventually settled on. This model can send a single pulse, and students can observe reflections from both fixed and free ends as well as longitudinal standing waves; they can even experience quarter- and half-wavelength resonances of longitudinal waves with it. The related video clips can be found on the APD website with index numbers of A430, A440, A516, and A526.

FIG. 4.

(Color online) A physical system of multiple springs and masses for demonstrating longitudinal waves: the fixed end on the left and the free end on the right (cf. https://splab.net/APD/A526/).

FIG. 4.

(Color online) A physical system of multiple springs and masses for demonstrating longitudinal waves: the fixed end on the left and the free end on the right (cf. https://splab.net/APD/A526/).

Close modal

Lecture 2.1 is about the neutral vowel produced from a tube with one end closed and the other end open. An acoustic wave in such a closed-open tube is reflected at both ends. As a result, resonance occurs under a certain condition, and when a glottal sound is fed in, a neutral vowel is output from the tube.

Students sometimes think that an acoustic wave reflects only at the closed end of a tube. However, it does reflect at the open end, although some students might misunderstand the concept. Therefore, I used an experiment with a Kundt tube to teach the concept. I often show students a video on a 1D Kundt experiment (cf. https://splab.net/APD/A600/). As shown in Fig. 5, when both ends are closed, the antinode is visible because the cork dust violently dances in the middle of the tube. This demonstration helps to explain the concept of resonance. If we can design a Kundt experiment with an open end, it would be useful. Since cork dust would flow out from the open end, I designed a two-dimensional (2D) Kundt experiment (Sakamoto et al., 2004). The design shown in Fig. 6 was originally for teaching students about Helmholtz resonance, but the neck part can also be viewed as a tube with closed or open ends. Because a sound wave is reflected at the open end, resonance can still occur, and cork dust dances at the antinode (cf. https://splab.net/APD/A615/).

FIG. 5.

(Color online) 1D Kundt experiment with both ends closed (cf. https://splab.net/APD/A600/).

FIG. 5.

(Color online) 1D Kundt experiment with both ends closed (cf. https://splab.net/APD/A600/).

Close modal
FIG. 6.

(Color online) 2D Kundt experiment originally designed for demonstrating Helmholtz resonance (cf. https://splab.net/APD/A615/). The neck part is used for showing the resonance of a tube with closed-open ends.

FIG. 6.

(Color online) 2D Kundt experiment originally designed for demonstrating Helmholtz resonance (cf. https://splab.net/APD/A615/). The neck part is used for showing the resonance of a tube with closed-open ends.

Close modal

As mentioned earlier, lecture 1.7, “Resonance,” may be the highlight of the first half of the lecture series. Many textbooks cover resonance as an important phenomenon in acoustics. However, not many explain the relation between resonance in physics and vocal-tract resonances in a simple acoustic tube excited by a glottal source. One missing point is that the vocal tract can be viewed as a multi-resonance system, such as a set of multiple pendulums with different lengths sharing a single pivot. To fill this gap, lecture 1.7 includes topic 3.

Kundt's experiments are very useful illustrations because they visualize sounds that usually cannot be observed. However, when the external force is not a simple harmonic one, the visualized pattern tends to become too complicated. Instead, multiple pendulums with different lengths can be used; as shown in Fig. 7, the natural frequencies of the pendulums are 1, 1.5, and 2 Hz from left to right. If we sinusoidally rotate the horizontal pivot bar back and forth around the rotation axis within a small degree range, say ±15°, as an external force on the system, only one pendulum will resonate. In other words, when the external force is a 2-Hz sinusoid, only the 2-Hz pendulum swings. Now, we can apply a periodic complex input to the system, such as an impulse train of 1 Hz. Then both 1- and 2-Hz pendulums will resonate because their resonance frequencies are integer multiples of 1 Hz (cf. https://splab.net/APD/A530/).

FIG. 7.

(Color online) Multiple pendulums with different lengths. The natural frequencies are 1, 1.5, and 2 Hz from left to right (cf. https://splab.net/APD/A530/).

FIG. 7.

(Color online) Multiple pendulums with different lengths. The natural frequencies are 1, 1.5, and 2 Hz from left to right (cf. https://splab.net/APD/A530/).

Close modal

In topic 3, we saw multiple pendulums with different lengths resonate with an external impulse train. When this concept is mapped onto the phenomenon in speech production, we can view the vocal tract as a multiple-resonance system and an acoustic impulse train as an external force. In addition, an impulse train is a periodic complex wave, and Fourier analysis shows that it is a set of sinusoidal waves of which the frequencies are integer multiples of the fundamental frequency. Topic 4 presents a comparison between an acoustic tube excited by a single sinusoid and an impulse train.

When we input a sinusoidal signal into a 17-cm-long acoustic tube with different frequencies and measure the gain of the output sound, we obtain a graph like that in Fig. 8 (Arai, 2019). The setup of this experiment is very similar to the one in Fig. 5. The driver unit of a horn speaker with a small hole in its neck part is connected to the glottis end (the left closed end of Fig. 5). Unlike in Fig. 5, the mouth end is open. When a sinusoidal signal of a certain frequency is fed into the driver unit from an amplifier, one can measure the sound pressure levels with and without the tube. We conducted this measurement and calculated the gain at the frequency by dividing the level measured with the tube by the one obtained without it. The dots in this figure denote values measured every 25 Hz, and the peaks at approximately 500, 1500, and 2500 Hz, etc., correspond to the resonances of a closed-open tube with a length of 17 cm. In other words, acoustic tubes can be viewed as a multiple-resonance system, as described in Sec. II C. That is, when we input an impulse train into the same acoustic tube, we can excite multiple resonances simultaneously, as in the demonstration above using swinging multiple pendulums. The impulse train, in this case, was created on a computer by setting x[n] to unity every fs/100 samples, where fs is the sampling frequency, and setting the rest of the samples of x[n] to zero. This yielded an impulse train with a fundamental frequency of 100 Hz. In so doing, we obtain the graph in Fig. 9 (Arai, 2019). In this graph, the harmonic structure is shown as a set of vertical bars. We can see that the peaks in the spectral envelope correspond to the resonance modes. In speech science, these peaks are called “formants.”

FIG. 8.

(Color online) For an acoustic tube, the output level relative to the input level was measured in dB. Adapted from Arai, “Vocal-tract models in phonetic teaching and research,” in Routledge Handbook of Phonetics, Copyright 2019 Routledge (Arai, 2019).

FIG. 8.

(Color online) For an acoustic tube, the output level relative to the input level was measured in dB. Adapted from Arai, “Vocal-tract models in phonetic teaching and research,” in Routledge Handbook of Phonetics, Copyright 2019 Routledge (Arai, 2019).

Close modal
FIG. 9.

Amplitude spectrum of the output signal when the impulse train was input into the tube. Adapted from Arai, “Vocal-tract models in phonetic teaching and research,” in Routledge Handbook of Phonetics, Copyright 2019 Routledge (Arai, 2019).

FIG. 9.

Amplitude spectrum of the output signal when the impulse train was input into the tube. Adapted from Arai, “Vocal-tract models in phonetic teaching and research,” in Routledge Handbook of Phonetics, Copyright 2019 Routledge (Arai, 2019).

Close modal

If an acoustic tube has a uniform area function along its length, the output sounds like the British English vowel schwa, which is like the vowel in “bird.” How are we able to produce the vowel /a/? To answer this question, the sliding two-tube (S2T) model was developed (Arai, 2019). Before introducing the S2T model, let us try a simple experiment. For this experiment, we use a 185-mm-long tube and a reed-type sound source (SS-R30). The reed-type sound source consists of a 32-mm-long reed placed on the retainer and a cylinder with an outer diameter of 33.5 mm. Because the inner diameter of the tube is 34 mm, the sound source closely slides inside the tube. Figure 10 shows a spectrogram of a sound recorded at the open end of the tube when the sound source is slid from one end to the other. The left side of the figure is when the sound source is located at the far end of the tube. The resonance frequencies were calculated with the following equation for the quarter-wavelength resonance: (2n − 1)c/(4L), where n = 1, 2, 3, …, c = 340 m/s, where L is the length of the front part of the tube. Figure 10 shows that the resonance frequency increases as the sound source slides (cf. https://splab.net/APD/A620/).

FIG. 10.

Sliding sound source in an acoustic tube. The solid rectangle denotes the location of the sound source sliding along the length of the tube (cf. https://splab.net/APD/A620/).

FIG. 10.

Sliding sound source in an acoustic tube. The solid rectangle denotes the location of the sound source sliding along the length of the tube (cf. https://splab.net/APD/A620/).

Close modal

To understand the output sounds produced by the S2T model, we can apply the basic idea that we have just illustrated in Fig. 10. As shown in Fig. 11, a black bar is inserted into the outer tube, so that the wide cavity and the narrow cavity have variable lengths, i.e., L1 and L2, under the condition of a constant total length, L1 + L2. Because the ratio of the areas of A1 and A2 is 1:10, it is large enough to decouple the two tubes, so that we can think of each set of resonances independently from the two tubes. Figure 12 shows a sound spectrogram in which the horizontal axis is time, and the vertical axis is frequency. This figure is similar to Fig. 10 in the sense that it is Fig. 10 overlaid with another Fig. 10 whose horizontal axis is flipped. This spectrogram was obtained as the black bar was gradually inserted into the outer tube, so the horizontal axis corresponds to L1 becoming longer from left to right. The declining curves are the results of the quarter-wavelength resonances from the back narrow cavity, while the rising curves are the results of the quarter-wavelength resonances from the front wide cavity.

FIG. 11.

(Color online) S2T model. Adapted from Arai, “Vocal-tract models in phonetic teaching and research,” in Routledge Handbook of Phonetics, Copyright 2019 Routledge (Arai, 2019) (cf. https://splab.net/APD/G250/).

FIG. 11.

(Color online) S2T model. Adapted from Arai, “Vocal-tract models in phonetic teaching and research,” in Routledge Handbook of Phonetics, Copyright 2019 Routledge (Arai, 2019) (cf. https://splab.net/APD/G250/).

Close modal
FIG. 12.

(Color online) Spectrographic representation of an output signal from the S2T model. Because the black bar was gradually inserted over a period of time, the horizontal axis corresponds to the length of L1.

FIG. 12.

(Color online) Spectrographic representation of an output signal from the S2T model. Because the black bar was gradually inserted over a period of time, the horizontal axis corresponds to the length of L1.

Close modal

When L1 is close to L2, that is, at the timings of the intersection of the first resonance curves from the two tubes, denoted by the red dashed lines in Fig. 12, the formant frequency, F1, becomes highest, and the second formant frequency, F2, becomes lowest. This is exactly the configuration of the vowel /a/.

Each of us produces different vowels with our own single vocal tract. VTM-FT (see the  Appendix) was developed because a vocal-tract model partially made of a flexible material can change its configuration (Arai, 2016). This model allows us to demonstrate that a single vocal tract can make multiple qualities of vowels manually by changing the shape of the flexible part including the tongue. This model has a great advantage because users can touch the model, check the tongue position, and listen to the output sounds at the same time. On the other hand, skill and knowledge are needed to change its configuration, so it would be more appropriate for advanced users/students. Vocal-tract models with simple configurations and small degrees of freedom are easier for an introductory session.

Section I introduced VTM-N20 (Fig. 1) and VTM-T20 (Fig. 2). Each of the models in these sets corresponds to one particular vowel. VTM-S20 is a S3T model (Fig. 13) that enables different vowels to be produced by changing the location of the inner slider. When the inner slider is in the back position, back vowels, such as /ɑ/ and /o/, are produced. When the inner slider is in the front position, front vowels, such as /i/ and /e/, are produced. When the inner slider is in the middle, /u/ is produced. These simple straight tubes are suitable for quick and effective demonstrations of the fact that the vocal-tract configuration determines the quality of vowel production. As mentioned previously, the STL files for 3D printing of N20 and T20 are available through the APD website (https://splab.net/APD/V100/). Moreover, video clips on vowel production using VTM-N20, T20, and S20 are available on our YouTube channel, which is called “Acoustic-Phonetics Demonstrations.” English versions of the six main videos have recently been released (see the  Appendix).

FIG. 13.

(Color online) S3T model (Arai, 2012a) (cf. https://splab.net/APD/G300/).

This topic is for more advanced students, but even so, it is not always correctly understood by them. To show this idea, one can apply the S2T model in Sec. II E. Because the resonance curves in Fig. 12 are nearly symmetric, we hear the same sound with two different vocal-tract configurations. As another example, the American English /r/ sound can be produced with two different major articulations: retroflexion of the tongue or bunching of the tongue. Figure 14 shows the models designed to demonstrate these two articulations. Figure 14(a) is the approximant model, or the VTM-AP model (Arai, 2019), with which we are able to produce /r/ and /l/. This model has a lever, and if you rotate the lever, the front part of the tongue undergoes retroflexion. As a result, this model can produce an English utterance of /ara/. On the other hand, there is another way to produce the English /r/ sound. Figure 14(b) shows the VTM-BR model for English /r/. With this model, we can partially raise the tongue body approximately 5–6 cm from the lip end. This corresponds to the so-called “bunched /r/” sound. These models can be excited to produce sounds. Actual demonstrations can be viewed on the YouTube channel of APD (see the  Appendix).

FIG. 14.

(Color online) Two models for /r/: (a) VTM-AP model and (b) VTM-BR model (cf. https://splab.net/APD/O500/).

FIG. 14.

(Color online) Two models for /r/: (a) VTM-AP model and (b) VTM-BR model (cf. https://splab.net/APD/O500/).

Close modal

This section describes an actual lecture delivered online in speech and hearing sciences. The results of questionnaires administered to participants in this lecture are also reported.

On October 3, 2021, a 4-h lecture was conducted online. About 100 people participated, and while the majority were speech-language pathologists and students who study speech-language pathology, there were also participants from other disciplines. They were asked to answer two questionnaires before and after the lecture. Table III shows the questions asked. Questions Q1–Q13 were asked before and after the lecture, while Q14–Q17 were only asked after the lecture. Q1–Q15 asked the participants to respond on a five-point scale: 1 = not applicable at all; 2 = not very applicable, 3 = somewhat applicable; 4 = moderately applicable; and 5 = well applicable. Q1–Q13 asked participants to choose from 1–5 on the basis of their current understanding of each topic.

TABLE III.

Questions asked before and after the online lecture on October 3, 2021.

Q1: I understand the difference between longitudinal and transverse waves. 
Q2: I understand the standing wave. 
Q3: I understand the resonance of a string fixed at the both ends. 
Q4: I understand a resonance phenomenon with the longitudinal wave, so that I can imagine it. 
Q5: I understand a wave reflects at the open end of an acoustic tube. 
Q6: I understand the principle of resonance of an acoustic tube with one open end and one closed end. 
Q7: I understand the principle of multiple pendulums that can be resonated simultaneously by applying a periodic external force. 
Q8: I understand the principle of the multiple resonances of an acoustic tube having multiple resonance frequencies. 
Q9: I understand that vowels can be explained by a combination of a source and a vocal-tract filter. 
Q10: I understand what type of acoustic tube produces the vowel /a/. 
Q11: I understand what determines the quality of a vowel. 
Q12: I understand that the vowel spectrum is constructed as a harmonic structure overlaid with peaks. 
Q13: I understand similar speech sounds can be produced from different articulations. 
The following items were asked only after the lecture: 
Q14: Lectures 1.1–1.8 helped my understanding of lectures 2.1–2.4. 
Q15: The lecture deepened my understanding of acoustic phonetics and made me more interested in the topics. 
Q16: Please describe any explanation that was easy to understand. 
Q17: Please describe any explanation that was hard to understand. 
Q1: I understand the difference between longitudinal and transverse waves. 
Q2: I understand the standing wave. 
Q3: I understand the resonance of a string fixed at the both ends. 
Q4: I understand a resonance phenomenon with the longitudinal wave, so that I can imagine it. 
Q5: I understand a wave reflects at the open end of an acoustic tube. 
Q6: I understand the principle of resonance of an acoustic tube with one open end and one closed end. 
Q7: I understand the principle of multiple pendulums that can be resonated simultaneously by applying a periodic external force. 
Q8: I understand the principle of the multiple resonances of an acoustic tube having multiple resonance frequencies. 
Q9: I understand that vowels can be explained by a combination of a source and a vocal-tract filter. 
Q10: I understand what type of acoustic tube produces the vowel /a/. 
Q11: I understand what determines the quality of a vowel. 
Q12: I understand that the vowel spectrum is constructed as a harmonic structure overlaid with peaks. 
Q13: I understand similar speech sounds can be produced from different articulations. 
The following items were asked only after the lecture: 
Q14: Lectures 1.1–1.8 helped my understanding of lectures 2.1–2.4. 
Q15: The lecture deepened my understanding of acoustic phonetics and made me more interested in the topics. 
Q16: Please describe any explanation that was easy to understand. 
Q17: Please describe any explanation that was hard to understand. 

The participants were also asked for additional responses. Before and after the lecture, those who were engaged in speech and language pathology were asked to respond to the following:

  • Q0-a: I can associate acoustics with clinical settings of speech and language therapy. (Respond on the five-point scale.)

  • Q0-b: Please specify any concrete examples of Q0-a.

The majority of the participants would have already taken an introductory acoustics course before they started studying speech-language pathology. However, from our questionnaire conducted before the lecture, the author found that many of them did not have a sufficient understanding, which coincides with my impressions throughout my career. In other words, the author has noticed that many speech-language pathologists in Japan feel that acoustics is a difficult subject. The suggested education program in Table II is designed exactly for this circumstance.

Table IV shows how the five-point scale scores for each question changed between before and after the lecture. The “total” column indicates the total number of participants who answered the question. The correlation between the scores before and after the lecture was evaluated by Spearman's correlation coefficient. Furthermore, it was checked whether the score after the lecture showed a statistically significant change compared with before the lecture. The statistical method used in this case was the Wilcoxon signed-rank test, which is a nonparametric paired test.

TABLE IV.

Changes in the five-point-scale responses to each question before and after the lecture. Also listed are Spearman's correlation coefficients between the before and after scores and the results of a Wilcoxon signed-rank test examining whether the score after the lecture showed a statistically significant change compared with before the lecture.

QuestionsBefore/after1: Not applicable at all2: Not very applicable3: Somewhat applicable4: Moderately applicable5: Well applicableTotalSpearman's correlation coefficientWilcoxon signed-rank test
ρp-valuep-value
Q0 Before 24 0.573 0.003 0.149 
4.2% 12.5% 20.8% 25.0% 37.5% 100.0% 
After 10 24 
0.0% 0.0% 29.2% 29.2% 41.7% 100.0% 
Q1 Before 15 16 12 10 60 0.334 0.009 <0.001 
11.7% 25.0% 26.7% 20.0% 16.7% 100.0% 
After 28 23 60 
0.0% 6.7% 8.3% 46.7% 38.3% 100.0% 
Q2 Before 13 17 16 10 61 0.577 <0.001 <0.001 
21.3% 27.9% 26.2% 16.4% 8.2% 100.0% 
After 13 30 15 61 
0.0% 4.9% 21.3% 49.2% 24.6% 100.0% 
Q3 Before 14 19 18 61 0.452 <0.001 <0.001 
23.0% 31.1% 29.5% 13.1% 3.3% 100.0% 
After 13 25 21 61 
1.6% 1.6% 21.3% 41.0% 34.4% 100.0% 
Q4 Before 18 17 13 10 61 0.485 <0.001 <0.001 
29.5% 27.9% 21.3% 16.4% 4.9% 100.0% 
After 18 28 10 61 
0.0% 8.2% 29.5% 45.9% 16.4% 100.0% 
Q5 Before 16 13 17 14 61 0.338 0.008 <0.001 
26.2% 21.3% 27.9% 23.0% 1.6% 100.0% 
After 26 20 61 
0.0% 11.5% 13.1% 42.6% 32.8% 100.0% 
Q6 Before 17 16 19 61 0.435 <0.001 <0.001 
27.9% 26.2% 31.1% 14.8% 0.0% 100.0% 
After 17 28 13 61 
0.0% 4.9% 27.9% 45.9% 21.3% 100.0% 
Q7 Before 34 20 61 0.119 0.363 <0.001 
55.7% 32.8% 8.2% 3.3% 0.0% 100.0% 
After 14 26 12 61 
1.6% 13.1% 23.0% 42.6% 19.7% 100.0% 
Q8 Before 24 15 14 61 0.366 0.004 <0.001 
39.3% 24.6% 23.0% 11.5% 1.6% 100.0% 
After 19 26 61 
3.3% 9.8% 31.1% 42.6% 13.1% 100.0% 
Q9 Before 16 19 12 61 0.328 0.010 <0.001 
8.2% 14.8% 26.2% 31.1% 19.7% 100.0% 
After 22 28 61 
0.0% 3.3% 14.8% 36.1% 45.9% 100.0% 
Q10 Before 17 17 13 61 0.221 0.086 <0.001 
14.8% 27.9% 27.9% 21.3% 8.2% 100.0% 
After 10 19 27 61 
1.6% 6.6% 16.4% 31.1% 44.3% 100.0% 
Q11 Before 15 23 12 60 0.401 0.002 <0.001 
1.7% 15.0% 25.0% 38.3% 20.0% 100.0% 
After 22 27 60 
0.0% 3.3% 15.0% 36.7% 45.0% 100.0% 
Q12 Before 15 13 20 10 61 0.429 0.001 <0.001 
24.6% 21.3% 32.8% 16.4% 4.9% 100.0% 
After 12 30 14 61 
1.6% 6.6% 19.7% 49.2% 23.0% 100.0% 
Q13 Before 10 20 11 15 60 0.368 0.004 <0.001 
16.7% 33.3% 18.3% 25.0% 6.7% 100.0% 
After 27 22 60 
1.7% 8.3% 8.3% 45.0% 36.7% 100.0% 
QuestionsBefore/after1: Not applicable at all2: Not very applicable3: Somewhat applicable4: Moderately applicable5: Well applicableTotalSpearman's correlation coefficientWilcoxon signed-rank test
ρp-valuep-value
Q0 Before 24 0.573 0.003 0.149 
4.2% 12.5% 20.8% 25.0% 37.5% 100.0% 
After 10 24 
0.0% 0.0% 29.2% 29.2% 41.7% 100.0% 
Q1 Before 15 16 12 10 60 0.334 0.009 <0.001 
11.7% 25.0% 26.7% 20.0% 16.7% 100.0% 
After 28 23 60 
0.0% 6.7% 8.3% 46.7% 38.3% 100.0% 
Q2 Before 13 17 16 10 61 0.577 <0.001 <0.001 
21.3% 27.9% 26.2% 16.4% 8.2% 100.0% 
After 13 30 15 61 
0.0% 4.9% 21.3% 49.2% 24.6% 100.0% 
Q3 Before 14 19 18 61 0.452 <0.001 <0.001 
23.0% 31.1% 29.5% 13.1% 3.3% 100.0% 
After 13 25 21 61 
1.6% 1.6% 21.3% 41.0% 34.4% 100.0% 
Q4 Before 18 17 13 10 61 0.485 <0.001 <0.001 
29.5% 27.9% 21.3% 16.4% 4.9% 100.0% 
After 18 28 10 61 
0.0% 8.2% 29.5% 45.9% 16.4% 100.0% 
Q5 Before 16 13 17 14 61 0.338 0.008 <0.001 
26.2% 21.3% 27.9% 23.0% 1.6% 100.0% 
After 26 20 61 
0.0% 11.5% 13.1% 42.6% 32.8% 100.0% 
Q6 Before 17 16 19 61 0.435 <0.001 <0.001 
27.9% 26.2% 31.1% 14.8% 0.0% 100.0% 
After 17 28 13 61 
0.0% 4.9% 27.9% 45.9% 21.3% 100.0% 
Q7 Before 34 20 61 0.119 0.363 <0.001 
55.7% 32.8% 8.2% 3.3% 0.0% 100.0% 
After 14 26 12 61 
1.6% 13.1% 23.0% 42.6% 19.7% 100.0% 
Q8 Before 24 15 14 61 0.366 0.004 <0.001 
39.3% 24.6% 23.0% 11.5% 1.6% 100.0% 
After 19 26 61 
3.3% 9.8% 31.1% 42.6% 13.1% 100.0% 
Q9 Before 16 19 12 61 0.328 0.010 <0.001 
8.2% 14.8% 26.2% 31.1% 19.7% 100.0% 
After 22 28 61 
0.0% 3.3% 14.8% 36.1% 45.9% 100.0% 
Q10 Before 17 17 13 61 0.221 0.086 <0.001 
14.8% 27.9% 27.9% 21.3% 8.2% 100.0% 
After 10 19 27 61 
1.6% 6.6% 16.4% 31.1% 44.3% 100.0% 
Q11 Before 15 23 12 60 0.401 0.002 <0.001 
1.7% 15.0% 25.0% 38.3% 20.0% 100.0% 
After 22 27 60 
0.0% 3.3% 15.0% 36.7% 45.0% 100.0% 
Q12 Before 15 13 20 10 61 0.429 0.001 <0.001 
24.6% 21.3% 32.8% 16.4% 4.9% 100.0% 
After 12 30 14 61 
1.6% 6.6% 19.7% 49.2% 23.0% 100.0% 
Q13 Before 10 20 11 15 60 0.368 0.004 <0.001 
16.7% 33.3% 18.3% 25.0% 6.7% 100.0% 
After 27 22 60 
1.7% 8.3% 8.3% 45.0% 36.7% 100.0% 

Overall, the scores increased after the lecture, as expected. Especially for Q1–Q13, there were significant differences in the paired-test results before and after the lecture; this means the degree of understanding increased significantly after the lecture. This result is considered to indicate a learning effect.

Spearman's correlation coefficient showed significant positive correlations for Q2, Q3, Q4, and Q6. These results suggest that the participants who had a high degree of understanding before the lecture had a deeper understanding after the lecture, as did those who lacked an understanding before the lecture.

On the other hand, Spearman's correlation coefficient did not show significant positive correlations for Q7 and Q10. If we take a close look at these results in Table IV, we can see that the participants only had very low scores before the lecture, and the scores dramatically increased after the lecture. As a result, the scores before and after the lecture were not correlated. This indicates that the lecture was most effective on the topics of Q7 and Q10.

For Q14, numbers of the participants for the five options were as follows: none for “1”; 3 people for “2”; 17 people for “3”; 28 people for “4”; and 25 people for “5”. For Q15, numbers of the participants for the five options were as follows: none for “1”; 4 people for “2”; 12 people for “3”; 29 people for “4”; and 28 people for “5”.

The participants were asked to describe any explanations that were easy to understand (Q16). Many of them found the animations and video clips provided for each topic on the APD website (https://splab.net/APD/) useful. The pages in the following categories seemed to be popular: lectures 1.3, 1.5, 1.7, 2.2, 2.3, and 2.4. The video clips on longitudinal waves generated by the real spring-mass model were also well received. In addition, they gave positive comments on the real-time demonstrations using the vocal-tract models.

Some of the participants gave detailed descriptions of the explanations they found to be hard to understand (Q17). It seemed that the concept of resonance (lecture 1.7) was one of the most difficult topics in this lecture. However, many of the participants felt that the lecture helped their understanding at the same time. In general, there are always a few participants who are not able to keep up with all of the content. Participants gave us some comments as follows. In reply to Q16, one participant said that she understood the idea behind the resonances of the fixed-fixed ends and the fixed-free ends. However, she got lost when the topic switched from resonance of the fixed-free ends to resonance in the vocal tract (Q17). Another student seemed to have difficulty understanding the concept of a spectral envelope in a vowel spectrum. These topics might be more carefully covered in a future lecture.

Table IV showed the learning effect of the online lecture regarding the set of physical models on acoustic phonetics and speech science with questionnaires. However, we did not conduct any quizzes for objective measures to evaluate the effectiveness in this study. Our previous study (Arai, 2007), however, examined the learning effect of a subset of physical models by asking the same set of quiz questions before and after a lecture conducted with the models. Twenty students aiming to become speech and language pathologists in Japan participated in this lecture. The quizzes were grouped into two subsets: G1 and G2. The G1 questions were not directly related to the lecture with the models, whereas the G2 questions were directly related to the lecture. According to the statistical analysis in Arai (2016) with the same data in Arai (2007), the answers to the G2 questions showed a significant improvement in score compared with G1. The G2 questions were about source-filter theory, the relationship between vocal-tract shape and vowel quality, the velum height and nasalized vowels, and breathing, for example. Arai (2016) showed that the demonstrations are effective in terms of objective measurements, and I am confident that this is also the case for the lectures proposed in the current paper.

The present study deals with educational issues in seven topics that students often feel have barriers to understanding the basic physics of acoustic phonetics and speech science. In this section, we discuss the key issues and the effect of the lecture.

An acoustic wave in the air is a longitudinal wave. However, it is often difficult to explain longitudinal waves with drawings, and many explanations resort to illustrating transverse waves instead. Although this is not a bad solution, some students still have difficulty imagining what a longitudinal standing wave is. The “Slinky” problem described in Sec. II has an issue with tension springs, which are not suitable for demonstrating the effect of a free end. One might argue that the Slinky can be used vertically. However, the degree of deflection varies from one position to another.

The alternative is a multiple spring-mass model with compression springs and masses, as shown in Fig. 4. With this model, not only fixed-end reflections, but free-end reflections can be demonstrated because the compression springs go either way, compressing or stretching. This helps students better understand the longitudinal waves after transverse wave phenomena have been presented.

A set of animations was prepared to demonstrate in lecture 1.7 that a string with free and/or fixed ends has multiple resonance modes. In particular, the animations illustrate multiple resonances in an acoustic tube. If the acoustic tube has one end open and one end closed and its length is 17 cm, its resonance (natural) frequencies are odd-numbered multiples of approximately 500 Hz. To enable the students to see why they are odd-numbered multiples, slow-motion animations of the transverse waves of a string with one end free and one end fixed, as shown in Fig. 15 (https://splab.net/APD/A525/), were also prepared.

FIG. 15.

(Color online) Snapshot of an animation of a transverse wave of a string with a fixed end (left) and free end (right) to demonstrate resonance modes (taken from https://splab.net/APD/A525/). The vertical axis is the displacement in arbitrary units, and the horizontal axis is the position along the string. When a narrow pulse is sent from the fixed end, it is reflected at the free end. When the reflected pulse moves backward and reaches the starting point, it is reflected again at the fixed end. When the subsequent pulses are sent at certain timings, they are overlaid and eventually resonate. This snapshot was taken when pulses are sent every two-thirds of the duration of reciprocation. (Details are in the main text.).

FIG. 15.

(Color online) Snapshot of an animation of a transverse wave of a string with a fixed end (left) and free end (right) to demonstrate resonance modes (taken from https://splab.net/APD/A525/). The vertical axis is the displacement in arbitrary units, and the horizontal axis is the position along the string. When a narrow pulse is sent from the fixed end, it is reflected at the free end. When the reflected pulse moves backward and reaches the starting point, it is reflected again at the fixed end. When the subsequent pulses are sent at certain timings, they are overlaid and eventually resonate. This snapshot was taken when pulses are sent every two-thirds of the duration of reciprocation. (Details are in the main text.).

Close modal

In the animation, a narrow pulse is sent from the fixed (left) end and reaches the free (right) end. Then the pulse is reflected at the free end, and the displacement due to the pulse is kept in the same direction (free-end reflection). The reflected pulse moves backwards and is reflected again at the fixed end. In this case, the direction of the displacement due to the pulse is inverted (fixed-end reflection). Let us assume it takes 1 ms to reciprocate and come back to the starting point. If identical pulses are sent every 1 ms, they cancel because the directions of the reflected first pulse and the second pulse are opposite. Likewise, when pulses are sent every 1/n ms, where n is a natural number, they cancel because the displacements of the reflected kth and the k + nth pulses are in opposite directions. When pulses are sent every 2 ms (two reciprocations), they are overlaid and eventually resonate, because the displacements of the reflected first and the second pulses are in the same directions. This is the first resonance mode. In this case, it is called a quarter-wavelength resonance, because the wavelength in the first resonance mode is 68 cm, which is the distance of two reciprocations or four times the tube length of 17 cm. Assuming the speed of sound in air is 340 m/s, the first resonance frequency is 500 Hz. When pulses are sent every 2/3 ms, they are also overlaid and resonate for the same reason. This is the second resonance mode, and its resonance frequency is 1500 Hz. Moreover, when pulses are sent every 2/(2n1) ms, they are overlaid and resonate. This is the nth resonance mode, and its resonance frequency is 500×(2n1) Hz. Thus, this acoustic tube is a multiple-resonance system with resonance frequencies at odd-numbered multiples of 500 Hz. Because this is one of the most difficult parts of the lecture series, more careful explanations might be needed. These animations on the APD website are very helpful for understanding why only odd-numbered multiples of the first resonance frequency occur.

A system with multiple pendulums of different lengths and a common pivot was used to show whether an external force applied to the pivot can make multiple resonances simultaneously (https://splab.net/APD/A530/). If the external force is sinusoidal and its frequency matches one of the natural frequencies, the matched pendulum will swing. If the external force is a periodic but complex signal, multiple pendulums may be resonated. This concept is very important in speech science, because a glottal source due to the vocal-fold vibration is a complex signal. Thus, as in this example, it is possible to show a multiple-resonance system, resonating simultaneously from a complex input signal in accordance with the principle of superposition.

Regarding the online lecture in Sec. III, the participants scored significantly higher after the lecture in their understanding of the principle of multiple pendulums that can be resonated simultaneously by applying a periodic external force (Q7 in Table IV). This indicates that the video clips on the multiple-resonance system consisting of pendulums with different lengths in addition to explanations were an effective education.

The same arguments regarding the system with multiple pendulums of different lengths apply to an acoustic tube with open and closed ends. If the length of the tube is 17 cm, which is about the length of the vocal tract of an adult male, the output sounds like a male voice with an input signal, of which the fundamental frequency is, say, 100 Hz. Because this tube has the natural frequencies that are odd-number multiples of approximately 500 Hz, the formant frequencies match the 5th, 15th, 25th,…, etc., harmonics in the spectral domain.

The key point of vowel production is the source-filter dualism. The glottal source signal is periodic and made up of a set of sinusoidal waves whose frequencies are integer multiples of the fundamental frequency. On the other hand, the vocal-tract filter is a multiple-resonance system. When a glottal source signal is applied to a vocal-tract filter, the frequency components with the harmonic structure matching the multiple resonances are emphasized and form multiple peaks in the spectral envelope.

Figure 9 shows a spectrographic representation of an output signal produced by blowing into a reed-type sound source attached to the S2T model. The horizontal axis corresponds to the length of the narrow, back cavity, L1, while the vertical axis is the frequency. The resonances of the entire model are approximated by those of the individual cavities. The longer the narrow cavity is, the lower the resonances become. Likewise, the shorter the wide cavity is, the higher the resonances become, and they intersect each other. At the midpoint, the first formant, F1, is the highest, and the second formant, F2, is the lowest. That is exactly when the vowel /a/ is produced. When we measure F1 and F2 frequencies around this intersection, those frequencies are more or less stable and less sensitive to L1. Stevens (1972, 1989) pointed out that this is an example of the quantal theory of speech production.

Regarding the online lecture in Sec. III, the participants also scored significantly higher after the lecture in their understanding of what type of acoustic tube produces the vowel /a/ (Q10 in Table IV). This indicates that the explanation using the S2T model was educationally effective.

The vocal-tract model set, including VTM-T20 and VTM-S20, were shown to be effective in the online lecture. The demonstrations were well received by the audience and helped them to understand that vowel quality is determined by the vocal-tract configuration. The S2T model was also used to demonstrate that similar sounds are produced from two different articulations. This demonstration concretely illustrates the one-to-many mapping between sounds and articulations in speech production.

There are more advanced topics that we can deal with along the line of this educational program. These advanced topics can be treated in a graduate program in acoustic phonetics and speech science. The following are examples.

  • Topic A1: Why are certain combinations, [long tube] + [low-pitch] or [short tube] + [high-pitch], more intelligible than the other combinations (Arai, 2012b)?

  • Topic A2: The VTM-S20 (S3T) model was originally designed for vowel production. However, we can demonstrate that the same model becomes a slide whistle simply by switching the sound source from a plastic reed to air reed. Why is this so?

  • Topic A3: Both straight and bent vocal tracts can produce the same vowel, as long as their cross-sectional area functions match. Then what is the advantage of a bent vocal tract in terms of vowel production?

  • Topic A4: The relation between articulation and acoustics is not linear.

  • Topic A5: The relation between sounds and phonemes is not a one-to-one mapping.

  • Topic A6: In a strict sense, there is an interaction between the source and the vocal-tract filter; that is, they are not always independent (Titze, 2008; Titze et al., 2008).

  • Topic A7: Speech perception can be affected by visual cues.

There are a couple of issues to discuss further. When we teach wave propagation, for example, animations on particles' displacements are great for giving students an intuitive understanding. However, more illustrative animations are needed as well. Here, the animations by Russell (e.g., Krishnamurti, 2010) showing how particle velocity and air pressure are related are representative of this topic and should be presented in an education program for acoustic phonetics and speech science. Another issue that we face when demonstrating standing waves either with a traditional wave machine or the spring-mass model described above is that the end to which the external force is applied is a free end, whereas the ideal explanation of speech production would have the external force applied to a fixed end. This would not be a serious problem as long as we explain it clearly.

The present paper described seven topics that are important in acoustic phonetics and speech science but that learners often find difficult to understand. Solutions to facilitate understanding were suggested and incorporated into a model lecture. One of the most difficult concepts that learners often misunderstand is how a set of harmonic waves excites a multi-resonance system for vowel production. Questionnaires on this concept were administered to attendees before and after an online 4-h lecture, and the responses revealed the effectiveness of the suggested solutions and shed light on a new educational program intended as an introductory course on acoustic phonetics and speech science with physical models. Seven more advanced topics that can be discussed in an advanced course were also suggested. The principles embodied in most of the topics can be demonstrated by using the set of physical models prepared by the author.

The education program suggested in the present paper is for an introductory course in acoustic phonetics and speech science. More physical models and demonstrations will be suggested in the future.

The present paper is based on the author's invited paper at the Acoustical Society of America (ASA) Meeting, 2020 (Arai, 2020). The author would like to thank the members of the Committee of Education in Acoustics, ASA. This work was partially supported by Japan Society for the Promotion of Science (JSPS) KAKENHI Grant Nos. 18K02988/21K02889. The author would also like to thank the members of the KAKENHI project (Grant No. 20K03074) for their help.

The URLs of the six video clips on the YouTube channel “Acoustic-Phonetics Demonstrations” are listed below:

  1. Lung model and head-shaped model: https://www.youtube.com/watch?v=vT4dT8bd5ow;

  2. Vocal-tract models VTM-N20: https://www.youtube.com/watch?v=DyQ96oerZEs;

  3. Vocal-tract models VTM-T20: https://www.youtube.com/watch?v=DMmPfl209t8;

  4. Vocal-tract models VTM-S20: https://www.youtube.com/watch?v=9c7ZrSqnIM0;

  5. Vocal-tract model with flexible tongue (VTM-FT model): https://www.youtube.com/watch?v=pwfdLYuC0n0;

  6. Vocal-tract model for approximants (VTM-AP model and VTM-BR model): https://www.youtube.com/watch?v=saohtEPPtW8.

1.
Arai
,
T.
(
2007
). “
Education system in acoustics of speech production using physical models of the human vocal tract
,”
Acoust. Sci. Tech.
28
(
3
),
190
201
.
2.
Arai
,
T.
(
2012a
). “
Education in acoustics and speech science using vocal-tract models
,”
J. Acoust. Soc. Am.
131
(
3
),
2444
2454
.
3.
Arai
,
T.
(
2012b
). “
Vowels produced by sliding three-tube model with different lengths
,” in
Proceedings of INTERSPEECH 2012
, September 9–13, Portland, OR, pp.
2190
2193
.
4.
Arai
,
T.
(
2016
). “
Vocal-tract models and their applications in education for intuitive understanding of speech production
,”
Acoust. Sci. Tech.
37
(
4
),
148
156
.
5.
Arai
,
T.
(
2019
). “
Vocal-tract models in phonetic teaching and research
,” in
Routledge Handbook of Phonetics
, edited by
W. F.
Katz
and
P. F.
Assmann
(
Routledge
,
London
), pp.
570
598
.
6.
Arai
,
T.
(
2020
). “
Acoustic-phonetics demonstrations for classroom teaching
,”
J. Acoust. Soc. Am.
148
,
2609
.
7.
Chiba
,
T.
, and
Kajiyama
,
M.
(
1941
).
The Vowel: Its Nature and Structure
(
Tokyo-Kaiseikan
,
Tokyo
).
8.
Fant
,
G.
(
1960
).
Acoustic Theory of Speech Production
(
Mouton
,
Hague, Netherlands
), pp.
15
90
.
9.
Krishnamurti
,
S.
(
2010
). “
Acoustics and vibration animations
,”
Ear Hear.
31
(
4
),
585
586
.
10.
Russell
,
D.
(
2020
). “
Acoustics and vibration animations
,” https://www.acs.psu.edu/drussell/demos.html (Last viewed November 1, 2022).
11.
Sakamoto
,
S.
,
Ueno
,
K.
, and
Tachibana
,
H.
(
2004
). “
Visualization of resonance phenomena for acoustic education
,” in
Proceedings of the 18th International Congress on Acoustics
, April 4–9, Kyoto, Japan, pp.
2311
2312
.
12.
Stevens
,
K. N.
(
1972
). “
The quantal nature of speech: Evidence from articulatory-acoustic data
,” in
Human Communication: A Unified View
, edited by
P. B.
Denes
and
E. E.
David
, Jr.
(
McGraw-Hill
,
New York
), pp.
51
66
.
13.
Stevens
,
K. N.
(
1989
). “
On the quantal nature of speech
,”
J. Phon.
17
,
3
46
.
14.
Stevens
,
K. N.
(
1998
).
Acoustic Phonetics
(
MIT
,
Cambridge, MA
).
15.
Titze
,
I. R.
(
2008
). “
Nonlinear source–filter coupling in phonation: Theory
,”
J. Acoust. Soc. Am.
123
(
5
),
2733
2749
.
16.
Titze
,
I. R.
,
Riede
,
T.
, and
Popolo
,
P.
(
2008
). “
Nonlinear source-filter coupling in phonation: Vocal exercises
,”
J. Acoust. Soc. Am.
123
(
4
),
1902
1915
.