Directivity of speech and singing is determined primarily by the morphology of a person, i.e., head size, torso dimensions, posture, and vocal tract. Previous works have suggested from measurements that voice directivity in singing is controlled unintentionally by spectral emphasis in the range of 2–4 kHz. The attempt is made to try to identify to what extent voice directivity is affected by the mouth configuration and the torso. Therefore, simulations, together with measurements that investigate voice directivity in more detail, are presented. Simulations are presented for a piston in an infinite baffle, a radiating spherical cap, and an extended spherical cap model, taking into account transverse propagation modes. Measurements of a classical singer, an amateur singer, and a head and torso simulator are undertaken simultaneously in the horizontal and vertical planes. In order to assess differences of voice directivity common metrics, e.g., horizontal and vertical directivity indexes, are discussed and compared to improved alternatives. The measurements and simulations reveal that voice directivity in singing is affected if the mouth opening is changed significantly. The measurements show that the torso generates side lobes due to diffraction and reflections at frequencies related to the torso's dimensions.
I. INTRODUCTION
Voice directivity describes how sound is radiated into space from the mouth and/or nasal orifices of a person. Voice radiation is characterized by its directionality and the efficiency with which the sound produced by the vocal tract is transmitted outside. Voice directivity patterns can be calculated from simulated or measured sound pressure levels at fixed distances from a defined center position along a circle or sphere around a subject. The most prominent influences on voice directivity (Blandin et al., 2015; Blandin and Brandner, 2019; Blandin et al., 2018; Cabrera et al., 2011; Chu and Warnock, 2002; Flanagan, 1960; Katz and D'Alessandro, 2007; Marshall and Meyer, 1985) can be summarized as follows:
shape and size of head and torso (fixed),
posture, head inclination (variable),
vocal tract geometry and mouth opening (variable), and
spectral emphasis due to vocal technique (variable).
A recent review of research on voice directivity can be found in Abe (2019). While there are several studies in the literature that investigate the directivity patterns of human speech, only a few reveal to which extent a person can control or at least incidentally change these patterns. In the studies of Cabrera (2004), Cabrera et al. (2011), and Chu and Warnock (2002), general directivity properties of speakers and singers have been investigated. The data are calculated over longer time frames of running speech or sung phrases. The pioneering work of Marshall and Meyer (1985) is also noteworthy. They analyze three vowels for three singers but present the results solely in octave bands and not in full detail. Changes of directivity in detail (third-octave bands or higher resolution) for different phonemes of speakers have been recently presented in Kocon and Monson (2018) and, to the author's knowledge, for a classical singer only in Katz and D'Alessandro (2007). In the latter, the singer's directivity has been investigated in detail by calculating long-term averaged spectra (LTAS) of specific vowels rather than sung phrases, but the corresponding mouth openings have not been discussed. The directivity patterns show subtle differences between different vowels and performance styles for one test subject. Such detailed results of steady conditions cannot easily be obtained by LTAS from running speech because coarticulation can affect the mouth opening for a specific phoneme and, therefore, smear the results.
Unconsidered in all these studies is that for the geometry of the vocal tract and simpler cylindric geometries, higher order modes (transverse propagation modes) can occur. Transverse propagation modes generate a nonuniform velocity field on the mouth exit plane, which departs from the plane piston assumption (Savkar, 1975; Snakowska et al., 1996). Recently published data in Blandin et al. (2018) reveal that the geometry of the vocal tract can have influence on the sound radiation above 6 kHz in speech production.
In general, an increase in directivity is expected if larger aperture diameters are used, which correspond to larger mouth openings. A significant increase or decrease of voice directivity would introduce a change of the perceived source distance at the listeners position (Wendt et al., 2017).
The influence of a professional singing technique on the vocal tract configuration has been shown by the use of ultrasound equipment in Nair et al. (2016). In this study, it is assumed that as a strategy a larger mouth opening is used by professional singers. If changing the oral configuration in singing is related to optimizing the vocal output, then measurement results of differences in directivity for different mouth openings are supposed to relate to this optimization process.
An initial objective of this work is to identify if simple models can serve as good approximations of human voice directivity. A second objective is to determine the effect of the torso on voice directivity and in which frequency region the effects are most prominent. However, the main objective of the work is to determine if different mouth openings while vocalizing the German vowel /a/ show an effect on voice radiation in singing. The paper is structured as follows. First, we present the three used simulation models, our measurement setup, data visualizations, and metrics in Sec. II. Second, we compare the results for the three used sound radiation models in Sec. III A. Third, we present the measurement results to investigate the effect of the torso and different mouth openings in Sec. III B. Finally, on the basis of our results, we discuss how the head, torso, and mouth configuration influences voice directivity in Sec. IV. For an objective interpretation and comparison of the results, we use common metrics and introduce a new measure, the so-called energy vector as an enhanced descriptor.
II. METHODS
In this section, we present a wide variety of opportunities to achieve an objective analysis and reproducible measurement results for singing voice radiation and directivity analysis.
A. Simulations
We investigate currently known simulation models on their applicability to provide insights on influences from diffraction or higher order modes on voice directivity. This should allow us to better understand specific effects seen in voice directivity measurement data. Therefore, we investigate the piston model, which accounts solely for different mouth opening areas with a uniform particle velocity distribution, the spherical cap model, which includes also head diffraction and also a uniform particle velocity distribution, and the extended spherical cap model, which also accounts for the propagation of transverse modes, i.e., the superposition of uniform and nonuniform particle velocity distributions.
1. Piston model
As described in Flanagan (1960), the mouth opening can be approximated in the simplest case by a spherical source on a spherical baffle, where the latter corresponds to the human head. Furthermore, he proposes a small piston instead of a spherical source. The far-field pressure for a circular baffled piston with a radius rp at a radial distance r is given as follows:
with J1 denoting the first-order Bessel function, the angular wave number k, the density of air ρ0, the speed of sound c, and the membrane velocity . The sound pressure is dependent on the distance r and due to axial symmetry only on the angle ϑ. Therefore, a simple calculation regarding the influence of mouth openings corresponding to different radii of a piston is feasible. Directionality increases noticeably for , whereas the frequency of the transition from a nondirectional to directional radiation pattern for a piston—which will be called the transition frequency within the rest of the paper—can be calculated with (Zwicker and Zollner, 1993).
2. Spherical cap model
An even more realistic approach to simulate the sound radiation from a person's mouth is given by the spherical cap model described in Zotter and Frank (2019). In contrast to the piston model, the sound is radiated by a spherical cap located on a rigid sphere with a radius r0 and an aperture angle α. The model does allow one to simulate sound radiation for angles larger than ±90°. The pressure distribution in the far-field for a single spherical cap can be calculated by the use of the Legendre polyomials Pn of order n as follows:
where r is the radius in space, is the spherical Hankel function of the second kind, and is its first derivative. The aperture opening is accounted for by the use of a weighting function wn (Zotter and Frank, 2019),
3. Extended spherical cap model
The plane wave assumption inside the vocal tract does not hold for frequencies higher than about 4 kHz because given the dimensions of the vocal tract, the transverse propagation modes can potentially propagate from this frequency on (Blandin et al., 2015). Thus, the particle velocity on the mouth exit may become nonuniform, and the plane piston and the spherical cap models are no more valid. Here, the spherical cap model is extended by accounting for the variations of the particle velocity. The particle velocity field over the mouth exit is discretized in Np point sources of acoustic flow velocity amplitudes νl. The acoustic pressure is obtained as the summation of the contributions of each point source,
where γl is the angular distance between the coordinates of the source point θl and the coordinates of the reception point θ and are the equivalent surface areas. The particle velocity field at the mouth exit is calculated by using the multimodal method, which relies on the projection of the acoustic field on the propagation modes of a locally uniform waveguide. As this approach includes the plane mode, it can be considered as an extension of the plane wave theory with higher order modes. The multimodal theory used in this work is described in Blandin et al. (2015) and Blandin et al. (2016). Similar results may be obtained by the use of any other three-dimensional (3D) acoustic simulation method like the finite element or finite difference method.
In Fig. 1, the vocal tract geometries used for the simulations are shown. The vocal tract configurations are defined as “normal,” “long,” “wide,” and “open” for the ease of reading and correspond to the mouth openings shown in Fig. 2. However, we recognize that the mouth openings are not independent of the vocal tract configurations. Furthermore, the mouth opening and its corresponding vocal tract configuration will affect the formant characteristics, i.e., their frequency, bandwidth, and strength. This will receive no further attention here as our focus lies on the sound radiation patterns. The used approximations of the real vocal tract of the singer are adaptations of a 3D geometry extracted from a magnetic resonance image for the vowel /a/ of a male speaker (Aalto et al., 2014; Arnela et al., 2016). The length and the volume of the oral and pharyngeal cavity of the original geometry were adapted to female dimensions by utilizing the data of 120 male and female subjects (Xue and Hao, 2006). An elliptical mouth opening is used in which the height and width of the mouth openings have been taken from the female classical singer participating in this study (cf. Table II). For a more detailed explanation of the geometry design, see Blandin and Brandner (2019).
B. Measurements
1. Test subjects and conditions
One professional classical singer with a master's degree in classical voice and international singing experience and an amateur singer are investigated in the study. The classical singer and amateur singer were both instructed to orient their head always in the forward direction. The mean head inclination angle acquired by utilizing a tracking system validates that only slight movements did take place; see Table I.
. | Normal . | Long . | Wide . | Open . |
---|---|---|---|---|
() | 1.7 | 5.8 | 1.3 | 6.6 |
. | Normal . | Long . | Wide . | Open . |
---|---|---|---|---|
() | 1.7 | 5.8 | 1.3 | 6.6 |
As the main goal of the study is the analysis of the influence of the mouth opening on voice directivity, four mouth openings are investigated (Table II; Fig. 2).
. | Normal . | Long . | Wide . | Open . |
---|---|---|---|---|
Width (cm) | 4.8 | 4.2 | 6.6 | 5.9 |
height (cm) | 2.7 | 6.6 | 1.4 | 5.2 |
. | Normal . | Long . | Wide . | Open . |
---|---|---|---|---|
Width (cm) | 4.8 | 4.2 | 6.6 | 5.9 |
height (cm) | 2.7 | 6.6 | 1.4 | 5.2 |
As a target vowel, we define the German vowel /a/ even though slight deviations are expected to occur due to a change of mouth opening. The classical singer is asked to maintain the same vocal effort for all conditions and merely change the mouth opening.
To investigate the influence of the upper body on the radiated sound pressure field, measurements are made with a Brüel and Kjaer 4128 head and torso simulator (HATS; Nærum, Denmark). As an excitation signal, we use a logarithmic sweep. The torso influence in the horizontal and vertical directivities can be discussed for the HATS as the head can be removed from the torso and measured separately. The mouth opening has a horizontal width of 3 cm and a vertical distance of 1 cm.
2. Room conditions
Measurements were carried out in a sound treated measurement room with absorptive material on the walls and floor at the Institute of Electronic Music and Acoustics. The mean room reverberation between 400 Hz and 1 kHz is below 75 ms and above 1 kHz below 50 ms. The volume of the room is approximately 50 m3 with a floor area of 22.50 m2.
3. Double circle microphone array (DCMA)
The measurements of source radiation patterns are undertaken using a microphone array with a radius of 1 m consisting of two circular rings, one placed in the horizontal plane and the other one placed in the vertical plane, described in Brandner et al. (2018). Each of the rings can hold up to 32 microphones (NTI m2230, Schaan, Liechtenstein)—which means an angular spacing of 11.25°—resulting in a maximum number of 62 microphones as both rings intersect in the front and back of the array. In addition, a reference microphone is used and is located at the exact center of the microphone array.
4. Measurement procedure
In order to facilitate reproducible and comparable data, the head of the performer is equipped with reflective markers for optical tracking. The singer is asked to sit within the measurement setup as close as possible to a centered reference microphone. A visual feedback of the position is provided, whereby the mouth is defined as the acoustical center. The performer is asked to sing a glissando, starting at a low pitch (G4, 392 Hz) and ending one octave above the starting pitch. The glissando ensures that a wide range of frequencies are captured; see Kob and Jers (1999). Impulse responses for vocalized phonemes from directivity measurements with a reference signal can be acquired, which leads to a similar approach as the exponential swept-sine method in Farina (2000) to further improve the signal-to-noise ratio. The impulse responses are cut to a length of 1024 samples at a sampling frequency of 44.1 kHz, which results in a frequency resolution of 43 Hz. Simultaneous array measurements as outlined in Sec. II B 3 reduce both measurement time and positioning drifts.
5. Measurement uncertainty
The variability of the measurement results due to positioning errors has been investigated with the HATS. The HATS is positioned with various radial offsets from the center position (up to 11 cm off-center) of the measurement setup. Therefore, we expect for the largest positioning error of 11 cm radially a maximum magnitude error of ±1 dB. The standard deviations of the horizontal directivity index (HDI) and the vertical directivity index (VDI) for all tested positioning offsets lie below 1 dB in the vicinity of 1.5 kHz and below 0.6 dB elsewhere. The positioning error shows minimal influence on the calculated directivity indexes and can be considered neglectable if the positioning error stays at least below a magnitude of 5 cm.
C. Metrics
In addition to the visualization of voice directivity in polar plots and normalized acoustic pressure maps, the following metrics will be used to investigate differences. The metrics are calculated from third-octave smoothed data.
1. Directivity index for a single plane
In accordance to literature (Cabrera, 2004; Tylka et al., 2015), we define the directivity factor for the horizontal and vertical planes. It is defined by the ratio of the on-axis power to the average power Pmean of all sampling positions on the respective plane. The HDI and VDI evaluated at an angular frequency ω for each plane are then defined in dB as follows:
2. Front-to-back ratio
The HDI and VDI tend to decrease if side lobes with higher levels than the on-axis level occur, which can lead to the false conclusion that at these frequencies, the radiation pattern gets more omnidirectional. We define the front-to-back ratio (FBR) in dB as the ratio of the average power radiated to the front Pfront and to the back Pback [Eq. (6)].
3. Upward-to-downward ratio
This metric is used to investigate whether most of the energy in the vertical plane is radiated upward or rather downward to the floor. The upward-to-downward ratio (UDR) denotes the ratio of how much power is radiated to the upper half space Pupper versus the energy radiated to the lower half space Plower in dB:
4. Energy vector
The energy vector in Eq. (8) can be utilized to describe the direction and the width of the main lobe of an acoustic source (Fig. 3). This measure, first introduced by Gerzon (1992), is commonly used in the context of 3D loudspeaker setups and their evaluation (Frank, 2013) but is useful in the description of properties of any arbitrary sound source radiation:
The frequency dependent magnitudes are multiplied by the vectors of each measurement position i, in each respective plane and normalized by the sum of the energy, yielding a normalization of the vector between the limits 0 (omnidirectional) to 1 (maximum focus to one direction). The following two metrics will be used: (i) the main beam width in each plane and (ii) the main direction in the vertical plane θs.
D. Visualization
1. Polar patterns
A quasi-continuous representation of arbitrary radiation directions for the azimuth angle and elevation angle ϑ can be rendered from given discrete measurement positions by applying the circular or spherical harmonics (SH) transform (Zotter, 2009). The polar patterns are displayed logarithmically and show a dynamic of 25 dB in each plane. The data presented in Figs. 9 and 11 are third-octave smoothed. The visualization and analysis tools are freely available within the DirPat-project, which is in the spirit of open data and reproducible research and discussed in Brandner et al. (2018).
2. Acoustic pressure maps
The acoustic pressure maps show the amplitude of the acoustic pressure normalized by the maximum over the angular position for each frequency and they are limited to a dynamic range of 20 dB. We use a frequency resolution of 10 Hz for the displayed data of the simulations, whereas for the measurement data, a frequency resolution of 43 Hz is used, and for visualization, semitone-octave smoothing is used.
III. RESULTS
A. Simulation results
1. Piston model
Normalized acoustic pressure maps are shown in Figs. 4(a) and 4(b) for a piston diameter of d = 3 cm and d = 6.6 cm, respectively, corresponding to the width of the mouth opening of the HATS and the classical singer using the wide mouth opening configuration. An increase of directionality for d = 3 cm and d = 6.6 cm is visible at around 3.6 kHz and 1.7 kHz, respectively, and increases further with frequency, which is indicated by a steady decrease of the amplitudes toward the side. Due to the axial symmetry of the piston, only half of the acoustic pressure maps are shown. As the piston is located in an infinite baffle, the sound pressure levels are only calculated in the half space (i.e., only in front of the baffle).
2. Spherical cap model
The size of the sphere is set to the average size of a human head (r0 = 8.5 cm). The model allows one to calculate the directivity in the full-space. The spherical cap model shows a narrower directivity pattern toward the front, already at lower frequencies, in contrast to the piston model in the infinite baffle [see Figs. 4(c), and 4(d)]. The amplitudes at 90° for 3.6 and 1.7 kHz are of around 3.5 dB less than for the piston model. The plots also show larger amplitudes at 180°. This is due to constructive interference for frequencies with a wavelength of the same order of size as the sphere radius.
3. Extended spherical cap model
The directivity maps for an extended spherical cap model by accounting for transverse propagation modes are presented for the long and wide mouth opening configurations in the horizontal and vertical planes in Fig. 5. The effect of the transverse propagation modes is visible as vertical streaks, which can already propagate from 3.6 kHz on; see Fig. 5(c). The cut-on frequency of the first transverse propagation mode increases to 3.8 kHz as the mouth width is decreased [Fig. 5(a)]. In the horizontal plane, the long configuration shows a broader main beam width between 2.5 and 6 kHz with almost the same amplitude between 270° and 90° [Fig. 5(a)] in comparison to the wide configuration in Fig. 5(c), in which the main beam width is narrower. Furthermore, the transverse propagation modes introduce asymmetries around their cut-on frequencies, which can be seen around 3.6 and 6 kHz. Above 6 kHz, less transverse propagation modes occur for the long configuration. In contrast, the vertical plots in Figs. 5(b) and 5(d) show much larger differences. In general, we see that the wide configuration has a much broader main beam width. In Fig. 5(b), we see that at and above 3.8 kHz, the first transverse propagation mode affects the directivity of the long configuration by creating a local maximum around 90°, whereas for the wide configuration the first transverse propagation mode occurs at 6 kHz with almost no impact; see Fig. 5(d). The transverse propagation modes affect the main beam direction and the sound is, in general, directed downward above 3.8 kHz for the long configuration. For all configurations, sound radiation gets more complex in comparison to the piston and spherical cap model, showing significant variations in frequency intervals on the order of 100 Hz. As the frequency and size of the mouth opening increase, the number of transverse propagation modes increases. Further data on the normal and open mouth configurations can be found in the supplementary material.1
B. Measurement results
1. Torso influence (HATS)
The normalized acoustic pressure maps for the HATS with and without torso are shown in Figs. 6(a) and 6(b), respectively. The normalized acoustic pressure map of the HATS without torso shows that the amplitudes below 2 kHz decrease by 1.5 dB already for angles larger than 45° (or lower than 315°) except at around 1.1 kHz. Furthermore, sound is also radiated toward the back for frequencies below 2 kHz with lower amplitudes between 90° and 135° and higher amplitudes at around 180° for frequencies below 1 kHz. The main beam width toward the front decreases as the frequency increases; see Fig. 6(a). However, it broadens in specific frequency bands around 3.3, 5.3, 7.3, and 9.3 kHz. It can be seen that sound radiation looks very similar—except for the broadening of the main beam width at the mentioned frequencies—if compared to the results of the spherical cap model for an aperture size of 3 cm.
We turn now to the measurement results for the HATS including the torso. Smaller amplitudes are visible toward the sides and the back already at lower frequencies in the range of 0.5–1 kHz. The most striking result to emerge from the data is that the torso provokes distinct side lobes within the range of 1–2 kHz, 3–5 kHz, and less pronounced within 6–7 kHz. A decrease of amplitudes toward the front on the order of −5 dB is visible at around 1.1 kHz. We now use the HDI and VDI metrics to compare the measurements. The HDI and VDI only quantify the directionality in the corresponding plane. Figure 7 shows that the strongest deviations in the range of 3 dB and 2 dB occur around 1.1 kHz and 2 kHz, respectively. The differences are smaller than 1 dB above 2 kHz for the horizontal plane, and the differences are smaller than 1 dB above 2.5 kHz for the vertical plane. Furthermore, the ripples visible in the acoustic pressure maps are also visible as drops of the HDI and VDI values at the same frequencies. The directivity index decreases to negative values around 1 kHz due to strong side lobes provoked by the torso.
C. Comparison of test subjects and HATS
In this section, we give a qualitative comparison of the directivity of the HATS with torso, the classical singer, and the amateur singer. For the comparison, we use the measurements of the test subjects with the normal mouth opening configuration.
In Fig. 8(a), the HDI values show a general tendency to increase over frequency for the test subjects and the HATS. Although, the curves also show drops at 1.1, 3.3, and 5.1 kHz for the HATS and at 750 Hz, 1.7 kHz, 3 kHz, and 8 kHz for the amateur singer, and at 1, 2.2, and 5.2 kHz for the classical singer. These drops also occur for the VDI values at similar frequencies in Fig. 8(b). The VDI curves show a slight increase over frequency for the HATS and the classical singer and a decrease for the amateur singer above 4 kHz.
Polar patterns for the test subjects and the HATS show greatest accordance if they are evaluated at the frequencies corresponding to the identical minima of the directivity indexes (DIs) than at the same frequencies. In order to illustrate the dependency of the side lobes on the torso dimensions, the directivity patterns of the HATS, the amateur singer, and the classical singer are plotted at frequencies corresponding to the same first minimum of the DI curves in Fig. 9, that is, 1.1 kHz, 800 Hz, and 950 Hz, respectively (Fig. 9).
The FBR and UDR values are presented in Figs. 8(c) and 8(d), respectively. The FBR generally increases with frequency for the test subjects and the HATS. Prominent drops occur within the region of 800 Hz–1 kHz. The directionality of the HATS increases for frequencies above 7 kHz. The FBR values above 7 kHz stay at 10 dB for the classical singer and decrease for the amateur singer. Largest deviations for the UDR values of about 5 dB occur at 1.5 kHz between both the test subjects and the HATS around 1.5, 2.5, and above 8 kHz.
D. Influence of the mouth configuration on radiation
The HDI, VDI, FBR, and UDR values obtained with the four mouth configurations on the classical singer are presented in Figs. 10(a) and 10(b). The deviation of HDI between the different configurations is up to 2.6 dB in a broad frequency range of 1.5–3 kHz and within the narrower range of 7–8 kHz. The VDI values also differ in the frequency range of 1.5–3 kHz and within the range of 4–7 kHz. The FBR values in Fig. 10(b) show more prominent differences with deviations of up to 5 dB within the range of 1.5–3 kHz, whereas the wide configuration shows the highest FBR. The UDR values show that along the vertical axis up to 1.8 kHz, the sound is focused toward the same direction for all mouth openings. Differences of the magnitude of 3 dB occur within the region of 2–3 kHz and above 4 kHz. The UDR values decrease to −5 dB for the long configuration, indicating that most of the energy is radiated downward from 4 kHz on. A similar decrease is shown for the open configuration but in the smaller range of 4–6 kHz.
In Fig. 11, the polar patterns show the directivity in more detail at 2.4 and 5 kHz. We see larger differences between mouth openings in the magnitude of 5 dB at 2.4 kHz in the horizontal plane and in the magnitude of almost 10 dB at 5 kHz in the vertical plane. At 5 kHz, the normal and wide mouth openings distribute the energy almost symmetrically in both planes. This agrees with the features highlighted by the FBR and UDR metrics.
Let us now turn to how we can gain the abovementioned information by using the angular information of the energy vector in our detailed analysis. In Figs. 12(a)–12(d), we present the beam width θw for each plane and the main direction of sound θs for the vertical plane in degrees for each proposed mouth opening. The wide and open conditions show the highest directionality in the horizontal plane within 1.5–4 kHz. In the vertical plane, the long and open conditions from 5 to 9 kHz are most directional. In addition, the open condition focuses sound between 1.5 and 4 kHz and between 5 and 9 kHz in both planes and tends, therefore, to be the most directional condition. The larger mouth openings long and open tend to radiate most of the energy toward the floor within 3–5 kHz. This focusing of sound toward the floor occurs also at higher frequencies for the long condition, but this tendency decreases as the frequency increases further. Finally, we present a comparison of the acoustic pressure maps for the classical singer for the long and wide conditions in Fig. 13 in the horizontal and vertical planes.
The acoustic pressure maps show similar side lobes as in the results for the HATS with torso [cf. Fig. 6(b)]. Although in the higher frequency region, more complex patterns are visible. Due to a change of mouth opening from long to wide, the most affected frequency region is above 1.5 kHz. However, all conditions show quite complex sound radiation above 2 kHz. Several local minima can be seen for the long and wide configurations in both planes. In particular, we see local minima in both planes for the long configuration at 3.5 kHz and for the wide configuration at 4.5 and 8 kHz. In the horizontal plane, the patterns of the wide configuration are more complex above 4 kHz than the patterns of the long configuration. In general, the wide mouth configuration tends to show a narrower pattern from 1.5 kHz on.
IV. DISCUSSION
A. Simulations
All three of the models used show that, in general, an increase of mouth opening in width and height leads to an increase in directionality. The transition frequency for the piston model can be identified by visual inspection of the normalized acoustic pressure maps. Nevertheless, the piston model does not predict sound radiation of a human properly. The main beam width below the transition frequency is too broad as the model does not account for the diffraction around the head, and no side lobes due to torso reflections can be predicted.
The normalized acoustic pressure maps of the spherical cap model do not reveal a clear transition from an omnidirectional to directional sound radiation because the diffraction of the sphere decreases the amplitudes of the sound radiated toward the side already at very low frequencies. The size of the mouth opening affects the main beam width and how sound is diffracted around the head, which can be seen for frequencies below 2 kHz for the angles between 90° and 180°. If the size of the mouth opening is increased, amplitudes at 180° increase and decrease around 135°. The effect of increased amplitudes toward the back below 1 kHz is often referred to as the acoustic bright spot (Hecht, 2002). The spherical cap model predicts the sound radiation of a human quite well, but experimental accuracy is still far out of reach as it becomes inaccurate at frequencies above roughly 4 kHz.
The results of the extended spherical cap model show that transverse propagation modes occur already around 3.6 kHz for the simulated vocal tract models. The transverse propagation modes create complex directivity patterns with strong local maxima and minima dependent on which mouth configuration is used. For larger mouth openings, the number of radiating transverse propagation modes increases significantly and, therefore, the complexity of the radiation pattern also increases significantly. The results show that the transverse propagation modes affect the main beam direction in the vertical plane increasing the downward deflection if the mouth opening is large (long and open configuration).
However, none of the three presented models take into account the effect of the torso. Another influence on sound radiation that is not included is the effect of the lips, which can be quite large dependent on the phoneme, as discussed in Yoshinaga et al. (2018).
B. Torso influence (HATS)
The directivity patterns measured on the HATS without torso in Fig. 6(a) agree very well with the simulation results for the spherical cap model in Fig. 4(a). Deviations occur above 2 kHz, where the measurement shows a broadening of the main beam width at several frequencies, which is not visible in the simulation. However, the head of the HATS has been only approximated by a sphere in the simulations. The on-axis frequency response reveals notches at the corresponding frequencies of the broadenings. These deviations are most likely edge diffraction effects of the head. Such diffraction effects can be partly simulated (Svensson, 2017; Vanderkooy, 1991) but have been omitted in the current study due to the complexity of the geometry of the HATS. The comparison of DI values in Fig. 7 shows a maximum deviation of around 1.6 dB between simulation and measurement. The negative DI values in Fig. 7 also confirm the limitation of the index as an optimal indicator for the characterization of speech or singing directivity.
Most striking as a result is the influence of the torso on the directivity for frequencies around 1 kHz. The torso diffraction is visible as large side lobes, which also provoke a decrease of amplitude of about 5 dB on-axis at 1.1 kHz. This explains a similar observation reported for singer directivity measurements in Katz and D'Alessandro (2007, Fig. 6).
C. Comparison of test subjects and HATS
The most obvious finding to emerge from the measurement data is that, dependent on the body dimensions, a decrease of amplitude toward the front around 1 kHz occurs for both test subjects and the HATS. As the test subject's dimensions increase, the decrease of the amplitude toward the front and, therefore, the first valley for the HDI, VDI, and FBR values is shifted toward a lower frequency (Fig. 8). This can be attributed to the fact that sound is diffracted by the torso, which we discuss in Sec. IV B. The torso diffraction provokes distinct side lobes at and above 1 kHz (Fig. 6).
By comparing polar patterns at 1.1 kHz, 800 Hz, and 950 Hz for the HATS, the amateur singer, and the classical singer, we can observe a similarity of voice directivity. The similarity of these polar patterns underlines the link between torso dimensions and torso diffraction and its effect on voice directivity in the frequency range around 1 kHz. As the frequency increases, and especially above 3.6 kHz, this similarity will most likely decrease as parameters such as the morphology of the vocal tract overtake the influence of the torso diffraction. Nevertheless, it is obvious that diffraction of the head also plays a role above 1 kHz.
D. Influence of the mouth configuration on radiation
We see from the measurement data that differences in sound radiation occur if the mouth configuration is changed to a larger extent. It is interesting to note that the results for the classical singer clearly show asymmetries along the vertical and horizontal planes. These changes of voice directivity and the asymmetries occur in both planes and were expected from the simulation results. All of the presented metrics reveal distinct differences between conditions. The change of mouth configuration provokes an increase or decrease of the main beam width and a shift of the main direction, which is seen in the analysis with the energy vector (Fig. 12). Two strong effects occur if the mouth is opened widely. A decrease of the main beam width is seen in the frequency region from 1.5 to 3 kHz for the wide and open mouth in the horizontal plane. This is also visible as an increase of the FBR values in the same frequency region. Furthermore, a change of main direction toward the floor is provoked at higher frequencies (>3 kHz) on the order of 20° by lowering the jaw (long and open conditions), which is seen for θs of the energy vector in the vertical plane and the UDR values. This downward focusing occurs even though we recognized a slight upward inclination of the head for the long and open conditions (cf. Table I). The downward focusing effect is most likely explained by the influence of transverse propagation modes, which can be seen in the results for the long and open conditions in the vertical plane in Fig. 5.
The acoustic pressure maps show highly complex patterns at higher frequencies (>3 kHz). The deviations from the simulations are most likely explained by the facts that (i) no MRI data were available to model the vocal tract of the classical singer precisely, (ii) the head is only approximated by a sphere, and (iii) the torso is not taken into account.
Still, the results for the singers agree with those of Kocon and Monson (2018), Blandin et al. (2018), and Katz and D'Alessandro (2007), which show that different mouth configurations (vowel and singing technique) provoke changes in the voice directivity. This study has been able to demonstrate that this effect even occurs for the German vowel /a/ if different mouth openings are used. Although the change of mouth opening introduces a deviation in a phonetic sense, it was used in this study as it is common in singing.
V. CONCLUSION
As very little was found in the literature on the question on how well simple models approximate human voice directivity, directivity models with different levels of complexity were compared with measurements of singers and a HATS. The simulations are also used to better understand detailed singing voice directivity measurement data. We tried to predict sound radiation for the classical singer with simulations, including the transverse propagation modes. We adapted the mouth opening dimensions in the simulations according to the extracted mouth opening height and width from the video used by the singer during the measurements. Prior studies that have noted deviations in voice directivity between test subjects have not investigated the specific effects that occur due to differences of torso dimensions and mouth openings for one vowel.
All our simulation models predict a higher directivity if the size of the mouth opening is increased. The extended spherical cap model predicts a strong influence on the voice directivity of the transverse propagation modes from 3.6 kHz on. In accordance with expectations, the study did not show that the simulation results fully reassemble the measurements and vice versa. However, the simulation results allow one to identify the combined effects seen in the voice directivity measurements of the singers.
It cannot be omitted that a detailed cause-and-effect analysis of voice directivity is quite difficult due to the many influences like posture, vowel tract geometry, and subject differences. We minimized these effects in our study. The results show that a larger mouth opening reduces the main beam width by a magnitude of roughly 20° within the frequency range of 1–4 kHz and 5–9 kHz. This effect can be explained by a higher directivity due to a larger mouth opening, which also reduces diffraction at the torso. A second prominent effect is that a larger mouth opening provokes a shift of the main direction toward the floor for frequencies above 3 kHz.
With our analysis of voice directivity by simulations and measurements for different mouth openings, we give closer insight on singing voice directivity. The results indicate that sound can be focused to some extent toward the front by altering the mouth opening. We also show how the energy vector can be used as a more intuitive tool to analyze complex radiation patterns compared to the commonly used metrics.
Although the current study is based on a small number of participants, our findings agree well with results in the literature, which does allow us to generalize our findings to some extent. However, investigations on voice directivity in speech (Frank and Brandner, 2019) show that large changes in beam width are necessary to perceive an effect as a listener, which indicates that the observed effects may have little perceptual relevance. Therefore, the specific mouth configuration using in singing may be more directed toward increasing vocal production efficiency than on acting on the directionality.
ACKNOWLEDGMENTS
This work is partly supported by the project Augmented Practice-Room (1023), which is funded by the local government of Styria, and a part of this research was funded by the German Research Foundation (DFG), Grant No. BI 1639/7-1.
See supplementary material at https://doi.org/10.1121/10.0001736 for the normalized acoustic pressure maps for the classical and amateur singers for all mouth opening configurations (SupMat.pdf). Furthermore, we include the results of the energy vector for the amateur singer and the measured mouth openings.