Human voice directivity shows horizontal asymmetries caused by the shape of the lips or the position of the tooth and tongue during vocalization. This study presents and analyzes the asymmetries of voice directivity datasets of 23 different phonemes. The asymmetries were determined from datasets obtained in previous measurements with 13 subjects in a surrounding spherical microphone array. The results show that asymmetries are inherent to human voice production and that they differ between the phoneme groups with the strongest effect on the [s], the [l], and the nasals [m], [n], and [ŋ]. The least asymmetries were found for the plosives.
1. Introduction
Voice directivity and its relevance for human voice communication and room acoustic design have been investigated for more than 200 years (Henry, 1857; Saunders, 1790; Wyatt, 1813). The first microphone-based measurements were carried out by Trendelenburg (1929), studying the directivity patterns in the horizontal plane for several vowels and fricatives. Dunn and Farnsworth (1939) determined the spherical sound radiation for a spoken sentence. Since then, numerous follow-up studies have analyzed specific aspects of human voice directivity, such as phoneme dependencies (Katz and D'Alessandro, 2007; Kocon and Monson, 2018; Monson , 2012; Pörschmann and Arend, 2021, 2023), voice level (Marshall and Meyer, 1985), differences between speaking and singing (Monson , 2012), or gender differences (Monson , 2012). For a general overview of studies measuring voice directivity, please refer to Pörschmann (2023).
Human voice production is often considered to be symmetrical along the vertical plane, and in some studies, only the left or right hemisphere has been analyzed, e.g., in Kocon and Monson (2018) and Monson (2012). Moreover, because asymmetries are considered to be individual, they often diminish when measurement data are averaged over subjects, for example, in our recent studies (Pörschmann and Arend, 2021, 2023). As a result, horizontal asymmetries in human voice directivity have not yet been investigated in detail.
Even though many people lack dental symmetry, in many cases, this does not hinder chewing and swallowing, nor does it affect speech production in general (Abduo, 2016). Tooth and tongue position or the shape of the lips during articulation can affect voice radiation by diffraction and reflection, and thus become relevant when the dimensions causing the asymmetries are larger or in the same range as the wavelength. Accordingly, asymmetries in voice radiation are expected to be small at low frequencies and less relevant for phonemes with small mouth openings.
Another factor that most likely causes asymmetries in voice directivity is the position of the tongue during vocalization. In this context, Hamlet (1986) studied the tongue contact patterns for an [s] and an [l] using dynamic palatography and found both temporal and spatial asymmetries that even occurred during the transition of the phonemes into the vowel. Applying similar methods, Marchal and Espesser (1989) showed that the tongue moves asymmetrically during vocalization. More recently, Verhoeven (2019) compared the results of several previous studies and found that 83% of palatograms show an asymmetric pattern. Although several studies have demonstrated asymmetries in the human speech apparatus, none have examined how these asymmetries might affect human voice directivity. To the best of our knowledge, the only study that has identified and briefly mentioned asymmetries in voice directivity is that of Brandner (2020), who described directional asymmetries in the horizontal plane for a singer articulating an [a]. However, the asymmetries were neither quantified nor compared with those determined in the corresponding simulations of the same study.
In recent studies (Pörschmann and Arend, 2021, 2023), we determined full-spherical voice directivity patterns of various phonemes using a surrounding spherical microphone array. In these studies, however, the directivity patterns were averaged across subjects to examine differences between phonemes. By averaging, details of individual asymmetries of the directivity patterns were lost, which we now aim to better understand by determining both the dimensions and the phoneme dependencies of the asymmetries in more detail—thus gaining a deeper insight into human voice directivity. Therefore, we analyze asymmetries between the left and right hemispheres within and between phoneme groups and the extent to which they can be explained by articulation-dependent properties, such as the size of the mouth opening or the position of the tongue.
2. Materials
We used the publicly available voice directivity datasets (see Data Availability) from our previous studies that were measured with a surrounding spherical microphone array with a diameter of 2 m (Pörschmann and Arend, 2021, 2023). Please refer to these studies for a detailed description of the measurement methods of the datasets, which include two repeated measurements for 13 subjects (two female and 11 male) and included the following articulations:
-
Vowels: [a], [e], [i], [o], [u],
-
Plosives: [p], [t], [k], [b], [d], [g],
-
Unvoiced fricatives: [f], [s], [ʃ], [x], [h],
-
Voiced fricatives: [z], [v],
-
Nasals: [m], [n], [ŋ],
-
Voiced alveolars: [l], [r].
For further analysis, the datasets for each subject and phoneme were averaged over both repetitions. To visualize the voice directivity patterns, the upsampled datasets were interpolated to 360 directions in steps along the horizontal plane ( to , where positive angles point to the left; ), with describing the azimuth, and describing the elevation. Technically, this interpolation was done by transforming the upsampled dataset to the spherical harmonic (SH) domain with an SH order of N = 35 and then resampling the datasets to the desired positions using the inverse SH transform.
3. Results
3.1 Directivity difference
Figures 1–3 show the patterns of over frequency in the horizontal plane in a frequency range between 250 Hz and 8 kHz. The plots reveal a similar structure of for most phonemes, which, by definition, reaches zero at radiation exactly forward and backward. In general, the asymmetries increase toward higher frequencies as the voice directivity is affected by diffraction and reflection, which becomes relevant when the dimensions of the corresponding sound-emitting body are in the same range as the wavelength. In addition, the asymmetries increase for lateral and backward directions as the diffraction and shadowing effects of the head become stronger.
Directivity difference of the plosives [p], [t], [k], [b], [d], and [g].
Directivity difference of the unvoiced fricatives [f], [s], [ʃ], [x], and [h], voiced fricatives [z] and [v], nasals [m], [n], and [ŋ], and voiced alveolars [l] and [r].
Directivity difference of the unvoiced fricatives [f], [s], [ʃ], [x], and [h], voiced fricatives [z] and [v], nasals [m], [n], and [ŋ], and voiced alveolars [l] and [r].
Among the vowels as shown in Fig. 1, the strongest asymmetries can be observed for the [a], and the smallest for the [o] and the [u]. This reduction of the asymmetries goes along with the mouth opening size, which is largest for the [a] and decreases toward the [o] and the [u]. Consequently, for the plosives (Fig. 2), which also show small mouth opening sizes, the asymmetries are quite weak compared to most other phoneme groups. Within the group of plosives, the asymmetries appear to have comparable strength. The asymmetries of fricatives are shown in Fig. 3. Among the voiceless fricatives, the [s] has the strongest asymmetries with a maximum area centered at and 2 kHz. For the other voiceless fricatives ([f], [ʃ], [x]), the asymmetries are much weaker. The same can be observed for the [h], which, however, has a peak at about 7 kHz for . Comparing the [s] and the [z], which are voiceless and voiced counterparts, shows relevant differences and much lower asymmetries for the voiced [z]. For the [v], which is the other voiced fricative, the asymmetries are smaller than those of most of the other phonemes. Generally, an exception are the nasals, for which the asymmetries are already evident for slightly lateralized radiation directions at . Moreover, the asymmetries are more pronounced for nasals than for any other phoneme group. The asymmetries in the nasals probably have a different origin, which depends less on diffraction and shadowing and more on the unequal radiation of sound through the nostrils. Finally, for the voiced alveolars, the asymmetries are much weaker for the [r] than for the [l], which is affected by asymmetries even at small lateral angles and frequencies above 5 kHz.
In the supplementary material, we further show polar plots, which are commonly used to visualize directivity patterns, depicting in octave bands. However, as they do not resolve the fine structure over frequency, they are not further discussed in the paper.
3.2 Directivity index difference
We determined by averaging over all subjects for each phoneme. Figure 4 shows the means and standard deviations of for the different phonemes.2 The plots indicate some clear differences in asymmetry depending on phoneme and frequency, which was statistically supported by a Greenhouse–Geisser (GG) corrected (Greenhouse and Geisser, 1959) two-way repeated measures analysis of variance (ANOVA) with the within-subject factors phoneme (23 phonemes) and frequency (nineteen -octave frequency-bands from 125 Hz to 8 kHz). These findings reveal a significant main effect of phoneme [F(22 264) = 16.57, pGG < 0.001, =0.58, ϵ = 0.17] and frequency [F(18 216) = 61.50, pGG <0.001, =0.84, ϵ = 0.19], as well as a significant phoneme × frequency interaction effect [F(39 64 752) = 7.68, pGG < 0.001, =0.39, ϵ = 0.02]. Comparing the asymmetries across phonemes indicates that (a) the nasals generally have the highest, and (b) the plosives generally have the lowest . For the most part, both observations are supported by statistical analysis. In this sense, we performed pairwise comparisons between the frequency-dependent means of each phoneme group (i.e., averaged across the respective phonemes of each group) using nested GG-corrected two-way repeated measures ANOVAs with the within-subject factors frequency and phoneme group. The initial significance level of 0.05 was further corrected according to Hochberg (1988) to prevent α-error accumulation. The ANOVAs yielded significant main effects of phoneme group as well as significant frequency × phoneme group interaction effects for all five comparisons involving the nasals and a significant main effect of phoneme group for four out of five comparisons involving the plosives (only no significant main effect for plosives vs vowels), all with 0.001 (for the sake of conciseness, the statistical results of the nested ANOVAs are not reported in greater detail throughout the manuscript).
Mean values and standard deviations of ( octave smoothed) for the phoneme groups. For comparison in each plot, the curve averaged over all phonemes is shown.
Mean values and standard deviations of ( octave smoothed) for the phoneme groups. For comparison in each plot, the curve averaged over all phonemes is shown.
Comparing the asymmetries within the phoneme groups leads to further interesting observations, which we analyzed statistically by pairwise comparisons between the different phonemes in their groups using nested GG-corrected two-way repeated measures ANOVAs with the within-subject factors, frequency and phoneme. Again, Hochberg correction was applied to correct for multiple hypothesis testing. Within the vowel group, the curves for [e] and [i] as well as for [o] and [u] are quite similar, and in general, the differences in occur mainly above 4 kHz. The ANOVAs revealed significant main effects of phoneme for [a] vs [o], [a] vs [u], [e] vs [u], and [i] vs [u] as well as significant frequency × phoneme interaction effects for [a] vs [o] and [a] vs [u] (all , indicating clear frequency-dependent differences between the asymmetries for the vowels. In the group of the unvoiced fricatives, the [s] sticks out with an that ranges from 0.5 to 2 dB above the average over all phonemes for f < 4 kHz. The statistical analysis supported the rather strong deviation of the [s] from the other unvoiced fricatives, with significant main effects of phoneme for [s] vs [ʃ] and [s] vs [h] (all . The voiced alveolar [l] has an that is between 0.5 and 2 dB above the average over all phonemes for kHz and furthermore differs considerably from the [r], which is further confirmed by a significant main effect of phoneme and a significant frequency × phoneme interaction effect for the pairwise comparison [l] vs [r] (all . In contrast, within the groups of plosives, voiced fricatives, and nasals, the respective phonemes show similar asymmetry behavior, which is further supported by the absence of statistically significant differences between the various phonemes within their groups.
4. Discussion
From the measured datasets presented in Pörschmann and Arend (2021, 2023), we determined asymmetries in voice directivity, which vary between the phoneme groups and to some extent within the groups. While they are significantly smaller for the plosives than for the other phoneme groups, we found the strongest asymmetries for the [s], the [l], and the nasals.
As already discussed in Brandner (2020) and Pörschmann and Arend (2021), vowels vary in their mouth opening size, with the largest mouth opening for an [a], the smallest for an [o], and an [u]. Similar to how the mouth opening size affects the DI, it also affects the . The simulations of Brandner (2020) could be extended by analyzing asymmetric mouth opening shapes to investigate to what extent the shape of the mouth opening affects asymmetries in voice radiation. For the plosives, the results are consistent with our findings from Pörschmann and Arend (2023) to the extent that both the directivity patterns and the asymmetries only show small deviations within this phoneme group. It is reasonable to assume that, due to the small mouth opening size, the plosives are generally not susceptible to changes in directivity, whether due to individual differences, phoneme differences, or asymmetries, such as those investigated in the present study.
Using dynamic palatography, Hamlet (1986) analyzed the asymmetries of the tongue during articulation of an [s] and an [l]. The authors determined asymmetries by the maximum contact pattern and found both spatial and temporal asymmetries. While spatial asymmetries refer to different contact patterns for the left and right side, temporal ones describe that tongue contacts occurred systematically earlier or remained longer on one side. Since both can contribute to asymmetries in voice emission, it is reasonable to assume that these two phonemes also result in strong asymmetries in voice radiation in the present study. For the [s], is increased relative to the average in a wide frequency range from 200 Hz to 4 kHz. For the [l], the increase is limited to a range between 4 and 8 kHz. Surprisingly, the [s] and the [z], which are unvoiced and voiced counterparts, do not show comparable asymmetries. While the [s] shows one of the strongest asymmetries of the examined phonemes, the values for the [z] are close to the average values.
In literature, there are no studies at all on the asymmetries in voice directivity of the nasals, for which we found the absolute highest values of . These asymmetries could be caused by non-uniform sound radiation from the nostrils, e.g., in case of a cold. Furthermore, even for healthy persons, the nasal mucosa performs spontaneous, reciprocal congestion and decongestion during the day, which also leads to a varying nasal airflow (Kayser, 1895; Pendolino , 2018). As a result, the nasal sound radiation could be dominated by one of the nostrils, resulting in asymmetries in voice directivity toward higher frequencies. Based on the simulation of how nasal cavities affect articulation (Vampola , 2020), radiation patterns could be determined that take into account horizontal asymmetries of the nasals.
The perceptual relevance of asymmetries in human voice directivity needs to be investigated in more detail in future research. Based on these results, the representation of the asymmetries in virtual reality and spatial audio systems could be addressed. Furthermore, it needs to be investigated whether the asymmetries occur in a similar way in fluent speech and to what extent they have time-varying properties. In addition, the causes of the asymmetries could be studied in more detail. This could be done using model-based approaches that take into account slight asymmetric characteristics of the mouth opening, tongue position, or teeth position. Alternatively, approaches based on ultrasound measurement or electropalatography methods could be used.
Supplementary material
See the supplementary material for an analysis of the difference in the asymmetries between the repeated measurements; for plots showing horizontal-plane DIs for the left and right hemisphere as well as the corresponding horizontal-plane Δ DIs; and for additional plots for the signed Δ DIs.
Acknowledgments
The research presented in this paper has been partly funded by the Federal Ministry of Education and Research in Germany, support code BMBF 03FH014IX5NarDasS, and by the German Research Foundation under Grant No. DFG WE 4057/21-1.
Author Declarations
Conflict of Interest
The authors have no conflicts of interest to disclose.
Data Availability
The datasets of all human speakers can be accessed at https://doi.org/10.5281/zenodo.7452117.
Other studies, e.g., Brandner (2020), analyzed the directivity index for directions in the horizontal plane only. Please refer to the supplementary material for plots showing horizontal-plane DIs for the left and right hemisphere as well as the corresponding horizontal-plane .
In the supplementary material, we show additional plots for the signed . The signed are much smaller than the unsigned ones, indicating that there is no strong systematic trend of toward one side.