Human voice directivity shows horizontal asymmetries caused by the shape of the lips or the position of the tooth and tongue during vocalization. This study presents and analyzes the asymmetries of voice directivity datasets of 23 different phonemes. The asymmetries were determined from datasets obtained in previous measurements with 13 subjects in a surrounding spherical microphone array. The results show that asymmetries are inherent to human voice production and that they differ between the phoneme groups with the strongest effect on the [s], the [l], and the nasals [m], [n], and [ŋ]. The least asymmetries were found for the plosives.

Voice directivity and its relevance for human voice communication and room acoustic design have been investigated for more than 200 years (Henry, 1857; Saunders, 1790; Wyatt, 1813). The first microphone-based measurements were carried out by Trendelenburg (1929), studying the directivity patterns in the horizontal plane for several vowels and fricatives. Dunn and Farnsworth (1939) determined the spherical sound radiation for a spoken sentence. Since then, numerous follow-up studies have analyzed specific aspects of human voice directivity, such as phoneme dependencies (Katz and D'Alessandro, 2007; Kocon and Monson, 2018; Monson , 2012; Pörschmann and Arend, 2021, 2023), voice level (Marshall and Meyer, 1985), differences between speaking and singing (Monson , 2012), or gender differences (Monson , 2012). For a general overview of studies measuring voice directivity, please refer to Pörschmann (2023).

Human voice production is often considered to be symmetrical along the vertical plane, and in some studies, only the left or right hemisphere has been analyzed, e.g., in Kocon and Monson (2018) and Monson (2012). Moreover, because asymmetries are considered to be individual, they often diminish when measurement data are averaged over subjects, for example, in our recent studies (Pörschmann and Arend, 2021, 2023). As a result, horizontal asymmetries in human voice directivity have not yet been investigated in detail.

Even though many people lack dental symmetry, in many cases, this does not hinder chewing and swallowing, nor does it affect speech production in general (Abduo, 2016). Tooth and tongue position or the shape of the lips during articulation can affect voice radiation by diffraction and reflection, and thus become relevant when the dimensions causing the asymmetries are larger or in the same range as the wavelength. Accordingly, asymmetries in voice radiation are expected to be small at low frequencies and less relevant for phonemes with small mouth openings.

Another factor that most likely causes asymmetries in voice directivity is the position of the tongue during vocalization. In this context, Hamlet (1986) studied the tongue contact patterns for an [s] and an [l] using dynamic palatography and found both temporal and spatial asymmetries that even occurred during the transition of the phonemes into the vowel. Applying similar methods, Marchal and Espesser (1989) showed that the tongue moves asymmetrically during vocalization. More recently, Verhoeven (2019) compared the results of several previous studies and found that 83% of palatograms show an asymmetric pattern. Although several studies have demonstrated asymmetries in the human speech apparatus, none have examined how these asymmetries might affect human voice directivity. To the best of our knowledge, the only study that has identified and briefly mentioned asymmetries in voice directivity is that of Brandner (2020), who described directional asymmetries in the horizontal plane for a singer articulating an [a]. However, the asymmetries were neither quantified nor compared with those determined in the corresponding simulations of the same study.

In recent studies (Pörschmann and Arend, 2021, 2023), we determined full-spherical voice directivity patterns of various phonemes using a surrounding spherical microphone array. In these studies, however, the directivity patterns were averaged across subjects to examine differences between phonemes. By averaging, details of individual asymmetries of the directivity patterns were lost, which we now aim to better understand by determining both the dimensions and the phoneme dependencies of the asymmetries in more detail—thus gaining a deeper insight into human voice directivity. Therefore, we analyze asymmetries between the left and right hemispheres within and between phoneme groups and the extent to which they can be explained by articulation-dependent properties, such as the size of the mouth opening or the position of the tongue.

We used the publicly available voice directivity datasets (see Data Availability) from our previous studies that were measured with a surrounding spherical microphone array with a diameter of 2 m (Pörschmann and Arend, 2021, 2023). Please refer to these studies for a detailed description of the measurement methods of the datasets, which include two repeated measurements for 13 subjects (two female and 11 male) and included the following articulations:

  • Vowels: [a], [e], [i], [o], [u],

  • Plosives: [p], [t], [k], [b], [d], [g],

  • Unvoiced fricatives: [f], [s], [ʃ], [x], [h],

  • Voiced fricatives: [z], [v],

  • Nasals: [m], [n], [ŋ],

  • Voiced alveolars: [l], [r].

For further analysis, the datasets for each subject and phoneme were averaged over both repetitions. To visualize the voice directivity patterns, the upsampled datasets were interpolated to 360 directions in 1 ° steps along the horizontal plane ( ϕ = 180 ° to 180 °, where positive angles point to the left; θ = 0 °), with ϕ describing the azimuth, and θ describing the elevation. Technically, this interpolation was done by transforming the upsampled dataset to the spherical harmonic (SH) domain with an SH order of N = 35 and then resampling the datasets to the desired positions using the inverse SH transform.

First, we give a descriptive overview of the differences in directivity between the left and right hemispheres. For this, we calculated the directivity patterns D in the horizontal plane and determined the absolute difference in dB between the corresponding directions of the left and right hemispheres:
(1)
with the frontal direction denoted by ϕ 0 , θ 0. From this, we obtained Δ D ¯ by averaging Δ D for each phoneme over all 13 subjects.

Figures 1–3 show the patterns of Δ D ¯ over frequency in the horizontal plane in a frequency range between 250 Hz and 8 kHz. The plots reveal a similar structure of Δ D ¯ for most phonemes, which, by definition, reaches zero at radiation exactly forward and backward. In general, the asymmetries increase toward higher frequencies as the voice directivity is affected by diffraction and reflection, which becomes relevant when the dimensions of the corresponding sound-emitting body are in the same range as the wavelength. In addition, the asymmetries increase for lateral and backward directions as the diffraction and shadowing effects of the head become stronger.

Fig. 1.

Directivity difference Δ D ¯ of the vowels [a], [e], [i], [o], and [u].

Fig. 1.

Directivity difference Δ D ¯ of the vowels [a], [e], [i], [o], and [u].

Close modal
Fig. 2.

Directivity difference Δ D ¯ of the plosives [p], [t], [k], [b], [d], and [g].

Fig. 2.

Directivity difference Δ D ¯ of the plosives [p], [t], [k], [b], [d], and [g].

Close modal
Fig. 3.

Directivity difference Δ D ¯ of the unvoiced fricatives [f], [s], [ʃ], [x], and [h], voiced fricatives [z] and [v], nasals [m], [n], and [ŋ], and voiced alveolars [l] and [r].

Fig. 3.

Directivity difference Δ D ¯ of the unvoiced fricatives [f], [s], [ʃ], [x], and [h], voiced fricatives [z] and [v], nasals [m], [n], and [ŋ], and voiced alveolars [l] and [r].

Close modal

Among the vowels as shown in Fig. 1, the strongest asymmetries can be observed for the [a], and the smallest for the [o] and the [u]. This reduction of the asymmetries goes along with the mouth opening size, which is largest for the [a] and decreases toward the [o] and the [u]. Consequently, for the plosives (Fig. 2), which also show small mouth opening sizes, the asymmetries are quite weak compared to most other phoneme groups. Within the group of plosives, the asymmetries appear to have comparable strength. The asymmetries of fricatives are shown in Fig. 3. Among the voiceless fricatives, the [s] has the strongest asymmetries with a maximum area centered at ϕ = 135 ° and 2 kHz. For the other voiceless fricatives ([f], [ʃ], [x]), the asymmetries are much weaker. The same can be observed for the [h], which, however, has a peak at about 7 kHz for ϕ 90 °. Comparing the [s] and the [z], which are voiceless and voiced counterparts, shows relevant differences and much lower asymmetries for the voiced [z]. For the [v], which is the other voiced fricative, the asymmetries are smaller than those of most of the other phonemes. Generally, an exception are the nasals, for which the asymmetries are already evident for slightly lateralized radiation directions at ϕ 30 °. Moreover, the asymmetries are more pronounced for nasals than for any other phoneme group. The asymmetries in the nasals probably have a different origin, which depends less on diffraction and shadowing and more on the unequal radiation of sound through the nostrils. Finally, for the voiced alveolars, the asymmetries are much weaker for the [r] than for the [l], which is affected by asymmetries even at small lateral angles ϕ 30 ° and frequencies above 5 kHz.

In the supplementary material, we further show polar plots, which are commonly used to visualize directivity patterns, depicting Δ D ¯ in octave bands. However, as they do not resolve the fine structure over frequency, they are not further discussed in the paper.

In a subsequent step, we both descriptively and statistically analyzed the asymmetries based on the directivity index (DI), which is a measure describing the spherical directivity by a single frequency-dependent value. We calculated the DI according to Long (2014), based on a spherical grid to be consistent with our previous studies.1 To examine the asymmetries, we determined separate DIs for the left hemisphere
(2)
and the right hemisphere
(3)
and then described the asymmetries by the difference between the DIs of the hemispheres as
(4)

We determined Δ D I ¯ by averaging Δ D I over all subjects for each phoneme. Figure 4 shows the means and standard deviations of Δ D I ¯ for the different phonemes.2 The plots indicate some clear differences in asymmetry depending on phoneme and frequency, which was statistically supported by a Greenhouse–Geisser (GG) corrected (Greenhouse and Geisser, 1959) two-way repeated measures analysis of variance (ANOVA) with the within-subject factors phoneme (23 phonemes) and frequency (nineteen 1 3-octave frequency-bands from 125 Hz to 8 kHz). These findings reveal a significant main effect of phoneme [F(22 264) = 16.57, pGG < 0.001, η p 2 =0.58, ϵ = 0.17] and frequency [F(18 216) = 61.50, pGG <0.001, η p 2 =0.84, ϵ = 0.19], as well as a significant phoneme × frequency interaction effect [F(39 64 752) = 7.68, pGG < 0.001, η p 2 =0.39, ϵ = 0.02]. Comparing the asymmetries across phonemes indicates that (a) the nasals generally have the highest, and (b) the plosives generally have the lowest Δ D I ¯. For the most part, both observations are supported by statistical analysis. In this sense, we performed pairwise comparisons between the frequency-dependent means of each phoneme group (i.e., averaged Δ D I ¯ across the respective phonemes of each group) using nested GG-corrected two-way repeated measures ANOVAs with the within-subject factors frequency and phoneme group. The initial significance level of 0.05 was further corrected according to Hochberg (1988) to prevent α-error accumulation. The ANOVAs yielded significant main effects of phoneme group as well as significant frequency × phoneme group interaction effects for all five comparisons involving the nasals and a significant main effect of phoneme group for four out of five comparisons involving the plosives (only no significant main effect for plosives vs vowels), all with p G G  0.001 (for the sake of conciseness, the statistical results of the nested ANOVAs are not reported in greater detail throughout the manuscript).

Fig. 4.

Mean values and standard deviations of Δ D I ¯ ( 1 3 octave smoothed) for the phoneme groups. For comparison in each plot, the curve averaged over all phonemes is shown.

Fig. 4.

Mean values and standard deviations of Δ D I ¯ ( 1 3 octave smoothed) for the phoneme groups. For comparison in each plot, the curve averaged over all phonemes is shown.

Close modal

Comparing the asymmetries within the phoneme groups leads to further interesting observations, which we analyzed statistically by pairwise comparisons between the different phonemes in their groups using nested GG-corrected two-way repeated measures ANOVAs with the within-subject factors, frequency and phoneme. Again, Hochberg correction was applied to correct for multiple hypothesis testing. Within the vowel group, the curves for [e] and [i] as well as for [o] and [u] are quite similar, and in general, the differences in Δ D I ¯ occur mainly above 4 kHz. The ANOVAs revealed significant main effects of phoneme for [a] vs [o], [a] vs [u], [e] vs [u], and [i] vs [u] as well as significant frequency × phoneme interaction effects for [a] vs [o] and [a] vs [u] (all p G G 0 . 006 ), indicating clear frequency-dependent differences between the asymmetries for the vowels. In the group of the unvoiced fricatives, the [s] sticks out with an Δ D I ¯ that ranges from 0.5 to 2 dB above the average over all phonemes for f < 4 kHz. The statistical analysis supported the rather strong deviation of the [s] from the other unvoiced fricatives, with significant main effects of phoneme for [s] vs [ʃ] and [s] vs [h] (all p G G 0.003 ). The voiced alveolar [l] has an Δ D I ¯ that is between 0.5 and 2 dB above the average over all phonemes for f 4 kHz and furthermore differs considerably from the [r], which is further confirmed by a significant main effect of phoneme and a significant frequency × phoneme interaction effect for the pairwise comparison [l] vs [r] (all p G G 0.007 ). In contrast, within the groups of plosives, voiced fricatives, and nasals, the respective phonemes show similar asymmetry behavior, which is further supported by the absence of statistically significant differences between the various phonemes within their groups.

From the measured datasets presented in Pörschmann and Arend (2021, 2023), we determined asymmetries in voice directivity, which vary between the phoneme groups and to some extent within the groups. While they are significantly smaller for the plosives than for the other phoneme groups, we found the strongest asymmetries for the [s], the [l], and the nasals.

As already discussed in Brandner (2020) and Pörschmann and Arend (2021), vowels vary in their mouth opening size, with the largest mouth opening for an [a], the smallest for an [o], and an [u]. Similar to how the mouth opening size affects the DI, it also affects the Δ D I ¯. The simulations of Brandner (2020) could be extended by analyzing asymmetric mouth opening shapes to investigate to what extent the shape of the mouth opening affects asymmetries in voice radiation. For the plosives, the results are consistent with our findings from Pörschmann and Arend (2023) to the extent that both the directivity patterns and the asymmetries only show small deviations within this phoneme group. It is reasonable to assume that, due to the small mouth opening size, the plosives are generally not susceptible to changes in directivity, whether due to individual differences, phoneme differences, or asymmetries, such as those investigated in the present study.

Using dynamic palatography, Hamlet (1986) analyzed the asymmetries of the tongue during articulation of an [s] and an [l]. The authors determined asymmetries by the maximum contact pattern and found both spatial and temporal asymmetries. While spatial asymmetries refer to different contact patterns for the left and right side, temporal ones describe that tongue contacts occurred systematically earlier or remained longer on one side. Since both can contribute to asymmetries in voice emission, it is reasonable to assume that these two phonemes also result in strong asymmetries in voice radiation in the present study. For the [s], Δ D I ¯ is increased relative to the average in a wide frequency range from 200 Hz to 4 kHz. For the [l], the increase is limited to a range between 4 and 8 kHz. Surprisingly, the [s] and the [z], which are unvoiced and voiced counterparts, do not show comparable asymmetries. While the [s] shows one of the strongest asymmetries of the examined phonemes, the values for the [z] are close to the average values.

In literature, there are no studies at all on the asymmetries in voice directivity of the nasals, for which we found the absolute highest values of Δ D I ¯. These asymmetries could be caused by non-uniform sound radiation from the nostrils, e.g., in case of a cold. Furthermore, even for healthy persons, the nasal mucosa performs spontaneous, reciprocal congestion and decongestion during the day, which also leads to a varying nasal airflow (Kayser, 1895; Pendolino , 2018). As a result, the nasal sound radiation could be dominated by one of the nostrils, resulting in asymmetries in voice directivity toward higher frequencies. Based on the simulation of how nasal cavities affect articulation (Vampola , 2020), radiation patterns could be determined that take into account horizontal asymmetries of the nasals.

The perceptual relevance of asymmetries in human voice directivity needs to be investigated in more detail in future research. Based on these results, the representation of the asymmetries in virtual reality and spatial audio systems could be addressed. Furthermore, it needs to be investigated whether the asymmetries occur in a similar way in fluent speech and to what extent they have time-varying properties. In addition, the causes of the asymmetries could be studied in more detail. This could be done using model-based approaches that take into account slight asymmetric characteristics of the mouth opening, tongue position, or teeth position. Alternatively, approaches based on ultrasound measurement or electropalatography methods could be used.

See the supplementary material for an analysis of the difference in the asymmetries between the repeated measurements; for plots showing horizontal-plane DIs for the left and right hemisphere as well as the corresponding horizontal-plane Δ DIs; and for additional plots for the signed Δ DIs.

The research presented in this paper has been partly funded by the Federal Ministry of Education and Research in Germany, support code BMBF 03FH014IX5NarDasS, and by the German Research Foundation under Grant No. DFG WE 4057/21-1.

The authors have no conflicts of interest to disclose.

The datasets of all human speakers can be accessed at https://doi.org/10.5281/zenodo.7452117.

1

Other studies, e.g., Brandner (2020), analyzed the directivity index for directions in the horizontal plane only. Please refer to the supplementary material for plots showing horizontal-plane DIs for the left and right hemisphere as well as the corresponding horizontal-plane Δ DIs.

2

In the supplementary material, we show additional plots for the signed Δ DIs. The signed Δ DIs are much smaller than the unsigned ones, indicating that there is no strong systematic trend of Δ D I toward one side.

1.
Abduo
,
J.
(
2016
). “
Morphological symmetry of maxillary anterior teeth before and after prosthodontic planning: Comparison between conventional and digital diagnostic wax-ups
,”
Med. Princ. Pract.
25
(
3
),
276
281
.
2.
Brandner
,
M.
,
Blandin
,
R.
,
Frank
,
M.
, and
Sontacchi
,
A.
(
2020
). “
A pilot study on the influence of mouth configuration and torso on singing voice directivity
,”
J. Acoust. Soc. Am.
148
(
3
),
1169
1180
.
3.
Dunn
,
H. K.
, and
Farnsworth
,
D. W.
(
1939
). “
Exploration of pressure field around the human head during speech
,”
J. Acoust. Soc. Am.
10
,
184
199
.
4.
Greenhouse
,
S. W.
, and
Geisser
,
S.
(
1959
). “
On methods in the analysis of profile data
,”
Psychometrika
24
(
2
),
95
112
.
5.
Hamlet
,
S. L.
,
Bunnell
,
H. T.
, and
Struntz
,
B.
(
1986
). “
Articulatory asymmetries
,”
J. Acoust. Soc. Am.
79
(
4
),
1164
1169
.
6.
Henry
,
J.
(
1857
).
Annual Report of the Board of Regents of the Smithsonian Institution Technical Report
(
A. G. F. Nicholson
,
Washington, DC
).
7.
Hochberg
,
Y.
(
1988
). “
A sharper Bonferroni procedure for multiple tests of significance
,”
Biometrika
75
(
4
),
800
802
.
8.
Katz
,
B.
, and
D'Alessandro
,
C.
(
2007
). “
Directivity measurements of the singing voice
,” in
Proceedings of the 19th International Congress on Acoustics
, Madrid, Spain (Sociedad Espanola de Acustica, Madrid, Spain), pp. 1–6.
9.
Kayser
,
R.
(
1895
). “
Die exakte Messung der Luftdurchgängigkeit der Nase
” (“Precise measurement of the air patency of the nose”), in
Archiv Für Laryngologie Und Rhinologie
(
Verlag August Hirschwald
), Vol. 3, pp.
101
120
.
10.
Kocon
,
P.
, and
Monson
,
B. B.
(
2018
). “
Horizontal directivity patterns differ between vowels extracted from running speech
,”
J. Acoust. Soc. Am.
144
(
1
),
EL7
EL12
.
11.
Long
,
M.
(
2014
).
Architectural Acoustics
(
Elsevier
,
Burlington MA
).
12.
Marchal
,
A.
, and
Espesser
,
R.
(
1989
). “
L'asymétrie des appuis linguo-palatins
” (“The asymmetry of the linguo-palatal support”),
J. D'Acoustique
2
,
53
57
.
13.
Marshall
,
A. H.
, and
Meyer
,
J.
(
1985
). “
The directivity and auditory impressions of singers
,”
Acustica
58
,
130
140
.
14.
Monson
,
B. B.
,
Hunter
,
E. J.
, and
Story
,
B. H.
(
2012
). “
Horizontal directivity of low- and high-frequency energy in speech and singing
,”
J. Acoust. Soc. Am.
132
(
1
),
433
441
.
15.
Pendolino
,
A.
,
Lund
,
V.
,
Nardello
,
E.
, and
Ottaviano
,
G.
(
2018
). “
The nasal cycle: A comprehensive review
,”
Rhinol.
1
(
1
),
67
76
.
16.
Pörschmann
,
C.
(
2023
). “
A database for the comparison of measured datasets of human voice directivity
,” in
Proc. Forum Acusticum
Torino, Italy (
European Acoustic Association
), pp.
4131
4138
.
17.
Pörschmann
,
C.
, and
Arend
,
J. M.
(
2021
). “
Investigating phoneme-dependencies of spherical voice directivity patterns
,”
J. Acoust. Soc. Am.
149
(
6
),
4553
4564
.
18.
Pörschmann
,
C.
, and
Arend
,
J. M.
(
2023
). “
Investigating phoneme-dependencies of spherical voice directivity patterns II: Various groups of phonemes
,”
J. Acoust. Soc. Am.
153
(
1
),
179
190
.
19.
Saunders
,
G.
(
1790
).
Treatise on Theaters
(
I. and J. Taylor
,
London
).
20.
Trendelenburg
,
F.
(
1929
). “
Beitrag zur Frage der Stimmrichtwirkung
” (“Contribution to the question of voice directivity”),
Z. Tech. Phys.
11
,
558
563
.
21.
Vampola
,
T.
,
Horáček
,
J.
,
Radolf
,
V.
,
Švec
,
J. G.
, and
Laukkanen
,
A.-M.
(
2020
). “
Influence of nasal cavities on voice quality: Computer simulations and experiments
,”
J. Acoust. Soc. Am.
148
(
5
),
3218
3231
.
22.
Verhoeven
,
J.
,
Marien
,
P.
,
De Clerck
,
I.
,
Daems
,
L.
,
Reyes-Aldasoro
,
C. C.
, and
Miller
,
N.
(
2019
). “
Asymmetries in speech articulation as reflected on palatograms: A meta-study
,” in
Proceedings of the 19th International Congress of Phonetic Sciences, ICPhS 2019
,
Melbourne, Australia, August 5–9, 2019, Melbourne
, Australia pp.
2821
2825
.
23.
Wyatt
,
B.
(
1813
).
Observation on the Design for the Theatre Royal, Drury Lane
(
J. Taylor
,
London
).

Supplementary Material