This exploratory study compared vowel space area (VSA) in face-to-face situations and video conference situations using the software Zoom. Twenty native German participants read word lists recorded before and after spontaneous conversation. The overall VSA in Zoom was reduced significantly by 11.9%, with a more reduced VSA before and less reduction after the spontaneous conversation. Of nine peripheral vowels in German, /aː iː yː/ showed a significantly reduced Euclidean distance to the centroid of the vowel space. The observed hypoarticulation is discussed in light of the experimental setup, situational differences, and less involvement in Zoom than in face-to-face situations.

Since the COVID-19 pandemic, the usage of audio-visual means of communication has increased to a great extent. Concomitantly, the possibility of recording dialogue data in a face-to-face situation (F2F) has been severely restricted and the use of audio-visual services for research purposes has become more relevant (Lupton, 2021). In this respect, recent research has focused on the question of whether these new means of communication technically alter the phonetic signal. It has been found that video conference applications, though differing in the transmission of the signal, “transmit formant information fairly accurately” (Freeman and De Decker, 2021) and that any discrepancies between the raw signal and the signal recorded by the applications can effectively be minimized by normalization (Calder , 2022). Apart from these technical issues, it is so far unclear whether the change from a F2F to a video conference-based situation has an impact on the phonetic realization of the speakers, that is, whether there is a register effect. Register refers to intra-individual language variation induced by different situational characteristics (e.g., medium of communication, relations among participants, setting, communicative purposes; cf. Biber and Conrad, 2009). For example, video conferences might cause speakers to increase their vowel space in order to be understood better when communicating remotely. However, it could also be that the situational characteristics of remote communication go along with a decreased feeling of connectedness towards the interlocutor, leading to less effort in articulation and, thus, to a decreased vowel space.

This exploratory study aims at closing the research gap on possible phonetic effects of the video conference situation by investigating the quality of German vowels in read speech in a F2F and a video conference situation; in both situations, speakers are recorded with device-independent, separately placed external microphones. Our research question is as follows: Do speakers alter their vowel space when they read word lists to their interlocutors via Zoom as opposed to in a F2F situation?

We use BeDiaCo-videocall, a subcorpus of BeDiaCo (Berlin Dialogue Corpus v.2) (Belz , 2021). It contains read word lists and spontaneous dialogues (approx. 45 min) in a F2F situation and a video conference situation via Zoom (Zoom Video Communications, Inc., San Jose, CA). The word lists investigated were collected in the beginning and at the end of the experiment in the presence of the other participant (either F2F or via Zoom). Between both lists, free and task-based spontaneous dialogues were elicited, which are not examined here. Target words were embedded in the carrier sentence “Sage X bitte” “Say X please” (cf. Table 1). All 15 full vowels of German (Kohler, 1999) were situated between two stops in the accented first syllable of disyllabic words. Each word occurred once per list. The corner vowels /aː iː uː/ were read four times per list, once in words ending with schwa [ə], and once in words ending with the near-open central vowel [ɐ]. Word lists were chosen for this study since they contain the same number of all vowels uttered by each participant.

Table 1.

Target words and analyzed vowels.

Word Vowel Word Vowel Word Vowel
piepe “beep.1.SG”  [iː]  Güte “kindness”  [yː]  Bube “boy”  [uː] 
Kippe “dump”  [ɪ]  bücke “bend down.1.SG”  [ʏ]  Puppe “doll”  [ʊ] 
Beete “flower bed.PL”  [eː]  böte “offer.3.SG.SBJV”  [øː]  Bote “messenger”  [oː] 
bäte “ask.3.SG.SBJV”  [ɛː]  Böcke “goat.PL”  [œ]  Pocke “pock.SG”  [ɔ] 
Kette “chain”  [ɛ]  Tage “day.PL”  [aː]  packe “pack.1.SG”  [a] 
Pieper “beeper”  [iː]  Kaper “caper”  [aː]  Puder “powder”  [uː] 
Word Vowel Word Vowel Word Vowel
piepe “beep.1.SG”  [iː]  Güte “kindness”  [yː]  Bube “boy”  [uː] 
Kippe “dump”  [ɪ]  bücke “bend down.1.SG”  [ʏ]  Puppe “doll”  [ʊ] 
Beete “flower bed.PL”  [eː]  böte “offer.3.SG.SBJV”  [øː]  Bote “messenger”  [oː] 
bäte “ask.3.SG.SBJV”  [ɛː]  Böcke “goat.PL”  [œ]  Pocke “pock.SG”  [ɔ] 
Kette “chain”  [ɛ]  Tage “day.PL”  [aː]  packe “pack.1.SG”  [a] 
Pieper “beeper”  [iː]  Kaper “caper”  [aː]  Puder “powder”  [uː] 

Twenty native German speakers who were familiar with each other (roommates, couples, siblings; 10 females, 10 males, mean age = 25.7 years, standard deviation = 3.8) and showed no strong characteristics of German dialects were recorded with separate microphones in both the F2F and the video conference situation. Regarding the use of videoconferencing, 13 participants reported using Zoom daily or weekly, seven monthly or never; 15 participants reported feeling comfortable or very comfortable, five neither comfortable nor uncomfortable. In the F2F situation [Fig. 1(C)], the participants were seated opposite to each other at a distance of 1.5 m in a sound-attenuated booth and wore neckband headsets (beyerdynamic Opus 54, beyerdynamic GmbH & Co. KG, Heilbronn, Germany). In the Zoom situation, participants were seated in two adjacent rooms [one in the booth, one in a regular office; cf. Figs. 1(A) and 1(B)]. The participants in the Zoom situation could see each other in full screen mode on tablets [Lenovo, Lenovo (Germany) GmbH, Stuttgart, Germany; 10.1 in. screen diagonal], which were placed approximately 30 cm away from the table edge. Interlocutors wore circumaural headphones to prevent the output signal of the tablets from being recorded. The built-in microphone of the tablets served as the source of communication. Participants were recorded with omnidirectional (Sennheiser Me62, Sennheiser electronic GmbH & Co. KG, Wedemark, Germany) and cardioid rod (Sennheiser Me64) microphones placed in front of them. This way, the acoustic signal was not recorded via Zoom, but separately. Although the mics differed slightly in the formant properties they captured, the differences were negligible when tested by one of the authors reading the same list to all the microphones (see mic-comparison analysis in the online data repository).

Fig. 1.

Experimental setup (Miriam Müller, 2023, CC BY 4.0). (A) and (B) In the video conference setting, the participants were seated in separate rooms and recorded with two rod microphones. Headphones (output) were used to listen to the speech of the other interlocutor, who was connected via Zoom on a tablet (input). (C) In the F2F setting, participants were recorded with head-mounted microphones and seated together in a booth at a distance of 1.5 m.

Fig. 1.

Experimental setup (Miriam Müller, 2023, CC BY 4.0). (A) and (B) In the video conference setting, the participants were seated in separate rooms and recorded with two rod microphones. Headphones (output) were used to listen to the speech of the other interlocutor, who was connected via Zoom on a tablet (input). (C) In the F2F setting, participants were recorded with head-mounted microphones and seated together in a booth at a distance of 1.5 m.

Close modal

The recordings took place in two sessions, about a week apart in the winter of 2020, under strict hygiene regulations due to the COVID-19 pandemic. Face masks were not required as the speakers confirmed that they lived in the same household. The order of the two situations was counterbalanced across sessions. The procedure was the same for both sessions. After the consent form was signed by both participants, the first word list was read aloud by each of them individually, while the interlocutor was already visible in Zoom, but muted. After that, a task-oriented conversation was followed by a free conversation, which was again followed by a task-oriented conversation (ten min each). Finally, the second word list was read aloud by both participants, one after the other. The order in which the speakers read the word lists was counterbalanced.

Vowels were labeled on the vowel tier in Praat (Boersma and Weenick, 2023). Boundaries were placed at the beginning and at the end of the periodic interval in the oscillogram at the zero crossing, taking visible F2 trajectories into account. The data were converted to an EMU database (Winkelmann , 2017) via emuR (Winkelmann , 2020) in R (R Core Team, 2023). Formants were extracted with the Praat formant tracker (after Burg; maximum number of formants = 5; formant ceiling for women 5500 Hz; formant ceiling for men 5000 Hz; window length = 0.025 s; pre-emphasis = 50 Hz) in R using the script PraatToFormants2AsspDataObj.R (Winkelmann, 2015) and added to the EMU database. The formants of each participant were subsequently inspected and corrected by two different authors in the EMU WebApp interface (http://ips-lmu.github.io/EMU-webApp/). If the calculated formant trajectories, which were laid over the spectrogram, deviated visibly from formants visible in the spectrogram, the trajectories were corrected. Intensity was approximated via the slope of the power spectrum and measured as the difference between the amplitude of the first harmonic and the third formant H1-A3 (Mooshammer, 2010) in the spectral slice, calculated from the total vowel duration of /aː/ in Praat for each participant per word list and situation. We chose this measure to be able to distinguish whether a reduced vowel space is due to lower intensity (see Koenig and Fuchs, 2019, for /aː/) or due to hypoarticulation. H1-A3 measures vocal effort as a proxy for intensity and is understood to be independent of microphone distance (cf. Mooshammer, 2010, for additional information). Additionally, the mean articulation rate was calculated for the disyllabic target words for each participant per word list and situation, as well as the vowel duration. Mean articulation rate and vowel duration were included because slower articulation rate and longer vowels are indicative of hyperarticulation (Lindblom, 1990).

Due to pronunciation errors, 34 of the total of 2880 vowels were excluded. Vowels were measured at the steady-state midpoint in the EMU database, extracting five consecutive samples to avoid coarticulatory effects of the adjacent consonants. After transforming the data to the Bark scale to correct for perceptual differences between F1 and F2, the mean of these samples per vowel was calculated and normalized using a vowel-intrinsic, speaker-extrinsic, formant-intrinsic method (Lobanov, 1971). Then, we calculated a common centroid of the vowel space for all vowels in both word lists and situations (cf. Harrington, 2010), after which, for each data point of every vowel, its Euclidean distance (ED) to the centroid was added (bas.uni-muenchen.de, 2023). To obtain vowel space area (VSA), we calculated the polygon area of the mean peripheral vowels per participant, situation, and list using the R package phonR (McCloy, 2016).

The research question was answered using two linear mixed-effects regression models (Bates , 2015). For each dependent variable, a full model was fit with all possible predictors. For vowel space, these included situation (F2F vs Zoom), list (first vs second), and their interaction, as well as Zoom comfort level and frequency of use. For the ED of the peripheral vowels to the centroid, the same predictors were used, with the additional predictor vowel label (in interaction) and vowel duration (without interaction, as the model would not converge otherwise). Participants were added as random intercepts, with situation, list, and H1-A3 as random slopes. Non-significant effects in both models were removed using backward elimination of random effects and fixed effects with step() from the stats package. The package lmerTest (Kuznetsova , 2017) was used to obtain p-values. Pairwise post hoc comparisons per vowel for each situation per list were calculated via the R package emmeans (version 1.7.2) (Lenth, 2022). The alpha level was adjusted automatically for multiple comparisons. We calculated marginal ( R m 2 , which is the variance explained by the fixed effects only) as well as conditional variance R c 2 (the variance explained by both fixed and random factors) (Nakagawa and Schielzeth, 2013), using the package MuMIn version 1.46 (Barton, 2018). Cohen's d was calculated by the R package effsize (version 0.8.1) (Torchiano, 2020).

The best model of VSA includes the predictor situation and participants as random intercepts. Neither intensity, as measured by H1-A3, nor Zoom comfort level and frequency of use, improved the model significantly. Figure 2 presents the normalized VSA of all nine peripheral vowels in the F2F vs Zoom situation. It is striking that the Zoom VSA (dashed line) does not stretch beyond the F2F VSA (solid line). VSA is significantly reduced in Zoom as opposed to F2F ( β intercept = 8.76 , p < 0.001 ; β Zoom = 1.03 , p < 0.001 ), which can be seen in Fig. 4(A). VSA for word list 2 (at the end of the experiment) is slightly (and non-significantly) enhanced in Zoom and F2F. A total of 28% of the variance is explained by the fixed and random effects ( R m 2 = 0.19 , R c 2 = 0.28 ), with a large effect of situation (Cohen's d = 0.95). Overall, the VSA in Zoom is reduced by 11.9% (14.3% in the first and 9.5% in the second list).

Fig. 2.

Normalized vowel space of F2F (orange, solid lines) and Zoom conversations (blue, dashed lines) in word list 1 (left, before conversation) and 2 (right, after conversation). Boldfaced x represents the common centroid of the two situations and lists. Ellipses show 95% of all data points per vowel.

Fig. 2.

Normalized vowel space of F2F (orange, solid lines) and Zoom conversations (blue, dashed lines) in word list 1 (left, before conversation) and 2 (right, after conversation). Boldfaced x represents the common centroid of the two situations and lists. Ellipses show 95% of all data points per vowel.

Close modal

When assessing the individual differences per participant (cf. Fig. 3), the majority of the participants reduce their VSA in the first word list in Zoom, as evidenced by the bars pointing in a negative direction. Eight participants extend their VSA in Zoom compared to F2F, and four do so exclusively in the second list.

Fig. 3.

Individual reduction (negative values) or extension (positive values) of each speaker's VSA (in percent) in Zoom per session and word list, calculated by subtracting the F2F VSA from the Zoom VSA and dividing it by the F2F VSA (f = female; m = male participant).

Fig. 3.

Individual reduction (negative values) or extension (positive values) of each speaker's VSA (in percent) in Zoom per session and word list, calculated by subtracting the F2F VSA from the Zoom VSA and dividing it by the F2F VSA (f = female; m = male participant).

Close modal

For ED, the best model included vowels, situation, and vowel duration as main effects, as well as the interaction of vowels with situation and vowel duration, with participants as random intercepts. As was the case for VSA, list has no significant main effect. Post hoc analyses of the interaction of vowels with situation show that /aː/ ( β = 0.34 , p < 0.001 , d = 0.73 , medium effect), /iː/ ( β = 0.11 , p < 0.05 , d = 0.46 , small effect), and /yː/ ( β = 0.27 , p < 0.001 , d = 0.8 , large effect), are significantly reduced in Zoom [cf. Fig. 4(B)]. The effect of vowel duration ( β duration = 0.0016 , p < 0.01 ) means that for each millisecond prolonging the vowel, ED is extended by 0.0016 standard deviations from the centroid in the normalized vowel space [cf. Fig. 4(C)].

Fig. 4.

Model predictions of (A) vowel space area, (B) Euclidean distance in F2F (solid line) and Zoom (dashed line) with 95% confidence intervals, (C) vowel duration for all nine peripheral vowels.

Fig. 4.

Model predictions of (A) vowel space area, (B) Euclidean distance in F2F (solid line) and Zoom (dashed line) with 95% confidence intervals, (C) vowel duration for all nine peripheral vowels.

Close modal

What effect does communication via video conference have on speakers' vowel space? We found that interlocutors significantly reduce their vowel space in Zoom as opposed to F2F situations, at least for read speech. In the following, we will discuss both technical and situational factors that might explain this effect.

According to the theory on hyper- and hypoarticulation (Lindblom, 1990), articulation varies on a continuum from clear to less clear speech. Speakers make as much effort as is necessary for listeners to understand. While the speakers in the F2F situation are seated 1.5 m apart from each other, the speakers in the Zoom situation wear headphones and are connected via the internal microphone of the tablet, which is placed about 30 cm away from the edge of the table (but the distance to the speakers could be greater if they were leaning back). Given this setup, it is unclear whether the effort speakers make to be understood is comparable. Factors detrimental to understanding the speech signal in Zoom are the variation in transmission speed and distortions or background noise, and possible lags of the internet connection. These factors would more likely cause hyperarticulation in the Zoom situation, aiming at a successful sound delivery. However, speakers' vowel space areas are reduced in Zoom, which indicates no higher effort of speakers to compensate for possible disturbing factors.

According to Koenig and Fuchs (2019), [aː] in normal speech tends to be reduced as opposed to in louder speech (i.e., an effect of hypoarticulation), while they found no effect for the high vowels [iː uː yː]. In our data, /aː/ as well as /iː/ and /yː/ are reduced. While a softer signal could explain the reduced /aː/, the reduced high vowels in Zoom would not be compatible with the results of Koenig and Fuchs (2019). However, our data do not show any significant effect of intensity (as approximated via H1-A3) between the F2F and the Zoom situation. As H1-A3 is considered to be independent of microphone distance, we can rule out intensity as a factor explaining the reduction in Zoom.

Connected to intensity is the fact that participants wore circumaural headphones while conversing over Zoom due to the recording setup. As these headphones transmit the signal directly to the hearer and shield from background noise, they possibly create an impression of proximity of the interlocutor, which, in turn, could lead to a reduced effort when speaking (hypoarticulation). To the best of our knowledge, there is no research indicating that headphone-wearing participants reduce their vowel space. Considering hyperarticulation induced by headphones, studies on occluded speech find that speakers raise their voices when their own speech monitoring is occluded (Silverman, 2006), leading to louder speech production with an enhanced VSA (Lombard, 1911).1 However, as discussed above, intensity in our study did not prove significant, which is why we argue that headphones do not induce the observed effects.

A sociophonetic factor that might explain the reduced VSA in the Zoom situation is the degree of involvement in a conversation, which is understood as “an internal, even emotional connection individuals feel which binds them to other people” (Tannen, 2007). Speakers might feel less involved in the Zoom conversation, as they are alone in a room and only connected to their interlocutor via screen and headphones (Koester, 2023), lacking other sensory input, such as smell or full-body gestures. The lack of human copresence and direct personal interaction may lead to less communicative effort and thus, hypoarticulation, resulting in a reduced vowel space. However, it could be argued that the visual cues of the face and upper body displayed on the tablets may have created an impression of proximity comparable to the F2F situation (albeit with the inherent limitation of the small tablet size). The participants' high comfort levels and frequent use of Zoom also suggest that the virtual setting does not necessarily lead to a lesser degree of involvement. Although the authors believe that F2F is still the most common mode of communication, it could be argued that, during the COVID-19 pandemic, remote communication has become a particularly prevalent mode of interaction, potentially surpassing F2F conversations. Considering this, participants may have seen the opportunity to communicate in a laboratory in person as a welcome departure from their lockdown routines, possibly resulting in a more enthusiastic and hyperarticulated speech in comparison to Zoom. Subsequent studies could include post-experiment questionnaires to more clearly assess involvement and determine its effect on VSA. As this information is not a part of the corpus, we avoid drawing any strong conclusions.

There are some limitations to this study. First, we use read word lists as a proxy variable to compare the two situations. It is open to future studies to show whether the effect remains the same in unscripted spontaneous dialogues. Further, different microphones were used during data collection. Although the formant frequency differences between microphone types were negligible and the frequency response curves, as provided by the manufacturers, appear to be comparable at higher frequencies, certain formant frequencies might have been affected in non-uniform ways. Future studies should therefore aim at using the same microphone types for a more valid comparison. To make the situations even more comparable, either the F2F situation could be recorded using headphones (making it rather unnatural) or the Zoom situation could be recorded without headphones (which would interfere with the channel separation in post-processing). Finally, we cannot exclude the possibility that the participants hyperarticulated in the F2F situation, instead of hypoarticulating in the Zoom situation. In our study, we treat F2F as the baseline because we believe it more closely resembles natural in-person communication than the Zoom situation.

By analyzing vowels from read word lists as a proxy for natural speech, we observe that vowel space area is decreased significantly in a video conference compared to in face-to-face conversations. The results suggest that speakers are aware of their situation and its functional affordances and use a different register, meaning that they articulate less clearly in the situation of a video conference than in a face-to-face situation. As video conferences will quite possibly remain as a building-block of communication, acoustic studies of these situations will further the understanding of human behavior in remote communication.

The creation of the subcorpus this research is based on was supported by the Deutsche Forschungsgemeinschaft (German Research Foundation, SFB 1412, 416591334). We thank the three anonymous reviewers for their critical reading of the manuscript and their helpful comments.

The authors have no conflicts to disclose.

This study did not record any humans, but re-used data available for linguistic research. All recordings are part of the corpus BeDiaCo, for which ethical approval was obtained by the Collaborative Research Center 1412. Informed consent was obtained from all participants.

The annotated data is available for scientific research in linguistics in version 3 of the corpus BeDiaCo. The aggregated data that support the findings and the R scripts are openly available at https://doi.org/10.17605/OSF.IO/QSZ5K.

1

Since the signal produced by a speaker in the Zoom situation is not fed back into their own headphones, there should be no side tone amplification effect.

1.
Barton
,
K.
(
2018
). “
MuMIn: Multi-model inference
https://CRAN.R-project.org/package=MuMIn (Last viewed October 5, 2023).
2.
bas.uni-muenchen.de
(
2023
). See https://www.bas.uni-muenchen.de/∼jmh/lehre/Rdf/EMU-SDMS/lesson11/Euclidean_and_Mahalanobis_Distances.html for information on calculating the Euclidean distance (Last viewed March 33, 2023).
3.
Bates
,
D.
,
Mächler
,
M.
,
Bolker
,
B.
, and
Walker
,
S.
(
2015
). “
Fitting linear mixed-effects models using lme4
,”
J. Stat. Softw.
67
(
1
),
1
48
.
4.
Belz
,
M.
,
Mooshammer
,
C.
,
Zöllner
,
A.
, and
Adam
,
L.-S.
(
2021
). “
Berlin dialogue corpus (BeDiaCo) (Version 2)
,” https://rs.cms.hu-berlin.de/phon (Last viewed October 5, 2023).
5.
Biber
,
D.
, and
Conrad
,
S.
(
2009
).
Register, Genre, and Style
(
Cambridge University Press
,
Cambridge
).
6.
Boersma
,
P.
, and
Weenick
,
D.
(
2023
). “
Praat: Doing phonetics by computer (version 6.2.12) [computer program]
,” http://www.praat.org (Last viewed October 5, 2023).
7.
Calder
,
J.
,
Wheeler
,
R.
,
Adams
,
S.
,
Amarelo
,
D.
,
Arnold-Murray
,
K.
,
Bai
,
J.
,
Church
,
M.
,
Daniels
,
J.
,
Gomez
,
S.
,
Henry
,
J.
,
Jia
,
Y.
,
Johnson-Morris
,
B.
,
Lee
,
K.
,
Miller
,
K.
,
Powell
,
D.
,
Ramsey-Smith
,
C.
,
Rayl
,
S.
,
Rosenau
,
S.
, and
Salvador
,
N.
(
2022
). “
Is Zoom viable for sociophonetic research? A comparison of in-person and online recordings for vocalic analysis
,”
Linguist. Vanguard
2022
,
20200148
.
8.
Freeman
,
V.
, and
De Decker
,
P.
(
2021
). “
Remote sociophonetic data collection: Vowels and nasalization over video conferencing apps
,”
J. Acoust. Soc. Am.
149
(
2
),
1211
1223
.
9.
Harrington
,
J.
(
2010
).
Phonetic Analysis of Speech Corpora
(
Wiley-Blackwell
,
Chichester, UK/Malden, MA
).
10.
Koenig
,
L. L.
, and
Fuchs
,
S.
(
2019
). “
Vowel formants in normal and loud speech
,”
J. Speech. Lang. Hear. Res.
62
(
5
),
1278
1295
.
11.
Koester
,
A.
(
2023
). “
Why face-to-face communication matters: A comparison of face-to-face and computer-mediated communication
,” in
COVID-19, Communication and Culture: Beyond the Global Workplace
,
1st ed
., edited by
F.
Rossette-Crake
and
E.
Buckwalter
(
Routledge
,
London
), Chap. 7, pp.
115
134
.
12.
Kohler
,
K. J.
(
1999
). “
German
,” in
Handbook of the International Phonetic Association
(
Cambridge University Press
,
Cambridge
), pp.
86
89
.
13.
Kuznetsova
,
A.
,
Brockhoff
,
P. B.
, and
Christensen
,
R. H. B.
(
2017
). “
lmerTest package: Tests in linear mixed effects models
,”
J. Stat. Softw.
82
(
13
),
1
26
.
14.
Lenth
,
R. V.
(
2022
). “
emmeans: Estimated Marginal Means, aka Least-Squares Means [R package] (version 1.7.2.)
,” https://CRAN.R-project.org/package=emmeans (Last viewed October 5, 2023).
15.
Lindblom
,
B.
(
1990
). “
Explaining phonetic variation: A sketch of the H&H theory
,” in
Speech Production and Speech Modelling
, NATO ASI Series, edited by
W. J.
Hardcastle
and
A.
Marchal
(
Springer
,
Dordrecht
), pp.
403
439
.
16.
Lobanov
,
B. M.
(
1971
). “
Classification of Russian vowels spoken by different speakers
,”
J. Acoust. Soc. Am.
49
(
2B
),
606
608
.
17.
Lombard
,
E.
(
1911
). “
Le signe de l'élévation de la voix” (“The sign of the elevation of the voice”
),
Ann. des Maladies de l'Oreille, du Larynx, du Nez et du Pharynx
37
,
101
119
.
18.
Lupton
,
D.
(
2021
). “
Doing fieldwork in a pandemic
,” ssrn.com/abstract=4228791 (Last viewed October 5, 2023).
19.
McCloy
,
D. R.
(
2016
). “
phonR: Tools for Phoneticians and Phonologists
,” R package version 1.0-7, https://cran.r-project.org/package=phonR (Last viewed October 5, 2023).
20.
Mooshammer
,
C.
(
2010
). “
Acoustic and laryngographic measures of the laryngeal reflexes of linguistic prominence and vocal effort in German
,”
J. Acoust. Soc. Am.
127
(
2
),
1047
1058
.
21.
Nakagawa
,
S.
, and
Schielzeth
,
H.
(
2013
). “
A general and simple method for obtaining R2 from generalized linear mixed-effects models
,”
Methods Ecol. Evol.
4
(
2
),
133
142
.
22.
R Core Team
. (
2023
).
R: A Language and Environment for Statistical Computing
(
R Foundation for Statistical Computing
,
Wien
).
23.
Silverman
,
D.
(
2006
).
A Critical Introduction to Phonology: Of Sound, Mind and Body
(
Continuum
,
London/New York
), p.
191
.
24.
Tannen
,
D.
(
2007
).
Talking Voices: Repetition, Dialogue, and Imagery in Conversational Discourse
(
Cambridge University Press
), p.
27
..
25.
Torchiano
,
M.
(
2020
). “
effsize: Efficient effect size computation
,” R package version 0.8.1. https://doi.org/10.5281/zenodo.1480624.
26.
Winkelmann
,
R.
(
2015
). “
Praattoformants2asspdataobj.r
,” https://gist.github.com/raphywink/2512752a1efa56951f04 (Last viewed October 5, 2023).
27.
Winkelmann
,
R.
,
Harrington
,
J.
, and
Jänsch
,
K.
(
2017
). “
EMU-SDMS: Advanced speech database management and analysis in R
,”
Comput. Speech Lang.
45
,
392
410
.
28.
Winkelmann
,
R.
,
Jaensch
,
K.
,
Cassidy
,
S.
, and
Harrington
,
J.
(
2020
). “
emuR: Main package of the EMU speech database management system
,” R package version 2.3.1.