In an effort to mitigate the 2019 novel coronavirus disease pandemic, mask wearing and social distancing have become standard practices. While effective in fighting the spread of the virus, these protective measures have been shown to deteriorate speech perception and sound intensity, which necessitates speaking louder to compensate. The goal of this paper is to investigate via numerical simulations how compensating for mask wearing and social distancing affects measures associated with vocal health. A three-mass body-cover model of the vocal folds (VFs) coupled with the sub- and supraglottal acoustic tracts is modified to incorporate mask and distance dependent acoustic pressure models. The results indicate that sustaining target levels of intelligibility and/or sound intensity while using these protective measures may necessitate increased subglottal pressure, leading to higher VF collision and, thus, potentially inducing a state of vocal hyperfunction, a progenitor to voice pathologies.
I. INTRODUCTION
Prophylactic measures imposed to mitigate the spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus responsible for the coronavirus disease (COVID-19), have significantly changed the lifestyle of the world's population. Two primary protective measures have been prescribed to the public to minimize the transmission of SARS-CoV-2, namely, wearing a face mask and social distancing. Wearing a mask aims to suppress the spread of droplets and aerosols generated during sneezing, coughing, and breathing, which transport virions (Agrawal and Bhardwaj, 2020; Khosronejad et al., 2020; Mittal et al., 2020; Shah et al., 2021), while social distancing aims to maintain interindividual spacing beyond the distance that contaminated droplets and aerosols are thought to travel during common expiratory events (Xie et al., 2007). These protective measures have proven effective in lowering the transmission rate of SARS-CoV-2 (Eikenberry et al., 2020; Qian and Jiang, 2022). It should be noted that despite the quick development and availability of COVID-19 vaccines (Ndwandwe and Wiysonge, 2021), protective measures are likely to remain in place for a long period of time to mitigate the spread of the virus and its rapidly emerging variants (Koyama et al., 2020).
Although effective in mitigating the spread of SARS-CoV-2, both social distancing and masks negatively impact verbal communication, generally necessitating individuals to speak louder to compensate for the undesired effects of these protective measures. Masks have been characterized as low-pass filters (Corey et al., 2020) that attenuate the high frequency content of speech signals, leading to reduced speech perception and intelligibility (Saunders et al., 2021). Further contributing to the deterioration of intelligibility when wearing opaque masks is the loss of information embedded in lip movement cues that contribute to the audiovisual integration of speech intelligibility and nonverbal communication (Carbon, 2020; Mheidly et al., 2020).
Because there is negligible atmospheric absorption between typical speaker/listener pairs (Attenborough, 2014; Evans et al., 1972; ISO, 1993), social distancing more uniformly attenuates all frequencies of the speech signal with the amplitudes of all frequencies being inversely proportional to the distance between the speaker and receiver (Kinsler et al., 1999); thus, the speaker must speak louder to produce the same sound pressure level (SPL) at the extended distance. Compensating for these effects may result in an increased vocal effort to sustain effective communication, such as that reported by healthcare workers during the pandemic (McKenna et al., 2021). In the long run, increased vocal effort is a factor leading to hyperfunctional voice disorders (Hillman et al., 2020). Similarly, the additional effort to “project your voice” in performers and teachers in comparison with the general population is believed to be the primary factor driving the larger prevalence of voice disorders in these populations (Guss et al., 2014; Roy et al., 2004).
The underlying biomechanics of normal and pathological human phonation are often studied using numerical models (Galindo et al., 2017; Ishizaka and Flanagan, 1972; Jiang et al., 1998; Steinecke and Herzel, 1995; Story and Titze, 1995; Zañartu et al., 2014). Such analyses can provide useful insight into the underlying mechanisms of vocal disorders. For example, Sommer et al. (2012, 2013) and Steinecke and Herzel (1995) investigated vocal fold (VF) oscillations using asymmetric two-mass models and found that imbalances between the right and left VFs can lead to chaotic VF oscillations; similar chaotic patterns were found when modeling unilateral polyps (Zhang and Jiang, 2004). Galindo et al. (2017) investigated the influence of a posterior glottal opening on the quality of voice using a triangular body-cover model (BCM) and found that compensating for deterioration in some voice measures associated with a posterior glottal opening leads to an increase in the VF collision pressure, which can lead to phonotrauma. Dejonckere and Kob (2009), using a three-dimensional multi-mass VF model, deduced that curved VFs and incomplete glottal closure, which are relatively more common in female speakers, necessitate increased subglottal pressure to compensate, leading to higher localized mechanical stress during phonation. Clinically, these results are correlated with an increased prevalence of nodules in female speakers.
As the use of protective face masks and social distancing have become common as a result of COVID-19, the long term consequences of these protective measures on the vocal health of individuals are presently out of the reach of clinical and experimental investigations due to the lack of long term data. However, enlisting a combination of numerical simulations and knowledge gleaned from functionally similar clinical investigations may offer insights into the long term effects of COVID-19 protective measures on vocal health.
The goal of this paper is to explore, by means of numerical phonation simulations, the relative and combined impacts of wearing masks1 and social distancing on vocal effort and VF collision forces. In particular, the focus of this study is to elucidate how the reduction in speech perception associated with these measures can potentially lead to vocal hyperfunction through compensatory mechanisms (Hillman et al., 1989, 2020). The organization of this paper is as follows: Section II introduces the phonation and acoustic models; Sec. III elaborates on the study design; the acoustic effects of wearing masks and social distancing on several voice measures are investigated in Secs. IV and V, respectively; compensatory measures and the resulting implications for VF collision forces are investigated in Sec. VI; practical suggestions for maintaining vocal health while wearing a mask and engaging in social distancing are given in Sec. VII; Sec. VIII describes the limitations of the study; and Sec. IX concludes the work.
II. NUMERICAL PHONATION MODEL
A. Body-cover VF model
This study employs the BCM of the VFs (Story and Titze, 1995; Titze and Story, 2002), shown in the center of the top schematic in Fig. 1. This model, which embeds the essential physiological components of the VFs used in modal voice, consists of two cover masses and a body mass, denoted by , and , respectively. For the remainder of the manuscript, the subscripts “l,” “u,” and “b” will denote “lower” (inferior), “upper” (superior), and “body,” respectively. The body and cover masses are interconnected and the body layer is connected to the fixed rigid larynx via springs and dampers to model the tissue viscoelasticity. Displacements of the masses from the medial plane are given by , and . It is assumed that both of the VFs have identical properties and their motions are symmetric about the medial plane. A collision of the opposing fold is modeled by applying additional nonlinear spring forces to the cover masses, which are proportional to the degree of overlap as the VFs cross the medial (collision) plane. Muscle activation rules are incorporated to control the primitive model variables (Titze and Story, 2002), wherein the muscle activation parameters , and , account for the activation of the cricothyroid (CT), thyroarytenoid (TA), and lateral cricoarytenoid/posterior cricoarytenoid (LCA/PCA) muscles, respectively.
A schematic diagram of the phonation model consisting of the BCM coupled with subglottal and supraglottal tracts (top) and a simplified circuit analogy representation of the model (bottom).
A schematic diagram of the phonation model consisting of the BCM coupled with subglottal and supraglottal tracts (top) and a simplified circuit analogy representation of the model (bottom).
B. Glottal flow and acoustics
The glottal flow and pressure forces on the VFs are computed according to a Bernoulli-based flow model, where flow separation is located at the junction between the lower and upper cover masses when the VF configuration is convergent and at the inlet of the glottis when the configuration is divergent (Story and Titze, 1995). The glottal flow is driven by the subglottal and supraglottal pressures that consist of static P and acoustic p components, such that the subglottal pressure is given by and the supraglottal pressure is given by . The static components correspond to the equilibrium pressure conditions in the subglottal and supraglottal tracts when the VFs are at rest and fully adducted (i.e., and correspond to the lung and atmospheric pressures, respectively, in the case of equilibrium), whereas the acoustic components correspond to the perturbations in the pressure field due to travelling pressure waves. The coupling between the acoustic subglottal and supraglottal pressures and the glottal flow is performed using a modified (and sign corrected) version of the flow rate relation introduced by Titze (1984) (see Lucero and Schoentgen, 2015). The BCM equations are discretized in time using the Taylor series method (Galindo et al., 2014), which has been employed in several previous BCM studies (Galindo et al., 2017; Galindo et al., 2014; Hadwin et al., 2016; Serry et al., 2021).
The subglottal and supraglottal acoustic tracts are coupled with the BCM, and the acoustic wave propagation is modeled using the wave reflection analog (WRA) method (Kelly and Lochbaum, 1962; Liljencrants, 1985; Story, 1995, 2005). The losses are modeled using attenuation factors (Titze and Alipour, 2006; Zañartu, 2006) of the form for the supraglottal tract and for the subglottal tract, where A and l are the cross-sectional area and length of a given WRA tube section, respectively. The simulation time step is governed by the lengths of the WRA tube sections used in the tracts. Here, the supraglottal area functions corresponding to vowels /a/, /e/, /i/, /o/, and /u/ with section lengths of 0.25 cm are adopted from Takemoto et al. (2006) with a corresponding time step of 7.14 × 10−6 s, and the vowel /æ/ with a section length of 0.396825 cm is adopted from Story et al. (1998) with a resulting simulation time step of 1.13 × 10−5 s. Similar to Galindo et al. (2014) and Zañartu et al. (2014), the subglottal tract area function is adapted from the respiratory system measurements of human cadavers (Weibel, 1963), covering only the trachea and bronchi. This tract consists of two tubes, where the lung side tube is nominally 4 cm long with a cross-sectional area of 2.33 cm2, and the glottis side tube is nominally 11 cm long with a cross-sectional area of 2.54 cm2. For a section length of 0.25 cm, this results in a total subglottal tract length of 15.75 cm, whereas for a section length of 0.396825 cm, this results in a total length of 15.873 cm. At the lung and mouth ends of the subglottal and supraglottal tracts, respectively, the continuities of the pressure and flow with an acoustic circuit model are enforced as boundary conditions. The lung termination circuit is modeled by a single resistor to ensure a reflection coefficient of -0.8 based on the boundary condition used by Zañartu et al. (2007) and shown schematically in Fig. 1.
In the absence of a mask, radiation from the mouth is modeled using the well-known approximation of a cylinder in an infinite baffle, which results in a parallel resistor-inductor (R-L) circuit (Flanagan et al., 1975; Story, 1995).
The acoustic effects of the mask are modeled as a parallel R-L acoustic circuit. The adopted mask model is based on the models for textile materials, where the effects of the flow through the fabric and movement of the fabric due to the acoustic pressures are considered (Moholkar and Warmoeskerken, 2003; Pieren, 2012). The total flow rate and pressure across the mask are given by
where is the pressure drop across the mask, is the total flow rate across the mask, is the flow rate through the mask when the mask mass is infinite (or the mask mass is at rest), is the flow rate due to the mask movement, is the mask resistance, and is the mask inductance. In addition, to specify the intrinsic properties of the mask fabric, we define the scaled mask resistance and inductance (area density) as and , respectively, where is the terminal area at the mouth. These properties are independent of the size of the mask because they relate the pressure drop across the mask to the acoustic velocity through the mask rather than the flow rate.
The mask parameters and represent the resistance to the flow through the mask and inertia of the mask, respectively. Increasing the mask flow resistance would require increasing the pressure at the lungs to drive the same amount of air through and would, thus, be perceived as less breathable. Increasing the mask inertia would be felt as a heavier mask.
The mask circuit is connected in series with the mouth radiation circuit as the total flow rate through the mask is also the radiated flow rate. The acoustic termination conditions are implemented numerically following the approach of Story (1995, Chap. 2). Specifically, the termination circuit is discretized in time with the bilinear transform, and the continuity of the pressure and flow between the termination circuit and the tract are applied to solve for the reflected pressures.
C. Far-field distance dependent pressure
The influence of social distancing on acoustic signals is considered by adopting the far-field wave approximation for the piston-in-baffle source (Kinsler et al., 1999, Sec. 7.4).
Let be a T periodic signal, and let be the flow rate signals sampled with N points such that and is the sampling frequency (N is even). The sampled flow rate can then be represented by one-sided discrete Fourier transform coefficients as
where , and and are the associated modal frequencies and Fourier series coefficients, respectively. The bold font denotes a complex quantity. Then, the far-field acoustic pressure, pf, in the frequency and time domains is given by
where d is the distance from the mouth, t is the time, ρ is the air density, c is the speed of sound, and is the angular wave number. Lastly, for the numerical computation of the Fourier coefficients, we apply a Tukey window to with a tapered region fraction of 0.2 to avoid spectral leakage, which causes spurious high frequency content.
We note that Eq. (3) is limited because the frequency dependent energy-dissipative effects are present for sound propagation over long distances (Evans et al., 1972; ISO, 1993). For the small distances in social distancing observed here, however, these effects will be negligible. For example, at a mid-band frequency of 10 kHz, 20° C, relative humidity of 15%, and ambient pressure of 1 atm, the sound attenuation due to the atmospheric absorption at a distance of 1 m is approximately 0.267 dB (ISO, 1993, Table 1) and even less at lower frequencies.
III. STUDY DESIGN
Simulations are performed for vowel phonemes /a/, /i/, /e/, /o/, /u/, and /æ/. Each simulation spans 1 s of phonation with the last 0.5 s considered in the analysis so as to eliminate the transient effects. An “average vowel” is computed by performing a weighted average of the individual vowels according to the vowel frequency data reported by Hayden (1950) to roughly approximate running speech. Specifically, the average vowel comprises 15.7% /a/, 16.9% /e/, 14.5% /i/, 13.0% /o/, 13.2% /u/, and 26.9% /æ/. In all of the simulations, the laryngeal muscle activation parameters are set to and . The value of 0.2 assigned for and corresponds to low/normal activation levels of the CT/TA muscles, whereas the value corresponds to the fully adducted VFs with a zero neutral glottal gap, which is typically the case in modal phonation.
The effect of the mask is investigated for a wide range of resistance and mass (inductance) values based on reported values of the scaled mask resistance and scaled mask inductance in the literature. The broad range of reported values is likely, in part, due to the array of mask materials studied as well as the differences in the testing methodologies. For example, Drewnick et al. (2021) reported values for the mask density ranging from 0.05 to 0.2 kg/m2 and mask resistances of 100–1000 Pa s m−1 for a selection of cotton and surgical masks (four surgical mask samples and cotton twill, cotton woven, and jersey cotton samples). Konda et al. (2020) reported consistent mask resistance values between 45 and 52 Pa s m−1 for N95, surgical, and cotton masks, whereas Shah et al. (2021) reported values ranging from 500 to 800 Pa s m−1 for KN95, R95, and surgical masks without any leakage flow. In this study, the smaller mask resistance measured by Konda et al. (2020) was considered reasonable (about 50 Pa s m−1) as the large values reported in other studies would predict unrealistic attenuation in our model. As a result, in our investigation of the influence of the protective measures, we consider the range 0–150 Pa s m−1 for and the range 0–150 g/m2 for to cover a range from zero to three times the nominal mask values of 50 Pa s m−1 and 50 g/m2, respectively. We consider these ranges to capture reasonable variations for the different mask types or the wearing of multiple mask layers. When varying one parameter, resistance or inductance, the other parameter is held fixed at its nominal value.
Because masks and social distancing result in reductions in some voice measures (e.g., intelligibility and sound intensity), we investigate compensating for such deficiencies by means of increasing the static subglottal pressure . It has been found that increasing the subglottal pressure is an efficient mechanism capable of compensating for reductions in the SPL, harmonics-to-noise ratio, and fundamental frequency (Galindo et al., 2017). The range of subglottal pressures considered in this work is 500–6500 Pa in steps of 100 Pa and 6500–9000 Pa in steps of 500 Pa, whereas the static supraglottal pressure is set to be zero (atmospheric). The coarse range 6500–9000 Pa was required to compensate for a large attenuation observed for the vowel /u/ with a mask in combination with social distancing (about 20 dB). We note that such high subglottal pressures are not observed in normal voice clinically but have been observed in shouting (Lagier et al., 2017) and are only included here for completeness.
The simulation outputs considered for analysis are the radiated flow rate , radiated pressure at different distances pf, and the maximum of collision forces between the lower and upper masses as time vectors. From the time vectors, we compute the means and maxima over time, denoted by and , respectively. The subscript “0” is used to denote values from the unmasked case, e.g., and . To explore the influence of wearing a mask on the acoustic characteristics of speech, we compute 1 octave band and 1/3 octave band spectra of pf (using the poctave function in matlab; MathWorks, 2021) from which the attenuation behaviors are estimated. From the radiated pressure pf, we estimate the SPL, which is a measure of the sound intensity, and the speech intelligibility index (SII), which correlates with the intelligibility of speech (ANSI, 1997; Hornsby, 2004). Specifically, we compute the SII according to (ANSI, 1997)
where i denotes the frequency band, Ei and Ni are the speech and noise spectrum levels (dB), respectively, and Ii is the band importance function. The spectra Ei are computed as spectra from pf while the pink noise level spectrum is assumed to be 50 dB, roughly comparable to noise levels in hospitals (Busch-Vishniac et al., 2005). To illustrate the effect of noise, we also consider pink noise of 0 and 25 dB.
Because we examine the SII in a nonspecific scenario here, we choose 1/3 octave bands and the average speech band importance function for calculating the SII, where Ii is given by Pavlovic (1987, Table 2) and illustrated in Fig. 2.
(Color online) The average band importance function from Pavlovic (1987, Table 2) used for computing the SII.
(Color online) The average band importance function from Pavlovic (1987, Table 2) used for computing the SII.
IV. PHONATION CHARACTERISTICS WITH MASKS
A. Acoustic effects of masks: Theoretical analysis
In this section, we theoretically analyze the attenuation behavior associated with the mask model introduced in Sec. II B, where, for simplicity, we neglect the coupling with the vocal tract system. Let the complex forward travelling pressure wave amplitude incident on the mask be denoted by and the complex forward travelling pressure wave amplitude transmitted at the outlet of the mask be denoted by with both occurring at the frequency ω. By considering the equilibrium of the pressure and velocity on both sides of the mask (Moholkar and Warmoeskerken, 2003), the attenuation ratio, , can be computed as
where is the characteristic acoustic impedance of air. Let us consider two limiting cases associated with the low and high frequencies. In the case when +, the attenuation ratio approaches a value of one (no attenuation). This is because the inductance term acts as a short circuit at zero frequency, resulting in an equivalent resistance of zero. Moreover, using a Taylor expansion at ω = 0, the attenuation behavior at low frequencies can be approximated using the quadratic expression
Equation (7) shows that at low frequencies, the change in the attenuation behavior is minimal near zero frequency, then as the frequency increases, the attenuation starts to increase more significantly, implying a low-pass filtering behavior. The cutoff frequency associated with the low-pass filtering is given by
According to Eq. (8), increasing the mask mass density, , leads to a decrease in the cutoff frequency, whereas increasing the mask resistance does the opposite.
As , the attenuation ratio approaches the asymptotic value,
which indicates a purely resistive behavior that is independent of the frequency. This implies that at high frequencies, the attenuation is approximately uniform. Equation (9) shows that increasing the mask resistance induces a higher asymptotic attenuation, and that asymptotic attenuation is independent of the mask mass. Figure 3 illustrates a typical attenuation curve of the mask model in this study. The low-pass behavior predicted from the theoretical analysis is in qualitative agreement with experimental observations of the acoustic effects of masks (Corey et al., 2020), which we discuss further in Sec. IV B.
B. Analysis using phonation simulations
Herein, we look at the influence of masks on the acoustic pressure at a distance of 1 m from the mouth, computed from the phonation simulations using the model introduced in Sec. II (see also Fig. 1). Figure 4 shows the attenuation of the mask, computed using single octave bands as the ratio of the sound intensity with a mask to the sound intensity without, over varying mask parameters ( ) for different vowels, including the average vowel.
(Color online) The effect of the mask parameters on sound attenuation for are shown with single octave bands. The mask parameters are varied by changing while keeping (left) and changing while keeping (right). The attenuation was calculated using at 1 m.
(Color online) The effect of the mask parameters on sound attenuation for are shown with single octave bands. The mask parameters are varied by changing while keeping (left) and changing while keeping (right). The attenuation was calculated using at 1 m.
The attenuation curves for most of the vowel sounds show a drop for low frequencies followed by an increase at higher frequencies. The low frequency attenuation and apparent asymptotic behavior at high frequencies are compatible with the theoretical analysis (see Fig. 3). The decrease in the attenuation for the mid range of frequencies in Fig. 4 is apparently due to acoustic coupling between the mask and vocal tract as the mask R-L circuit without a vocal tract is predicted to induce monotonic attenuation behavior (Sec. IV A). The experimental data of the acoustic effects of masks exhibit low-pass filtering trends on average, however, large variabilities in the attenuation behaviors have been observed. This large variability is attributed to several factors, including the acoustic signals employed, experimental details, and types and mountings of the masks. The data from Corey et al. (2020), which are based on analyses of speech signals, predict an almost zero attenuation over the frequency range of approximately 0–1000 Hz for all of the mask types considered. At higher frequencies, attenuation is observed with the degree depending on the mask type and generally leveling off above roughly 4000 Hz. Some of the mask types do, in fact, exhibit slight reductions in the attenuation at the highest frequencies that they considered [see, for example, attenuation of the 2L jersey and 2L denim face coverings in Fig. 3 (right) in Corey et al., 2020)]. This shows that, in general, the predictions from our theoretical and numerical analyses of the attenuation behaviors of masks qualitatively agree with the available experimental data.
Figure 4 also illustrates that mask resistance and inductance affect the attenuation differently. Increasing the mask resistance generally increases the attenuation primarily in the mid to high ranges of the presented frequencies, as appears in the left column of Fig. 4 for most of the vowels. Increasing the mask inductance (mask area density) tends to affect all of the frequencies; see, for example, /i/ in the right column of Fig. 4. This is likely due to acoustic coupling as the pure mask model implies negligible influence of the mask area density at high frequencies. Higher attenuation at low frequencies with an increasing density is expected because increasing the mass decreases the cutoff frequency, which shrinks the low frequency range wherein the mask has negligible effect, in agreement with our theoretical analysis. Furthermore, Fig. 4 (right) shows that the influence of the mask inductance decreases at high values of the area density. This agrees partially with our theoretical analysis as increasing the mask density induces the attenuation to reach its asymptotic limit faster [observe that ω is always multiplied by in Eq. (6)]. The frequency dependent behaviors at high area density values further highlight the influence of acoustic coupling on the attenuation as our theoretical analysis, which neglects coupling, predicts uniform attenuation at high frequencies and mask area densities.
Figure 5 presents the SPL, SII, and maximum collision forces as a function of the mask resistance (left) and mass (right) for a selection of vowels in the case of no compensation (that is, at a fixed subglottal pressure). The maximum collision force is normalized by its value with no mask, and the SPL uses the no mask condition as a reference. Figure 5 shows that increasing either the mask resistance or mask density decreases the SII and SPL, which agrees with the day-to-day experience of wearing masks. As in the case of the attenuation curves, the effect of increasing the mask resistance or density varies for different vowels.
(Color online) Trends in the BCM model measures for different mask parameters at . The background noise is assumed to be 50 dB for the SII calculations. The mask parameters are varied by changing while keeping (left) and changing while keeping (right). A subset of investigated vowels has been shown for clarity but all of the vowels are considered in the average vowel case.
(Color online) Trends in the BCM model measures for different mask parameters at . The background noise is assumed to be 50 dB for the SII calculations. The mask parameters are varied by changing while keeping (left) and changing while keeping (right). A subset of investigated vowels has been shown for clarity but all of the vowels are considered in the average vowel case.
The modest reduction in the SII found in Fig. 5 with an increasing mask resistance/inductance generally agrees with the experimental results on the influence of masks on objective intelligibility measures (Palmiero et al., 2016). On the other hand, studies on the influence of masks on intelligibility as measured by the percentage of speech material correctly identified have found mixed effects, where some indicate significant reductions in intelligibility (Keerstock et al., 2020; Toscano and Toscano, 2021; Truong and Weber, 2021) and others note minor effects (Magee et al., 2020). These results do not conflict with the small changes in the SII found here; however, due to the differences in the experimental conditions (presence of visual cues from mouth motion, background noise, etc.) and because intelligibility, as measured by the percent of speech material understood, does not directly correlate with the SII (Kryter, 1962a,b).
Large decreases in the intelligibility do not necessarily imply large decreases in the SII (articulation index) as intelligibility is insensitive to the SII when it is high and sensitive to the SII when it is low (Kryter, 1962a,b) with the exact relationship depending on additional factors, such as the type of speech sound.
To illustrate this effect in terms of noise, Fig. 6 shows the SII score and corresponding estimated intelligibility (percent of words identified) based on Kryter (1962b, Fig. 15) for the “256 PB words” case. In all of the noise conditions, the effect of the mask on the SII is relatively small with a maximum decrease of about 0.1. As the noise level increases (or the SII in the no mask condition decreases), however, decreases in the corresponding intelligibility due to wearing a mask grow from nearly 0% at a noise level of 0 dB to 20% at a noise level of 50 dB. Therefore, the model investigated here can predict the small and large changes in intelligibility depending on the level of background noise or other factors that decrease the SII in the no mask condition. The experimental studies that found small changes in intelligibility due to a mask were conducted in low noise conditions (Magee et al., 2020), whereas experiments that found large changes in intelligibility due to a mask were conducted in high noise conditions (Keerstock et al., 2020; Toscano and Toscano, 2021; Truong and Weber, 2021).
(Color online) The relation between the SII and intelligibility as a function of the mask resistance at different levels of the background pink noise amplitude for the average vowel at . The solid lines indicate the SII, and the dotted lines indicate the corresponding intelligibility.
(Color online) The relation between the SII and intelligibility as a function of the mask resistance at different levels of the background pink noise amplitude for the average vowel at . The solid lines indicate the SII, and the dotted lines indicate the corresponding intelligibility.
Figure 5 also displays that as the mask resistance and inductance change, the collision forces are moderately influenced for an average vowel. Specific vowels, however, can undergo either increases or decreases in the collision force with an increasing mask resistance/inductance. For example, there is a slight increase in the collision force for the /a/ vowel at and a more substantial decrease for the /i/ vowel at . These changes in the collision force are the result of differing acoustic coupling effects under the increase in mask parameters. Several studies have shown that changes in the acoustic characteristics of the vocal tracts alter the dynamics of the VF oscillations (Lucero et al., 2012; Titze, 2008) with the specific effects depending on a variety of factors, such as the shape of the vocal tract and the resulting distribution of formants.
V. INFLUENCE OF SOCIAL DISTANCING ON ACOUSTIC SIGNALS
In this section, we briefly discuss the influence of social distancing on acoustic signals. Recall the simplified acoustic model of social distancing in Sec. II C and consider the far-field pressure amplitudes associated with the radiated flow rate given in Eq. (3). Let us fix the harmonic index n and consider the pressure amplitudes associated with n at different distances d1 and d2, namely, and . Then, using Eq. (3), the ratio of the amplitudes is , which is independent of the frequency ωn.
Figure 7 displays the influence of increasing the distance on the SPL and SII (assuming 50 dB of background noise). The effect of the distance on the SPL follows a decay regardless of the vowel sound. The effect of the distance on the SII follows a piecewise decay and depends on the vowel sound caused by the saturation effects in the different frequency bands. This inverse relation indicates a reduction in the sound intensity and intelligibility with the distance from the speaker. In Sec. VI, we investigate how compensating for reductions in the voice measures, associated with social distancing and wearing masks, affect the biomechanics of the VFs.
(Color online) The effect of the distance from the mouth on the SPL and SII for the average vowel at . The background noise is assumed to be 50 dB for the SII calculations. The influence of the distance on the SPL is not affected by the vowel tract shapes so all of the results appear identical in the upper plot.
(Color online) The effect of the distance from the mouth on the SPL and SII for the average vowel at . The background noise is assumed to be 50 dB for the SII calculations. The influence of the distance on the SPL is not affected by the vowel tract shapes so all of the results appear identical in the upper plot.
VI. SUBGLOTTAL PRESSURE COMPENSATION
As shown in Secs. IV and V, wearing a mask and social distancing lead to reductions in several acoustic measures, including the SPL and SII. In this section, we aim to explore the influence of increasing the subglottal pressure, , as a compensatory mechanism on the mechanics of phonation, particularly the VF collision forces, which are associated with phonotrauma.
Here, we consider the phonation simulations with increasing mask resistance/mass and various subglottal pressure values. Increasing the mask resistance with the fixed area density corresponds to increasing the mask filtration efficiency while keeping the mass fixed (e.g., wearing a surgical mask instead of a nonmedical mask of the same mass density), whereas increasing the mass density with a fixed resistance corresponds to increasing the thickness/number of layers of the low quality mask to attain a goal filtration efficiency level. For the given SPL and SII in the case of no mask at 1 m, we seek the requisite subglottal pressure that yields the same acoustic output at a given distance when wearing a mask (particular resistance/mass combination). This is performed at 1 and 2 m listener distances to account for the social distancing measures.
Figures 8 and 9 show the compensations in required to achieve the same SPL or SII as in the no mask case at (a reasonable subglottal pressure during speech) for various vowel sounds with an increasing mask resistance and fixed area density and increasing area density and fixed resistance, respectively. For all of the vowels, an increasing mask resistance or mass generally necessitates increasing the subglottal pressure to sustain either the target SPL or SII. Unsurprisingly, doubling the distance from the speaker to the listener also requires an increase in .
(Color online) The compensation for the (left) /a/, (middle) /i/, and (right) average vowels as a function of the mask scaled resistance with fixed . The background noise is assumed to be 50 dB for the SII calculations. (Top row) The subglottal pressure and (bottom row) collision force are shown. The subglottal pressure is normalized by 1000 Pa while is normalized by the value in the no-mask case. A subset of the investigated vowels has been shown for clarity but all of the vowels are included in the average.
(Color online) The compensation for the (left) /a/, (middle) /i/, and (right) average vowels as a function of the mask scaled resistance with fixed . The background noise is assumed to be 50 dB for the SII calculations. (Top row) The subglottal pressure and (bottom row) collision force are shown. The subglottal pressure is normalized by 1000 Pa while is normalized by the value in the no-mask case. A subset of the investigated vowels has been shown for clarity but all of the vowels are included in the average.
(Color online) The compensation for the (left) /a/, (middle) /i/, and (right) average vowels as a function of the mask area density with fixed . The background noise is assumed to be 50 dB for the SII calculations. (Top row) The subglottal pressure and (bottom row) collision force are shown. The subglottal pressure is normalized by 1000 Pa while is normalized by the value in the no-mask case. A subset of investigated vowels has been shown for clarity but all of the vowels are included in the average.
(Color online) The compensation for the (left) /a/, (middle) /i/, and (right) average vowels as a function of the mask area density with fixed . The background noise is assumed to be 50 dB for the SII calculations. (Top row) The subglottal pressure and (bottom row) collision force are shown. The subglottal pressure is normalized by 1000 Pa while is normalized by the value in the no-mask case. A subset of investigated vowels has been shown for clarity but all of the vowels are included in the average.
The compensation required for the SII and SPL are similar but reflect subtle differences due to their definitions. The SII is a measure of intelligibility that includes frequency band weighting and saturation effects, whereas the SPL is purely a measure of the sound intensity. As a result, the SPL and SII are affected differently by the frequency dependent attenuation induced by wearing a mask (see Fig. 4) with the compensation required for the SII being generally greater. This is likely because the SII places a greater weight on the mid-band frequencies (Fig. 2), which are typically the frequencies that experience the greatest attenuation by the mask (Fig. 4) in our study.
Herein, the background noise was assumed to be 50 dB to model noise in typical environments, but other noise levels would affect the interpretation of the compensation results. In low noise environments (when the SII is high in the no mask condition), the minor decreases in the SII due to a mask, shown in Fig. 5 (about 0.1), would have minimal impacts on intelligibility (Kryter, 1962b). This suggests that in low noise environments, the compensation for intelligibility may not be necessary because while the mask does reduce it, the conversation context would likely be sufficient to fill in any gaps. In high noise environments, the SII in the no mask condition will already be low, therefore, the same decrease in the SII results in a much larger decrease in the intelligibility (see Fig. 6). As a result, the full compensatory increase predicted in Figs. 8 and 9 is more applicable. The effect of noise on intelligibility will also depend on additional factors. For example, Keerstock et al. (2020) found that non-native speaker intelligibility is greatly affected by masks and noise, and using a clear speech style can reduce the effects of masks and noise on intelligibility. Effects like these would influence the relation between the SII and intelligibility and, thus, the level of compensation required.
Figures 8 and 9 show that the compensation required for doubling the distance is generally greater than compensation for increasing the mask resistance or mass, indicating that compensating for social distancing requires a relatively higher vocal effort in comparison with that to overcome the attenuation associated with wearing a mask.
Finally, Figs. 8 and 9 show that increases in the subglottal pressure to compensate for masking and social distancing measures leads to increased VF collision forces.
The trends in the VF collision forces with compensation for different masks and social distancing are caused by a combination of the effect of the mask itself on the collision force and the increased subglottal pressure for compensation. Generally, increasing the subglottal pressure increases the collision forces due to the increased vibration amplitudes; however, increasing the mask resistance/mass can either increase or decrease the collision forces, as seen in Fig. 5, depending on the vowel. In the case of the /a/ vowel, the mask itself has little impact on the collision force so the collision force primarily increases as a result of the increased subglottal pressure for compensation. In the case of the /i/ vowel, increasing the mask resistance/mass tends to decrease the collision force (Fig. 5) while the compensatory effects tend to increase it. As a result, the collision force trends for the /i/ vowel show an initial increase followed by a decrease due to these competing effects.
VII. COMMENTS ON MASK USAGE AND SOCIAL DISTANCING
We observed how compensating for reductions in the different voice measures associated with mask wearing and social distancing alters the mechanics of phonation and, in particular, leads to an increase in the subglottal pressure and VF collision. The collision pressure and resulting high stresses in the VF body have been hypothesized to play a large role in vocal trauma and the formation of VF nodules (Gunter, 2003; Tao and Jiang, 2007; Titze, 1994). The compensatory subglottal pressure and resulting increased collision forces seen here would likely contribute to an increased risk for vocal hyperfunction, which is in agreement with recent observations of vocal fatigue in healthcare workers who follow protective measures for long periods of time (McKenna et al., 2021). Forensic investigations on the prevalence of voice pathologies now and in the near future in comparison with pre-pandemic levels will shed additional insight into the clinical repercussions of long term prophylactic use and social distancing, particularly for at risk groups, such as teachers.
Although masks and social distancing can contribute to hyperfunction and the development of voice disorders with prolonged usage, there are a few practical strategies that could mitigate the deleterious effects of these prophylactics on the voice while retaining their important role in prevention of airborne disease transmission. First, light masks are preferable to heavy masks for the same particle filtration properties. This is because the larger mask mass increases the sound attenuation and will require larger compensatory effects. Second, the negative effects of masks on intelligibility can be greatly reduced by speaking in low noise environments. When intelligibility is high, mask wearing causes only minor decreases in intelligibility such that compensation may not be required (Sec. VI). As speech intelligibility can be increased also by reducing the distance between the speaker and audience, there are practical considerations in certain environments that could reduce fatigue. For example, seating configurations can be adjusted such that the distance from the speaker to the audience is similar for all of the audience members (e.g., put classrooms' desks in a circular arrangement to follow social distancing guidelines while simultaneously minimizing the distance to the furthest listener). Additionally, microphones should be employed when possible to eliminate the need of a speaker to raise their voice. Intelligibility can also be increased by changing the speech style, such as by speaking in a clear style (Keerstock et al., 2020). Combined, these strategies could greatly reduce the compensatory adjustments required while wearing a mask and maintaining a safe social distance.
VIII. LIMITATIONS
The current study aimed to investigate the consequences of following two of the recommended COVID-19 protective measures, namely, social distancing and mask wearing, on the biomechanics of phonation. Although our analysis was effective in revealing some of the potential consequences of the protective measures on the vocal health of individuals, the extent of the applicability of the results from the analysis is limited due to the implemented assumptions, for example, the use of a simplified acoustic model of social distancing, which neglects the frequency dependent effects. These may influence how the distance affects intelligibility especially if large distances are to be considered, neglecting the effects of the surroundings (walls, ceiling, corners) on acoustic signals (reflection, absorption, interference) and how such effects influence intelligibility and acoustic attenuation associated with COVID-19 protective measures, and considering the SII and SPL only when studying compensation. Our measurement of intelligibility through SII, also, cannot capture some aspects of intelligibility; for example, context, body language, and phrasing in real speech could help improve intelligibility despite poor intelligibility from the acoustics alone. There are several vocal and nonvocal measures (such as the effects on breathing and mouth visibility) that are affected by the COVID-19 protective measures, and the compensation patterns associated with these other measures may be different. However, it is unclear what vocal and nonvocal measures besides “being heard” are compensated for in individuals when mask wearing and social distancing. Our usage of a subset of vowel sounds to represent average speech may also be limited as we observed in Sec. IV B that acoustic coupling has a significant effect on the attenuation behavior of masks. As a result, it may be important to consider running speech and consonant sounds to elucidate more accurately the acoustic effects of COVID-19 protective measures, especially mask wearing, on vocal health.
IX. CONCLUSION
In this paper, by means of numerical phonation simulations, we investigated the effects of wearing masks and social distancing on intelligibility and sound intensity. Moreover, we studied how compensating for reductions in the SII and SPL by means of increasing the subglottal pressure affects the mechanics of phonation.
Our analysis showed that masks have low-pass filtering effects, which agrees qualitatively with the available experimental observations. Furthermore, numerical simulations demonstrated how wearing masks and social distancing reduce the sound intensity and intelligibility. The simulations showed that decreases in the SII and SPL due to the mentioned protocols require compensatory subglottal pressure increases. These compensatory increases could potentially lead to vocal hyperfunction and, in turn, the development of other vocal disorders such as nodules.
The current study employed a simple acoustic model of masks that captures the general trends reasonably while there are some deviations from experimental observations of the acoustic effects of masks. In future work, we aim to develop and implement models that capture the acoustic effects of masks more accurately. Then, such models will be used in phonation simulations to weigh the significance of increased VF collision pressures associated with compensating for reduced voice measures to analyze the long term effects of wearing masks on the health of the VFs.
ACKNOWLEDGMENTS
The research reported in this work was supported, in part, by the National Institute on Deafness and Other Communication Disorders (NIDCD) of the National Institutes of Health (NIH) under Award No. P50DC015446, the National Science Foundation (CBET:2029548), and Agencia Nacional de Investigacíon y Desarrollo (ANID) under Award No. BASAL FB0008. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. J.J.D. and M.A.S. contributed equally to this paper.
In this study, we define a mask to be any thin porous textile material. This definition covers a wide variety of mask types, including surgical masks, nonmedical masks, and cloth face covers. As modeling assumptions, the flow through a mask is assumed to be proportional to the pressure difference across the mask when the mask mass is at rest, and the mask dynamics are driven by that pressure difference (see Sec. II B).