Superspreading events and overdispersion are hallmarks of the COVID-19 pandemic. However, the specific roles and influence of established viral and physical factors related to the mechanisms of transmission, on overdispersion, remain unresolved. We, therefore, conducted mechanistic modeling of SARS-CoV-2 point-source transmission by infectious aerosols using real-world occupancy data from more than 100 000 social contact settings in ten US metropolises. We found that 80% of secondary infections are predicted to arise from approximately 4% of index cases, which show up as a stretched tail in the probability density function of secondary infections per infectious case. Individual-level variability in viral load emerges as the dominant driver of overdispersion, followed by occupancy. We then derived an analytical function, which replicates the simulated overdispersion, and with which we demonstrate the effectiveness of potential mitigation strategies. Our analysis, connecting the mechanistic understanding of SARS-CoV-2 transmission by aerosols with observed large-scale epidemiological characteristics of COVID-19 outbreaks, adds an important dimension to the mounting body of evidence with regard to airborne transmission of SARS-CoV-2 and thereby emerges as a powerful tool toward assessing the probability of outbreaks and the potential impact of mitigation strategies on large scale disease dynamics.

## I. INTRODUCTION

Superspreading events and overdispersion are now well-established characteristics of the COVID-19 pandemic, similar to SARS and other outbreaks of respiratory viruses.^{1} The documented outbreaks at the Skagit Valley Chorale,^{2,3} at a restaurant in Guangzhou,^{4,5} and at a call center in Korea^{6} are examples of superspreading events, where one infectious index case led to tens of individuals infected within a few hours, at an order of magnitude higher than the basic reproduction number $2\u2264R0\u22643.6$ for the original SARS-CoV-2 variant.^{7} The term “superspreading” has generally been ascribed to any event (index case and exposures) that leads to more than the average number of secondary transmissions and, thus in probabilistic terms, could refer to any number of secondary cases to the right of the expectation.^{1} As such, it has been proposed that superspreading events are not exceptional events, but is an expected feature that emerges from the right-sided tail beyond the basic reproduction number. When this right-hand tail is further skewed with greater variability than expected, leading to an uneven distribution, the term overdispersion is applied (statistical definition). In the context of communicable diseases, overdispersion refers to a non-random pattern of clustering, and which often include a large number of zero cases and a small number of larger outbreaks.^{8} This pattern of overdispersion can be applied at the level of an event or index case (individual-level variation)^{8} or in the context of networks (population-level via onward transmission chains):^{9} both fall under the broader epidemiological umbrella of heterogeneity. In the context of the former, data suggest that such superspreading events are characterized by overdispersion in SARS-CoV-2 transmissions with 10%–20% of index cases responsible for 80% of secondary cases.^{10}

Understanding the nature and characteristics of superspreading events is, therefore, key to understanding the SARS-CoV-2 spread. Agent based modeling by Sneppen *et al.*^{11} suggested that non-repeating, random contacts, such as those in restaurants and bars, are a dominant contributor to the SARS-CoV-2 spread. Chen *et al.*^{12} argued that while social and micro-environmental factors affect transmissibility, overdispersion could result from an intrinsic characteristic of certain viruses. The knowledge gap that remains is the extent to which each of these factors contributes to the observed distribution of secondary SARS-CoV-2 transmissions per index case, by connecting variability in each of the components with our mechanistic understanding of SARS-CoV-2 transmission.

Emerging data confirm the importance of airborne transmission of SARS-CoV-2 by respiratory aerosols.^{13–17} A large number of small respiratory droplets (when size $<100\u2009\mu m$ at the point of exhalation often referred to as aerosols, as suggested by Prather *et al.*^{18}) can remain airborne in the liquid or semi-solid state,^{19} encapsulating the SARS-CoV-2 virus. As a result, the virus can remain infectious within the aerosols for substantial length of time.^{20} Several studies have analyzed airborne transmission in specific micro-environments. Bourouiba *et al.*^{21} identified that respiratory droplets and aerosols exhaled during violent expiratory events can travel long distances co-flowing with the moist air jet. Abkarian *et al.*^{22} analyzed exhaled air flow during speech and how certain phonetics produce a train of puffs. Chen *et al.*^{23} showed that for talking and coughing, a short range airborne route dominates transmission of respiratory infections. Analyzing a respiratory droplet/aerosol laden cough jet, modeling by Chaudhuri *et al.*^{19} showed that aerosols (droplet-nuclei) of initial size less than 50 *μ*m pose highest infection risk, and variation in the corresponding viral load could lead to large variation in the number of secondary infections. Using well-mixed assumptions and the Wells–Riley model,^{24,25} Buonanno *et al.*^{26} proposed a quantitative risk assessment for specific micro-environments with asymptomatic infectious cases and suggested that instead of specific superspreaders, it is a combination of several factors, including emission and exposure that lead to highly probable, superspreading events. Using a similar well-mixed approach, Bazant and Bush^{27} proposed to restrict the occupancy number and time spent in a room to mitigate airborne transmission.^{27} Schijven *et al.*^{28} estimated risk of infection resulting from sneezing, coughing, speaking, breathing, and singing at different viral loads. Analysis by Bond *et al.*^{29} quantified the importance of confinement in pathogen transport using “effective rebreathed air volume.” Dbouk and Drikakis^{30–32} studied the transport and evaporation of virus contaminated saliva droplets and looked at how these can be implemented in practical models. Further details on specific aspects of airborne disease transmission, including but not limited to aerosols, flow physics, and respiratory droplet size distribution, could be found in a recent review and opinion articles.^{33–37} Yet, to date mechanistic models describing airborne transmission have not been coupled with real-world distributions and occupancy information toward understanding the large scale features of disease dynamics, for example, overdispersion in transmissibility.

The overarching goal of this work is to leverage the mechanistic underpinnings of the airborne disease transmission to explore event-level overdispersion of the SARS-CoV-2 spread using real-world inputs from a large number of social gatherings. The specific goals are:

Develop an algorithm based on aerosol dispersion with randomized inputs and available occupancy data to generate distribution of the number of secondary infections per infectious case.

Explore if observed patterns of overdispersion in secondary transmissions could be reproduced via simulations using the above algorithm.

Derive an analytical function (and not a fit), which can describe the probability density function of the number of the secondary infections from the dynamics of the problem.

Identify the dominant variables that drive overdispersion and the resulting implications for mitigation measures.

To these ends, an aerosol spread model is solved over a hundred thousand random social-contact settings, utilizing real-world occupancy and area information coupled with realistic input distributions of viral-load and ventilation rates to obtain the probability distributions of the number of secondary infections per infectious case in those settings. We focus specifically on 103 679 restaurant settings from ten metropolitan areas in the US, motivated by the finding that these locations represent important centers of transmission in the early phase of the pandemic in the US.^{38} Transmission in such random, non-repetitive settings represents point source transmission. Emergence of later variants (delta, omicron) characterized by shorter generation times with respect to the original variant^{39,40} highlights the importance of understanding transmission and overdispersion in point source transmission settings. We note that the simulation in this paper addresses overdispersion in the number of secondary cases generated per infectious case over a period of about an hour in such hundred thousand instances, conditional on the presence of one index case at each location. The analytical expressions developed and validated with the simulations can address overdispersion over much wider time scales and contact settings, when coupled with appropriate inputs. In any case, in this paper, we do not directly address the number of secondary cases spawned by an infectious case over their entire course of infection experiencing different types of contact.

## II. ALGORITHM

A semi-analytical framework based on turbulent diffusion of infectious aerosols exhaled by an infectious, asymptomatic (or presymptomatic) individual is coupled with a dose response model^{41} to compute the number of secondary infections inside a room of specified dimensions and ventilation rate. This is described in detail in Sec. III. Using random samples of viral load, volume flow rate of ejected respiratory liquid, air changes per hour (*ACH*) generated from correspondingly observed distributions, along with real world occupancy and area data from SafeGraph, and exhaled aerosol size distribution, we use the algorithm depicted in Fig. 1 to generate the probability density function (pdf) of the number of secondary infections per infectious case.

The algorithm runs over *N _{s}* iterations where

*N*= 103 679 corresponding to the number of full service restaurants in ten US cities, available from SafeGraph data, used for generating a pdf at the end;

_{s}*i*being the iteration index. The restaurants serve as proxy for non-repeating, random, social contact settings and are referred to as points of interest (POI). First, at each such an indoor micro-environment of the POI, the volume of the mucosalivary fluid ejected per unit time due to speech and breath (random sample obtained from a truncated normal distribution whose mean is evaluated from the exhaled respiratory droplet volume size distribution, vsd) multiplied with a random sample from the log-normal viral load distribution is fed into the code to generate the number of virions emitted per unit time. The reader is requested to refer to the Appendix for details on the truncated normal distribution of the volume flow rate of ejected mucosalivary liquid. This is used in the turbulent diffusion solutions shown in Eqs. (4)–(7) in Sec. III, which establishes the spatio-temporally resolved virion concentration field. Random samples of exposure time, speaking time,

*ACH*, and area of the point of interest (POI), along with virus half-life, which is calculated separately, also act as input parameters for the solver. The resulting virus concentration field is used in the dose response model (with a specified dose response constant) to obtain probability of infection fields. Next, these probability fields are multiplied with the POI specific population density data (from SafeGraph) to obtain the number of secondary infections:

*Z*. Once the iteration index reaches

*N*, the iteration ends and the pdf of $Z$ is calculated using

_{s}*Z*values from every iteration. Note that the random variable—the number of secondary infections at specific POIs or individual infectivity is denoted by

*Z*, while the sample-space variable corresponding to

*Z*is denoted by $Z$. Pdf of $Z$, i.e., probability density function, defined as probability per unit distance in the sample space of $Z$,

^{42}is a convenient mathematical tool to describe overdispersion, irrespective of the specific number of events or samples. Furthermore, pdfs are amenable to analytical descriptions. Fundamentally, any long tailed pdf represents an inherently overdispersed random variable, because the long tail represents finite (but could be small) probability of an event where $Z\u226b$ mean of

*Z*, while any pdf other than that represented by a delta function represents heterogeneity. In view of these, in this paper, the analysis of overdispersion will be addressed using both simulated and analytical pdfs of $Z$.

Three key inputs—particle size distribution, viral load distribution, and occupancy information are described in Sec. III; details of the rest of the inputs: ventilation rate distribution, speech time, and exposure time distributions can be found in the Appendix. A table summarizing the important parameter values, distributions, and sources can also be found there (Table I).

Symbols . | Definitions . | Values . | References . |
---|---|---|---|

N _{s} | Total number of random samples used | 103 679 | Total number of data points from restaurants over ten US cities |

ρ | Viral load—log-normal distribution with mean μ and standard deviation σ | μ = 13.839 4 and σ = 3.630 1 for original variant | Data from Ref. 57 |

ACH | Air change rate (ACH)—measure of amount of air replaced in a room per unit time per unit volume of the room, log-normal distribution | $\mu ACH=0.7701,\sigma ACH=0.7554$ | Indoor ventilation rate is typically log-normally distributed. Parameters obtained from measured ACH distribution in restaurants from Ref. 59 |

$Q\u0307ls$ | Volume of respiratory liquid ejected during speaking per unit time | $\u27e8Q\u0307ls\u27e9=2.222\xd710\u22126\u2009ml/s,\sigma Q\u0307ls=3.078\xd710\u22126\u2009ml/s$ | Data from Refs. 36 and 66–69 were used to determine the normal distribution parameters |

N | Susceptible population at a given POI from SafeGraph data, exponential distribution | $\nu =7.707$ | SafeGraph occupancy and area datasets |

τ | Exposure time of a susceptible individual to the ejected aerosols, log-normal distribution | $\mu \tau =0,\sigma \tau =0.4$ | Distribution chosen such that average exposure time corresponds to the occupancy time over which the seven day averaged SafeGraph data are obtained. |

t _{s} | Duration of speaking of the infected individual, uniform distribution | $0.25\tau \u2264ts\u22640.5\tau $ | Estimated such that $ts\u22640.5\tau $ to ensure the speaking time is shared between infectors and susceptibles |

$Ds,0$ | Initial droplet diameter distribution (log-normal) | $10\u22122\mu m\u2264Ds,0\u2264102\mu m$ | Range used by Pöhlker et al.^{36} |

D _{i} | Mean geometric diameter of a mode of expiratory event (b—breath, s—speech) | $(D1)b=0.07\mu m,\u2009(D2)b=0.3\mu m$, $(D1)s=0.07\mu m,\u2009(D2)s=0.3\mu m,\u2009(D3)s=1.00\mu m,\u2009(D4)s=10\mu m,\u2009(D5)s=96\mu m$ | Input parameter in the log-normal fit function for respiration PSDs provided by Pöhlker et al.^{36} (Table VI) |

A _{i} | Number concentration at D _{i} | $(A1)b=7.7cm\u22123,\u2009(A2)b=1.1cm\u22123,\u2009(A1)s=9.8cm\u22123,\u2009(A2)s=1.4cm\u22123,\u2009(A3)s=1.7cm\u22123,\u2009(A4)s=0.03cm\u22123,\u2009(A5)s=0.17cm\u22123$ | Input parameter in the log-normal fit function for respiration PSDs provided by Pöhlker et al.^{36} (Table VI) |

σ _{i} | Modal geometric standard deviation | $(\sigma 1)b=0.9,\u2009(\sigma 2)b=0.9$, $(\sigma 1)s=0.9,\u2009(\sigma 2)s=0.9,\u2009(\sigma 3)s=0.9,\u2009(\sigma 4)s=0.98,\u2009(\sigma 5)s=0.97$ | Input parameter in the log-normal fit function for respiration PSDs provided by Pöhlker et al.^{36} (Table VI) |

$V\u0307$ | Volume of air exhaled in an expiratory event (b—breath, s—speech) | $V\u0307s=194.4cm3s\u22121,\u2009V\u0307b=100cm3s\u22121$ | Characteristic values used for calculations in Pöhlker et al.^{36} (Table I) |

$V\u0307b$ | Inhaled volume of air | $100cm3s\u22121$ | Inhalation and exhalation volume during one breath cycle considered to be equal. Value for exhalation volume obtained from Pöhlker et al.^{36} (Table I) |

H | Room height | 3 m | |

$u*$ | Friction velocity | $0.03ms\u22121$ | Chosen from the range of values used in Ref. 64. Figure 4 ^{64} shows that all values in the range have overlapping deposition velocity curves in the higher particle size range, which determines the average deposition velocity value |

T | Ambient temperature | $294.7K$ | Recommended typical indoor condition by ASHRAE |

RH | Relative humidity | 0.5 | Recommended typical indoor condition by ASHRAE |

UV _{index} | Strength of ultraviolet radiation | 0 | Recommended typical indoor condition by ASHRAE |

$t1/2$ | Half-life of airborne virus within aerosols | $32.07min$ | Based on above parameters at standard indoor conditions, from Refs. 20 and 51 estimated using the DHS calculator |

r _{v} | Dose response constant | 1/1440 copies | Obtained from Ref. 28 |

Symbols . | Definitions . | Values . | References . |
---|---|---|---|

N _{s} | Total number of random samples used | 103 679 | Total number of data points from restaurants over ten US cities |

ρ | Viral load—log-normal distribution with mean μ and standard deviation σ | μ = 13.839 4 and σ = 3.630 1 for original variant | Data from Ref. 57 |

ACH | Air change rate (ACH)—measure of amount of air replaced in a room per unit time per unit volume of the room, log-normal distribution | $\mu ACH=0.7701,\sigma ACH=0.7554$ | Indoor ventilation rate is typically log-normally distributed. Parameters obtained from measured ACH distribution in restaurants from Ref. 59 |

$Q\u0307ls$ | Volume of respiratory liquid ejected during speaking per unit time | $\u27e8Q\u0307ls\u27e9=2.222\xd710\u22126\u2009ml/s,\sigma Q\u0307ls=3.078\xd710\u22126\u2009ml/s$ | Data from Refs. 36 and 66–69 were used to determine the normal distribution parameters |

N | Susceptible population at a given POI from SafeGraph data, exponential distribution | $\nu =7.707$ | SafeGraph occupancy and area datasets |

τ | Exposure time of a susceptible individual to the ejected aerosols, log-normal distribution | $\mu \tau =0,\sigma \tau =0.4$ | Distribution chosen such that average exposure time corresponds to the occupancy time over which the seven day averaged SafeGraph data are obtained. |

t _{s} | Duration of speaking of the infected individual, uniform distribution | $0.25\tau \u2264ts\u22640.5\tau $ | Estimated such that $ts\u22640.5\tau $ to ensure the speaking time is shared between infectors and susceptibles |

$Ds,0$ | Initial droplet diameter distribution (log-normal) | $10\u22122\mu m\u2264Ds,0\u2264102\mu m$ | Range used by Pöhlker et al.^{36} |

D _{i} | Mean geometric diameter of a mode of expiratory event (b—breath, s—speech) | $(D1)b=0.07\mu m,\u2009(D2)b=0.3\mu m$, $(D1)s=0.07\mu m,\u2009(D2)s=0.3\mu m,\u2009(D3)s=1.00\mu m,\u2009(D4)s=10\mu m,\u2009(D5)s=96\mu m$ | Input parameter in the log-normal fit function for respiration PSDs provided by Pöhlker et al.^{36} (Table VI) |

A _{i} | Number concentration at D _{i} | $(A1)b=7.7cm\u22123,\u2009(A2)b=1.1cm\u22123,\u2009(A1)s=9.8cm\u22123,\u2009(A2)s=1.4cm\u22123,\u2009(A3)s=1.7cm\u22123,\u2009(A4)s=0.03cm\u22123,\u2009(A5)s=0.17cm\u22123$ | Input parameter in the log-normal fit function for respiration PSDs provided by Pöhlker et al.^{36} (Table VI) |

σ _{i} | Modal geometric standard deviation | $(\sigma 1)b=0.9,\u2009(\sigma 2)b=0.9$, $(\sigma 1)s=0.9,\u2009(\sigma 2)s=0.9,\u2009(\sigma 3)s=0.9,\u2009(\sigma 4)s=0.98,\u2009(\sigma 5)s=0.97$ | Input parameter in the log-normal fit function for respiration PSDs provided by Pöhlker et al.^{36} (Table VI) |

$V\u0307$ | Volume of air exhaled in an expiratory event (b—breath, s—speech) | $V\u0307s=194.4cm3s\u22121,\u2009V\u0307b=100cm3s\u22121$ | Characteristic values used for calculations in Pöhlker et al.^{36} (Table I) |

$V\u0307b$ | Inhaled volume of air | $100cm3s\u22121$ | Inhalation and exhalation volume during one breath cycle considered to be equal. Value for exhalation volume obtained from Pöhlker et al.^{36} (Table I) |

H | Room height | 3 m | |

$u*$ | Friction velocity | $0.03ms\u22121$ | Chosen from the range of values used in Ref. 64. Figure 4 ^{64} shows that all values in the range have overlapping deposition velocity curves in the higher particle size range, which determines the average deposition velocity value |

T | Ambient temperature | $294.7K$ | Recommended typical indoor condition by ASHRAE |

RH | Relative humidity | 0.5 | Recommended typical indoor condition by ASHRAE |

UV _{index} | Strength of ultraviolet radiation | 0 | Recommended typical indoor condition by ASHRAE |

$t1/2$ | Half-life of airborne virus within aerosols | $32.07min$ | Based on above parameters at standard indoor conditions, from Refs. 20 and 51 estimated using the DHS calculator |

r _{v} | Dose response constant | 1/1440 copies | Obtained from Ref. 28 |

## III. MATERIALS AND METHODS

In this section, first we describe the model to obtain the probability of infection resulting from inhalation of infectious aerosols generated from speaking and breathing, in an indoor, confined, and ventilated micro-environment. Next, we connect this model to an algorithm—a first of its kind to our knowledge that accepts randomized inputs from distributions of viral load, exhaled aerosol size distribution, ventilation rate, speech, and exposure time corresponding to specific inputs of the number of people occupying specific indoor areas. The occupancy information is obtained from a large SafeGraph dataset of full-service restaurants from ten major US cities. The restaurants serve as our study setting for time-limited, point source transmission via mostly non-repeating, random contacts, in contrast to households or workplaces where repeated contacts are made with the same individuals and over longer periods of time. In this work, we focus on disease spread by asymptomatic infectious cases (asymptomatic at the time of disease spread, therefore including presymptomatics) under the assumption that individuals with symptoms would not be engaged in indoor dining. Hence, we consider only speech and breath as the mechanisms by which respiratory aerosols are ejected into specific micro-environments. The algorithm developed for this work is shown in Fig. 1.

### A. Aerosol dispersion from an infectious case

We use the standard, turbulent diffusivity based closure^{42} to model the spatio-temporal evolution of ensemble averaged infectious aerosol concentration *c _{a}* (the number of aerosol particles per unit volume of air associated with emissions from the infector), as shown in Eq. (1). This approach was used by Drivas

*et al.*

^{43}to model indoor concentration fields from point sources. Venkatram and Weil

^{44}have also shown that gradient transport based turbulence models work reasonably well in indoor environments.

The turbulent diffusivity *D _{T}* in Eq. (1) is a function of the air-change rate

*a*and the area of the space given by

*A*. The second term on the RHS is a sink term that accounts for removal of particles by ventilation (

*a*is the air change rate) and by deposition, while

*w*is the deposition velocity and

_{d}*V*being the volume of the indoor space

We treat the infectious case as a continuous point source. The initial condition is $ca(x,y,z,0)=0$ with reflection boundary conditions at six walls. Since the virus remains embedded inside the infectious aerosols and they are non-volatile, the ratio of the number of virions to the number of aerosol particles in any given volume of air can be assumed to remain constant post-ejection. To retain analytical tractability of the solution, in this work, we consider fast evaporation and a constant, post-evaporation, aerosol volume-averaged *w _{d}*; see Refs. 19, 23, and 45 for detailed evaporation and deposition considerations of respiratory droplets. Therefore, the ensemble averaged concentration of virus RNA copies per unit volume of air

*c*is proportional to

*c*, or

_{a}The average number of virions within an aerosol particle, $Na$, is the constant of proportionality. Using the identity equation (2), we can immediately convert Eq. (1) into an evolution equation for *c*, as shown as

While one can use more complex approaches to model turbulent mixing and aerosol dispersion, the advantage of the relatively simple, yet sufficiently robust and accurate, diffusivity based closure is the analytical tractability and the inexpensive solution it offers. As will be seen later, such an attribute is pivotal to generating the large number of realizations of the *c* field, utilized in this work. Indeed, direct numerical simulation or large eddy simulation based computational fluid dynamics (CFD) could produce high fidelity solutions for one specific micro-environment at high computational cost similar to the studies by Pendar and Páscoa^{46} who looked at the fluid dynamics and transport characteristics of a sneeze within an indoor location, and Zheng *et al.*^{47} who studied the transmission of aerosolized viruses within a densely populated urban area. Other viable techniques like the Lagrangian particle model coupled with a continuous random walk model used by Wang *et al.*^{48} to simulate the droplet dynamics of expelled droplets also exist as alternatives. However, it is evident that for the 100 000 micro-environments investigated here, such techniques cannot be used. Investigating such a large number of cases behooves application of a robust reduced order model for aerosol dispersion. The robustness of the present reduced order model used is proven by the comparison with experimental results as in Ref. 49 as well as the comparison with experimental and CFD results given by Hathway *et al.*^{50} These will be discussed later in further details. As such, for a continuous point source $Qx0,y0,z0$ at $(x0,y0,z0)$, the solution of the concentration field $c(x,y,z,t\u2032)$ at time $t\u2032$ was given by^{43}

with wall reflection terms for $i,j,k\u22600$

*L*, *W*, and *H* are the length, width, and height of the room, respectively. Here, we have introduced a virus survivability function $\psi (t)$ in the solution, as in Ref. 19

Based on the experiments by Dabisch *et al.*^{20} and calculations from DHS,^{51} for $T=21.7\u2218C$, *RH *=* *0.50, *UV _{index}* = 0, i.e., typical ASHRAE recommended indoor air conditions, for SARS-CoV-2, half-life comes out as $t12=32.07$ min. Here,

*T*is the temperature,

*RH*is the relative-humidity, and

*UV*is the ultra-violet index inside the room of interest. The SARS-CoV-2 virus half-life reduces monotonically with temperature and non-monotonically with relative humidity

_{index}^{52}like other enveloped viruses, as shown by Marr

*et al.*

^{53}

The above solution given in Eqs. (4)–(7) [without $\psi (t)$] was validated by Cheng *et al.*^{49} who released CO from a point source and measured its spatiotemporal dispersion characteristics inside typical built environments. Indeed aerosols deposit (accounted for in this paper) unlike CO, but since their motion is predominantly controlled by turbulent diffusion inside a room (turbulent diffusivity $DT\u226b$ molecular diffusivity of CO or effective diffusivity of aerosol particles), the validation exercise is highly relevant. They suggested the following correlation among turbulent diffusivity, area of room, and air change rate:

which is used for the present study as well. Here, *L*_{0} is given as $(A\xd7H)1/3$, *ACH* is the air changes per hour, and therefore, $a=ACH/3600$. The model described through Eqs. (1)–(9) was further compared with the experimental and computational results provided by Hathway *et al.*^{50} for central point source bioaerosol dispersion within a ventilated room. The model used shows qualitative and quantitative match within the uncertainty limits. The detailed comparison is provided in the Appendix.

Denoting *τ* as the time duration of the event under considerations, the ensemble averaged number of infectious virions, generated from both speech and breath of the infected individual located at $x0,y0,zo$, that is inhaled at $x,y,z0$ up to time *τ* is denoted $Nv(x,y,z0,\tau )$, and it is given by

While infectious aerosol emissions from breathing take place over the entire time duration of the event, emissions from talking are assumed to occur only for a time $ts<\tau $ (see the Appendix for details). This is accounted for in our calculation of the corresponding time-varying virus concentration field. $V\u0307b$ is the average volume of air inhaled per second. The local probability of infection due to infectious aerosols produced by speaking and breathing $Ps+b(x,y,z0,\tau )$ is calculated using the dose-response model originally proposed by Haas.^{41} The dose-response constant is chosen as $rv=1/1440$ based on the estimations by Haas^{54} and Schijven *et al.*^{28} for the original SARS-CoV-2 variant

The number of secondary infections *Z* within a room is, thus, given by

Here, *ρ _{p}* is the susceptible population density (assumed to be uniform), estimated as $\rho p=n/A$ at the given point of interest (POI). POI is a term used in SafeGraph dataset and in Ref. 38 referring to a specific business location like a restaurant.

*A*is the indoor area of that POI. In this paper, we place one infectious case at each POI. Therefore,

*Z*is a measure of individual-level infectivity as well. Here,

*n*is the number of susceptible individuals present at a given POI, i.e., $n=np\u22121$, where

*n*is the total number of people present at that POI.

_{p}### B. Particle size distribution of exhaled aerosols

The particle size distributions at the source of the expiratory events, speaking and breathing, are obtained from the review by Pöhlker *et al.*^{36} A multimode log-normal fitting has been found to describe the corresponding distribution reasonably well. The number size distribution (nsd) for exhaled aerosol particles for different expiratory events can be described using a single function with event specific constants. The general form for the distribution can be expressed as

where *η* is the number concentration of particles, *D*_{0} is the particle diameter at ejection, i.e., at time *t *=* *0, and $Ds,0$ is the corresponding sample space variable. Moreover, *A _{i}* and

*σ*are constants that depend on the mode and type of the expiratory event. For further details, the reader is referred to the Appendix. Figure 2 shows the number and volume size distributions that are used in the simulation for the speaking and breathing events. The volume size distribution (vsd) is integrated over $Ds,0$ to obtain the volume of the exhaled respiratory liquid per unit volume of air for the given expiratory event. This term when multiplied with the volume flow of air per unit time gives us the mean volume flow rate of ejected respiratory liquid. (A truncated normal distribution is assumed for it, and more details can be found in the Appendix.) A random sample of this volume flow rate multiplied with the individual viral load

_{i}*ρ*yields the ejected number of virions (RNA copies) emitted per unit time, i.e., the source term $Qx0,y0,z0$ of Eq. (4). Here, we choose particles only with $Ds,0<100\u2009\mu m$ as the larger ones will settle in less than 10 s even after accounting for evaporation.

^{19}A detailed description of the particle deposition velocity calculation is presented in the Appendix.

### C. Viral load distribution

Viral load (*ρ*) has been associated with infectivity and, thus, the number of secondary infections for a given setting.^{55,56} Analyzing respiratory droplets and aerosol laden cough jets, Chaudhuri *et al.*^{19} showed that the corresponding number of infected individuals could vary by orders of magnitude due to variation in the viral load. Chen *et al.*^{10} analyzed a large number of SARS-CoV-2 viral load databases and suggested that the viral load is an important contributor to heterogeneity in secondary infections. They also showed that the viral load distributions were similar for symptomatic and asymptomatic stages of infection. This point was further amplified by direct measurements of Yang *et al.*,^{57} who showed that viral load distributions are nearly identical among hospitalized (symptomatic) and asymptomatic population. In this paper, we utilize measurements of viral load in asymptomatic (including presymptomatic) population from Ref. 57 as an input into the algorithm shown in Fig. 1. According to Ref. 57 at the time of saliva collection, the infected individual was either asymptomatic or presymptomatic. The pdf of $ln(\rho v)$ (*ρ _{v}* is the sample space variable of

*ρ*) is shown in Fig. 3(a). We generate and use samples of

*ρ*from this log-normal distribution in our calculations.

### D. Occupancy and area of different points of interest from SafeGraph data

Two of the important inputs in the simulation are the areas of different points of interest (POI) and the number of people occupying them during each time period of interest. These data were obtained from SafeGraph—a company that collects anonymous data from mobile devices. For our simulation, data that are available from individual confined spaces, rather than that from a collection of several indoor spaces, were felt to be most appropriate. Hence, we used SafeGraph data for areas of full service restaurants (POI) over ten cities in USA, namely, Atlanta, Chicago, Dallas, Houston, Los Angeles, Miami, New York City, Philadelphia, San Francisco, and Washington D.C. The SafeGraph-tabulated area for each restaurant is multiplied by 0.5 to convert the total area of a given restaurant to the corresponding sitting area since the dining area is estimated as 50% of the total restaurant area. (Based on typical restaurant design guides,^{58} change in this factor does not qualitatively change the observations.) The occupancy information in these POIs between hours 12:00–13:00 and 18:00–19:00 over seven days starting from March 1, 2020 is obtained from the datasets created by Chang *et al.*^{38} who developed a mobility network based SEIR model using the SafeGraph data. These two time periods represent typical lunch and dinner times and, hence, highest occupancy periods of any day. The pdf of the averaged number of susceptible individuals in restaurants (total occupancy minus one) between hours 12:00–13:00 and 18:00–19:00 over ten US cities is shown in Fig. 3(b). Also, shown in the figure is an exponential fit and the corresponding fitting parameter.

## IV. RESULTS AND DISCUSSION

### A. Spatial distributions of the particle concentration for fixed conditions

First, we present the spatial distribution of the airborne virus concentration (RNA copies*/*m^{3} of air, virions are encapsulated within aerosols of initial size 0.01–100 *μ*m) with the source at $x=2.5m$ and $y=2.5m$ from the origin (at the bottom left corner) for a specific micro-environment: a 10 m × 10 m × 3 m room, after 15 min of aerosol exhalation by speech and breath. In Fig. 4, the first column (a, c, and e) presents results with a constant viral load $\rho =109$ copies/ml, but with increasing *ACH*. Note that the third row represents a case without wall reflections and, hence, can simulate outdoor conditions. The second column of Figs. 4(b), 4(d), and 4(f) represents a constant but five times higher viral load of $\rho =5\xd7109$ copies/ml. The results demonstrate strong inhomogeneity of the virus concentration and also show that the contours scale linearly with *ρ* for the same *ACH* due the linear nature of the governing equation (3). Using the dose response model [Eq. (11)], the corresponding contours of probability of infection are shown in Fig. 5. It could be found that indeed near unity probability of infection (from speech and breath) $Ps+b$ are found near the source, and there is decay with distance from the source. The subscript *s *+* b* indicates infection caused by exhaled aerosols due to speech (*s*) and breath (*b*) from the infected individual. Increasing *ACH* invariably reduces virus concentration for a given *ρ*. However, the reduction in the probability of infection may not be proportional to the reduction in the virus concentration due to the non-linearity involved in the dose response. Interestingly, the simulated outdoor conditions shown in Figs. 4(e) and 4(f) show much smaller virus concentration both near and far from the source with respect to the confined cases. This can be attributed to the inherently higher *ACH* inside the volume of interest but primarily due to the absence of confinement, which allows the virus concentration to freely decay with space.

### B. Statistical distributions of secondary infections generated from realistic input distributions

A simulation based on the algorithm presented in Fig. 1 is run for each data point available from the predefined SafeGraph dataset, resulting in a sample size of *N _{s}* = 103 679. We place one infected individual at each POI, at random locations within its premises. As such, most inputs, including viral load,

*ACH*, exposure time, speaking time, volume of ejected respiratory liquid per unit time, and source location (

*x*

_{0},

*y*

_{0}), are randomized.

*ACH*is generated from a log-normal distribution with a median

*ACH*= 2.16 h

^{−1}, such that $\mu ACH=0.7701,\sigma ACH=0.7554$ based on measurements by Bohanon

*et al.*

^{59}The room height

*H*=

*3 m, the height of the source (seated) $z0=1$ m, indoor conditions ($T=21.7oC$,*

*RH*=

*0.50,*

*UV*= 0), and dose response constant $rv=1/1440$ [see Eq. (11)] are held constant. The resulting distribution of the number of secondary infections—pdf of $Z$ is presented in Fig. 6. An analytical solution $g(Z)$ to be derived in Sec. IV C is also shown in the figure.

_{index}The stretched tailed nature of the simulated pdf is immediately apparent. This shows that there is small but finite probability of tens of secondary infections per infected individual. For this simulation, the $mean(Z)s=\u27e8Z\u27e9s=0.14$ (where the subscript *s* represents the simulated result as opposed to an analytical result) obtained from the simulated pdf, indicates that over an exposure time of nearly an hour, on average less than one person got infected, per infector. The calculated total number of infections is 12 648 with many of the infections occurring in large superspreading events. As such, only 3.57% of the infected individuals infected 80% of the population over this time. This could also be the reason why it is generally difficult to culture the virus from the air, though that was unequivocally demonstrated by Lednicky *et al.*^{60} High probability of infection, which as shown in the paper, typically occurs at high viral load, could be correlated with high probability of virus detection in the air. Direct virus detection from air could, therefore, necessitate sampling from a large population of infected individuals. Clearly, the finite probability of $Z\u226b\u27e8Z\u27e9$ recovers the inherently overdispersed nature of SARS-CoV-2 transmission dynamics. Fitting a negative binomial probability distribution function to the $Z$-pdf yields a good fit with dispersion parameter *k *=* *0.03. While the fit quality worsens at the pdf tails, the dispersion parameter is in the same order as the corresponding values for SARS and measles estimated by Lloyd-Smith *et al.*^{1} However, it is to be noted that we are considering only infections over a period of about 1 h on average, as opposed to the entire course of infection; hence, $\u27e8Z\u27e9$ should not be interpreted as $R0$. Similarly, the qualitative *k* value, thus, obtained should be interpreted with care. Figure 7(a) shows the joint probability density function (jpdf) of *ρ* and $Z$.

The close correlation of the two random variables across nearly six orders of magnitude is immediately apparent from this figure. While correlation may not imply causality in general, it is reasonable in this case that the extreme variation in viral load is causing a similar variation in the number of secondary infections with other parameters controlling the slope and strength of the correlation.

We present the joint probability density function (jpdf) of *ACH* and *Z* in Fig. 7(b). It is apparent that the highest number of infections *Z* occurs at lower air exchange rates, as expected; however, the majority occur at an intermediate air exchange rate of about 1–1.5 *ACH*, in part because very low and very high air exchange rates are simply less common. Furthermore, it is also clear that dispersion of *Z* and *ACH* are indeed negatively correlated. The effect of universal, high ventilation rates, and masks will be taken up later.

### C. Analytical pdf of the number of secondary infections $Z$: $g(Z)$

What leads to the stretched tail pdf of the number of secondary infections? Can we model it from first principles? These questions are taken up in this subsection. First, we note that the viral load pdf from Ref. 57, obtained for asymptomatic individuals (including presymptomatics), can be well approximated by a log-normal distribution with parameters *μ* and *σ* given by $\mu =13.84$ and $\sigma =3.63$. This is shown by Yang *et al.* and also shown in Fig. 3(a). As such, we used the following pdf of viral load $f(\rho v)$ for generating inputs into the simulation:

Due to its extremely large [over $O(12)$] variation, our analysis shows that viral load *ρ* is one of the most dominant variables in controlling overdispersion of secondary infections *Z*, as apparent from Fig. 7(a). This result can also be presented in terms of secondary attack rate—a more generalized descriptor, defined as $Z\u0303=Z/n$, where *n* is the number of susceptible individuals present at the given POI over the period of interest *τ*. $Z\u0303$ is the sample space variable corresponding to $Z\u0303$. Variation of $Z\u0303$ with respect to *ρ* is shown in Fig. 8(a). Clearly, this plot reflects the dose-response function [Eq. (11)] on a macro scale given the dominant influence of *ρ* in controlling the probability of infection and eventually secondary attack rates. Therefore, we propose a function similar to the dose-response function to model the response of $Z\u0303$ to *ρ* variation. This is shown below

We can also write

The constant *α* can be estimated as the inverse of the average number of virions inhaled per unit volume of mucosalivary liquid ejected that is required for an infection probability of $1\u2212e\u22121=0.63$. Utilizing mean values of individual input distributions: average room volume $\u27e8V\u27e9=6.34\xd7102$ m^{3}, average speaking time $\u27e8ts\u27e9=1469$ s, average exposure time $\u27e8\tau \u27e9=3914$ s, average air change rate, $\u27e8ACH\u27e9=2.87$ h^{−1}, and deposition parameter based on the average room area and volume $\beta 0=0.002$ s^{−1}, *α* can be estimated as

The virus survivability function $\psi (t)$ is given in Eq. (8), and the average volume flow of the exhaled liquid per unit time is $\u27e8Q\u0307l\u27e9=2.222\xd710\u22126$ ml/s. Note that all the $\u27e8\u27e9$ quantities mention averages over the distributions used in the present simulation. Equation (17) yields $\alpha =1.113\xd710\u221210$ ml/copies. The comparison of Eq. (15) and the simulation results are shown in Fig. 8(a).

With the functional form of the pdf of the viral load known [given in Eq. (14)], we can immediately substitute Eq. (16) into Eq. (14) to eventually derive the pdf of $Z\u0303$ using the generalized equation below

Using the log-normal form of *f*, we derive the analytical function below, which could be used to model the pdf of the secondary attack rate $Z\u0303$:$\varphi (Z\u0303)$. However, the same method should be applicable to other functional forms of *f*, like a Weibull distribution as in Ref. 10 instead of log-normal

It is evident that Eq. (19) describes the simulation data to good quantitative accuracy. It is also remarkable that very important effects of area, *ACH*, virus kinetics, and exposure and speaking times could be encapsulated within one constant *α*. It is to be recognized that the equation is valid only for $Z\u0303<1$. This is an inherent feature emanating from the derivative of the functional form of the dose response model, which yields the $1\u2212Z\u0303$ term in the denominator. Importantly, the equation can describe the range $0\u2264Z\u0303<1$ with high degree of veracity. Now, we can write $Z=N(1\u2212e\u2212\alpha \rho )$ using Eq. (15), where *N* is the sample space variable corresponding to *n*. For a fixed *α*, clearly *N* and $1\u2212e\u2212\alpha \rho $ are independent random variables. Therefore, for any given pdf of *N* given by *h*(*N*), pdf of $Z/N$ given by $\varphi (Z/N)$ and a known *α*, and using the general equation that describe the pdf of the product of two independent random variables, we can write the pdf of $Z$ given by

Using the exponential distribution for occupancy at different points of interest: POIs (restaurants in our case) $h(N,\nu )=1\nu e\u2212N\nu $ shown in Fig. 3(b), we derive for $Z<N$

It is to be noted that the constants $\mu ,\sigma $ are properties of the viral load distribution, *ν* is the constant of the occupancy distribution, and *α* encapsulates $\tau ,ts,ACH,A,V,t12$, etc., according to Eq. (17). It is apparent that the pdf $g(Z)$ is stretched to higher (lower) $Z$ values when *μ*, *σ*, *ν*, or *α* increases (decreases). The remarkable match between this analytical function: *g*—the analytical pdf of the number of secondary infections and that obtained from the simulation data has been shown in Fig. 6. We revisit it here for further discussion. It is evident that the stretched tail of the pdf of $Z$ results from the two stretched exponential functional of $Z$ and *N*. This can be verified by noting that when either $\sigma \u21920$ or $\nu \u21920$, the overdispersion of the number of secondary infections vanishes. The first stretched exponential arising from the log-normal distribution of viral load *ρ _{v}* and the latter from the exponential distribution of the number of people at the different POIs. A more rigorous way to show this is to replace exponential distribution describing

*h*(

*N*) in Eq. (20) with a Dirac delta function $\delta (N\u2212N0)$ and using its sifting property to get

where unless $N0\u226b\nu $, the overdispersion in *Z* is greatly diminished with respect to the exponential distribution of *N*. While the near homogeneous population distribution could be representative of classrooms in elementary schools in a city, indoor occupancy of various social settings is expected to be overdispersed as in the case of restaurants. Therefore, in general, it is the joint contribution of overdispersed viral load and overdispersed occupancy that results in overdispersion of secondary infection numbers causing superspreading events. This is shown here with a single equation. The $\u27e8Z\u27e9$ obtained from the analytical pdf is given by $\u27e8Z\u27e9=\u222b0\u221eZg(Z)dZ$. We find mean and standard deviation as $\u27e8Z\u27e9=0.12,\u2009std(Z)=0.94$, respectively, in comparison to $\u27e8Z\u27e9s=0.14$ and $std(Z)s=0.76$ from the simulations. The analytical pdf $g(Z)$ is expected to be a generalized result and could be applied for any large number of indoor social-contact settings without much restriction on their type.

### D. Variants and mitigation measures

Finally, we test whether the derived Eq. (21) can describe overdispersed transmission associated with a different viral load distribution equally well. To this end, we generate a viral load distribution for the $\delta \u2212$variant. Jüni *et al.*^{61} have provided us with a representative curve for $\delta \u2212$variant viral load vs time based on the data from Ref. 62. Using this curve, the time-averaged median of the viral load log-normal distribution was obtained, and hence the parameter *μ* corresponding to the distribution. Thus, while Fig. 3(a) represents the pdf of the original variant, the new log-normal distribution characterized by $\mu =17.97$ and $\sigma =3.63$ might represent the pdf of viral load of the $\delta \u2212$variant infected individuals. The simulation results in terms of $Z$-pdf are presented in Fig. 9(a). The greater transmissibility of the $\delta \u2212$variant due to the higher mean viral load is immediately apparent. In comparison to the $\u27e8Z\u27e9=0.12$ for the original variant, the $\u27e8Z\u27e9=0.98$ for *δ* (from simulations $\u27e8Z\u27e9s=0.95$ for *δ*). Therefore, just based on viral load, according to the calculations and model, the $\delta \u2212$variant could be nearly 8.2× higher transmissible on average with respect to the original variant, over about an hour of contact. Interestingly, with just increased *μ*, Eq. (21) can capture the pdf of $Z$ for the $\delta \u2212$variant, remarkably well, alongside the one for the original variant. However, it is to be recognized that viral load and infectiousness potential (using a proxy of culture-positivity, for example), while monotonic in nature, may not demonstrate a linear relationship.^{63} Furthermore, $\u27e8Z\u27e9\u2260R0$. Hence, the enhancement factor, thus, found is valid only within the context of the assumptions.

Finally, we explore the effect of adopting uniformly high ventilation rates and masks on the distribution of secondary infections. To that end, we keep the ventilation rates constant at *ACH *=* *6 h^{−1} and assume that the entire population (including the infectors and susceptibles) is wearing masks that provide 50% reduction by volume of exhaled aerosols and 50% reduction in the correspondingly inhaled aerosols. The results are shown in Fig. 9(b).

We observe that such interventions result in significant reduction in transmissibility for the original variant; reducing the mean to $\u27e8Z\u27e9=0.05$ (from $\u27e8Z\u27e9=0.12$ obtained without any interventions) when the masks are blocking 50% of aerosols during both inhalation and exhalation events with significant reduction in the extension of the tail. Such intervention effects remain substantial for the *δ*-variant too, where $\u27e8Z\u27e9=0.47$ (with respect to $\u27e8Z\u27e9=0.98$ obtained without any interventions) though tail remains sufficiently stretched with some shift in the overall pdf toward lower $Z$. An even severe reduction in transmissibility is observed when the masks are set to block 90% of aerosols traveling through it, giving us $\u27e8Z\u27e9=0.004$ for the original variant and $\u27e8Z\u27e9=0.078$ for the $\delta \u2212$variant. Once again, we note that these numbers are obtained over nearly an hour of exposure time on average. Note that, in our model, *α* does not change between the original and the *δ* variant. Only the *μ* (median viral load) increases, resulting in an increase in the higher proportion of secondary attack rates close to unity as the virus strain switches from the original variant to the $\delta \u2212$variant. A more detailed analysis of the influence of individual parameters: mean viral load, mean occupancy, and mean ventilation rates on the mean and standard deviation of *Z* could be found in the Appendix.

Within the scope of the present study—social gatherings in restaurants, we ask what kind of spread could be expected for the $\delta \u2212$variant given the period of exposure and available occupancy data in a population where a large fraction is already vaccinated? This is shown in Fig. 9(b) by the green curves. Using the realistic *ACH* distribution and with masks that can reduce both emission and inhalation of aerosol volumes by 50%, respectively, we do a basic calculation including the effect of vaccination. Assuming vaccine efficacy $\eta vac=0.6$ and vaccination coverage efficiency $\eta cov=0.8$ representing fraction of the population vaccinated, we estimate the new population of susceptible individuals at a given POI as $nv=(1\u2212\eta vac\eta cov)n$. We do not consider any change in the viral load or change in distribution of infectious cases. Fitting an exponential distribution to *n _{v}*, the new constant: mean occupancy of susceptibles $\nu v=4.01$. Clearly, from Fig. 9(b), we observe a significant drop in the number of secondary infections and superspreading events. The pdf of the number of secondary infections with the

*δ*-variant with partially effective masks and vaccines is much less stretched than the original variant without masks or vaccine. Still the finite risk of superspreading event sustains. However, with 80% vaccination and 50% reduced occupancy, coupled with masks, a significant reduction in overdispersion is attained. This behooves the need for rapidly vaccinating the population alongside physical intervention measures like high-quality masks, reduced occupancy, and across the board higher ventilation rates.

## V. CONCLUSIONS

Understanding the mechanistic factors that lead to overdispersion in secondary cases has been considered long-standing scientific problems. We coupled an aerosol mixing model with real-world epidemiological and viral-biology inputs: exhaled aerosol size distribution for speech and breath, measured viral load distribution, and realistic ventilation rate distributions with occupancy information from more than hundred thousand social contact settings in major US cities to explore overdispersion in the number of secondary infections per infector. The simulated results demonstrate that the aerosol transmission route is consistent with the overdispersed individual infectivity with viral load variability being the dominant factor that controls secondary attack rates alongside the ventilation rate, exposure time, and speaking time. We also derived, for the first time, analytical expressions that accurately described the simulated pdfs of the secondary attack rates and the number of secondary infections, respectively, and further elucidated how the quantitative relationship between variability in individual-level viral load and event-level occupancy jointly control the overdispersion. Our findings suggest that even in the context of airborne transmission, approximately 4% of index cases in indoor settings would account for 80% of secondary cases, thus highlighting the importance of understanding and focusing mitigation efforts on drivers of superspreading events. The findings highlight the importance of measures to decrease exposures during periods of high viral shedding (such as via isolation by timely testing to detect periods of high viral shedding), as well as improving ventilation, and the increased probability of outbreaks with variants of concern associated with higher viral loads. Finally, the analytical function further offers an opportunity to estimate the spatially defined probability of outbreaks and outbreak size from point-source transmission events given viral load and occupancy across indoor settings.

## ACKNOWLEDGMENTS

UofT researchers acknowledge support from the Fields Institute for Research in Mathematical Sciences through their project Mathematics for Public Health and Variants of Concern sponsored by the Canadian Institutes of Health Research (Grant No. VS2-175577). S.C. acknowledges the Heuckroth Distinguished Faculty Award in Aerospace Engineering from UTIAS. S.M. acknowledges the Tier 2 Canada Research Chair in Mathematical Modeling and Program Science (Grant No. CRC-950–232643).

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.

### APPENDIX A: COMPARISON WITH EXPERIMENTAL AND COMPUTATIONAL RESULTS

Hathway *et al.*^{50} carried out experiments and CFD simulations in a $3.35\xd74.26\xd72.26$ m^{3} room with ventilation. Though the model is validated with spatial CO measurements^{12} to add another layer of validation, we compared our model with results from Ref. 50 for *ACH* values of 6 and 12 h^{−1} with a bioaerosol ejector placed at the center of the room. Results are compared along three lines at a height of 1.15 m. Our model was normalized using the average value over all three lines so that they can be compared with the normalized experimental and computational results provided in the paper. Hathway *et al.*^{50} have attributed the large variance in certain experimental measurements as shown in Fig. 10 (points with large error bars) to loss of viability during nebulization and sampling. In spite of that, comparisons in Fig. 10 show qualitative and quantitative agreement at most experimental sampling points, and for the points with large variance, our results fall within the given range. As for comparison with the CFD results, the very high peaks at the center of the Z $=1.67$ m line do not appear in our model, which is purely an artifact of the difference in the turbulence diffusivity model involved in the two studies.

### APPENDIX B: PARTICLE SIZE DISTRIBUTION

From Ref. 36, we have the number size distribution as

where they have defined $\psi =d\eta /dlog10Ds,0$ (*η* is the number concentration of particles). *D _{i}* is the mode mean geometric diameter,

*σ*is the modal geometric standard deviation, and

_{i}*A*is the number concentration at

_{i}*D*. These constants are mode specific parameters, which have prescribed values for each of the

_{i}*i*th modes (refer to Table VI in Ref. 36 for the values of the constants). For our purpose, we require the function $\psi D,\eta =d\eta /dDs,0$

It is to be noted that *A _{i}* is a number concentration parameter, which means

*ψ*would correspond to number of particles per unit volume. Hence, to obtain the distribution for the total number of particles ejected per unit time, we have to multiply Eq. (B3) by the total volume exhaled per unit time $V\u0307$. The $V\u0307$ values are taken as $360Lh\u22121$ and $700Lh\u22121$ for breathing and speaking, respectively, both of which fall within the typical range for $V\u0307$ as mentioned in Table I of Ref. 36. The number size distribution can also be converted to the volume size distribution through the relation

where $v\eta $ is the volume concentration of particles.

### APPENDIX C: VOLUME FLOW RATE OF RESPIRATORY LIQUID

$Q\u0307l$ is obtained from integrating the volume size distribution over the entire range of particle size diameters and multiplying it with the volume flow rate of air per unit time. It is reasonable to expect this quantity to vary from one infector to another as we move from one POI to another. To account for this change, we assume a truncated normal distribution for the speaking $Q\u0307ls$ with the mean value obtained from VSD of Ref. 36 as $\u27e8Q\u0307ls\u27e9=2.222\xd710\u22126$ ml/s. The standard deviation value was calculated as $\sigma Q\u0307ls=3.078\xd710\u22126$ ml/s using similar particle size distribution plots provided by Johnson *et al.*^{66} using their BLO model for their own data along with the data from Refs. 67–69. It is to be noted that if we leave the normal distribution with its limits extending indefinitely on both sides of the mean then it would encompass extremely large and small values for $Q\u0307ls$, which would be unrealisitic. Hence, we created a truncated normal distribution with the value of Johnson *et al.*^{66} ($Q\u0307ls=1.292\xd710\u22128$ ml/s) put as the lower limit whereas an equal distance (as between Johnson *et al.*'s lower limit and Pöhlker *et al.*'s mean value) is taken toward the upper limit ($Q\u0307ls=4.433\xd710\u22126$ ml/s) from the mean. It is ensured that the Duguid *et al.*^{67} and Loudon *et al.*^{68,69} $Q\u0307ls$ values fall within that range. The contribution of breathing toward $Q\u0307ls$ is several orders of magnitude lower than that of the lower limit of speaking, and as such no distribution is considered for breathing and instead its contribution $Q\u0307lb$ is added to the randomly sampled $Q\u0307ls$ value.

### APPENDIX D: CHARACTERISTIC DEPOSITION VELOCITY

To obtain a unique characteristic deposition velocity over a surface for the entire range of particle size, we first require a functional form for the deposition velocity. Lai and Nazaroff^{64} developed a model for said deposition velocity on indoor surfaces, which took into account the effects of Brownian diffusion, turbulent diffusion, and gravitational settling. They provided a formulation for a quantity *β* (wall deposition coefficient) as a function of the particle diameter. This function can then be put through an averaging operation to obtain a value for the average characteristic deposition velocity *w _{d}* as

Here, *A _{v}*,

*A*, and

_{u}*A*are the surface areas of the vertical surfaces, upward-facing horizontal surfaces, and downward-facing horizontal surfaces, respectively. The corresponding characteristic deposition velocities are given by

_{d}*v*,

_{dv}*v*, and

_{du}*v*. Lai and Nazaroff

_{dd}^{64}wrote the wall deposition coefficient

*β*as a function of the deposition velocities and the room geometry as

The deposition velocities can be expressed in terms of the friction velocity $u*$ and gravitational settling velocity *u _{s}* as

*I* is an integrated quantity (refer to Ref. 64 for detailed formulation) that depends on the Schmidt number $Sc=\nu /D$, where *ν* is the kinematic viscosity and *D* is the Brownian diffusivity. Brownian diffusivity can be directly obtained^{65} from the particle diameter at equilibrium $Ds=Ds,0/5$, as

Here, *k _{B}* is the Boltzmann constant,

*T*is the absolute temperature in kelvin,

*μ*is the dynamic viscosity of air, and

*C*is a slip correction factor for small particles. For the remaining two unknowns, the friction velocity and the gravitational settling velocity, the former is an input variable while the later can be expressed as

_{c}^{65}

where *ρ* is the particle density. From Eqs. (D2)–(D5), an average wall deposition velocity on *i*th surface (where $i=v,u,d$), $(vdi)avg$ can be obtained by treating this quantity as the expected value of a function of a random variable (*D _{s}*)

It is to be noted that this is a volume averaging operation due to $\psi D,V$ being defined as a volume size distribution similar to Eq. (B5) but by using the equilibrium diameter *D _{s}* instead. This gives us a relation for

*β*that can in turn be related to the characteristics deposition velocity

_{avg}*w*as

_{d}### APPENDIX E: EXPOSURE TIME, SPEAKING TIME, AND *ACH*

Exposure time *τ* variation was modeled with a log-normal distribution such that median *τ* = 1 h, ($\mu \tau =0,\sigma \tau =0.40$), and speech time *t _{s}* was modeled using a uniform distribution. In particular, $0.25\tau \u2264ts\u22640.5\tau $. These yield the following averages: $\u27e8\tau \u27e9=1.0875$ h and $\u27e8ts\u27e9=0.409$ h. From the literature, it is found that ventilation rates in indoor built environments are typically log-normally distributed. In particular, Bohanon

*et al.*

^{59}measured the ventilation rate in 33 restaurants and found them to be log-normally distributed with a median 1.8 l/sm

^{2}and standard deviation of 2.1 l/sm

^{2}. Following this measured distribution, for

*H*=

*3 m, we found median*

*ACH*= 2.16 h

^{−1}and used $\mu ACH=0.7701,\sigma ACH=0.7554$ for the

*ACH*pdf.

### APPENDIX F: SENSITIVITY OF MEAN AND STANDARD DEVIATION OF THE NUMBER OF SECONDARY INFECTIONS

In this sub-section, we explore the sensitivity of the most important parameters: $\u27e8ACH\u27e9,\u2009\u27e8\rho \u27e9$, and $\u27e8N\u27e9$. To that end, we explore the effect of variation one parameter at a time on $\u27e8Z\u27e9$ and standard deviation (*Z*), keeping the other two constant. The results are shown in Table II and Figs. 11(a)–11(c). All three parameters affect $\u27e8Z\u27e9$ and standard deviation (*Z*). However, the effect of variation of $\u27e8\rho \u27e9$ is strongest on both the mean and standard deviation of *Z*, followed by that of $\u27e8N\u27e9$, and $\u27e8ACH\u27e9$.

$\u27e8ACH\u27e9$ . | α
. | μ
. | Σ
. | $\u27e8\rho \u27e9$ . | $\u27e8N\u27e9$ . | $\u27e8Z\u27e9$ . | Std (Z)
. |
---|---|---|---|---|---|---|---|

0.1 | 1.467 × 10^{−10} | 13.839 | 3.643 | 7.796 × 10^{8} | 7.707 | 0.139 | 1.022 |

1 | 1.329 × 10^{−10} | 13.839 | 3.643 | 7.796 × 10^{8} | 7.707 | 0.131 | 0.987 |

5 | 9.391 × 10^{−11} | 13.839 | 3.643 | 7.796 × 10^{8} | 7.707 | 0.106 | 0.881 |

10 | 6.864 × 10^{−11} | 13.839 | 3.643 | 7.796 × 10^{8} | 7.707 | 0.089 | 0.808 |

50 | 2.158 × 10^{−11} | 13.839 | 3.643 | 7.796 × 10^{8} | 7.707 | 0.046 | 0.589 |

100 | 1.149 × 10^{−11} | 13.839 | 3.643 | 7.796 × 10^{8} | 7.707 | 0.033 | 0.508 |

2 | 1.205 × 10^{−10} | 12.672 | 3.643 | 2.427 × 10^{8} | 7.707 | 0.065 | 0.697 |

2 | 1.205 × 10^{−10} | 14.672 | 3.643 | 1.793 × 10^{9} | 7.707 | 0.199 | 1.255 |

2 | 1.205 × 10^{−10} | 16.672 | 3.643 | 1.325 × 10^{10} | 7.707 | 0.571 | 2.268 |

2 | 1.205 × 10^{−10} | 18.672 | 3.643 | 9.789 × 10^{10} | 7.707 | 1.310 | 3.546 |

2 | 1.205 × 10^{−10} | 20.672 | 3.643 | 7.234 × 10^{11} | 7.707 | 2.599 | 5.027 |

2 | 1.205 × 10^{−10} | 21.672 | 3.643 | 1.966 × 10^{12} | 7.707 | 3.289 | 5.594 |

2 | 1.205 × 10^{−10} | 13.839 | 3.643 | 7.796 × 10^{8} | 1.707 | 0.029 | 0.226 |

2 | 1.205 × 10^{−10} | 13.839 | 3.643 | 7.796 × 10^{8} | 2.707 | 0.045 | 0.349 |

2 | 1.205 × 10^{−10} | 13.839 | 3.643 | 7.796 × 10^{8} | 4.707 | 0.076 | 0.589 |

2 | 1.205 × 10^{−10} | 13.839 | 3.643 | 7.796 × 10^{8} | 6.707 | 0.108 | 0.839 |

2 | 1.205 × 10^{−10} | 13.839 | 3.643 | 7.796 × 10^{8} | 8.707 | 0.139 | 1.077 |

2 | 1.205 × 10^{−10} | 13.839 | 3.643 | 7.796 × 10^{8} | 10.707 | 0.171 | 1.324 |

$\u27e8ACH\u27e9$ . | α
. | μ
. | Σ
. | $\u27e8\rho \u27e9$ . | $\u27e8N\u27e9$ . | $\u27e8Z\u27e9$ . | Std (Z)
. |
---|---|---|---|---|---|---|---|

0.1 | 1.467 × 10^{−10} | 13.839 | 3.643 | 7.796 × 10^{8} | 7.707 | 0.139 | 1.022 |

1 | 1.329 × 10^{−10} | 13.839 | 3.643 | 7.796 × 10^{8} | 7.707 | 0.131 | 0.987 |

5 | 9.391 × 10^{−11} | 13.839 | 3.643 | 7.796 × 10^{8} | 7.707 | 0.106 | 0.881 |

10 | 6.864 × 10^{−11} | 13.839 | 3.643 | 7.796 × 10^{8} | 7.707 | 0.089 | 0.808 |

50 | 2.158 × 10^{−11} | 13.839 | 3.643 | 7.796 × 10^{8} | 7.707 | 0.046 | 0.589 |

100 | 1.149 × 10^{−11} | 13.839 | 3.643 | 7.796 × 10^{8} | 7.707 | 0.033 | 0.508 |

2 | 1.205 × 10^{−10} | 12.672 | 3.643 | 2.427 × 10^{8} | 7.707 | 0.065 | 0.697 |

2 | 1.205 × 10^{−10} | 14.672 | 3.643 | 1.793 × 10^{9} | 7.707 | 0.199 | 1.255 |

2 | 1.205 × 10^{−10} | 16.672 | 3.643 | 1.325 × 10^{10} | 7.707 | 0.571 | 2.268 |

2 | 1.205 × 10^{−10} | 18.672 | 3.643 | 9.789 × 10^{10} | 7.707 | 1.310 | 3.546 |

2 | 1.205 × 10^{−10} | 20.672 | 3.643 | 7.234 × 10^{11} | 7.707 | 2.599 | 5.027 |

2 | 1.205 × 10^{−10} | 21.672 | 3.643 | 1.966 × 10^{12} | 7.707 | 3.289 | 5.594 |

2 | 1.205 × 10^{−10} | 13.839 | 3.643 | 7.796 × 10^{8} | 1.707 | 0.029 | 0.226 |

2 | 1.205 × 10^{−10} | 13.839 | 3.643 | 7.796 × 10^{8} | 2.707 | 0.045 | 0.349 |

2 | 1.205 × 10^{−10} | 13.839 | 3.643 | 7.796 × 10^{8} | 4.707 | 0.076 | 0.589 |

2 | 1.205 × 10^{−10} | 13.839 | 3.643 | 7.796 × 10^{8} | 6.707 | 0.108 | 0.839 |

2 | 1.205 × 10^{−10} | 13.839 | 3.643 | 7.796 × 10^{8} | 8.707 | 0.139 | 1.077 |

2 | 1.205 × 10^{−10} | 13.839 | 3.643 | 7.796 × 10^{8} | 10.707 | 0.171 | 1.324 |