Mobility restriction is a crucial measure to control the transmission of the COVID-19. Research has shown that effective distance measured by the number of travelers instead of physical distance can capture and predict the transmission of the deadly virus. However, these efforts have been limited mainly to a single source of disease. Also, they have not been tested on finer spatial scales. Based on prior work of effective distances on the country level, we propose the multiple-source effective distance, a metric that captures the distance for the virus to propagate through the mobility network on the county level in the U.S. Then, we estimate how the change in the number of sources impacts the global mobility rate. Based on the findings, a new method is proposed to locate sources and estimate the arrival time of the virus. The new metric outperforms the original single-source effective distance in predicting the arrival time. Last, we select two potential sources and quantify the arrival time delay caused by the national emergency declaration. In doing so, we provide quantitative answers on the effectiveness of the national emergency declaration.

Understanding the community transmission of COVID-19 in the U.S. is critical to evaluating the effectiveness of public policy and controlling future pandemics. In this study, we propose multiple-source effective distance to identify potential sources of the SARS-CoV-2 virus in the U.S. The method identifies and ranks possible sources and we pick the top combination, Santa Clara, CA and Suffolk, MA, for further analyses. We visualize the transmission paths that originated from these two sources and found that the prediction based on our new measure outperforms prior metrics in estimating the transmission of COVID-19. Last, the arrival time delay (ATD) is proposed to quantify the effectiveness of the mobility decrease caused by the declaration of the national emergency. On average, the mobility decrease has delayed the infected time of all counties in the U.S. by 5.54 days in one month on April 12. However, because of the relatively late declaration of the national emergency, the policy only brought positive arrival time delay to some counties in the Midwest, South East, and Alaska. The results demonstrate that our approach provides a powerful tool to evaluate the effectiveness of nonpharmaceutical interventions when facing epidemics.

Coronavirus disease 2019 (COVID-19) is continuing to spread and has devastated public health,1 economy,2 and social lives3 in the U.S. As of December 15, 2021, the virus has infected over 50 million and killed more than 800 000 in the U.S.4 The unforeseen outbreak and the destruction caused by COVID-19 highlight that it is critical to identify sources of diseases in the early stages of future pandemics.5–9 These early sources are culprits in exporting the deadly virus and causing large-scale community transmissions. Finding them promptly and efficiently can guide policy-making to provide extra time for the preparation of local hospitals and government and reduce the impact of infectious diseases on public health.10–15 

There are two critical challenges to identifying the source locations of COVID-19. First, the incubation period of SARS-CoV-2 can delay finding early sources which may lead to community transmission in the U.S.16 Travel-related importations were detected from mainland China as early as January 2020 and then from European countries in February and March.17 However, treating them as sources for community transmissions could be faulty and misleading because limited testing at the early stage means asymptomatic infections could be missed, and thus, it is unknown when and where the community transmission started. Second, local governments enacted inconsistent anti-contagion policies at a varying time.18,19 The inconsistent and uncoordinated efforts might have suppressed local transmission but made it possible for the virus to spread to other places. These complications could have caused the multiple shifts of epicenters in the U.S.20 The dynamics indicate that community transmission sources are changing and are likely from unknown sources rather than from the initial outbreak locations.21 The two challenges call for a more effective approach in identifying sources for community transmission in the U.S.

Prior research has proposed a few approaches in identifying sources of infectious diseases. Pinto et al.12 proposed the maximum probability of localization criterion to narrow down the sources. Building on it, Brockmann and Helbing,22 and Iannelli et al.23 developed effective distance, a measure considering human mobility, and reconstructed the transmission paths to locate the initial source. Their approaches successfully found the initial source using arrival-time data from air travels for the 2009 H1N1 pandemic and the 2003 SARS pandemic, requiring no epidemiological features of the diseases. However, the effective distance method does not consider the shift of threatening sources from human mobility changes caused by varying local interventions and the practice of social distancing. Our prior research proposed a new distance metric, namely, country distancing, that counts the spatiotemporal changes in human mobility networks and anti-contagion policies.21 

Despite these prior works, three research gaps remain. Firstly, prior research has mostly assumed a single source rather than multiple sources. In reality, multiple sources can import infections in the early stage of an epidemic and transmit viruses to other locations. Second, previous research has been on coarse spatial resolutions, country or regional levels. It is unknown if effective distance metrics could maintain their accuracy when predicting mobility on a finer spatial resolution with more perturbation on the mobility network. Last, static and aggregated mobility data sets (e.g., airline traffics, vehicle flows across country borders, etc.) have been used in calculating effective distances. It is unknown if more dynamic mobility data with higher resolutions, such as cell phone or GPS data sets, can be used to calculate effective distances and predict disease transmission.

To address these knowledge gaps, we propose a modified distance metric, i.e., multiple-source effective distance, to quantify the effective distances based on high-resolution, dynamic mobility data sets on the community level. The new metric allows us to identify multiple original risk sources that could have been responsible for the wide transmissions of COVID-19 in the U.S. It thus narrows the knowledge gap of targeting multiple sources, instead of a single one, at the early stage of an epidemic. In addition, we provide an estimation of the effectiveness of control strategies toward mobility networks using the national emergency declaration as an example.

This study used two data sets: time-series data of the COVID-19 infection and the daily human mobility data. The first one collects the county-level COVID-19 cases by the Johns Hopkins University Center for Systems Science and Engineering.24 The number of reported daily infected cases is from January 21, 2020 to January 16, 2021. In this study, we selected the number of reported daily infected cases from January 21, 2020 to June 30, 2020. We defined counties’ arrival times as the first day when a county had more than one reported infected case. Second, the daily mobility data in the U.S. area from January 1, 2020 to June 30, 2020. The data set, anonymized and de-identified, was provided by Cuebiq. The data report the location in real-time and are crowd-sourced from over 30 million devices that opted-in to anonymous data sharing for research purposes through a CCPA and GDPR compliant framework in reporting real-time locations. The data set has supported interdisciplinary studies on transportation and commuting patterns,25 urban accessibility,26 social isolation and segregation,27 and social distancing in COVID-19.28,29 We computed the primarily located county per day and the travels from one county to another every 24 h. The mobility flux is then the aggregated number of users between the counties. The mobility data were compared with the 2018 American Community Survey (ACS) data and the R2 between the two data sets is over 0.9 through the study period (see Fig. S1 in the supplementary material).

Figure 1 shows that the inter-county flux dropped from 2.38×106 on March 13, 2020 (i.e., the date of national emergency declaration) to 1.03×106 on April 8, 2020, a decline of 56.7%. However, it started to climb in early April and reached 2.18×106 on June 23, 2020, almost to the pre-pandemic level.

FIG. 1.

The 7-day moving-average summation of influx and number of detected devices in log10 scale in the U.S. over time. The dotted vertical line indicates the date of the national emergency declaration, i.e., March 13, 2020.

FIG. 1.

The 7-day moving-average summation of influx and number of detected devices in log10 scale in the U.S. over time. The dotted vertical line indicates the date of the national emergency declaration, i.e., March 13, 2020.

Close modal

1. Single-source effective distance

We first select the effective distance proposed in Ref. 22 as the metric in measuring single-source distance between counties in the U.S. for a comparison to our multiple-source effective distance. To prevent confusion, the effective distance will be named single-source effective distance in the rest part of this article. In equations detailed in Ref. 22, the mobility network is represented as G, which encapsulates the mobility influx both inter- and intra- counties. The network could also be presented with the influx matrix F with Fmn0 quantifying the mobility flux from node n to destination m. Accordingly, the influx fraction matrix P with pmn[0,1] measures the fraction of outflux from location n. In so doing, Pmn=FmnNn, where Nn is the number of devices recorded at location n. For a static mobility network, G with the initial source n, the single-source effective distance of county m is defined as

dm|n(t)=minτm|n(ni+1,ni)τm|n(1logPni+1ni(t)),
(1)

where τm|n={n,n1,n2,,m} is the path from source n to m with Lm|n steps in G. As validated in Ref. 22, given the initial source n, the single-source effective distance is a good predictor for counties’ arrival times Tm,

dm|n(Tm)=Tmveff,
(2)

where veff represents the effective speed which is determined by the epidemiological properties of a disease.

2. Multiple-source effective distance

The transmission of the virus in the complex mobility network can lead to multiple threatening sources over time. To capture such dynamics, our prior study created a new distancing metric considering multiple threatening sources on the country level.21 Human mobility among countries was approximated by air travel. In this research, we developed the multiple-source effective distance, which can incorporate the high granularity and dynamics of human mobility in the U.S. Our model is shown below.

Given a set of threatening sources, NI={n1,n2,}, the distance to the target county m from the threatening sources is defined as

Dm|NI=minniNI(Lm|ni)log(1MniNIeLm|nidm|ni),
(3)

where M is the number of counties in the mobility network; in other words, the number of all possible sources. Lm|ni is the length of the shortest path from source ni to m, and dm|ni is the single-source effective distance. Based on the assumption that Lm|NILm|ni, Eq. (3) could be simplified into

Dm|NI=log(M)logniNI1edm|ni.
(4)

More details can be found in Ref. 21.

In Ref. 22, it is proven that the speed of virus spreading veff could be viewed as a function related to mobility rate γ(t)=m,nGFmn(t)mGNm. Therefore, in this research, based on our findings in Sec. II B 3 in the supplementary material, we assume that veff(t)=αγ(t)+veff0, where α and veff0 are two constants. Here, α is the ratio between the global mobility rate and the effective speed which is determined by the epidemiological parameters of the dynamic system. Also, veff0 is the base effective speed which means the speed when the global mobility rate is equal to zero. It is worth noting that despite the speed is not equal to zero in the circumstance when there is no mobility in the network, the effective distance and the arrival time are still infinity. Additionally, the effective speed is linear to the log value of the number of sources. However, as shown in Sec. II B 4 in the supplementary material, the ratio of one-source effective speed and two-source speed is close to one. Since we only considered two sources in the rest of the paper, the difference between the effective speeds is neglected. Because alpha and veff0 are related to the selection of sources, values of them are optimized using the Levenberg–Marquardt algorithm when locating the threatening sources (see Sec. II B3).

According to our previous work,21TmDm|NIv(t). As a result, on the arrival day Tm of county m,

Dm|NI(Tm)=0Tm(αγ(t)+veff0)dt,
(5)

where day 0 is defined as the arrival time of the first source.

Because the average mobility rate γ is a value derived from the mobility data, here we define x(t)=0tγ(t)dt. The arrival time should meet the following equation:

Dm|NI(Tm)=αx(Tm)+veff0Tm.
(6)

3. Locating the threatening sources

In order to locate the source based on the method of single-source effective distance,22 we examined every candidate source and selected the optimal one with the maximum R2 between counties’ single-source effective distances and their Tm. We also calculated the root mean square error (RMSE) between the estimation and the real value of single-source effective distance based on Eq. (2). It is also used to compare with the results from the multiple-source effective distance. The optimal source s meets

s=argmaxmΩ[1m(Tmfm)2m(TmT¯m)2],
(7)

where Ω is the set of initial candidates for single-source effective distance (Ω={k|kG,kn}), fm indicates the predicted value of Tm in the linear model fitted by dm|n and Tm, and T¯m is the average value of Tm.

In terms of the multiple-source effective distance, we searched through every combination of the candidate sources in the U.S. and found the optimal sequence with the least RMSE between the multiple-source effective distance and the distance predicted from Eq. (6). Thus, the viable candidate sources s are the infected counties with the minimum RMSE,

s=argminmΩm(Dm|NID^m|NI)2N,
(8)

where Ω is the set of possible combination of candidate sources for multiple-source effective distance metric (Ω={NI|NIG}), N is the number of counties m, and D^m|NI is the estimation of multiple-source effective distance based on Eq. (6). For both the single-source effective distance dm|n and the multiple-source effective distance Dm|NI, we denote them as Dms for simplicity. Considering the mean incubation period in Ref. 30,31 and the period of mobility dynamics, we select 7-day moving average of D¯ms(t) before the arrival time,

D¯ms=i=06Dms(Tmi)7,
(9)

where Dms(t) is the distance from candidate source s to m at time t.

We select candidate threatening sources as the infected counties based on either of the following two criteria. First, the source has a relatively large number of confirmed cases before the declaration of the national emergency. Second, a high number of confirmed cases before April 14 when 99% of counties in the U.S. had at least two confirmed cases. Consequently, we list 48 counties as the candidate threatening source. Among them, 6 counties are from California, 4 from Massachusetts, 8 from New Jersey, 6 from New York State, and the rest 24 counties are from Colorado, Connecticut, Florida, Illinois, Iowa, Louisiana, Michigan, Nebraska, Nevada, Oregon, Pennsylvania, Tennessee, Texas, and Washington. Detailed county names are listed in Sec. III in the supplementary material.

According to Ref. 32, before the outbreak of COVID-19, there were three large components in the U.S. mobility network: (1) the West which centered around the Pacific region, (2) the Midwest region, and (3) a large part of Eastern and Southern regions. Among those regions, according to Fig. 2(a), there was only one county in the Midwest region with more than two infected cases by March 6, 2020 when other regions already had multiple infected counties. Thus, we assume that there are two major sources with one in the West and one in the East. However, in searching for potential sources, we also included counties in other regions to consider all possibilities. We assume that there are two sources, and each of them exports the virus to counties majorly in their own clusters before the widespread happened around mid-April. The transition day, i.e., the day when the sources evolve from one to multiple locations, is searched from January 21 to April 7, which is the date when 90% of counties in the U.S. got infected, and when the recovery of mobility begins (see Fig. 1). Two sources are allowed to start diffusion at the same date, i.e., the transition day equals day 0. However, the transition day cannot be earlier than the arrival time of the second source.

FIG. 2.

Single-source effective distance fails to predict arrival times in the U.S. (a)–(d) Temporal snapshots of the pandemic in the U.S. Red circles represent the infected counties, and the gray circles are the uninfected counties. (e)–(h) Correlations between 7-day moving average of the single-source effective distance from the initial source (Suffolk, MA) and arrival times accordingly for counties that already have more than two confirmed cases in the date of (a)–(d). (i)–(l) Shortest-path trees for counties in (a)–(d).

FIG. 2.

Single-source effective distance fails to predict arrival times in the U.S. (a)–(d) Temporal snapshots of the pandemic in the U.S. Red circles represent the infected counties, and the gray circles are the uninfected counties. (e)–(h) Correlations between 7-day moving average of the single-source effective distance from the initial source (Suffolk, MA) and arrival times accordingly for counties that already have more than two confirmed cases in the date of (a)–(d). (i)–(l) Shortest-path trees for counties in (a)–(d).

Close modal

Using the single-source effective distance metric, we identify Suffolk county, Massachusetts, as the initial threatening source with optimal R2=0.36. The low R2 indicates that the counties’ single-source effective distances from the initial source fail to predict their arrival times. Figure 2 shows the low prediction accuracy if Suffolk, MA is considered as the original and sole threatening source that caused the widespread transmission of COVID-19 in the U.S. Figures 2(a)2(h) show that 2181 counties are infected by April 7, 2020. Their single-source effective distances have a low correlation with their arrival times with (R20.36). Panel (j) shows that in this hypothesized scenario, many counties in the outer circle were infected before the inner circle had gone fully infected. The inconsistency indicates that in this scenario, the initial source transmits the virus to remote counties before its neighbors get infected. This is contrary to the assumption that the arrival time should be positively correlated to the counties’ single-source effective distance. Therefore, the result confirms our assumption that there were more than one threatening source in the U.S.

After searching all of the combinations of counties in Fig. 3(a), we find that Santa Clara, CA was the first threatening source since February 3, and Suffolk, MA became the second threatening source since March 20. As shown in Fig. 3(b), the 7-day moving-average multiple-source effective distance from the two sources correlates with their integral of mobility rate before arrival times, i.e., x(Tm), with RMSE around 2.48. Interestingly, despite Santa Clara having an early second case on February 3, the third case was discovered on February 29 followed by the rapid increase of confirmed cases locally. However, considering that three counties, King, WA, Sacramento, CA, and San Diego, CA get infected during this time period and close to Santa Clara in terms of effective distance, it is still reasonable to follow the definition of arrival time and use the February 3 as its arrival time. It is worth noting that other combinations can also provide relatively small RMSE values. For example, RMSE is around 2.88 if IL is considered as the first source and New York, NY as the second one. The low RMSEs of these combinations are coherent to their early arrival time. Santa Clara, CA, and Cook, IL got their second confirmed case on February 3 and January 31, respectively. There are also many options for the second source. For example, the RMSE would be 2.58 if the first source was Santa Clara, CA, the second source was New York, NY, and the transition day remained as of March 20. This is mainly because the multiple-source effective distance from New York, NY to other counties in the east region is similar to the ones of Suffolk, MA. Interestingly, King County, WA, the county with the first case confirmed, is not the first source. The result is coherent to the observation that, despite the first confirmed case on January 22, 2020, the second case was reported one month later on February 29, 2020. It indicates that infection first found in King County, WA might not have caused the spreading of the virus.

FIG. 3.

Multiple-source effective distance outperforms single-source effective distance in predicting arrival times in the U.S. when multiple sources co-exist. (a) Swarm plot of RMSE for various combinations of sources and transition days. The solid curve shows the combination with minimum RMSE on a given transition day. (b)–(e) Correlation between 7-day moving-average multiple-source effective distance and the integral of mobility rate when Santa Clara, CA and Suffolk, MA are the threatening sources. The dates in this row represent the transition day when the second source starts to function. (f)–(i) Shortest-path trees for (b)–(e) by integrating the two identified threatening sources as one.

FIG. 3.

Multiple-source effective distance outperforms single-source effective distance in predicting arrival times in the U.S. when multiple sources co-exist. (a) Swarm plot of RMSE for various combinations of sources and transition days. The solid curve shows the combination with minimum RMSE on a given transition day. (b)–(e) Correlation between 7-day moving-average multiple-source effective distance and the integral of mobility rate when Santa Clara, CA and Suffolk, MA are the threatening sources. The dates in this row represent the transition day when the second source starts to function. (f)–(i) Shortest-path trees for (b)–(e) by integrating the two identified threatening sources as one.

Close modal

Combinations with low RMSEs are listed in Sec. IV in the supplementary material. In this study, we selected Santa Clara, CA and Suffolk, MA as the two sources, and the transition day as March 20 because of the minimum RMSE. In this case, the α is 124.96, and the veff0 is 0.23. The remaining analysis is based on this hypothesis, although the methods can be applied to other combinations of sources.

Figure 4 visualizes the counties that may have imported cases from the two threatening sources. Compared to the connection map before the epidemic, there is a sharp decrease in the number of edges spanning long geographical distances after the outbreak of COVID-19. Additionally, we observe that except for the counties neighboring Suffolk county, other counties have a larger likelihood of importing cases from Santa Clara [Fig. 4(b)]. Moreover, the threatening sources connect to their surrounding areas. In contrast, long-distance edges were the dominant links connect sources to remote counties before the outbreak of the virus. However, those edges were replaced by chains of relatively short-distance edges during the epidemic. In other words, the infection of a remote county is more likely to be the result of chained infections that propagates from a threatening source rather than of long-distance mobility from it.

FIG. 4.

Virus transmission paths from threatening sources to the rest of the counties in the U.S. The visualization only displays 20% of the paths longer than 300 km from Suffolk, MA and 1000 km from Santa Clara, CA. The complete paths are shown in Fig. S2 in the supplementary material. (a) Connection map shows routes on January 26 from threatening sources to counties that were infected before March 27. (b) Connection map shows routes on March 27 from threatening sources to infected counties before March 27.

FIG. 4.

Virus transmission paths from threatening sources to the rest of the counties in the U.S. The visualization only displays 20% of the paths longer than 300 km from Suffolk, MA and 1000 km from Santa Clara, CA. The complete paths are shown in Fig. S2 in the supplementary material. (a) Connection map shows routes on January 26 from threatening sources to counties that were infected before March 27. (b) Connection map shows routes on March 27 from threatening sources to infected counties before March 27.

Close modal

This result further supports the observation in Sec. III A that many combinations of threatening sources share similar RMSEs because of the existence of edges of short effective distances that connect potential threatening counties. In this case, many counties are intermediate counties from threatening sources. When they suffer outbreaks, they may potentially become new threatening sources, exporting the virus to other destinations. When policies to control the sources failed, controlling the intermediate counties could become an alternative strategy in mitigating the spread of the virus.

The sharp drop and recovery in mobility significantly impact the effective distances as shown in Fig. 5. The distances increase from an average of 7.57 on March 13 to 9.47 on April 13 and then slowly decrease to 7.71 on June 23. Limiting mobility has been a key policy in slowing and containing the transmissions of COVID-19. Our results validate that these policies have helped distance counties from the disease.

FIG. 5.

The median effective distance over time. We take the 7-day moving average and the shaded area shows the area between 25th and 75th percentiles.

FIG. 5.

The median effective distance over time. We take the 7-day moving average and the shaded area shows the area between 25th and 75th percentiles.

Close modal

As shown in Fig. 3, the multiple-source effective distance is correlated with x(Tm). Based on Eq. (6), the arrival time for each county could be calculated using the following equation. On the day t, county m’s effective distance to sources is Dm(t), and global mobility rate is γ(t). Assuming that the mobility will stay the same as the one on day t in the future, i.e., the Dm(t)=Dm(t) and γ(t)=γ(t) when tt, the predicted arrival times T^m|t for each county on the day t<Tm could be calculated by the following equation. More details on the derivation of Eq. (10) are included in Sec. II C in the supplementary material,

T^m|t=Dm(t)αx(t)+αtγ(t)αγ(t)+veff0.
(10)

In order to avoid the fluctuation of mobility in the period of mobility dynamics, a 7-day moving average is applied on x(t) and γ(t). Thus, the arrival time delay (ATD) could be defined as

ATD(t)=t=0tΔT(t)=t=0t(T^m|tT^m|t1).
(11)

After the first time when T^m|tt, the county is labeled as infected and the ATD will keep the same for the rest of time.

The result in Fig. 6(a) (black line) indicates that the mobility drop caused by the declaration of national emergency has delayed the arrival times by an average of 5.54 days by April 12. Because most of counties have been infected when the mobility starts to recover, it only brought a slight decrease on the average of ATD. The recovery eventually leads to an overall ATD at 4.82 days after May 15 when all counties got infected. If the national emergency had been declared 7 days earlier [Fig. 6(a), blue line], the average ATD would have been increased to 20.49 days on April 5. In comparison, if the mobility drop was delayed by 7 days [Fig. 6(a) orange line] when over 95.5% of counties in the U.S. would have been infected before the delayed declaration of national emergency, the policy would barely have any contribution to the ATD.

FIG. 6.

(a) The mean value of arrival time delay over time if the declaration of national emergency (NE) is on March 6 (blue line), March 13 (black line), and March 20 (orange line). (b)–(d) The arrival time delay distribution in three different cases on April 7 accumulating from each NE day.

FIG. 6.

(a) The mean value of arrival time delay over time if the declaration of national emergency (NE) is on March 6 (blue line), March 13 (black line), and March 20 (orange line). (b)–(d) The arrival time delay distribution in three different cases on April 7 accumulating from each NE day.

Close modal

The ATD map in Fig. 6(c) indicates that the hypothesized national emergency declaration on March 6 does not have a significant impact on counties in the West and Northeast regions because many counties there have been infected before the national emergency. However, the national emergency declaration on March 6 could give counties in the Midwest and South regions much more time to prepare for the arrival of the virus because of their relatively long distance to threatening locations [Fig. 6(c)]. Bethel, Southeast Fairbanks in Alaska would benefit most; the delays of the arrival would be 72.85 days and 71.29 days, respectively, by April 7. Figure 6(d) shows a significant decline in ATD in most counties when the national emergency declaration is delayed by 7 days to March 20. Many counties in the central region with long delays in arrival time in Fig. 6(b) would no longer have any delay in the arrival time in this case.

There are three main findings in this study. First, building on the previous work in Refs. 22 and 21, we introduce a multiple-source effective distance and compare it to the single-source effective distance in a dynamic mobility network to understand the transmission of COVID-19 in the U.S. The multiple-source effective distance shows that, given identified risk locations, the integral of mobility rate x(Tm) fits well with the 7-day moving-average multiple-source effective distance. The finding aligns with the results from Ref. 21, and, thus, the research expands the application of the metric when mobility rate varies over time.

Second, we use the concept of effective speed and quantified its relationship to the number of outbreak locations and the mobility rate. It provides a new perspective to understand how changes in the mobility network affect the effective distance, the effective speed, and, as a result, the arrival time. The concept further brings a more precise mathematical model to quantify disease transmission in complex dynamic networks. Additionally, based on the finding of the effective speed, we develop a method to locate sources and the transition day based on RMSE values. We use two most likely sources for our analyses: Santa Clara, CA, which was infected on February 3, and Suffolk, MA, which started to spread on March 20. It allows us to analyze the spreading of the virus to both local counties neighboring these sources and remote counties far from them. Our results indicate that the spreading of the virus is the result of chained infectious. The method could further be applied in the arrival time prediction and the arrival time delay estimation given the sources and the transition day. Considering that the method is built on the concept of the effective speed, it is extendable to include other factors mentioned in Ref. 22, such as the number of outbreak location (OLs), local reproduction ratio, and mean recovery rate in future research.

Last, the arrival time delay provides a reliable approach to quantify the benefits and timing of mobility-related policies.10,18 In this study, we take the declaration of national emergency as an example and reveal the benefits of the policy in different regions over time. Moreover, the consequences of different declaration times are discussed in terms of the arrival time of counties in the U.S.

There are also some limitations in this research. First, the approach of searching the optimal sources proposed in Sec. III A is limited by computational power. There is a possibility that there are more than two threatening sources or areas in the virus diffusion procedure in the U.S. In that case, the number of outbreak locations should be taken into consideration when using Eq. (6). Additionally, in the multiple-source effective distance network, many counties are close to each other, especially around large cities. In this case, it can be difficult to tell exactly which combination is the one that closest to reality. Third, this method in targeting sources heavily relies on the correct number of confirmed cases, especially in the early stage. The massive under-reporting and under-testing of COVID-19 cases at the early stage of the pandemic could lead to biased arrival times of counties,33–35 which further complicates the search of the optimal sources and may lead to errors in locating sources and the transition day. Last, because of the limitation of the dataset, we could not cross-validate the method in other countries, in other periods of time, or with different pandemics. Therefore, in our future work, we will work on the simulation of the pandemic to further investigate other factors that could influence the effective speed. Additionally, this method could apply to datasets in other countries or regions during different pandemics to investigate the local virus diffusion pattern. The method could also be used to investigate the neighborhood-level virus spreading and, following our prior research,36 provide a new perspective to understand the racial disparities among communities.

See the supplementary material for details on the county list of candidate threatening sources, the validation of mobility data, the top list of combinations with the top ten highest R2 values, and the additional connection map with all routes.

Q.W. and J.D. acknowledge support from the U.S. National Science Foundation (Nos. 2027744 and 2027708). J.G. acknowledges support of the U.S. National Science Foundation under Grant No. 2047488 and the Rensselaer-IBM AI Research Collaboration.

The authors have no conflicts to disclose.

All data were collected through a CCPA and GDPR compliant framework and utilized for research purposes. Our usage agreement with Cuebiq does not allow us to make public or otherwise share the anonymized mobile phone data used in this study. Researchers interested in aggregated data and/or summary statistics, where permitted under said agreement, should contact the corresponding authors. Computer codes used to process and analyze the data are openly available in https://github.com/Edwarddd/Multiple-source-effective-distance, Ref. [37].

1.
S.
Galea
,
R. M.
Merchant
, and
N.
Lurie
, “
The mental health consequences of covid-19 and physical distancing: The need for prevention and early intervention
,”
JAMA Int. Med.
180
,
817
818
(
2020
).
2.
COVID-19: Fiscal Impact to States and Strategies for Recovery (The Council of State Governments, 2020).
3.
D. B. G.
Tai
,
A.
Shah
,
C. A.
Doubeni
,
I. G.
Sia
, and
M. L.
Wieland
, “
The disproportionate impact of COVID-19 on racial and ethnic minorities in the United States
,”
Clin. Infect. Dis.
72
(4),
703–706
(
2020
).
4.
COVID-19 Weekly Epidemiological Update (WHO, 2021).
5.
S.
Lai
,
N. W.
Ruktanonchai
,
L.
Zhou
,
O.
Prosper
,
W.
Luo
,
J. R.
Floyd
,
A.
Wesolowski
,
M.
Santillana
,
C.
Zhang
,
X.
Du
et al., “Effect of non-pharmaceutical interventions to contain COVID-19 in China,”
Nature
585
(7825),
410–413
(
2020
).
6.
M.
Gilbert
,
G.
Pullano
,
F.
Pinotti
,
E.
Valdano
,
C.
Poletto
,
P.-Y.
Boëlle
,
E.
d’Ortenzio
,
Y.
Yazdanpanah
,
S. P.
Eholie
,
M.
Altmann
et al., “
Preparedness and vulnerability of African countries against importations of COVID-19: A modelling study
,”
Lancet
395
,
871
877
(
2020
).
7.
M.
Chinazzi
,
J. T.
Davis
,
M.
Ajelli
,
C.
Gioannini
,
M.
Litvinova
,
S.
Merler
,
A. P. Y
Piontti
,
K.
Mu
,
L.
Rossi
,
K.
Sun
et al., “
The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak
,”
Science
368
,
395
400
(
2020
).
8.
M. U.
Kraemer
,
C.-H.
Yang
,
B.
Gutierrez
,
C.-H.
Wu
,
B.
Klein
,
D. M.
Pigott
,
L.
Du Plessis
,
N. R.
Faria
,
R.
Li
,
W. P.
Hanage
et al., “
The effect of human mobility and control measures on the COVID-19 epidemic in China
,”
Science
368
,
493
497
(
2020
).
9.
J. S.
Jia
,
X.
Lu
,
Y.
Yuan
,
G.
Xu
,
J.
Jia
, and
N. A.
Christakis
, “
Population flow drives spatio-temporal distribution of COVID-19 in China
,”
Nature
582
,
389
394
(
2020
).
10.
M.
Zanin
and
D.
Papo
, “
Travel restrictions during pandemics: A useful strategy?
,”
Chaos
30
,
111103
(
2020
).
11.
A. L.
Horn
and
H.
Friedrich
, “
Locating the source of large-scale outbreaks of foodborne disease
,”
J. R. Soc. Interface
16
,
20180624
(
2019
).
12.
P. C.
Pinto
,
P.
Thiran
, and
M.
Vetterli
, “
Locating the source of diffusion in large-scale networks
,”
Phys. Rev. Lett.
109
,
068702
(
2012
).
13.
W.
Luo
,
W. P.
Tay
, and
M.
Leng
, “
How to identify an infection source with limited observations
,”
IEEE J. Select. Top. Signal Process.
8
,
586
597
(
2014
).
14.
Z.
Shen
,
S.
Cao
,
W.-X.
Wang
,
Z.
Di
, and
H. E.
Stanley
, “
Locating the source of diffusion in complex networks by time-reversal backward spreading
,”
Phys. Rev. E
93
,
032301
(
2016
).
15.
J.
Yang
,
Q.
Zhang
,
Z.
Cao
,
J.
Gao
,
D.
Pfeiffer
,
L.
Zhong
, and
D. D.
Zeng
, “
The impact of non-pharmaceutical interventions on the prevention and control of COVID-19 in New York city
,”
Chaos
31
,
021101
(
2021
).
16.
S. A.
Lauer
,
K. H.
Grantz
,
Q.
Bi
,
F. K.
Jones
,
Q.
Zheng
,
H. R.
Meredith
,
A. S.
Azman
,
N. G.
Reich
, and
J.
Lessler
, “
The incubation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: Estimation and application
,”
Ann. Int. Med.
172
,
577
582
(
2020
).
17.
CDC COVID-19 Response Team, M. A.
Jorden
,
S. L.
Rudman
,
E.
Villarino
,
S.
Hoferka
,
M. T.
Patel
,
K.
Bemis
,
C. R.
Simmons
,
M.
Jespersen
,
J. I.
Johnson
,
E.
Mytty
et al., “Evidence for limited early spread of COVID-19 within the United States, January–February 2020,”
MMWR Morb. Mortal. Wkly. Rep.
69(22), 680–684 (2020).
18.
S.
Pei
,
S.
Kandula
, and
J.
Shaman
, “
Differential effects of intervention timing on COVID-19 spread in the United States
,”
Sci. Adv.
6
,
eabd6370
(
2020
).
19.
W. A.
Chiu
,
R.
Fischer
, and
M. L.
Ndeffo-Mbah
, “
State-level needs for social distancing and contact tracing to contain COVID-19 in the United States
,”
Nat. Hum. Behav.
4
,
1080
1090
(
2020
).
20.
R.
Yeip
, “How the COVID-19 surge shifted to the south and west,” Wall Street Journal (2020), https://www.wsj.com/articles/in-the-u-s-coronavirus-tells-a-tale-of-two-americas-11593797658.
21.
L.
Zhong
,
M.
Diagne
,
W.
Wang
, and
J.
Gao
, “
Country distancing increase reveals the effectiveness of travel restrictions in stopping COVID-19 transmission
,”
Commun. Phys.
4
,
1
12
(
2021
).
22.
D.
Brockmann
and
D.
Helbing
, “
The hidden geometry of complex, network-driven contagion phenomena
,”
Science
342
,
1337
1342
(
2013
).
23.
F.
Iannelli
,
A.
Koher
,
D.
Brockmann
,
P.
Hövel
, and
I. M.
Sokolov
, “
Effective distances for epidemics spreading on complex networks
,”
Phys. Rev. E
95
,
012313
(
2017
).
24.
E.
Dong
,
H.
Du
, and
L.
Gardner
, “
An interactive web-based dashboard to track COVID-19 in real time
,”
Lancet Infect. Disease.
20
,
533
534
(
2020
).
25.
F.
Wang
,
J.
Wang
,
J.
Cao
,
C.
Chen
, and
X. J.
Ban
, “
Extracting trips from multi-sourced data for mobility pattern analysis: An app-based data example
,”
Transport. Res. C
105
,
183
202
(
2019
).
26.
A.
Akhavan
,
N. E.
Phillips
,
J.
Du
,
J.
Chen
,
B.
Sadeghinasr
, and
Q.
Wang
, “
Accessibility inequality in Houston
,”
IEEE Sensors Lett.
3
,
1
4
(
2019
).
27.
Q.
Wang
,
N. E.
Phillips
,
M. L.
Small
, and
R. J.
Sampson
, “
Urban mobility and neighborhood isolation in America’s 50 largest cities
,”
Proc. Natl. Acad. Sci. U.S.A.
115
,
7735
7740
(
2018
).
28.
J.
Valentino-DeVries
,
D.
Lu
, and
G. J. X.
Dance
, “Location data says it all: Staying at home during coronavirus is a luxury,” The New York Times (2020).
29.
A.
Aleta
,
D.
Martín-Corral
,
A. P. Y
Piontti
,
M.
Ajelli
,
M.
Litvinova
,
M.
Chinazzi
,
N. E.
Dean
,
M. E.
Halloran
,
I. M.
Longini
, Jr.
,
S.
Merler
et al., “
Modelling the impact of testing, contact tracing and household quarantine on second waves of COVID-19
,”
Nat. Hum. Behav.
4
,
964
971
(
2020
).
30.
Q.
Li
,
X.
Guan
,
P.
Wu
,
X.
Wang
,
L.
Zhou
,
Y.
Tong
,
R.
Ren
,
K. S.
Leung
,
E. H.
Lau
,
J. Y.
Wong
et al., “
Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia
,”
N. Engl. J. Med.
382
(13),
1199–1207
(
2020
).
31.
Y.
Wang
,
Z.
Cao
,
D. D.
Zeng
,
Q.
Zhang
, and
T.
Luo
, “
The collective wisdom in the COVID-19 research: Comparison and synthesis of epidemiological parameter estimates in preprints and peer-reviewed articles
,”
Int. J. Infect. Dis.
104
,
1
6
(
2021
).
32.
H.
Deng
,
J.
Du
,
J.
Gao
, and
Q.
Wang
, “
Network percolation reveals adaptive bridges of the mobility network response to COVID-19
,”
PLoS One
16
,
e0258868
(
2021
).
33.
H.
Lau
,
T.
Khosrawipour
,
P.
Kocbach
,
H.
Ichii
,
J.
Bania
, and
V.
Khosrawipour
, “
Evaluating the massive underreporting and undertesting of COVID-19 cases in multiple global epicenters
,”
Pulmonology
27
(2),
110–115
(
2020
).
34.
F. J.
Angulo
,
L.
Finelli
, and
D. L.
Swerdlow
, “
Estimation of US SARS-CoV-2 infections, symptomatic infections, hospitalizations, and deaths using seroprevalence surveys
,”
JAMA Netw. Open
4
,
e2033706
e2033706
(
2021
).
35.
S. L.
Wu
,
A. N.
Mertens
,
Y. S.
Crider
,
A.
Nguyen
,
N. N.
Pokpongkiat
,
S.
Djajadi
,
A.
Seth
,
M. S.
Hsiang
,
J. M.
Colford
,
A.
Reingold
et al., “
Substantial underestimation of SARS-CoV-2 infection in the United States
,”
Nat. Commun.
11
,
1
10
(
2020
). doi:10.1038/s41467-020-18272-4
36.
Y.
Wang
,
A.
Ristea
,
M.
Amiri
,
D.
Dooley
,
S.
Gibbons
,
H.
Grabowski
,
J. L.
Hargraves
,
N.
Kovacevic
,
A.
Roman
,
R. K.
Schutt
,
J.
Gao
,
Q.
Wang
, and
D. T.
O’Brien
, “
Vaccination intentions generate racial disparities in the societal persistence of COVID-19
,”
Sci. Rep.
11
,
19906
(
2021
).
37.
Y.
Wang
(2021). “Multiple-source-effective-distance manuscript,” Github. https://github.com/Edwarddd/Multiple-source-effective-distance

Supplementary Material