We consider a stochastic susceptible-exposed-infected-recovered (SEIR) epidemiological model. Through the use of a normal form coordinate transform, we are able to analytically derive the stochastic center manifold along with the associated, reduced set of stochastic evolution equations. The transformation correctly projects both the dynamics and the noise onto the center manifold. Therefore, the solution of this reduced stochastic dynamical system yields excellent agreement, both in amplitude and phase, with the solution of the original stochastic system for a temporal scale that is orders of magnitude longer than the typical relaxation time. This new method allows for improved time series prediction of the number of infectious cases when modeling the spread of disease in a population. Numerical solutions of the fluctuations of the SEIR model are considered in the infinite population limit using a Langevin equation approach, as well as in a finite population simulated as a Markov process.

The reduction in high-dimensional, stochastic systems is an important and fundamental problem in nonlinear dynamical systems. In this article, we present a general theory of stochastic model reduction, which is based on a normal form coordinate transform. This nonlinear, stochastic projection allows for the deterministic and stochastic dynamics to interact correctly on the lower-dimensional manifold so that the dynamics predicted by the reduced, stochastic system agrees well with the dynamics predicted by the original, high-dimensional stochastic system. Although the method may be applied to any physical or biological system with well-separated time scales, here we apply the method to an epidemiological model. We show that when compared with the original stochastic epidemic model, the reduced model properly captures the initial and recurrent disease outbreaks, both in amplitude and phase. This long-term accuracy of the reduced model allows for the application of effective disease control where phase differences between outbreak times and vaccine controls are important. Additionally, in practice, one can only measure the number of infectious individuals in a population. Our method allows one to predict the number of unobserved exposed individuals based on the observed number of infectious individuals.

## I. INTRODUCTION

The interaction between deterministic and stochastic effects in population dynamics has played, and continues to play, an important role in the modeling of infectious diseases. The mechanistic modeling side of population dynamics is well known and established.^{1,2} These models typically are assumed to be useful for infinitely large, homogeneous populations, and arise from the mean field analysis of probabilistic models. On the other hand, when one considers finite populations, random interactions give rise to internal noise effects, which may introduce new dynamics. Stochastic effects are quite prominent in finite populations, which can range from ecological dynamics^{3} to childhood epidemics in cities.^{4,5} For homogeneous populations with seasonal forcing, noise also comes into play in the prediction of large outbreaks.^{6–8} Specifically, external random perturbations change the probabilistic prediction of epidemic outbreaks as well as its control.^{9}

When geometric structure is applied to the population, the interactions are modeled as a network.^{10,11} Many types of static networks which support epidemics have been considered. Some examples include small world networks,^{12} hierarchical networks,^{13} and transportation networks of patch models.^{14} In addition, the fluctuation of epidemics on adaptive networks, where the wiring between nodes changes in response to the node information, has been examined.^{15} In adaptive network models, even the mean field can be high dimensional, since nodes and links evolve in time and must be approximated as an additional set of ordinary differential equations.

Another aspect of epidemic models, which is often of interest, involves the inclusion of a time delay. The delay term makes the analysis significantly more complicated. However, it is possible to approximate the delay by creating a cascade consisting of a large number of compartments.^{16} For example, one could simulate the delay associated with a disease exposure time with several hundred “exposed” compartments.

These model examples are just a few of the types of very high-dimensional models that are currently of interest. As a result of the high dimensionality, there is much computation involved, and the analysis is quite difficult. In particular, real-time computation is not currently possible. However, there are usually many time scales that are well separated (due typically to a large range in order of magnitude of the parameters) when considering such high-dimensional problems. In the presence of well-separated time scales, a model reduction method is needed to examine the dynamics on a lower-dimensional space. It is known that deterministic model reduction methods may not work well in the stochastic realm, which includes epidemic models.^{17} The purpose of this article is to examine a method of nonlinear, stochastic projection so that the deterministic and stochastic dynamics interact correctly on the lower-dimensional manifold and predict correctly the dynamics when compared with the full system. Because the noise affects the timing of outbreaks, it is essential to produce a low-dimensional system that captures the correct timing of the outbreaks as well as the amplitude and phase of any recurrent behavior.

We will demonstrate that our stochastic model reduction method properly captures the initial disease outbreak and continues to accurately predict the outbreaks for time scales, which are orders of magnitude longer than the typical relaxation time. Furthermore, in practice, real disease data include only the number of infectious individuals. Our method allows us to predict the number of unobserved exposed individuals based on the observed number of infectious individuals.

For stochastic model reduction, there exist several potential methods for general problems. For a system with certain spectral requirements, the existence of a stochastic center manifold was proven in Ref. 18. Nonrigorous stochastic normal form analysis (which leads to the stochastic center manifold) was performed in Refs. 19–22. Rigorous theoretical analysis of normal form coordinate transformations for stochastic center manifold reduction was developed in Refs. 23 and 24. Later, another method of stochastic normal form reduction was developed,^{25} in which any anticipatory convolutions (integrals into the future of the noise processes) that appeared in the slow modes were removed. Since this latter analysis makes the construction of the stochastic normal form coordinate transform more transparent, we use this method to derive the reduced stochastic center manifold equation.

Figure 1 shows a schematic demonstrating our approach to the problem. We consider a high-dimensional system along with its corresponding reduced low-dimensional system. In this article, two types of low-dimensional system are discussed: a reduced system based on deterministic center manifold analysis and a reduced system based on a stochastic normal form coordinate transform. Regardless of the type of low-dimensional system being considered, a common noise is injected into both the high-dimensional and low-dimensional models, and an analysis of the solutions found using the high and low-dimensional systems is performed.

In this article, as a first study of a high-dimensional system, we consider the susceptible-exposed-infected-recovered (SEIR) epidemiological model with stochastic forcing. As previously mentioned, we could easily consider a SEIR-type model where the exposed class was modeled using hundreds of compartments. Since the analysis is similar, we consider the simpler standard SEIR model to demonstrate the power of our method. Section II provides a complete description of this model. Section III describes how to transform the deterministic SEIR system to a new system that satisfies the spectral requirements needed to apply the center manifold theory. After the theory is used to find the evolution equations that describe the dynamics on the center manifold, we show in Sec. IV how the reduced model that is found using the deterministic result incorrectly projects the noise onto the center manifold. Section V demonstrates the use of a stochastic normal form coordinate transform to correctly project the noise onto the stochastic center manifold. A discussion and the conclusions are contained, respectively, in Secs. VI and VII.

## II. THE SEIR MODEL FOR EPIDEMICS

We begin by describing the stochastic version of the SEIR model found in Ref. 26. We assume that a given population may be divided into the following four classes that evolve in time:

Susceptible class $s(t)$ consists of those individuals who may contract the disease.

Exposed class $e(t)$ consists of those individuals who have been infected by the disease but are not yet infectious.

Infectious class $i(t)$ consists of those individuals who are capable of transmitting the disease to susceptible individuals.

Recovered class $r(t)$ consists of those individuals who are immune to the disease.

Furthermore, we assume that the total population size, denoted as $N$, is constant and can be normalized to $S(t)+E(t)+I(t)+R(t)=1$, where $S(t)=s(t)/N$, $E(t)=e(t)/N$, $I(t)=i(t)/N$, and $R(t)=r(t)/N$. Therefore, the population class variables $S$, $E$, $I$, and $R$ represent fractions of the total population. The governing equations for the stochastic SEIR model are

Additionally, $\mu $ represents a constant birth and death rate, $\beta $ is the contact rate, $\alpha $ is the rate of infection, so that $1/\alpha $ is the mean latency period, and $\gamma $ is the rate of recovery, so that $1/\gamma $ is the mean infectious period. Although the contact rate $\beta $ could be given by a time-dependent function (e.g., due to seasonal fluctuations), for simplicity, we assume $\beta $ to be constant. Throughout this article, we use the following parameter values: $\mu =0.02(yr)\u22121$, $\beta =1575.0(yr)\u22121$, $\alpha =1/0.0279(yr)\u22121$, and $\gamma =1/0.01(yr)\u22121$. Disease parameters correspond to typical measles values.^{26,27} Note that any other biologically meaningful parameters may be used as long as the basic reproductive rate $R0=\alpha \beta /[(\alpha +\mu )(\gamma +\mu )]>1$. The interpretation of $R0$ is the number of secondary cases produced by a single infectious individual in a population of susceptibles in one infectious period.

As a first approximation of stochastic effects, we have considered additive noise. This type of noise may result from migration into and away from the population being considered.^{28} Since it is difficult to estimate fluctuating migration rates,^{29} it is appropriate to treat migration as an arbitrary external noise source. Also, fluctuations in the birth rate manifest itself as additive noise. Furthermore, as we are not interested in extinction events in this article, it is not necessary to use multiplicative noise. In general, for the problem considered here, it is possible that a rare event in the tail of the noise distribution may cause one or more of the $S$, $E$, and $I$ components of the solution to become negative. In this article, we will always assume that the noise is sufficiently small so that a solution remains positive for a long enough time to gather sufficient statistics. Even though it is difficult to accurately estimate the appropriate noise level from real data, our choices of noise intensity lie within the huge confidence intervals computed in Ref. 29. The case for multiplicative noise will be considered in a separate paper.

Although $S+E+I+R=1$ in the deterministic system, one should note that the dynamics of the stochastic SEIR system will not necessarily have all of the components sum to unity. However, since the noise has zero mean, the total population will remain close to unity on average. Therefore, we assume that the dynamics is sufficiently described by Eqs. (1a)–(1c). It should be noted that even if $E(t)+I(t)=0$ for some $t$, the noise allows for the reemergence of the epidemic.

## III. DETERMINISTIC CENTER MANIFOLD ANALYSIS

One way to reduce the dimension of a system of equations is through the use of deterministic center manifold theory. In general, a nonlinear vector field can be transformed so that the linear part (Jacobian) of the vector field has a block diagonal form where the first matrix block has eigenvalues with positive real part, the second matrix block has eigenvalues with negative real part, and the third matrix block has eigenvalues with zero real part.^{30,31} These blocks are associated, respectively, with the unstable eigenspace, the stable eigenspace, and the center eigenspace. If we suppose that there are no eigenvalues with positive real part, then orbits will rapidly decay to the center eigenspace.

In order to make use of the center manifold theory, we must transform Eqs. (1a)–(1c) to a new system of equations that has the necessary spectral structure. The theory will allow us to find an invariant center manifold passing through the fixed point to which we can restrict the transformed system. Details regarding the transformation can be found in Sec. III A, and the computation of the center manifold can be found in Sec. III B.

### A. Transformation of the SEIR model

Our analysis begins by considering the governing equations for the stochastic SEIR model given by Eqs. (1a)–(1c). We neglect the $\sigma i\varphi i(t)$ terms in Eqs. (1a)–(1c) so that we are considering the deterministic SEIR system. This deterministic system has two fixed points. The first fixed point is

and corresponds to a disease free or extinct equilibrium state. The second fixed point corresponds to an endemic state and is given by

To ease the analysis, we define a new set of variables, $S\xaf$, $E\xaf$, and $I\xaf$, as $S\xaf(t)=S(t)\u2212S0$, $E\xaf(t)=E(t)\u2212E0$, and $I\xaf(t)=I(t)\u2212I0$. These new variables are substituted into Eqs. (1a)–(1c).

Then, treating $\mu $ as a small parameter, we rescale time by letting $t=\mu \tau $. We may then introduce the following rescaled parameters: $\alpha =\alpha 0/\mu $ and $\gamma =\gamma 0/\mu $, where $\alpha 0$ and $\gamma 0$ are $O(1)$. The inclusion of the parameter $\mu $ as a new state variable means that the terms in our rescaled system which contain $\mu $ are now nonlinear terms. Furthermore, the system is augmented with the auxiliary equation $d\mu /d\tau =0$. The addition of this auxiliary equation contributes an extra simple zero eigenvalue to the system and adds one new center direction that has trivial dynamics. The shifted and rescaled augmented system of equations is

The Jacobian of Eqs. (5a)–(5d) is computed to zeroth order in $\mu $ and is evaluated at the origin. Ignoring the $\mu $ components, the Jacobian has only two linearly independent eigenvectors. Therefore, the Jacobian is not diagonalizable. However, it is possible to transform Eqs. (5a)–(5c) to a block diagonal form with the eigenvalue structure that is needed to use the center manifold theory. We use a transformation matrix $P$ consisting of the two linearly independent eigenvectors of the Jacobian along with a third vector chosen to be linearly independent. There are many choices for this third vector; our choice is predicated on keeping the vector as simple as possible. This transformation matrix is given as

Using the fact that $(S\xaf,E\xaf,I\xaf)T=P\u22c5(U,V,W)T$, then the transformation matrix leads to the following definition of new variables: $U$, $V$, and $W$,

The application of the transformation matrix to Eqs. (5a)–(5c) leads to the transformed evolution equations given by

### B. Center manifold equation

The Jacobian of Eqs. (8a)–(8d) to zeroth order in $\mu $ and evaluated at the origin is

which shows that Eqs. (8a)–(8d) may be rewritten in the form

where $x=(U)$, $y=(V,W)$, $A$ is a constant matrix with eigenvalues that have negative real parts, $B$ is a constant matrix with eigenvalues that have zero real parts, and $f$ and $g$ are nonlinear functions in $x$, $y$, and $\mu $. In particular,

Therefore, the system will rapidly collapse onto a lower-dimensional manifold given by center manifold theory.^{32} Furthermore, we know that the center manifold is given by

where $h$ is an unknown function.

In general, it is not possible to solve the center manifold condition for the unknown function $h(V,W,\mu )$. Therefore, we perform the following Taylor series expansion of $h(V,W,\mu )$ in $V$, $W$, and $\mu $:

where $h0$, $h2$, $h3$, $h\mu $, $\cdots $ are unknown coefficients that are found by substituting the Taylor series expansion into the center manifold condition and equating terms of the same order. By carrying out this procedure using a second-order Taylor series expansion of $h$, the center manifold equation is

where $\u03f5=|(V,W,\mu )|$ so that $\u03f5$ provides a count of the number of $V$, $W$, and $\mu $ factors in any one term. Substitution of Eq. (17) into Eqs. (8b) and (8c) leads to the following reduced system of evolution equations, which describe the dynamics on the center manifold

## IV. INCORRECT PROJECTION OF THE NOISE ONTO THE STOCHASTIC CENTER MANIFOLD

### A. Transformation of the stochastic SEIR model

We now return to the stochastic SEIR system given by Eqs. (1a)–(1c). The shift in the fixed point to the origin will not have any effect on the noise terms, so that the stochastic version of the shifted equations is

As Eqs. (19a)–(19c) are transformed using Eqs. (7a)–(7c), the $\alpha =\alpha 0/\mu $ scaling, the $\gamma =\gamma 0/\mu $ scaling, and the $t=\mu \tau $ time scaling, the noise also is scaled so that the stochastic transformed equations are given by

The system of equations given by Eqs. (20a) and (21c) is an exact transformation of the system of equations given by Eqs. (1a)–(1c). We numerically integrate the original stochastic system of the SEIR model [Eqs. (1a)–(1c)] along with the transformed stochastic system [Eqs. (20a)–(20c)] using a stochastic fourth-order Runge–Kutta scheme with a constant time step size. The original system is solved for $S$, $E$, and $I$, while the transformed system is solved for $U$, $V$, and $W$. In the latter case, once the values of $U$, $V$, and $W$ are known, we compute the values of $S\xaf$, $E\xaf$, and $I\xaf$ using the transformation given by Eqs. (7a)–(7c). We shift $S\xaf$, $E\xaf$, and $I\xaf$, respectively, by $S0$, $E0$, and $I0$ to find the values of $S$, $E$, and $I$.

Figure 2 compares the time series of the fraction of the population that is infected with a disease $I$, computed using the original stochastic system of equations of the SEIR model with the time series of $I$ computed using the transformed stochastic system of equations of the SEIR model.

Although the two time series shown in Fig. 2 generally agree very well, there is some discrepancy. This discrepancy is due to the fact that the noise processes $\sigma 4\varphi 4$, $\sigma 5\varphi 5$, and $\sigma 6\varphi 6$ of the transformed system are new, independent noise processes with different variance than the $\sigma 1\varphi 1$, $\sigma 2\varphi 2$, and $\sigma 3\varphi 3$ noise processes found in the original system. If we carefully take the noise realization from the original system, transform this noise using Eqs. (21a)–(21c), and use this realization to solve the transformed system, then the two solutions would be identical.

### B. Reduction in the stochastic SEIR model

It is tempting to consider the reduced stochastic model found by substitution of Eq. (17) into Eqs. (20b) and (20c), so that one has the following stochastic evolution equations (that hopefully describe the dynamics on the stochastic center manifold):

One should note that Eqs. (22a) and (22b) also can be found by naïvely adding the stochastic terms to the reduced system of evolution equations for the deterministic problem [Eqs. (18a) and (18b)]. This type of stochastic center manifold reduction has been done for the case of discrete noise.^{27} Additionally, in many other fields (e.g., oceanography, solid mechanics, fluid mechanics), researchers have performed stochastic model reduction using a Karhunen–Loève expansion (principal component analysis, proper orthogonal decomposition).^{33,34} However, this linear projection does not properly capture the nonlinear effects. Furthermore, one must subjectively choose the number of modes needed for the expansion. Therefore, even though the solution to the reduced model found using this technique may have the correct statistics, individual solution realizations will not agree with the original, complete solution.

We will show that Eqs. (22a) and (22b) do not contain the correct projection of the noise onto the center manifold. Therefore, when solving the reduced system, one does not obtain the correct solution. Such errors in stochastic epidemic modeling impact the prediction of disease outbreak when modeling the spread of a disease in a population.

Using the same numerical scheme previously described, we numerically integrate the complete stochastic system of transformed equations of the SEIR model [Eqs. (20a)–(20c)] along with the reduced system of equations that is based on the deterministic center manifold with a replacement of the noise terms [Eqs. (22a) and (22b)]. The complete system is solved for $U$, $V$, and $W$, while the reduced system is solved for $V$ and $W$. In the latter case, $U$ is computed using the center manifold equation given by Eq. (17). Once the values of $U$, $V$, and $W$ are known, we compute the values of $S\xaf$, $E\xaf$, and $I\xaf$ using the transformation given by Eqs. (7a)–(7c). We shift $S\xaf$, $E\xaf$, and $I\xaf$, respectively, by $S0$, $E0$, and $I0$ to find the values of $S$, $E$, and $I$.

Figures 3(a) and 3(b) compare the time series of the fraction of the population that is infected with a disease $I$, computed using the complete stochastic system of transformed equations of the SEIR model [Eqs. (20a)–(20c)] with the time series of $I$ computed using the reduced system of equations of the SEIR model that is based on the deterministic center manifold with a replacement of the noise terms [Eqs. (22a) and (22b)]. Figure 3(a) shows the initial transients, while Fig. 3(b) shows the time series after the transients have decayed. One can see that the solution computed using the reduced system quickly becomes out of phase with the solution of the complete system. Use of this reduced system would lead to an incorrect prediction of the initial disease outbreak. Additionally, the predicted amplitude of the initial outbreak would be incorrect. The poor agreement, both in phase and amplitude, between the two solutions continues for long periods of time, as seen in Fig. 3(b). We also have computed the cross correlation of the two time series shown in Figs. 3(a) and 3(b) to be approximately 0.34. Since the cross correlation measures the similarity between the two time series, this low value quantitatively suggests a poor agreement between the two solutions.

Using the same systems of transformed equations, we compute 140 years worth of time series for 500 realizations. Ignoring the first 40 years of transient solution, the data are used to create a histogram representing the probability density $pSI$ of the $S$ and $I$ values. Figure 4(a) shows the histogram associated with the complete stochastic system of transformed equations, while Fig. 4(b) shows the histogram associated with the reduced system of equations with a replacement of the noise terms. The color-bar values in Figs. 4(a) and 4(b) have been normalized by $10\u22123$.

One can see by comparing Fig. 4(a) with Fig. 4(b) that the two probability distributions qualitatively look the same. It is also possible to compare the two distributions using a quantitative measure. The Kullback–Leibler divergence or relative entropy measures the difference between the two probability distributions as

where $Pi,j$ refers to the $(i,j)th$ component of the probability density found using the complete stochastic system of transformed equations [Fig. 4(a)] and $Qi,j$ refers to the $(i,j)th$ component of the probability density found using the reduced system of equations [Fig. 4(b)]. In our numerical computation of the relative entropy, we have added $10\u221210$ to each $Pij$ and $Qij$. This eliminates the possibility of having a $Qij=0$ in the denominator of Eq. (23) and does not have much of an effect on the relative entropy.

If the two histograms were identical, then the relative entropy given by Eq. (23) would be $dKL=0$. The two histograms shown in Figs. 4(a) and 4(b) have a relative entropy of $dKL=0.0391$, which means that the two histograms, while not identical, are quantitatively very similar. However, one cannot rely entirely on the histograms alone to say that the solutions of the complete system and the reduced system agree. As we have seen in Figs. 3(a) and 3(b), the two solutions have differing amplitudes and are out of phase with one another. It is important to note that these features are not picked up by the histograms of Fig. 4.

## V. CORRECT PROJECTION OF THE NOISE ONTO THE STOCHASTIC CENTER MANIFOLD

To project the noise correctly onto the center manifold, we will derive a normal form coordinate transform for the complete stochastic system of transformed equations of the SEIR model given by Eqs. (20a)–(20c). The particular method we use to construct the normal form coordinate transform not only reduces the dimension of the dynamics, but also separates all of the fast processes from all of the slow processes.^{25} This technique has been modified and applied to the large fluctuations of multiscale problems.^{17}

Many publications^{19–22} discuss the simplification of a stochastic dynamical system using a stochastic normal form transformation. In some of this work,^{19,22} anticipative noise processes appeared in the normal form transformations, but these integrals of the noise process into the future were not dealt with rigorously.

Later, the rigorous theoretical analysis needed to support normal form coordinate transforms was developed in Refs. 23 and 24. The technical problem of the anticipative noise integrals also was dealt with rigorously in this work. Even later, another stochastic normal form transformation was developed.^{25} This new method allows for the “[removal of] anticipation… from the slow modes with the result that no anticipation is required after the fast transients decay” (Ref. 25, p. 13). The removal of anticipation leads to a simplification of the normal form. Nonetheless, this simpler normal form retains its accuracy with the original stochastic system.^{25}

We shall use the method of Ref. 25 to simplify our stochastic dynamical system to one that emulates the long-term dynamics of the original system. The method involves five principles, which we recapitulate here for completeness:

Avoid unbounded secular terms in both the transformation and the evolution equations to ensure a uniform asymptotic approximation.

Decouple all of the slow processes from the fast processes to ensure a valid long-term model.

Insist that the stochastic slow manifold is precisely the transformed fast processes coordinate being equal to zero.

To simplify matters, eliminate as many as possible of the terms in the evolution equations.

Try to remove all fast processes from the slow processes by avoiding as much as possible the fast time memory integrals in the evolution equations.

In practice, the original stochastic system of equations (which satisfies the necessary spectral requirements) in $(U,V,W)T$ coordinates is transformed to a new $(Y,X1,X2)T$ coordinate system using a near-identity stochastic coordinate transform given as

Several iterations lead to coordinate transforms for $U$, $V$, and $W$ along with evolution equations describing the $Y$-dynamics, $X1$-dynamics, and $X2$-dynamics in the new coordinate system. The $Y$-dynamics have exponential decay to the $Y=0$ slow manifold. Substitution of $Y=0$ leads to the coordinate transforms

All of the stochastic terms in Eqs. (25a)–(25c) consist of integrals of the noise process into the past (convolutions), as given by Eqs. (26) and (27). These memory integrals are fast-time processes. Since we are interested in the long term slow processes and since the expectation of $G$ equals $e\u2212\u2135\tau \u2217E[\varphi ]$, where $E[\varphi ]=0$, we neglect the memory integrals and the higher-order multiplicative terms found in Eqs. (25a)–(25c) so that

Substitution of $Y=0$ and neglecting all multiplicative noise terms and memory integrals using the argument from above (so that we consider only first-order noise terms) leads to the following reduced system of evolution equations on the center manifold:

We numerically integrate the complete stochastic system of transformed equations of the SEIR model [Eqs. (20a)–(20c)] along with the reduced system of equations that is found using the stochastic normal form coordinate trans form [Eqs. (29a), (29b), (B1a), and (B1b)]. The complete system is solved for $U$, $V$, and $W$, while the reduced system is solved for $X1=V$ and $X2=W$. In the latter case, $U$ is computed using the center manifold equation given by Eq. (28a). As before, once the values of $U$, $V$, and $W$ are known, we compute the values of $S\xaf$, $E\xaf$, and $I\xaf$ using the transformation given by Eqs. (7a)–(7c). We shift $S\xaf$, $E\xaf$, and $I\xaf$, respectively, by $S0$, $E0$, and $I0$ to find the values of $S$, $E$, and $I$.

Figures 5(a) and 5(b) compare the time series of the fraction of the population that is infected with a disease $I$, computed using the complete stochastic system of transformed equations of the SEIR model [Eqs. (20a)–(20c)] with the time series of $I$ computed using the reduced system of equations of the SEIR model that is found using the stochastic normal form coordinate transform [Eqs. (29a), (29b), (B1a), and (B1b) ]. Figure 5(a) shows the initial transients, while Fig. 5(b) shows the time series after the transients have decayed. One can see that there is an excellent agreement between the two solutions. The initial outbreak is successfully captured by the reduced system. Furthermore, Fig. 5(b) shows that the reduced system accurately predicts recurrent outbreaks for a time scale that is orders of magnitude longer than the relaxation time. This is not surprising since the solution decays exponentially throughout the transient and then remains close to the lower-dimensional center manifold. Since we are not looking at periodic orbits, there are no secular terms in the asymptotic expansion, and the result is valid for all time. Additionally, any noise drift on the center manifold results in bounded solutions due to sufficient dissipation transverse to the manifold. The cross correlation of the two time series shown in Fig. 5 is approximately 0.98, which quantitatively suggests there is an excellent agreement between the two solutions.

Using the same systems of transformed equations, we compute 140 years worth of time series for 500 realizations. As before, we ignore the first 40 years worth of transient solution, and the data are used to create a histogram representing the probability density $pSI$ of the $S$ and $I$ values. Figure 6(a) shows the histogram associated with the complete stochastic system of transformed equations, while Fig. 6(b) shows the histogram associated with the reduced system of equations found using the normal form coordinate transform. The color-bar values in Figs. 6(a) and 6(b) have been normalized by $10\u22123$.

As we saw with Figs. 4(a) and 4(b), the probability distribution shown in Fig. 6(a) looks qualitatively the same as the probability distribution shown in Fig. 6(b). Using the Kullback–Leibler divergence given by Eq. (23), we have found that the two histograms shown in Figs. 6(a) and 6(b) have a relative entropy of $dKL=0.0953$. Since this value is close to zero, the two histograms are quantitatively very similar.

In addition to computing the cross correlation between the solution of the original system and the solutions of the two reduced systems for $\sigma i=0.0005$, we have computed the cross correlation for the case of zero noise as well as for noise intensities ranging from $\sigma =5.0\xd710\u221210$ to $\sigma =5.0\xd710\u22125$. These cross correlations were computed using time series from $t=800$ to $t=1000$. For the deterministic case (zero noise), the cross correlation between the time series which were computed using the original system and the reduced system based on the deterministic center manifold is 1.0, since the agreement is perfect. The cross correlation between the original system and the reduced system found using the stochastic normal form is also 1.0. Figure 7 shows the cross correlation between the original system and the two reduced systems for various values of $\sigma $.

One can see in Fig. 7 that the solutions found using the reduced system based on the deterministic center manifold compare poorly with the original system at very low noise values. Furthermore, as the noise increases, the agreement between the two solutions gets worse. On the other hand, Fig. 7 shows that the solutions computed using the reduced system found using the normal form coordinate transform compare very well with the solutions to the original system across a wide range of small noise intensities.

## VI. DISCUSSION

We have demonstrated that the normal form coordinate transform method reduces the Langevin system so that both the noise and dynamics are accurately projected onto the lower-dimensional manifold. It is natural to consider (a) the replacement of the stochastic term by a deterministic period drive of small amplitude and (b) the extension to finite populations. These cases are discussed, respectively, in Secs. VI A and VI B.

### A. The case of deterministic forcing

A single time series realization of the noise might be thought of as a deterministic function of small amplitude driving the system. One could rederive the normal form for such a deterministic function. However, since our derived normal form holds specifically for the case of white noise, we show that a simple replacement of the stochastic realization with a deterministic realization does not work. As an example, one could consider the following sinusoidal functions:

Figures 8(a) and 8(b) compare the time series of the fraction of the population that is infected with a disease $I$, computed using the complete system of transformed equations of the SEIR model [Eqs. (20a)–(20c)] with the time series of $I$ computed using the reduced system of equations of the SEIR model that is found using the stochastic normal form coordinate transform [Eqs. (29a), (29b), (B1a), and (B1b)], but where the stochastic terms of both systems have been replaced by the deterministic terms given by Eqs. (30a)–(30c). Figure 8(a) shows the initial transients, while Fig. 8(b) shows a piece of the time series after the transients have decayed. One can see in Figs. 8(a) and 8(b) that although the two solutions eventually become relatively synchronized with one another, there is a poor agreement, both in phase and amplitude, throughout the transient.

### B. The case of finite populations

The solutions to the original system and both reduced systems are continuous solutions based on an infinite population assumption and are found using Langevin equations having Gaussian noise. It is interesting to examine the effects of general noise by using a Markov simulation to compare solutions of the original and reduced systems.

The complete system in the original variables (see p. 2) will evolve in time $t$ in the following way:

Using a total population size of $N=10\xd7106$, we have performed a Markov simulation of the system. After completing the Markov simulation, we divided $s$, $e$, and $i$ by $N$ to find $S$, $E$, and $I$. Figure 9(a) shows a time series, after the transients have decayed, of the fraction of the population that is infected with a disease $I$. The results reflect both the mean and the frequency of the deterministic system. Performing the simulation for 500 realizations allows us to create a histogram representing the probability density $pSI$ of the $S$ and $I$ values. This histogram is shown in Fig. 9(b), and one can see that the probability density reflects the amplitude, which varies with the population size of $S$ and $I$. The color-bar values in Fig. 9(b) have been normalized by $10\u22124$

The complete system in the transformed variables has the stable endemic equilibrium at the origin. To bound the dynamics to the first octant, we use the fact that $s\u22650$, $e\u22650$, and $i\u22650$ to derive the appropriate inequalities for the transformed discrete variables $u$, $v$, and $w$. These inequalities can be found in Appendix C as Eq. (C1). These inequalities enable us to define new discrete variables $Y1$, $Y2$, and $Y3$ given by Eqs. (C2a)–(C2c) in Appendix C.

In the $Yi$ variables, we define evolution relationships similar to those found in Eq. (31). The complete transformed system will evolve in time $\tau $ according to the transition and rates given by Eq. (C3) in Appendix C.

After performing a Markov simulation of Eq. (C3) with a population size of $N=10\xd7106$, we can compare the dynamics of the transformed system with the dynamics of the original system by transforming the $Yi$ variables in the time series back to the original $s$, $e$, and $i$ variables. Dividing by $N$ yields $S$, $E$, and $I$. Figure 10(a) shows a time series, after the transients have decayed, of the fraction of the population that is infected with a disease $I$. The mean and the frequency agree with those found from the Markov simulation of the original system. We have performed the simulation for 500 realizations, and a histogram representing the probability density $pSI$ is shown in Fig. 10(b). The color-bar values in Fig. 10(b) have been normalized by $10\u22124$. One can see in Fig. 10(a) that the relative fluctuations of the $I$ component have nearly doubled. While the fluctuation size was 0.152 for the original system, it is 0.310 for the transformed system. Additionally, the two histograms shown in Figs. 9(b) and 10(b) have a relative entropy of $dKL=0.9519$, which means they are not in agreement. Because the simulation of the stochastic dynamics in the complete system of transformed variables does not qualitatively (or quantitatively) resemble the original stochastic system, we cannot expect that the reduced system will agree with either the original or the transformed systems. Therefore, much care should be exercised when extending the model reduction results (which show outstanding agreement) derived for a specific type of noise in the limit of infinite population to finite populations with a more general type of noise.

## VII. CONCLUSIONS

We have considered the dynamics of a SEIR epidemiological model with stochastic forcing in the form of additive Gaussian noise. We have presented two methods of model reduction, whereby the goal is to project both the noise and the dynamics onto the stochastic center manifold. The first method uses the deterministic center manifold found by neglecting the stochastic terms in the governing equations, while the second method uses a stochastic normal form coordinate transform.

Since the original system of governing equations does not have the necessary spectral structure to employ either deterministic or stochastic center manifold theory, the system of equations has been transformed using an appropriate linear transformation coupled with appropriate parameter scaling. At this stage, the first method of model reduction can be performed by computing the deterministic center manifold equation. Substitution of this equation into the complete stochastic system of transformed equations leads to a reduced system of stochastic evolution equations.

The solutions of the complete stochastic system of transformed equations as well as the reduced system of equations were computed numerically. We have shown that the individual time series does not agree because the noise has not been correctly projected onto the stochastic center manifold. However, by comparing histograms of the probability density $pSI$ of the $S$ and $I$ values, we saw that there was a very good agreement. This is caused by the fact that although the two solutions are out of phase with one another, their range of amplitude values is similar. The phase difference is not represented in the two histograms. This is a real drawback when trying to predict the timing of outbreaks and leads to potential problems when considering epidemic control, such as the enhancement of disease extinction through random vaccine control.^{35}

To accurately project the noise onto the manifold, we derived a stochastic normal form coordinate transform for the complete stochastic system of transformed equations. The numerical solution to this reduced system was compared with the solution to the original system, and we showed that there was an excellent agreement both qualitatively and quantitatively. As with the first method, the histograms of the probability density $pSI$ of the $S$ and $I$ values agree very well.

It should be noted that the use of these two reduction methods is not constrained to problems in epidemiology, but rather may be used for many types of physical problems. For some generic systems, such as the singularly perturbed, damped Duffing oscillator, either reduction method can be used since the terms in the normal form coordinate transform which lead to the average stochastic center manifold being different from the deterministic center manifold occur at very high order.^{17} In other words, the average stochastic center manifold and deterministic center manifold are virtually identical. For the SEIR model considered in this article, there are terms at low order in the normal form transform, which cause a significant difference between the average stochastic center manifold and the deterministic manifold. Therefore, as we have demonstrated, when working with the SEIR model, one must use the normal form coordinate transform method to correctly project the noise onto the center manifold.

In summary, we have presented a new method of stochastic model reduction that allows for impressive improvement in time series prediction. The reduced model captures both the amplitude and phase accurately for a temporal scale that is many orders of magnitude longer than the typical relaxation time. Since sufficient statistics of disease data are limited due to short time series collection, the results presented here provide a potential method to properly model real, stochastic disease data in the time domain. Such long-term accuracy of the reduced model will allow for the application of effective control of a disease where phase differences between outbreak times and vaccine controls are important. Additionally, since our method is general, it may be applied to very high-dimensional epidemic models, such as those involving adaptive networks. From a dynamical systems viewpoint, the reduction method has the potential to accurately capture new, emergent dynamics as we increase the size of the random fluctuations. This could be a means to identify new noise-induced phenomena in generic stochastic systems.

## ACKNOWLEDGMENTS

The authors benefited from the comments and suggestions of anonymous reviewers. We gratefully acknowledge the support from the Office of Naval Research and the Air Force Office of Scientific Research. E.F. is supported by the National Research Council Research Fellowship and L.B. is supported by the National Institute of General Medical Sciences (Award No. R01GM090204). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of General Medical Sciences or the National Institutes of Health.

### APPENDIX A: DETAILS OF THE ITERATIVE PROCEDURE FOR A SIMPLE EXAMPLE

We consider the system given by

Replacing $Y\u2032=\u2202Y/\u2202\tau $ with $\u2212Y+G$ [Eq. (A3b)], noting that $\u2202X/\u2202\tau =0$ [Eq. (A2b)], and ignoring the term $\u2202\eta /\u2202Y\u22c5G$ since it is a product of small corrections leads to

Equation (A5) must now be solved for $G$ and $\eta $. In order to keep the evolution equation [Eq. (A3b)] as simple as possible [principle (4) of Sec. V], we let $G=0$, which means that the coordinate transform [Eq. (A3a)] is modified by $\eta =X\u2212X3$. Therefore, the new approximation of the coordinate transform and its dynamics are given by

For the second iteration, we seek a correction to the $x$ coordinate (slow process) with the form

Replacing $X\u2032=\u2202X/\u2202\tau $ with $F$ [Eq. (A7b)], replacing $\u2202Y/\u2202\tau $ with $\u2212Y$ [Eq. (A6b)], and ignoring the term $\u2202\xi /\u2202X\u22c5F$ since it is a product of small corrections gives the equation

Equation (A9) must now be solved for $F$ and $\xi $. As in the first step, we employ principle (4) and keep the evolution equation [Eq. (A7b)] as simple as possible. However, since the terms $\mu (X\u2212X3)$ located on the right-hand side of Eq. (A9) do not contain $\tau $ or $Y$, these terms must be included in $F$. Therefore, one piece of $F$ will be $F=\mu (X\u2212X3)$.

The remaining deterministic term on the right-hand side of Eq. (A9) contains $Y$. This term can therefore be integrated into $\xi $. The equation to be solved is

whose solution is given as $\xi =\u2212\mu Y$.

To abide by principle (4), we would like to integrate the stochastic piece on the right-hand side of Eq. (A9) into $\xi $ by solving the equation

However, the solution of Eq. (A11) is given by

which has secular growth like a Wiener process. Since this would violate principle (1), we must let $F=\mu \sigma \varphi $.

Putting the three pieces together yields $\xi =\u2212\mu Y$ and $F=\mu (X\u2212X3)+\mu \sigma \varphi $. Therefore, the new approximation of the coordinate transform and its dynamics are given by

The construction of the normal form continues by seeking corrections $\xi $ and $F$ to the $x$ coordinate transform and the $X$ evolution using the updated residual of the $x$ equation [Eq. (A1a)], and by seeking corrections $\eta $ and $G$ to the $y$ coordinate transform and the $Y$ evolution equation using the updated residual of the $y$ equation [Eq. (A1b)].

### APPENDIX B: REDUCED STOCHASTIC SEIR MODEL: CORRECT PROJECTION OF THE NOISE

The specific form of $F$ and $G$ in Eqs. (29a) and (29b) is given as

### APPENDIX C: MARKOV SIMULATION FOR TRANSFORMED SEIR MODEL

The complete system in the transformed variables has the stable endemic equilibrium at the origin. To bound the dynamics to the first octant, we transform the new variables by using the original properties of $s\u22650$, $e\u22650$, and $i\u22650$, so that