The use of numerical simulation for prediction of characteristics of chaotic dynamical systems inherently involves unpredictable processes. In this work, we develop a model for the expected error in the simulation of ergodic, chaotic ordinary differential equation (ODE) systems, which allows for discretization and statistical effects due to unpredictability. Using this model, we then generate a framework for understanding the relationship between the sampling cost of a simulation and the expected error in the result and explore the implications of the various parameters of simulations. Finally, we generalize the framework to consider the total cost—including unsampled spin-up timesteps—of simulations and consider the implications of parallel computational environments to give a realistic model of the relationship between wall-clock time and the expected error in simulation of a chaotic ODE system.
I. INTRODUCTION
Progress in computational processing capabilities and numerical methods has led to increasing use of unsteady, scale resolving simulation methods for turbulent flows for applied research.1–3 In spite of advances, the cost of these simulations, particularly large-eddy simulation (LES), remains an obstacle to industrial use. Cost reduction remains a key research area for these systems, including more aggressive modeling approaches.1,4 On the other hand, advancements in numerical methods have demonstrated that a stable, high-order solution of turbulent systems is possible5 with ongoing developments enabling new efficiencies.6 As these advances approach maturity, a key outstanding question is how to use them to extract estimates as efficiently as possible when simulations are unsteady and chaotic and discretization methods are not significantly constrained by explicit stability limits.
The resolution of the turbulent—and therefore chaotic—nature of the underlying flows is a primary complication for estimation using these more complex simulations. For chaotic systems, estimation of long-time behavior is challenging, because the ordinary differential equations (ODEs) to which they can be reduced have limited predictability.7 Of the general class of chaotic systems, a subset is ergodic systems, whose long-term states are drawn from a stationary distribution, independent of initial conditions (ICs).8 For ergodic chaotic problems, we frequently want to quantify the unique infinite-time average of some instantaneous quantity of interest of the system
where g is the instantaneous output functional and the state is governed by a dynamical system of the form
with a given initial condition (IC), .
Often, the complexity of chaotic systems of interest is high, and accordingly, the cost of an accurate computational estimate of becomes formidable.9–12 As the scale, cost, and complexity of a computational simulation increase, efficient discretization methods become critical for accurately estimating quantities of interest.
Understanding the error in approximations of is nontrivial because statistical errors (errors due to a finite-time approximation) and discretization error (error due to a numerical approximation of solutions) are always simultaneously present. In the largest direct numerical simulation (DNS) and large eddy simulation (LES) cases, for example, it is typical to fix the period of simulation time over which an average is performed at some large number of characteristic times and validate that discretization error converges as expected, assuming negligible sampling errors.13–16 A recent work has sought to quantify the effect of statistical error more robustly, using a turbulent flow theory,17 advanced spatiotemporal statistical post-processing methods,18 statistical windowing techniques,19 or by extending the concept of Richardson extrapolation to chaotic flows using auto-regressive models and Bayesian methods.20 The latter work is notable for its use to estimate the statistical errors in the DNS of a high- turbulent channel flow.21
The objective of this paper is to investigate the behavior of statistical and discretization errors as a function of the computational cost for ergodic systems. Following a similar approach of Oliver et al.,20 we propose a simple error model for finite-time, discrete approximations of infinite-time averages on attractors. Using the Lorenz system as an example, we demonstrate that the discretization error converges as the time step size decreases. However, if we denote by “sampling time” the amount of simulation time on which the integral is taken (after discarding spin-up), we show that error does not increase exponentially with sampling time as might be expected from classical numerical analysis but rather asymptotes to a constant, time step-dependent value with respect to the sampling time. Furthermore, for a given computational cost (e.g., the number of timsteps), an optimal choice of discretization (i.e., time step) exists that minimizes the expected error in simulation, when accounting for both the effects of discretization error and sampling error. We show that this optimal choice results in a convergence rate with respect to computational cost that is bounded by the sampling convergence rate with a minor impact from the discretization order of accuracy. Finally, we consider the implications of spin-up time (i.e., unsampled time needed to arrive at the stationary distribution) and parallelism on the optimal error. We develop a method for estimating transient-related errors and then evaluate optimal choices incorporating the results.
II. PROPOSED ERROR MODEL ON THE ATTRACTOR
To approximate , we compute finite-time, discrete estimates of the outputs of interest of the true system as
where the notation here represents the quadrature approximation of the integral of a quantity between a and b. Here, we have made a discrete approximation of the state using an order-p discretization with a temporal grid with characteristic size , where an order-p discretization is one for which the discretization error behaves as
when the discretization is applied to a well-posed (non-chaotic) system. Then we sample a discrete state over a finite sampling period, Ts, starting at some initial time t0. We can define the error that is incurred as
By introducing a third value,
we can re-write the error using an identity
Here, we define the “discretization error” and “sampling error,” respectively, as
We can take an absolute value of both sides of (7), followed by a manipulation using the triangle inequality
Thus, the total error incurred by the approximation is bounded by the sum of the absolute discretization and sampling errors. Next, we define the attractor of the operator f, , as the set of long-term states toward which all trajectories converge independently of the initial condition.22 We can define the expectation for a generic function as the expectation taken over all the trajectories that can result from starting from points on the attractor, ,
For the case in question, we will be considering either
or
with, for these examples, . Given these definitions, we can now take the expectation of (10), giving
by linearity.
From here, we propose asymptotic forms for the two right-hand side terms in (12). Consider the definition of eT in (9)
Assuming that we choose t0 such that each is effectively an independent sample from the attractor's stationary distribution, then the quantity is a random variable drawn from a stationary distribution. The states of ergodic systems, in general, are not independent in time, but as long as the system has satisfactorily strong mixing properties, the central limit theorem (CLT) can be applied to finite time averages of its outputs. This is the case whenever the condition of α-mixing is met,23,24 which has been shown for the Lorenz system.25 Thus, we can write eT as
with a constant coefficient A0, where gives the normal distribution with mean μ and variance . If we take the absolute value of this random variable, the result is a halfnormal distribution
where gives a halfnormal distribution such that when . The expectation of the half-normal distribution is well defined, allowing
as Ts goes to infinity.
Now consider the use of a time-stepping method to give a discrete approximation of for each . Following the classical analysis,26 we might expect that the discretization error should take a form
This analysis is based on bounding the growth of local truncation error at each time step by the Lipschitz constant, Λ, of the underlying system, with Cp being a constant parameter that depends on the choice of method. However, Viswanath showed27 that, the global error could be modeled by a form
where could be bounded by a constant for some nonlinear but non-chaotic systems that are exponentially stable. While this result has not been extended to an ergodic system, the expected convergence onto the attracting set suggests a bound of the form
As our results in Sec. III will show, (19) is a good description of the expected discretization error.
that bounds when is small enough and Ts is large enough to satisfy the asymptotic assumptions. Here, we have used , where Ns is the number of timesteps computed for sampling; additionally, q refers to the observed discretization convergence behavior, which, in practice, may differ from p due to numerical cancelations or if the solutions of the system are insufficiently regular. Similarly, r is an observed sampling convergence rate coefficient, which we expect to be 1/2 asymptotically under the CLT.
III. EVALUATION OF THE PROPOSED ERROR MODEL ON THE LORENZ SYSTEM
In this section (Sec. III), we will fit numerical results for the Lorenz system to determine q, r, Cq, and A0 and show that this model is representative of the observed behavior. The Lorenz system is given by:28
where and . The Lorenz system is known to be chaotic for the classic Lorenz parametrization:29,, which is used everywhere in this text. For the output, we choose . We consider a set of explicit methods: forward Euler (FE, p = 1), third-order Runge–Kutta (RK3, p = 3), and fourth-order Runge–Kutta (RK4, p = 4). In all of these methods, we expect asymptotic convergence of to JT to be at least for non-chaotic systems.30 The stability limit for each of these methods is , which is above the limits of asymptotic convergence for all of the methods here; tests with implicit methods indicated similar error trends at higher cost and are, therefore, omitted.
For any given discrete instance, we will start the simulation at an initial state at t = 0 that is sampled randomly from a normal distribution
To guarantee that the initial sampling state at t0 is on the attractor (as well as further guaranteeing the independence from the other Monte Carlo instances), we evolve the state of any given Lorenz system discretization from its starting state for before proceeding to the sample; we refer to the process of evolving the solution until it is on the attractor as “spin-up.” Then, we evolve the state over the next Ts, during which we integrate and compute (3) using the same numerical integration scheme that was used for the state itself.
To approximate , we must first estimate by a reference value . is calculated using an ensemble mean of over instances of the Lorenz system. Each instance is started from a different as given in (22) and simulated using RK4 with and . The resulting is
with a 95% confidence estimate based on the ensemble mean estimator.
The computation of allows us to estimate errors . For a given , Ts pair, we then approximate using a Monte Carlo method over independent instances of the discrete system, each started from initial states drawn from (22) and spun-up to independent sampling starting points on the attractor ,
In Figs. 1–3, we compare the results of simulations with the FE, RK3, and RK4 discretizations with different values of Ns. In these figures, Ts scales with for a given Ns, so the Ts values on the x-axis will vary between lines on the plot. The fits shown are computed with truncated data in order to attempt to eliminate non-convergent data at small Ts or large ; the limits used for truncation are found in Table I. The results of the nonlinear least squares fits for , 105, and 106 are given in Table II. In the table, we observe that as the discretization error is reduced, either by increasing Ns or by pushing p higher. These figures demonstrate that (19) has explanatory value, as the errors in the discretization-dominated region collapse independently of Ts. It is also worth noting that Table II demonstrates higher-than-expected discretization error convergence rates for FE and RK4. In Fig. 4, we can examine the sampling error behavior between discretization methods for a single shared choice of Ns. Here, we can see that the sampling error effects on the left-hand side of the plot collapse independently of the discretization method. This indicates that the statistical effects are properties of the dynamical system, not artifacts of the discretization, as we might expect in the limit as . We note that Table II shows some Ns-dependence in the fit values of r; we expect that this is an artifact of the fit process, in which discretization effects and statistical effects are intermingled. As Ns or q become small, the region in which the convergent CLT behavior is the dominant effect becomes smaller, and for these cases, the value of r detected is subject to error and deviation from the expected rate r = 1/2.
Expected relative error as a function of for forward Euler discretization of the Lorenz equations. Nonlinear least squares fit based on data.
Expected relative error as a function of for forward Euler discretization of the Lorenz equations. Nonlinear least squares fit based on data.
Expected relative error as a function of for RK3 discretization of the Lorenz equations. Nonlinear least squares fit based on data.
Expected relative error as a function of for RK3 discretization of the Lorenz equations. Nonlinear least squares fit based on data.
Expected relative error as a function of for RK4 discretization of the Lorenz equations. Nonlinear least squares fit based on data.
Expected relative error as a function of for RK4 discretization of the Lorenz equations. Nonlinear least squares fit based on data.
Fit boundaries for nonlinear least squares fits.
Method . | . | . |
---|---|---|
FE | 1.0 | |
RK3 | 1.0 | |
RK4 | 1.0 |
Method . | . | . |
---|---|---|
FE | 1.0 | |
RK3 | 1.0 | |
RK4 | 1.0 |
Values of error model coefficients computed from nonlinear least squares fits to Monte Carlo study data.
. | FE . | RK3 . | RK4 . |
---|---|---|---|
(a) | |||
A0 | 2.19 | 1.74 | 1.63 |
r | 0.975 | 0.721 | 0.683 |
Cq | 4995 | 942 | 85 900 |
q | 1.65 | 2.70 | 4.83 |
(b) | |||
A0 | 1.94 | 1.50 | 1.41 |
r | 0.820 | 0.648 | 0.620 |
Cq | 1410 | 1310 | 96 100 |
q | 1.40 | 2.76 | 4.84 |
(c) | |||
A0 | 1.52 | 0.978 | 0.918 |
r | 0.693 | 0.553 | 0.538 |
Cq | 714.6 | 2740 | 165 000 |
q | 1.273 | 2.96 | 5.02 |
. | FE . | RK3 . | RK4 . |
---|---|---|---|
(a) | |||
A0 | 2.19 | 1.74 | 1.63 |
r | 0.975 | 0.721 | 0.683 |
Cq | 4995 | 942 | 85 900 |
q | 1.65 | 2.70 | 4.83 |
(b) | |||
A0 | 1.94 | 1.50 | 1.41 |
r | 0.820 | 0.648 | 0.620 |
Cq | 1410 | 1310 | 96 100 |
q | 1.40 | 2.76 | 4.84 |
(c) | |||
A0 | 1.52 | 0.978 | 0.918 |
r | 0.693 | 0.553 | 0.538 |
Cq | 714.6 | 2740 | 165 000 |
q | 1.273 | 2.96 | 5.02 |
Expected relative error as a function of for discretizations of the Lorenz equations.
Expected relative error as a function of for discretizations of the Lorenz equations.
Finally, we attempt to compare the computational costs across the various discretizations. In this case, the number of timesteps Ns is not a good proxy for fixed cost, since the computation time for a time step will vary between methods. Instead, we now fix Us, the total number of evaluations of the right-hand side f used in sampling timesteps. For the explicit schemes used in this work, we will have p right-hand side evaluations (e.g., forward Euler has p = 1 right-hand side evaluations) and, thus, . In Fig. 5, we can see the effect of changing at fixed sampling cost Us across discretizations. The error that can be achieved with the Runge–Kutta methods is lower than that of the forward Euler scheme, a factor of 4.8 improvement in the error from FE to RK4. However, the best-case improvement for going from third-order to fourth-order Runge–Kutta schemes is the only factor of about 1.4. Moreover, the results show that to achieve the lowest possible error, the optimal time step will be discretization dependent. We investigate this further in Sec. IV.
Expected percent error as a function of for discretizations of the Lorenz equations at a number of sampling residual evaluations. All fits evaluated at .
Expected percent error as a function of for discretizations of the Lorenz equations at a number of sampling residual evaluations. All fits evaluated at .
IV. OPTIMAL TIMESTEPPING ON THE ATTRACTOR
We now study the implications of the error model (20), specifically seeking to understand the convergence of the error with respect to computational effort. In this analysis, we will assume that r = 1/2.
Consider a non-dimensional form of the error model in which the error is normalized by the standard deviation of the instantaneous output σg, and the timescales and Ts are normalized by decorrelation time Td. The decorrelation time relates the amount of variance from independent draws from the distribution on the attractor and the amount of variance in the finite-time mean estimators based on the correlated output signal, given by the relation:31
In general, Td is hard to estimate accurately; this is a crux of the work of Oliver et al.20 In our formulation of the error model, we identify A0, which avoids outright estimation of Td. However, for the purpose of understanding the behavior of the error, Td is an intrinsic timescale, which can be used to normalize and Ts.
The resulting non-dimensional form of the error model is
We can also write the optimizers and optimal value of (27) in terms of the non-dimensional variables. These are given by
In terms of convergence with respect to sampling costs, the error model will scale at best as
In the limit as , the rate : the CLT limits the convergence rate. Table III gives the rates of convergence (28) for various values of q.
Convergence rates for combined error with respect to sampling timesteps implied in (28) at common high-order discretization error convergence rates.
q . | 1 . | 2 . | 3 . | 4 . | 5 . | . | . |
---|---|---|---|---|---|---|---|
1/3 | 2/5 | 3/7 | 4/9 | 5/11 | 1/2 |
q . | 1 . | 2 . | 3 . | 4 . | 5 . | . | . |
---|---|---|---|---|---|---|---|
1/3 | 2/5 | 3/7 | 4/9 | 5/11 | 1/2 |
Using the reference simulation, we can also find
where is an estimate of the standard deviation of g. Together, these allow us to estimate
With these values, we can plot the non-dimensional error model with fixed r = 1/2, which is given for in Fig. 6.
Expected non-dimensional error as a function of the non-dimensional time step for discretizations of the Lorenz equations. r = 1/2 assumed.
Expected non-dimensional error as a function of the non-dimensional time step for discretizations of the Lorenz equations. r = 1/2 assumed.
We now consider the implications of these results for increasing Ns. To focus solely on control of the discretization error, increases in Ns can be used to refine , with Ts fixed. On the other hand, to focus solely on controlling the sampling error, can be increased, holding fixed. In Fig. 7, the two approaches are compared with the optimal use of resources. In orange is the discretization error control strategy. In this approach, the simulations converge at a high-order rate in Ns toward the optimal error behavior; once the error reaches this optimum, however, it asymptotes to a constant: statistical errors limit the estimation of . On the other hand, the sampling error control approach is shown in blue. In this approach, the central limit convergence rate of 1/2 is initially achieved until the error asymptotes to a constant: discretization errors limit the estimation of . In other words, this latter case represents having increasingly many significant figures of certainty about an answer that is significantly offset from the true quantity of interest, . In the literature for large simulations, discussed in the introduction, simulations tend to be planned using either the discretization or statistical error control approach. What (20) implies and Fig. 7 demonstrates is that, in fact, there is a particular optimal scheme in which and Ts are simultaneously varied that will extract the most accurate estimate of as Ns increases.
Refinement study comparison for fixed , fixed Ts, and optimized and Ts using RK3 discretization to compute the expectation of the Lorenz system output g = u2.
Refinement study comparison for fixed , fixed Ts, and optimized and Ts using RK3 discretization to compute the expectation of the Lorenz system output g = u2.
V. INVESTIGATION OF THE GLOBAL DISCRETIZATION ERROR MODEL
In this section, we show that our simulations of chaotic, ergodic ODEs are consistent with a bounded relationship between the local and global discretization errors. Consider an estimate of the global error based on Ns timesteps
where
and is the exact solution integrated from through
In (31), we have assumed that the error from any given local state perturbation is propagated forward in time by the dynamics, before being transformed into an error in the output; this process is captured by an operator . Because the effect of local error propagates forward and not backward in time, for , and moreover, we assume that due to ergodicity when , where Td is the decorrelation time associated with the attractor. This allows us to write
Now, we assume that a constant exists such that
for all , and , where is the set of states possible by perturbation of u that remain in the basin of attraction of the attractor of f. When this is the case, we can create a bound on the magnitude of ehp
We now attempt to bound the value of for the Lorenz system by approximating the local truncation error. To make an estimate, we compute both the solution at the next time step as well as a surrogate for the true solution at each time step: and , where the former is computed with one time step of the method of interest and the latter is always computed with the highest available accuracy method, RK4, and subdividing into ten consecutive timesteps rather than one. Both and are always advanced from . This allows us to estimate locally
In Fig. 8, we characterize the convergence of local error estimates. Computations are run with Ts = 100 and fixed, varying . At each time step, the local truncation error is estimated by computing (37). The figure shows the computed and demonstrates that the expected rate of is nearly exactly achieved.
Convergence of estimated local truncation error with respect to . Fits to shown (with offset for presentation).
Convergence of estimated local truncation error with respect to . Fits to shown (with offset for presentation).
Using (36), we can estimate a bounding value for by
where cp is the leading truncation error coefficient fit in Fig. 8 and Cq and q are taken from Table II. Of course when , there will be dependence. However, as (38) requires that the discretization error has an asymptotic behavior, we will only consider in the asymptotic convergence regions given in Table I to compute . In Fig. 9, we show the values of the right-hand side quantity in (38), which allow us to make an estimate
Next, we use classical truncation error estimates26 to relate the discretization error to properties of the solution. We will assume that the local truncation error is bounded by a form
where is a local truncation constant term dependent on the numerical method and the in this context refers to the maximum value in time of the inf-norm of a vector-valued, time-dependent quantity . The derivatives of can be computed by evaluating and its derivatives using solutions from a reference RK4 solution of the Lorenz system with Ts = 1000, , and . Norms of the derivatives are shown in Fig. 10. The resulting values of that can now be derived by fitting the asymptotic behavior in Fig. 8 can be found in Table IV. The result of these estimates is that we can reliably bound the global error of a dynamical system as an accumulation of the local errors over a region of correlation.
Norm of analytic derivatives of u computed on the attractor of f. State u computed with RK4 at and Ts = 1000 after discarding .
Norm of analytic derivatives of u computed on the attractor of f. State u computed with RK4 at and Ts = 1000 after discarding .
Rate and coefficient fit for convergence of local truncation error of the discrete Lorenz system. estimated by using cp fit from Fig. 8.
p . | rate(observed) . | . | . |
---|---|---|---|
1 | 2.00 | 7.33 | |
3 | 4.02 | 156 | |
4 | 4.93 | 76.5 |
p . | rate(observed) . | . | . |
---|---|---|---|
1 | 2.00 | 7.33 | |
3 | 4.02 | 156 | |
4 | 4.93 | 76.5 |
We now want to consider how the global error behavior demonstrated here might extrapolate to more complicated systems by evaluating the spectral behavior of the Lorenz system. Using a discrete Fourier transform (DFT) with a Hann window function,32 we perform a spectral analysis on the states of the Lorenz system with sampling time Ts = 1000, , and . The resulting spectrum can be found in Fig. 11. The Lorenz system tends to have the most content in the frequencies with with a region of exponential decay in the range . On scales with , machine precision plateaus are observed and omitted here.
Fourier spectrum of . Computed with DFT using the Hann window function on data from RK4 discretization of the Lorenz system with Ts = 1000, , and . Gray dashed line: fit assuming with a = 0.872 and b = 2.58.
Fourier spectrum of . Computed with DFT using the Hann window function on data from RK4 discretization of the Lorenz system with Ts = 1000, , and . Gray dashed line: fit assuming with a = 0.872 and b = 2.58.
The fact that the Lorenz spectrum is an exponentially decreasing function of frequency f makes the use of high-order methods theoretically appealing for the spectral convergence of hp-refinement strategies.33 Unfortunately, the effect of statistical error in (28) limits the impact of this exponential decay, such that the benefits of higher-order discretization methods are limited compared to their steady-state and non-chaotic application. The convergence to the central limit rates can be seen in Fig. 12, which shows the convergence of (28) with the total sampling cost. The effect of increasing order improves the convergence rate in (28) toward the CLT-implied asymptotic rate of , as well as decreasing the value of the leading constant and the error never achieves the spectral rates possible with hp-refinement in the steady case. Nevertheless, the cost to achieve a given amount of error in expectation—in terms of function evaluations—is significantly less with higher-order methods. Managing to achieve 1% non-dimensional error in expectation is possible with RK4 at a cost ten times less than would be possible using FE; this factor grows larger than 100 when the tolerance is tightened to .
Convergence of optimal error with sampling costs for FE, RK3, and RK4 discretizations of the Lorenz output g = u2. Asymptotic rate implied by the central limit theorem is shown.
Convergence of optimal error with sampling costs for FE, RK3, and RK4 discretizations of the Lorenz output g = u2. Asymptotic rate implied by the central limit theorem is shown.
VI. IMPACT OF ENSEMBLE AVERAGING AND SPIN-UP
In this section, we will consider how the error behaves when ensemble averaging (over multiple parallel instances) and when spin-up effects are present.
A. Ensemble averaging on the attractor
Sampling error can be reduced at a fixed wall clock time by ensemble averaging across multiple parallel processes.34 Consider a Monte Carlo approach to approximate with a set of independent realizations
We can write a modified version of (20) to approximate the error that we expect in the Monte Carlo estimator in (41)
with an equivalent non-dimensional version, assuming
and an optimum given by
at
and
Equation (44) shows that, for finite values of q, the Monte Carlo method will have a mitigated return compared to its purely stochastic application as in Ref. 34; the optimal error scales as as opposed to . However, parallelization can achieve perfect scaling in the expected error, in the sense that the effect of running ensembles with Ns sampling timesteps each will have an equivalent error in expectation to simulate timesteps in series. As is varied on the set of optimal solutions, (45) and (46) indicate that the time step and sampling time should be adjusted with the same factor to achieve perfect scaling.
B. Spin-up transient modeling
So far, we have considered the error and cost on the attractor, neglecting the impact of spin-up from t = 0 to t = t0. This spin-up is necessary because simulations of ergodic systems invariably need some time for the state to proceed onto the attractor from the initial condition.
Consider , a solution of the ergodic chaotic system f from an arbitrary initial condition in the basin of attraction of an attractor, . The existence of the attractor implies the nonlinear stability of the system, such that all will converge to trajectories on the attractor . Denote by a trajectory that is on the attractor for all t and to which collapses as . The perturbation that describes the IC, therefore, exists in a stable subspace of perturbations to and can be associated with the negative Lyapunov exponents of the system. Thus, we can assume that such perturbations are governed asymptotically by
with being a characteristic time associated with the stable Lyapunov modes. In practice, we are interested in averages of quantities on the attractor , but we can only calculate quantities that will include some effect—if small—of the spin-up transient.
Next, we seek to quantify the effect of this gap on estimates . Consider the computation of JT. In (7), we have effectively found an estimate of
by choosing t0 sufficiently large. We now want to consider an error model of the form
where a new error is introduced, associated with the spin-up transient. The model for eT in (9) will apply without modification, while the model for ehp will be subject to slightly different assumptions, where in (8), Cq was bounded by the value on the attractor, , here we must assume that Cq is bounded from t = 0 to , including both the attractor and the transient part of the trajectory. We only require that the transient part be in the basin of attraction of . We assume that a model of the form used in (8) applies in expectation when the transient component is included.
Next, we concentrate on ,
We now assume that, like u, g will decay exponentially in t as (47), such that
will apply for , with being a constant that can be related to the deviation between and .
From this assumption,
Taking the absolute value, we can find a bounding model
As before, manipulation of (49) allows
Now, we take an expectation of the absolute value of ,
where gives the expectation on the basin of attraction of . Here, the expectation of does not reduce to an expectation on the attractor. The statistical term is handled on the attractor as before, and we have assumed that the discretization error is bounded by the same form in expectation on as on . Finally, the expectation of is taken on the set of initial conditions used. This allows us to take the expectation of (53) to complete (56). Because we anticipate will be bounded by a constant for a given system, this is given by
If a and can be identified by observation of given an initial condition is no longer stochastic and the as in (53).
Putting all the pieces together, we can now give an error model that incorporates the effects of spin-up and ensemble estimation
where can be either estimated on an instance-by-instance basis or by estimating the expectation on the family of initial conditions. Under this model, will scale with the exponent of a large negative value when . Even when , (53) suggests that the decay-induced error term will still scale with , faster than the expected CLT rate of , and thus, it will be dominated it as . This implies two “paths” to control spin-up errors: either choosing t0 long enough to shrink the mean offset error from t0 or choosing Ts long enough so that the mean offset contribution to the simulation error is small in spite of the error at t = t0.
C. Identification of the spin-up transient model
We will now develop a method to fit the error model. In order to do so, consider observations and for tn in . We will assume that is large enough that the solution at each tn is effectively independent. If this is the case, then we can assume that each will be an independent and identically distributed (i.i.d.) draw from a bounded, stationary distribution with mean . The distributions of and , in general, are not known. In order to facilitate an estimate of the mean behavior, we will assume are i.i.d. drawn from a normal distribution with mean value . Then, we have
where the relationship between gn and is taken from (51).
In order to understand the implications of this model, we can use a set of reference RK4 simulations of the Lorenz system with timesteps sampled without spin-up over a period T = 100 from initial conditions similar to those given in (22) with a scaled-up standard deviation of 100 in all three variables to highlight the initial transient. In order to treat each of , σg, , and in (59) as unknowns, we use Hamiltonian Monte Carlo with the likelihood function implied in (59). We discard from t = 0 to t = 5 and then take 10 000 equispaced samples from t = 5 to t = 100. For prior models, we start by computing naïve estimators of the mean and standard deviation of the trace, and using the downsampled trace signal and then use
where
It should be noted that in this specification, the Bayesian fit only requires a user-supplied prior for the decay time and for the uncertainty in the standard deviation, assumptions upon which the fitting method only requires to be reasonable.
A sample fit and trace are found in Fig. 13, for which the maximum a posteriori (MAP) estimate gives and . For the Lorenz system, the initial transient onto the attractor is very rapid, almost negligible. Applying the Bayesian fit procedure to an ensemble of 1000 runs generated in the same way as Fig. 13, we can find maximum a posteriori (MAP) estimates of the variables and in the decay model. In Figs. 14 and 15, histograms of these variables are shown, which are needed to determine (53). We can see that the fit procedure identifies values
for greater than 97% of initial conditions, up to two standard deviations above the mean. Using these values as a conservative estimate for the mean offset, we can now model the effect of the transient behavior.
MAP estimate for Lorenz system transient. Collected over 1000 Lorenz trajectories with , Ts = 100, and randomized . Outliers truncated, greater than 97% of data in the pictured range.
MAP estimate for Lorenz system transient. Collected over 1000 Lorenz trajectories with , Ts = 100, and randomized . Outliers truncated, greater than 97% of data in the pictured range.
MAP estimate for Lorenz system transient. Collected over 1000 Lorenz trajectories with , Ts = 100, and randomized . Outliers truncated, greater than 97% of data in the pictured range.
MAP estimate for Lorenz system transient. Collected over 1000 Lorenz trajectories with , Ts = 100, and randomized . Outliers truncated, greater than 97% of data in the pictured range.
VII. OPTIMAL TIME-STEPPING INCLUDING SPIN-UP
Now we can consider how the cost and error impact of spin-up is incorporated into the model for error at a fixed cost. The spin-up time requires the use of N0 timesteps
with N being the total number of timesteps used, given by
where is the number of timesteps during sampling for t0 to on a given instance.
By normalizing (58) then substituting (63), we arrive at a transient-inclusive non-dimensional model for the error
Using this result, we can solve numerically for and via (64).
Consider a Lorenz simulation on which a budget of right-hand side evaluations are available on each of parallel processors. We start by studying the error under (64) as and t0 vary with a conservative estimate for the transient behavior using the bounding values in (61). In Fig. 16, we show for forward Euler at a fixed cost of . (The optimum is denoted by a red star.) Moving to the right, discretization error becomes the dominant factor as . The diagonal boundary gives the region of feasibility at which, under the cost constraint, sampling no longer occurs (Ts = 0). Moving from the optimum toward the bottom left, , and ; thus the transient error and sampling error become dominant. Similar plots for RK3 and RK4 are found in Figs. 17 and 18. The optimal errors and optimizing simulations are described in Table V. We can see from these results that, at a fixed budget with , the effect of increasing the discretization order makes a smaller error possible with a larger time step, which means fewer timesteps to traverse the spin-up time. These two effects combine to allow for an increase in the sampling time available Ts, allowing significantly less sampling error for RK3 compared to FE, and an additional—albeit smaller—benefit moving from RK3 to RK4, holding cost fixed.
Dependence of normalized error expectation on normalized time step and normalized spin-up time with total cost set at for forward Euler. Red star denotes optimum, and dashed line indicates optimal t0 given .
Dependence of normalized error expectation on normalized time step and normalized spin-up time with total cost set at for forward Euler. Red star denotes optimum, and dashed line indicates optimal t0 given .
Dependence of normalized error expectation on normalized time step and normalized spin-up time with total cost set at for third-order Runge–Kutta. Red star denotes optimum, and dashed line indicates optimal t0 given .
Dependence of normalized error expectation on normalized time step and normalized spin-up time with total cost set at for third-order Runge–Kutta. Red star denotes optimum, and dashed line indicates optimal t0 given .
Dependence of normalized error expectation on normalized time step and normalized spin-up time with total cost set at for fourth-order Runge–Kutta. Red star denotes optimum, and dashed line indicates optimal t0 given .
Dependence of normalized error expectation on normalized time step and normalized spin-up time with total cost set at for fourth-order Runge–Kutta. Red star denotes optimum, and dashed line indicates optimal t0 given .
Optimal Lorenz simulations for output g = u2 under budget of right-hand side evaluations using .
Method . | p . | . | . | t0 . | Ts . |
---|---|---|---|---|---|
FE | 1 | 0.0502 | 30.2 | 275 | |
RK3 | 3 | 0.0130 | 35.5 | 3370 | |
RK4 | 4 | 0.0224 | 36.7 | 6670 |
Method . | p . | . | . | t0 . | Ts . |
---|---|---|---|---|---|
FE | 1 | 0.0502 | 30.2 | 275 | |
RK3 | 3 | 0.0130 | 35.5 | 3370 | |
RK4 | 4 | 0.0224 | 36.7 | 6670 |
In Fig. 19, we take another perspective on these results for RK3 by varying and plotting the optimal t0, Ts, and . As gets large, the optimal choice of t0 has logarithmic growth, and when , the optimal choice of t0 rapidly falls to zero. Parallelization has a small but non-zero effect on the optimal choice of sample time. The sampling time also has a small effect from parallelization, in this case constrained to a small region. Outside that region, Ts scales with both as and as .
Dependence of normalized spin-up time , sampling time , and model error on normalized time step with total cost set at for third-order Runge–Kutta.
Dependence of normalized spin-up time , sampling time , and model error on normalized time step with total cost set at for third-order Runge–Kutta.
The bottom plot of Fig. 19 shows the variation of error with . In this plot, we can see three distinct regions. For , discretization error is the dominating error, and the convergence goes with the discretization error rate. Approaching the optimum, sampling error becomes the dominant error contribution, starting at until . In this region, the convergence is around the CLT-implied 1/2 rate, and the effect of parallelization is clearly seen. For , however, the spin-up error becomes the dominant error contribution. The optimal choice of t0 begins to fall rapidly, as the sampling and spin-up must compete for computational resources under the budget. Once the spin-up error dominates, the paradigm by which (53) is controlled shifts from the term to the term as , since resolving Ts delivers both spin-up and sampling error control.
This interdependence will evidently have an effect on the overall scaling between cost and error, which we now seek to understand. Here, we study the variation of with U under the optimal choices and evaluate how well approximates experimental data for . In Fig. 20, the variation of computed via (64) as a function of and U is shown. From this figure, we can see that, in the limit of small error, the sampling costs dominate and the best possible rate is given by the estimate in (44), limited by the CLT. On the other hand, when the cost is more moderate, scaling of the error is close to the discretization error convergence rate in (19). In this region, the spin-up costs are significant, and high-order discretization brings the state more efficiently to the start of sampling. In the spin-up dominated region, the effect of the parallel ensemble approach is minimal since spin-up must be overcome on each processor.
Optimal non-dimensional error under model as a function of total cost U for RK3. Theory totem on the left-hand side: discrete convergence rate, ; on right-hand side: rate from (44).
Optimal non-dimensional error under model as a function of total cost U for RK3. Theory totem on the left-hand side: discrete convergence rate, ; on right-hand side: rate from (44).
Now, we validate the total error model for the Lorenz system by a final numerical experiment. At each choice of and U, we generate 1000 individual realizations of at the computed and and using the model fit given in Table II. In Figs. 21–23, we show the predictions and the results of Monte Carlo estimates of for our three discretizations. These results validate the model with significant discrepancies only when the asymptotic assumptions— small and Ts large—do not hold, due to budget limitations in the limit of small U.
Total cost model and Monte Carlo validation as a function of total cost U for FE.
Total cost model and Monte Carlo validation as a function of total cost U for FE.
Total cost model and Monte Carlo validation as a function of total cost U for RK3.
Total cost model and Monte Carlo validation as a function of total cost U for RK3.
Total cost model and Monte Carlo validation as a function of total cost U for RK4.
Total cost model and Monte Carlo validation as a function of total cost U for RK4.
VIII. CONCLUSIONS AND FORTHCOMING WORK
In this manuscript, we have developed a theoretical framework for the total error incurred by the discrete sampling of mean outputs of ergodic ODEs. These findings are validated by Monte Carlo studies of the Lorenz system using Runge–Kutta methods. We incorporate effects of parallelization and spin-up and validate that the models match observed results in experiments. Using these models, we are able to develop a comprehensive understanding of the relationship between the wall-clock cost of a simulation and the amount of error in expectation that it might achieve.
The first result of these studies is to demonstrate the central limit theorem that gives a lower limit of on the convergence rate of error with respect to sampling cost in a simulation. The effect of discretization error is to penalize the observed convergence rate away from this limiting convergence rate, causing a higher convergence rate than the rate for the central limit theorem. In order to minimize the error that might be expected in a given simulation, there exists a specific choice of that achieves the minimum in expectation given a number of timesteps Ns available for computation.
In addition to this finding, we show that ensemble simulations reduce the sampling error, albeit with a similar penalty for discrete simulations. Moreover, the effect of spin-up costs—assuming optimal choices—is to increase the error at a given amount of the total simulation cost, which cannot be reduced by parallelization. The effect of spin-up, however, is negligible in the limit of infinite costs, but the results suggest it should be considered in cost-constrained contexts.
A key problem with the applicability of this research presented in this paper is the expense of identifying the parameters of the error model. In order to overcome this, we believe that leveraging a Bayesian approach as in Oliver et al.20 can allow us to approximate the model in (20) at relatively small cost, and then exploit the result to conduct a high-fidelity simulation at (approximately) optimal discretizations. Furthermore, the framework must be extended to handle chaotic partial differential equation systems as opposed to ordinary differential equation systems. Though many discrete partial differential equation (PDE) systems are discretized in a form that reduces to an ODE system, a rigorous model for the error and cost of a PDE system should account for the contributions of both temporal discretization and spatial discretization. These will be the primary concerns of our forthcoming work. Finally, for the most expensive systems, it is likely that cost constraints will make simulations that reside comfortably inside the asymptotic convergence region prohibitively expensive. Further investigation of the convergence behavior as leaves the asymptotic region or at the convergence limit is left to future research.
ACKNOWLEDGMENTS
The authors would like to acknowledge the support of The Boeing Company (technical monitor Dr. Andrew Cary).
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Cory V. Frontin: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Software (equal); Writing – original draft (equal). David L. Darmofal: Conceptualization (equal); Formal analysis (equal); Funding acquisition (equal); Project administration (equal); Resources (equal); Software (equal); Supervision (equal); Writing – review & editing (equal).
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.