Author Notes
Forecasting complex systems is important for understanding and predicting phenomena. Due to the complexity and error sensitivity inherent in these predictive models, forecasting proves challenging. This paper presents a novel approach to assimilate system observations into predictive models. The approach makes use of a recursive partitioning algorithm to facilitate the computation of local sets of model corrections as well as provide a data structure to traverse the model space. These local sets of corrections act as a sample from a piecewise stochastic process. Appending these corrections to the predictive model incorporates hidden residual dynamics, resulting in improved forecasting performance. Numerical experiments demonstrate that this approach results in improved forecasting for the Lorenz 1963 model. In addition, comparisons are made between two types of corrections: Vector Difference and Gaussian. Vector Difference corrections provide the best computational efficiency and forecasting performance. To further justify the effectiveness of this approach it is successfully applied to more complex systems such as Lorenz for various chaotic parameterizations, coupled Lorenz, and cubic Lorenz.
Predictive models of complex systems play a crucial role in understanding phenomena. These models capture the underlying dynamics of phenomena and facilitate predictions. Despite computational advancements, modeling and forecasting these systems proves challenging. By making use of data assimilation and forecasting techniques, model inadequacies are addressed and effective forecasts are made. In some cases, such techniques are insufficient. Regardless, it is advantageous to have longer forecast horizons with greater confidence. In this paper, a novel approach to forecasting complex systems is introduced through the use of stochastic forcing, where the stochastic forcing is a piecewise stochastic process that approximately maps the model state to its associated model error. Applying the forcing to the state’s evolution equation as an additional term results in improved forecasting performance.
I. INTRODUCTION
Outside of trivial systems, the perfect model scenario, where model predictions are equivalent to system outcomes, is a fiction. This is due to the inability to perfectly capture the dynamics that contribute to a system’s properties because either the dynamics are imperceptible, introducing them into the model is computationally intractable,1 or a combination of the two. In the Imperfect Model Scenario (IPMS), which is reality, models are unable to account for system dynamics in their entirety. The IPMS poses challenges for forecasting, especially when the system in question displays Sensitive Dependence on Initial Conditions (SDICs), where a small perturbation to the initial condition results in an exponential divergence between the perturbed and unperturbed trajectories. SDIC leads to non-linear scaling of model error, shortening the forecast horizon (the period of time when the model can adequately predict the future state of the system) of the predictive model. Despite the limitations posed by IPMS, predictive models provide useful forecasts, informing both policy makers and scientists, enabling decision making. Given that systems of interest can display SDIC, it is important to develop methodologies that can extend their forecast horizons. Systems such as those found in oceanography and meteorology2–8 are examples of phenomena that display SDIC and are of interest.
Previous research has made use of data assimilation to incorporate observations of the system into the predictive model, thus encoding missing dynamics into the predictive model. Duffin et al. take a Bayesian approach to the problem of determining model parameters given observations of the system.9,10 Through the statFEM method originally introduced by Girolami et al.11 Duffin et al. use Gaussian Processes12 as model parameters, which they then fit to the data for use in the finite element method, allowing for improved forecasts with uncertainty quantification. Another approach is theoretically outlined by Judd and Smith.13,14 Their approach is to take a deterministic model and append a stochastic process to account for residual dynamics. The stochastic process acts as an approximation of the map between the model state and the associated model error. By appending the stochastic process to the predictive model, model error is reduced.
This paper addresses the problem of model error in chaotic systems by appending a piecewise stochastic process to the model in line with the theory outlined by Judd and Smith.14 In the context of chaotic systems, the definition of model error by Judd et al. is utilized. Namely, a combination of projection and direction error,15 where projection error occurs when the model’s attractor is in the wrong location and direction error is when the trajectories of the model move in the wrong direction compared to the projection of reality into model space. To develop the methodology the Lorenz ’63 system16 serves as a toy model, where the standard parameter Lorenz system will serve as the predictive model and a misspecified Lorenz system will serve as the system to forecast. The methodology is then tested against varying chaotic parameterizations of Lorenz, coupled Lorenz, and cubic Lorenz systems.
Through the application and development of a recursive partitioning algorithm, the model space is partitioned and through these partitions a piecewise stochastic process is generated. The contributions provided by this paper are as follows: a recursive data structure to facilitate data assimilation, a piecewise stochastic process utilizing the recursive data structure, a comparison between the performance of two approximations of model errors, and an assessment of the robustness of this approach to varying forecasting scenarios.
The paper is organized as follows: Sec. II introduces the theory and background. Section III introduces the Lorenz system16 and the process of assembling the piecewise stochastic process using recursive partitioning. Section IV introduces both Vector Difference corrections and Gaussian corrections. Section V introduces the metric to compare the two types of model corrections, the implementation of the forecasting scheme, and an analysis of the robustness of this approach to varying forecasting scenarios. Finally, Sec. VI summarizes the findings and suggests directions for future research.
This paper shows Vector Difference corrections outperform Gaussian corrections in terms of effective forecast duration and computational efficiency, as well as demonstrating the effectiveness of appending a piecewise stochastic process using both correction types when compared to the original model. Through simulations of the model using Vector Difference corrections, it is found that increasing the resolution of partitions leads to the greatest increase in the forecast horizon, as opposed to the quantity of corrections associated with any given partition, such that additional contributions to the correction samples quickly become negligible. Finally, this method can perform effectively on more complex examples using Vector Difference corrections.
II. THEORY
Figure 1 shows the process of how corrections improve the predictive model as well as the variability in model error over model space. It is because of this variability in model error that is piecewise, allowing for local errors to inform local trajectories, ensuring that the corrections applied are relevant to the model state.
Shows the process of applying corrections from to the predictive model trajectory every time steps, resulting in the model shadowing the system.
Shows the process of applying corrections from to the predictive model trajectory every time steps, resulting in the model shadowing the system.
III. PARTITIONING
Figure 2(a) shows the difference in scale between the two realizations. By scaling to , the error resulting from the difference in scale is mitigated. Figure 2(b) shows the result of scaling to resulting in both predictive model and scaled system having the same sample variance. With projected into the model space, the next step is to partition the model space. Carroll and Byers utilized a grid-based partitioning algorithm for the purpose of characterizing invariant properties of chaotic attractors.18 In this paper, grid-based partitioning is used to assemble a piecewise stochastic process in line with Eq. (4).
Model and system realizations before and after scaling. Figure 2(a) shows the system realization prior to scaling. Figure 2(b) shows the result of the transformation of to , where .
Model and system realizations before and after scaling. Figure 2(a) shows the system realization prior to scaling. Figure 2(b) shows the result of the transformation of to , where .
It begins with a single cube that covers in the model space. This cube is then split into equally sized cubes, with being the dimension. Cubes continue to split recursively and empty cubes are discarded. The stopping criterion for the partitioning algorithm is met when a cube contains a certain number of points from . The stopping criterion is set to . Once a cube has or less points, it ceases to split. The selection of stopping criterion was determined through trial and error. Since provided effective results and high resolution partitions to address SDIC, it was selected. The question of optimal parameters for partitioning and how to obtain them is outside the scope of this paper. There are some indicators to determine if the stopping criterion is too high. For example, if the corrections stored by a partition are highly contradictory, specifically they differ greatly in direction and magnitude, then it would indicate that the partition is too big and thus too general, in which case a lower stopping criterion should be set. Although the stopping criterion can be set, the partitioning algorithm cannot be finely tuned due to computational costs and general unpredictability inherent in its operations.
Figures 3(a) and 3(b) demonstrate the algorithm for level of recursion. Figure 3(c) shows the end of the algorithm for , where denotes the cardinality. Partitioning until a partition contains a fixed number of points allows for the resolution of the partitioning algorithm to adjust to the density of points. Figure 3(c) demonstrates this through high resolution partitions occurring around regions of high density such as the separatrix. The number of partitions ( ) resulting from this algorithm is dependent on the stopping criteria, , and the properties of the time series. As a result, is only known after partitioning.
Recursive partitioning algorithm for level of recursion and the final result of the algorithm. Figure 3(a) shows contained by a box with side lengths equal to . Figure 3(b) then shows the box in Fig. 3(a) being split into eight equally sized cubes with side lengths . This process of splitting cubes into eight equally sized cubes continues. When a cube is split, sub-cubes that do not contain points are removed. This process repeats until each cube contains at least one point and at most five points, at which point they are referred to as partitions. Figure 3(c) shows the final result of the partitioning algorithm.
Recursive partitioning algorithm for level of recursion and the final result of the algorithm. Figure 3(a) shows contained by a box with side lengths equal to . Figure 3(b) then shows the box in Fig. 3(a) being split into eight equally sized cubes with side lengths . This process of splitting cubes into eight equally sized cubes continues. When a cube is split, sub-cubes that do not contain points are removed. This process repeats until each cube contains at least one point and at most five points, at which point they are referred to as partitions. Figure 3(c) shows the final result of the partitioning algorithm.
With the model space partitioned, the partitions are populated with additional scaled observations . By sorting additional points into the partitions, the sets of corrections become denser and thus contain more information about the dynamics. The sorting of these additional points is facilitated through the existing recursive structure. and is scaled by the same factor in Eq. (7).
Figure 4 demonstrates how points in are sorted into partitions. This approach operates on the fact that subsequent elements of are nearby. Initially, the partition that contains the starting point is unknown. As a result, the algorithm starts at the top of the recursive structure and moves down through the partitions that contain until it either finds a partition at the bottom or the closest candidate at the bottom. Once a partition is found, it will serve as the starting point for finding the subsequent point in . In this case, the algorithm moves up the recursive structure until it reaches a partition that contains the point, then like before it moves down the recursive structure until it either finds the partition at the bottom that contains the point or it finds the closest candidate.
, where comes before in . To find the partition containing given that contains first find the smallest partition containing both and . Find the partition that splits from . If is at the bottom of the recursive structure, then is the partition that contains , otherwise continue to the sub-partitions of .
, where comes before in . To find the partition containing given that contains first find the smallest partition containing both and . Find the partition that splits from . If is at the bottom of the recursive structure, then is the partition that contains , otherwise continue to the sub-partitions of .
IV. CORRECTIONS
Figure 5 demonstrates the application of Vector Difference corrections when forecasting. With the process repeating every time steps. Since the true correction for is unknown, it is approximated by adding to , allowing for nearby information to inform the correction; however, if data are sparse there is more uncertainty on the corrections. To account for the additional uncertainty, Gaussian corrections are introduced, which make use multivariate normals to produce model corrections.
is the current state of the stochastically forced predictive model and is contained by partition . Evolving to then applying the correction from results in . The corrections are sampled uniformly from in line with Eq. (5). Corrections are generated using Eq. (8).
The implementation of Gaussian corrections is inspired by Particle Filters;23–25 as a result, the implementation of Gaussian corrections borrows from Particle Filter implementations. There are two implementations of Gaussian corrections: one with locally defined parameters and the other with globally defined parameters.
Figure 6 shows the process of applying Gaussian corrections when forecasting. This process repeats every time steps. With the conclusion of the discussion on the types of corrections and their implementations, they are compared to assess their overall effectiveness at mitigating model error.
Given is in partition proceed to evolve to , where (10) is the prediction the model gives for the next state in the forecast. and are samples taken from distributions and , respectively. is accepted if , where and . The weighted sample mean of a random subset of the accepted samples is computed and becomes the point .
Given is in partition proceed to evolve to , where (10) is the prediction the model gives for the next state in the forecast. and are samples taken from distributions and , respectively. is accepted if , where and . The weighted sample mean of a random subset of the accepted samples is computed and becomes the point .
V. RESULTS
In the case of Gaussian corrections, , where is defined in Eq. (10). In line with Stemler and Judd, and correspond to the th initial condition for a forecast, with both system and model initialized the same.27 Figure 7 shows the Mean Separation Times as defined by Eqs. (12)–(14). For each forecasting scheme with varying , each point in Fig. 7 corresponds to for simulations with different initial conditions occurring at times using a partition structure generated with and . It is evident from Fig. 7 that has a limited effect on all forecasting schemes, excepting the Global parameter Gaussian corrections.
Figure 8 shows that sorting additional points into partitions has a negligible effect on performance. Instead, it is better to increase the resolution of the partitions. By using more data to construct partitions, the stopping criterion for the algorithm is met for smaller partitions, resulting in higher resolution partitioning. For high resolution partitions, the variance of corrections is small. As a result, including additional observations in a given partition does not meaningfully improve the effectiveness of corrections. Despite high resolution providing the most significant performance gains, the improvements afforded by increasing the partition resolution eventually plateau around , where approaches 4.4 s.
For larger Ensemble sizes, performance improved marginally for the Local parameter Gaussian corrections and Vector Difference corrections, excepting the Local Gaussian corrections that improved significantly. The Vector Difference corrections outperformed both parameterizations of Gaussian corrections, more than doubling each of their Separation Times. Due to a covariance matrix being computed for Local parameter Gaussian corrections, all partitions that contain only one point are removed.
For larger Ensemble sizes, performance improved marginally for the Local parameter Gaussian corrections and Vector Difference corrections, excepting the Local Gaussian corrections that improved significantly. The Vector Difference corrections outperformed both parameterizations of Gaussian corrections, more than doubling each of their Separation Times. Due to a covariance matrix being computed for Local parameter Gaussian corrections, all partitions that contain only one point are removed.
The Mean Separation Time for Forecasts using Vector Difference corrections, an Ensemble Size of , and varying allocations of data. The heatmap results from linear splines.
The Mean Separation Time for Forecasts using Vector Difference corrections, an Ensemble Size of , and varying allocations of data. The heatmap results from linear splines.
To further justify its effectiveness, the approach is applied to Lorenz for varying chaotic parameterizations and structural changes. A set of parameterizations are selected by visually inspecting the resulting attractors. It is important to ensure the parameterizations are chaotic since periodic parameterizations result in seemingly indefinite separation times and collapsed attractors lead to the partitioning algorithm failing due to reaching the max recursion. Visually inspecting the qualitative properties of each Lorenz system to determine whether it is chaotic is used over numerical methods like Lyapunov exponents due to the absence of computation time and the small set of parameterizations to inspect. Assessing whether a parameterization of Lorenz is chaotic can only be determined after generating realizations.28 As a result, samples of parameterizations are tested and discarded if they do not result in chaotic realizations. It is important to ensure that parameterizations are chaotic so as to avoid biasing results from the inclusion of seemingly indefinite forecasts of Lorenz systems with periodic parameterizations. The structural changes to the Lorenz system are described in the following systems of ODEs:
Equation (15) defines a cubic Lorenz system and Eqs. (16)–(18) define two coupled Lorenz systems introduced by Just et al.29
Figure 9 illustrates the Lorenz system perturbed by structural error or misparameterization. Such parameterizations and structural changes have a notable effect on the dynamics of each realization. Mean Separation Time is then computed for a set of parameterizations of each system, beginning with misparameterizations of the standard Lorenz system.
Figure 10 shows the Mean Separation Times for 10 000 simulations over parameterizations. The more complex and contrary the dynamics are with respect to the predictive model the shorter the Separation Time. The variability in Separation Times results precisely from the variance in dynamics between parameterizations with similar parameterizations, in some cases, having dissimilar Separation Times. To further generalize the performance of this approach, Separation Times are generated for the cubic Lorenz System [Eq. (15)] and the coupled Lorenz System [Eqs. (16) and (17)].
Figure 11 shows the Mean Separation Times for cubic Lorenz systems and coupled Lorenz systems. Figure 11(a) shows the Mean Separation Time in response to an increase in the contribution of the cubic component of Eq. (15). Using the Euler method of integration, chaotic attractors are observed for . Similarly, in Fig. 11(b), chaotic attractors are observed for a subset of parameterizations of and . Outside of these parameterizations, the realizations collapse to a fixed point or the numerical integration is unstable. As the contribution of the cubic component of Eq. (15) increases through , the model misspecification increases along with its complexity. Similarly, increasing in Eq. (17) and increasing the coupling through decreases the Separation Time.
Shows to scale realizations of misparameterized Lorenz [Eq. (6)], cubic Lorenz [Eq. (15)], coupled Lorenz [Eqs. (16)–(18)], and standard Lorenz [Eq. (6)].
Mean Separation Times for misparameterized Lorenz systems over 10 000 simulations. Figure 10(a) are Separation Times for , Fig. 10(b) are Separation Times for , and Fig. 10(c) are Separation Times for .
Mean Separation Times for misparameterized Lorenz systems over 10 000 simulations. Figure 10(a) are Separation Times for , Fig. 10(b) are Separation Times for , and Fig. 10(c) are Separation Times for .
Mean Separation Times for cubic Lorenz systems and coupled Lorenz systems over 10 000 simulations. Figure 11(a) are Separation Times for cubic Lorenz and Fig. 11(b) are Separation Times for coupled Lorenz.
Mean Separation Times for cubic Lorenz systems and coupled Lorenz systems over 10 000 simulations. Figure 11(a) are Separation Times for cubic Lorenz and Fig. 11(b) are Separation Times for coupled Lorenz.
VI. CONCLUSION
The approach put forth by this paper makes use of a piecewise stochastic process to account for model error in the predictive model. It is shown that effective forecast horizons are achievable using this approach and that Vector Difference corrections are superior to Gaussian corrections in both efficacy and computational efficiency. The effectiveness of recursive partitioning in generating the piecewise stochastic process is also demonstrated. Not only did it account for varying data densities, but it also did so computationally efficiently, operating with for assembling the structure and for sorting additional points, where is the number observations of the system used to build the partition structure and is the number of additional system observations sorted into the partitions. Considering Fig. 8, sorting additional points into the partitions is unnecessary; therefore, the methodology can be streamlined by its omission. In addition, the design of the data structure facilitates efficient serialization,30,31 allowing for predictive models to be efficiently stored and accessed, therefore aiding in the practicality of the method. This paper makes use of noiseless data in IPMS. In reality, data are rarely, if ever, noiseless. As a result, there are research opportunities in applying this method to chaotic systems using noisy data. In addition to testing the method using noisy data, there are opportunities to investigate different approximations of from Eq. (3). Currently, the piecewise structure of is inflexible due to being unable to precisely tune the recursive partitioning algorithm. The partitioning algorithm determines the partitioning structure solely on the number of observations and the stopping criterion. These parameters do not provide fine control over the partitioning structure; instead, they determine general properties. As a result, there are research opportunities available in replacing it, possibly with a Gaussian Process, which has historically performed well as a function approximator.12 Overall, this work offers a novel approach to the problem of model error in forecasting and opens new avenues for research. To recreate results, the code is available on Github at https://github.com/marcus-dyson/RPS-Forecasting.
ACKNOWLEDGMENTS
All authors are supported by the ARC ITRH for Transforming Energy Infrastructure through Digital Engineering (TIDE), Grant No. IH200100009. This research is also supported by an Australian Government Research Training Program (RTP) Scholarship.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
M. Dyson: Conceptualization (equal); Methodology (lead); Software (lead); Visualization (lead); Writing – original draft (lead); Writing – review & editing (lead). T. Stemler: Conceptualization (equal); Methodology (supporting); Project administration (lead); Resources (lead); Supervision (lead); Writing – review & editing (supporting).
DATA AVAILABILITY
Data sharing is not applicable to this article as no new data were created or analyzed in this study.