Regime switching is ubiquitous in many complex dynamical systems with multiscale features, chaotic behavior, and extreme events. In this paper, a causation entropy boosting (CEBoosting) strategy is developed to facilitate the detection of regime switching and the discovery of the dynamics associated with the new regime via online model identification. The causation entropy, which can be efficiently calculated, provides a logic value of each candidate function in a pre-determined library. The reversal of one or a few such causation entropy indicators associated with the model calibrated for the current regime implies the detection of regime switching. Despite the short length of each batch formed by the sequential data, the accumulated value of causation entropy corresponding to a sequence of data batches leads to a robust indicator. With the detected rectification of the model structure, the subsequent parameter estimation becomes a quadratic optimization problem, which is solved using closed analytic formulas. Using the Lorenz 96 model, it is shown that the causation entropy indicator can be efficiently calculated, and the method applies to moderately large dimensional systems. The CEBoosting algorithm is also adaptive to the situation with partial observations. It is shown via a stochastic parameterized model that the CEBoosting strategy can be combined with data assimilation to identify regime switching triggered by the unobserved latent processes. In addition, the CEBoosting method is applied to a nonlinear paradigm model for topographic mean flow interaction, demonstrating the online detection of regime switching in the presence of strong intermittency and extreme events.

Online nonlinear system identification with sequential data has recently become important in many applications, e.g., extreme weather events, climate change, and autonomous systems. In this work, we developed a causation entropy boosting (CEBoosting) framework for online nonlinear system identification. For each sequential data batch, this framework calculates the causation entropy that evaluates the contribution of each function in a large set of candidate functions to the system dynamics. The causation entropies based on multiple data batches are then aggregated to identify a few candidate functions that have significant impacts on the system dynamics. With the identified sparse set of functions, the framework further fits a model of the system dynamics. We tested the proposed framework for complex systems with features including chaotic behavior, high dimensionality, intermittency and extreme events, and partial observations. The results show that the CEBoosting method can capture the regime switching and then fit models of system dynamics for various types of complex dynamical based on a limited amount of sequential data.

Regime switching is ubiquitous in many complex dynamical systems in geoscience, engineering, neural science, and material science.1–7 The switching is usually associated with sudden changes in internal states or appears when specific external forcing is exerted. Dynamical systems often display distinct behavior with regime switching. One example is the atmospheric jets, which meander in different directions when the atmosphere alternates between blocked and unblocked regimes.8,9 Similarly, an excitable medium is susceptible to finite perturbations, which triggers regime switching from a quiescent state to one with various wave patterns.10–12 Regime switching can also induce an increased occurrence of extreme events, leading to, for example, extreme weather and climate patterns,13,14 bursting neurons,15 or extreme ductile damages.16 Detecting regime switching and the corresponding underlying dynamics, which rely on appropriate model identification methods, have significant social and scientific impacts. Challenges in detecting regime switching are associated with the intrinsic properties of many complex dynamical systems, including high dimensionality, partial or incomplete observations, and the intermittent occurrence of rare and extreme events.17–21 

Efficient model identification has received significant attention. Both physical knowledge and observational data facilitate learning the underlying model dynamics. The model structures are often established utilizing physical intuitions for traditional knowledge-based model identification. The primary process then becomes the estimation of model parameters. Linear models are natural candidates for simple problems22,23 and can potentially be skillful for short-term forecasts. Other families of models with pre-determined structures, such as the physics-constrained nonlinear regression models24,25 and conditional Gaussian nonlinear systems,26,27 are alternative nonlinear models aiming to capture specific underlying dynamical features. On the other hand, recent progress has been made in data-driven model identification. Data-driven reduced-order models have been widely used in scientific and engineering applications.28–31 Sparse model discovery methods also appear as advanced model identification tools that allow automatic learning of the model structure and parameters from data and lead to nonlinear models with parsimonious structures via sparse regression.32–39 Recently, an information-flow-based approach for system identification has gained popularity, which utilizes causation entropy to identify the sparse model structure of an unknown dynamical system.40 An iterative approach called entropic regression which is robust with noise and outliers has further been developed and applied to the area of cognitive neuroscience to improve the accuracy of recovery of the structure of structural and functional connectivity between anatomical regions.41–43 With a limited amount of indirect data, derivative-free optimization methods44–46 have been explored as model identification tools. In addition, non-parametric and machine learning models have been built to characterize complex dynamical systems.47–52 

Among various model discovery approaches, online model identification is a particularly useful method in practice, which sequentially determines model structure and estimates model parameters when new observation arrives.53–58 It is the primary model identification strategy in many geophysical and engineering problems, where the limited amount of historical data is insufficient to robustly discover the underlying dynamics. It should be noted that online model identification can be further combined with data assimilation to handle noisy observations or recover the unobserved state variables in the situation with partial observations.59–61 However, unlike the online parameter estimation that can be efficiently addressed by standard filtering methods,62 the lack of knowledge about the proper model structure poses a unique challenge in online model identification. The sequentially arriving data play a vital role in progressively rectifying the model and reducing the uncertainty in the identified system. Although existing system identification methods can identify the proper model structure via promoting sparsity in the offline setting by fitting a model to abundant data, promoting sparsity relies on the model fitting may eliminate some important model structures in the context of sequential learning. To address such a challenge, we incorporate causation entropy to achieve robust online model identification. As regime switching often occurs and completes within a short transient period, developing suitable online identification methods for discovering regime switching exploiting transition data is essential with practical importance.

In this paper, a causation entropy boosting (CEBoosting) strategy is developed. It is incorporated into an online model identification method to detect regime switching and discover the nonlinear dynamics associated with the new regime. Different from many existing sparse model identification algorithms, such as those relying on LASSO (least absolute shrinkage and selection operator) regression46,63 or thresholding,32,35 the method developed here separates the estimation of model parameters from the recurrent identification of nonlinear model structure. Such a separation allows using closed analytic formulas for the entire online learning algorithm and, therefore, the overall computational cost is significantly reduced. In this new strategy, causation entropy64,65 is utilized to provide a logic value (i.e., true or false) of each candidate function in a pre-determined library throughout the online learning process. By examining the causation entropy on the newly arrived data, the reversal of one or a few such causation entropy indicators associated with the model calibrated for the current regime implies the detection of regime switching. In other words, the causation entropy indicator, which can be efficiently calculated, is employed to decide if the existing terms in the current model need to be rectified and if the system demands additional terms as a response to regime switching. Note that the sequential data in online learning are collected within a short time window to form a batch of time series, which is utilized to compute the causation entropy. As each batch contains a short amount of data, it may embody only part of the dynamical properties. Nevertheless, as time evolves, the accumulated value of causation entropy corresponding to a sequence of batches leads to a robust indicator of the model structure in response to regime switching. The concept of accumulating causation entropy calculated from sequential data relates to the statistical method of bagging.66 With the detected rectification of the model structure, the subsequent parameter estimation becomes a quadratic optimization problem, which is solved using closed analytic formulas. For multiple times of regime switching, a summation of residual models is calibrated, which relates to the statistical method of boosting.67–70 

The proposed new strategy has several unique features. First, causation entropy takes into account the interdependence between all the candidate functions in the pre-determined library and, therefore, it can eliminate the superficial causal relationship. Model identification exploiting the causation entropy has been shown to reach a higher selection accuracy than LASSO regression or elastic net.65 The causation-based learning approach also indicates robust results in the presence of indirect coupling between features and stochastic noise,71 which are crucial features of complex systems. Second, causation entropy is only utilized to indicate the terms that need to be added or removed from the existing model. In other words, although computing the exact value of the causation entropy is challenging, closed analytic formulas are available for efficiently approximating this causation entropy indicator, which allows an effective detection of the model structure. Third, the parameter estimation only needs to be carried out after the model structure is entirely determined. Therefore, the overall computational cost is reduced compared with applying LASSO regression, which requires detecting the model structure and estimating model parameters simultaneously for each batch of data. Lastly, compared to the pioneering works of using causation entropy for system identification (e.g., Refs. 40, 64, and 72), this work focuses on dynamical systems with regime switching, and the proposed CEBoosting method utilizes online data to (i) identify the regime switching and (ii) estimate the sparse model structure to calibrate the unknown model parameters for the new regime. Therefore, the key contribution of this work is to demonstrate the use of causation entropy for a robust online sparse system identification with the sequential and limited amount of data, by leveraging statistical concepts of bagging and boosting. It is worth highlighting that the causation entropy indicator is easy to calculate and applicable to moderately large dimensional systems. The method developed here is also adaptive to the situation with partial observations, where utilizing data assimilation to recover the unobserved state variables can be incorporated into the learning process for identifying regime switching resulting from the latent processes. In addition, the method is not limited to the Gaussian data. It can be applied to dynamical regimes with strong intermittency and extreme events. Applications to nonlinear dynamical systems with moderately large dimensions, partial observations, and extreme events are all studied in the paper.

The remainder of the paper is organized as follows. The development of the online identification method utilizing the CEboosting strategy is presented in Sec. II. Section III includes four test cases. In addition to a standard chaotic model as a proof-of-concept, the other three test cases emphasize the method applying to systems with moderately large dimensions, partial observations, and extreme events, respectively. The paper is concluded in Sec. IV.

The online sparse identification method aims to (i) detect regime switching of dynamical systems via causation entropy and (ii) determine the resulting dynamics after the regime switching. The dynamical system has the following general form:
(2.1)
where x ( t ) = [ x 1 ( t ) , x 2 ( t ) , , x p ( t ) ] R p is the multi-dimensional state variable and x ˙ ( t ) R p is the associated temporal derivative, f : R p R p is a vector-valued nonlinear function (i.e., vector field) of the state variable, W ˙ ( t ) R q is a white noise vector, and σ R p × q is a matrix of noise magnitudes. In the absence of random noise forcing, σ becomes a zero matrix. Assume that the regime switching occurs at t = t s, i.e., the vector field f of the original dynamical system changes to f . The goal of this work is to detect such a regime switching and to identify the new model after the regime switching.

As regime switching often results from a sudden change of a small number of the model parameters or specific components of the model structure, the residual model δ f = f f typically has a sparse structure that can be calibrated using relatively short data, which is precisely the case of the online identification problems. Therefore, instead of learning the entire model associated with the new regime, the focus is to estimate the residual part δ f = f f. Once the residual part δ f is identified, it is then added to the existing model that provides the new system as a response to the regime switching.

The limited amount of data is assumed to arrive sequentially in the form of batches. The kth batch represents data x ( t ) [or a subset of the vector x ( t ) in the partial observation case] for t [ t B k , t B k + 1 ). It should be noted that, as t s is typically unknown in practice, the identification algorithm usually does not start from t s (namely, the left point of the first interval t B 1 t s). Yet, for the simplicity of presentation, t B 1 is chosen to be t s for the numerical examples in this work. This will not affect the identification algorithm as applying the algorithm to those batches prior to t s will not indicate regime switching. But this setup facilitates counting for the length of the data that is needed to detect the regime switching once it occurs at t s.

Denote by Φ = [ ϕ 1 , ϕ 2 , , ϕ N ] a vector containing all candidate basis functions, which are knowledge-based and are pre-determined. Each ϕ n in Φ is a scalar-valued function ϕ n := ϕ n ( x ) that gives a map R p R. The representation of δ f = f f is approximated by a linear combination of these basis functions:
(2.2)
where δ f i is the ith scalar component of δ f. A sparse representation of (2.2) means that most of the coefficients ξ i n are zeros in the identified model. To obtain such a sparse representation of (2.2), a CEBoosting method is developed to effectively determine which basis functions should take non-zero coefficients.
Causation entropy is based on the general concept of conditional mutual information.73 Other popular information measures that also build on conditional mutual information include directed information74 and transfer entropy,75 which has found applications in many areas, e.g., turbulence modeling76,77 and neurosciences,78 with a comprehensive review in Ref. 79. The idea of utilizing the causation entropy to detect the influence between different variables has been developed in Refs. 40–43 and studied in Refs. 64, 65, 72, 80, 81, and 82. It can be naturally applied to the context of system identification. As the white noise does not explicitly contribute to the causal relationship, the calculation of the causation entropy mainly focuses on the candidate functions that consist of the deterministic part of the dynamics, namely, the functions f in (2.1). To this end, consider the deterministic part of (2.1),
(2.3)
The causation entropy C ϕ n x ˙ i | [ Φ ϕ n ] is utilized to quantify the contribution from the candidate function ϕ n to the dynamics x ˙ i (i.e., the time derivative of the ith state variable: x i) conditioned on the remaining candidate functions Φ ϕ n, namely, all the candidate functions except ϕ n. This causation entropy reflects the causal influence of ϕ n to the dynamics x ˙ i, and we enforce ξ i n = 0 if the causation entropy is small. Repeating this procedure over all n = 1 , , N and i = 1 , , p to form the matrix Ξ. As only a few candidate functions will have the actual causal influence on the dynamics, the matrix Ξ is expected to have a sparse structure. The causation entropy C ϕ n x ˙ i | [ Φ ϕ n ] is defined as follows:
(2.4)
where H ( | ) is the conditional entropy, which is defined as
(2.5)
where p ( u , v ) is the corresponding probability density function (PDF) that can be determined by a histogram from the time series assuming ergodicity. On the right-hand side of (2.4), the difference between the two conditional entropies indicates the information in x ˙ i contributed by the specific function ϕ n given the contributions from all the other functions in the library Φ. Thus, it tells if ϕ n provides additional information to x ˙ i. It is worth highlighting that the causation entropy in (2.4) is fundamentally different from directly computing the correlation between x ˙ i and ϕ n, as the causation entropy also considers the influence of the other library functions. If both x ˙ i and ϕ n are caused by a common factor ϕ m, then x ˙ i and ϕ n can be highly correlated. Yet, in such a case, the causation entropy C ϕ n x ˙ i [ Φ ϕ n ] will be zero as ϕ n is not the causation of x ˙ i.
In practice, the conditional entropy in (2.5) can involve expensive high-dimensional integrals, which is computationally challenging.83 Nevertheless, a Gaussian approximation of the PDFs inside the integrand can be utilized to calculate the causation entropy.72 By approximating all the joint and marginal distributions as Gaussians, the causation entropy is calculated as follows:
(2.6)
where R denotes the covariance matrix of the corresponding vector, e.g., R U V W corresponds to the covariance matrix of the vector [ U , V , W ] . It should be noted that Eq. (2.6) employs general symbols of U, V, and W to indicate the calculation and the approximation of causation entropy. In terms of the connection to the system defined in Eq. (2.3), U corresponds to x ˙ i, W represents ϕ n (i.e., the nth basis function), and V is [ Φ ϕ n ] (i.e., the basis functions excluding ϕ n). The dynamics x ˙ i can either be measured directly or obtained by numerical evaluating the temporal derivative of x ( t ). In this paper, the latter method is adopted, and fine temporal resolution data have been used to avoid the numerical discretization error caused by coarse temporal resolution. It should be noted that the numerical approximation of the temporal derivative is a non-trivial task. A common practice is to denoise the data before approximating the temporal derivative as suggested by Ref. 32. If the actual frequency of the noises is much higher than the one of the signal, it is possible to have a robust numerical approximation with sub-sampled data. The calculation of causation entropy is robust to the large but temporally uncorrelated noises, which is demonstrated in Sec. III C. The explicit expression in (2.6) based on the Gaussian approximation can efficiently compute the causation entropy. It allows the computation of the causation entropy with a moderately large dimension, which is typically the case for many practical situations. For a p dimensional system and N basis functions, we need to calculate p × N causation entropy to formalize the final causation entropy matrix. The time complexity of causation entropy computing can be reduced by parallel computing and localized basis functions. Since the computation of the causation entropy for the dynamics of each state variable is independent of each other, we can calculate the causation entropy for different dynamics in parallel. The localized basis is a technique of dictionary design, which constructs basis functions for each state using itself and its adjacent states. For example, each state variable x i will only expanded with ϕ ( x i , x i ± 1 , x i ± s ) with s be a small number. For dynamical systems whose state variables only interact with the nearby ones, the technique of localized basis functions could be utilized to decrease N and reduce the time complexity of the causation entropy computation. It is worth noting that the Gaussian approximation may lead to certain errors in computing the causation entropy if the actual distribution is highly non-Gaussian. Nevertheless, the primary goal is not to obtain the exact value of the causation entropy. Instead, it suffices to detect if the causation entropy C ϕ n x ˙ i [ Φ ϕ n ] is nonzero (or practically above a small threshold value). In most applications, if a significant causal relationship is detected in the higher-order moments, it is very likely in the Gaussian approximation. This allows us to efficiently determine the sparse model structure. The exact values of the nonzero coefficients on the right-hand side of the identified model will be calculated via a simple least square estimation to be discussed in the following. Note that the Gaussian approximation is taken directly from the statistics associated with the nonlinear time series from the underlying nonlinear model. Therefore, the Gaussian approximation still includes the nonlinear dynamical information. It is very different from linearizing a nonlinear complex system and computing the resulting Gaussian distribution. Such a Gaussian approximation of the nonlinear time series has been widely applied to compute various information measurements and lead to reasonably accurate results.17,84–86 Note that, with linear and Gaussian assumptions, causation entropy has been demonstrated as equivalent to Granger causality,87 which has been a popular tool for analyzing time series data for decades.88 But, the focus here is more toward identifying the nonlinear models.
Below, for notation conciseness, C i n is utilized as a short-hand notation of C ϕ n x ˙ i | [ Φ ϕ n ]. To impose sparsity into Φ in the practical computational scenarios, a threshold C ¯ is prescribed and C i n = 0 is enforced when C ϕ n x ˙ i | [ Φ ϕ n ] C ¯. Such a threshold value is adopted mainly to exclude the small causation entropy values due to the sampling error from using a finite time series. After applying this threshold value, a causation entropy matrix (CEM) C is obtained, where its ( i , n )th entry is given by
(2.7)
Based on the matrix C, the sparsity can be further enforced into Ξ by setting ξ i n = 0 when C i n = 0. It has been demonstrated that a correct sparse model Ξ Φ can be obtained in the offline-learning setting where the time series is long enough.72 However, in the online learning setting, all the covariance matrices in (2.6) are only estimated from a limited amount of batch data and may not provide correct information for imposing sparsity. To address this challenge in the online learning setting, the following CEBoosting algorithm is introduced.
The kth batch data correspond to a time series x ( t ) for t [ t B k , t B k + 1 ). Assuming that by exploiting all the past batch data x ( t ) for t [ 0 , t B 1 ), the causation entropy of the system has been calculated in an off-line mode, and then the model structure and unknown parameters in the current regime can be estimated based on the calculated causation entropy matrix. More specifically, the model in the current regime is estimated as
(2.8)
With the first new incoming batch data x ( t ) for t [ t B 1 , t B 2 ), the true dynamics x ˙ within this period can be obtained by differentiation of x ( t ). On the other hand, the predicted dynamics using the current model (2.8) is given by x ~ ˙ = Ξ ( 0 ) Φ. Then, the residual dynamics δ f for the first batch is given by r = x ˙ x ~ ˙. Therefore, the true dynamics is the temporal derivative from the new batch of time series, the predicted dynamics corresponds to the current model, and the residual dynamics is the difference between the true and predicted dynamics. If there is no regime switching for this batch, then it is expected that δ f = 0 and the estimated r is approximately Gaussian white noise. As an analog to (2.2), the matrix form of the residual dynamics r can be written as
(2.9)
To detect regime switching, identify sparse pattern of Ξ r and estimate the rest of parameters of Ξ r in (2.9), one can estimate the causation entropy C ϕ n r i | [ Φ ϕ n ] between basis of states Φ and residual dynamics r according to (2.4) and (2.6) and then obtain the corresponding causation entropy matrix C by (2.7).
Mathematically, if C = 0, it implies that no variable in the basis functions Φ is significant to the residual dynamics r, which leads to Ξ r = 0 in (2.9). In other words, the current model Ξ ( 0 ) Φ fits the dynamics of first batch and makes residual dynamics r behave like white noise. This means there is no regime switching ( δ f = 0) if C = 0 and the model is unchanged. Otherwise, if C contains some non-zero terms, then a regime switching occurs ( δ f 0), and the goal is to identify the sparse structure of the Ξ r in (2.9). However, despite the mathematical justification, with a limited amount of data in each batch, the true sparse structure of δ f cannot be accurately identified. It is often the case that C contains multiple entries that are nonzero due to the sampling error from the short time series. The sampling error can originate from insufficiently sampled data within a single data batch or measurement noises during the data collection process such as sensor-based data. Therefore, when a batch of time series is short, the sampling error will cause a biased statistical estimation of CEM. This means that directly imposing the sparsity into Ξ r based on the CEM C computed from such a short single-batch data may lead to an incorrect residual system Ξ r Φ. To resolve such a sampling problem, we introduce aggregated CEM C + ( K ) based on the average of the causation entropy from a series of data batches,
(2.10)
Denote by D the number of batches with which the aggregated CEM has not changed, namely,
(2.11)
In the CEBoosting algorithm, D is a hyper-parameter and needs to be pre-determined. The algorithm is robust with the choice of D as long as D is not too small. If the averaged causation entropy matrix with a chosen value of D can successfully identify all the important basis functions, then any larger value of D should also complete the identification task with a consistent pattern of important basis functions. Based on our tests of all the examples, D = 4 is a good empirical choice to achieve the consistent and correct pattern of identified basis functions. We further define a stable aggregated causation entropy matrix C + ¯ = C + ( K ) with the smallest K that satisfies the criterion in (2.11).

With this stable aggregated causation entropy matrix C + ¯, we then impose sparsity into Ξ r by setting δ ξ i n = 0 when C + ¯ i n = 0 and extract a set of remaining coefficients Ξ r = { δ ξ i n | C + ¯ i n = 1 }.

Once the sparsity is imposed into Ξ r for extracting a set of remaining coefficients, a model Ξ r Φ in (2.9) can be calibrated based on the accumulated batches of data x ( t ) for t [ t B 1 , t B K + 1 ). Assume a discrete approximation of the continuous data with a fixed time step Δ t such that t B k + 1 t B k = M Δ t for any k. The model calibration is performed by solving the following least squares problem:
(2.12)
where denotes the vector norm in R p. Note that the method also works with adaptive time steps, and the assumption of a fixed time step in (2.12) is for the simplicity of the illustration.

After the sparse parameter matrix Ξ r for residual dynamics model (2.9) is obtained, the current model Ξ ( 0 ) is updated by adding the information from the residual dynamics Ξ r. The CEBoosting algorithm repeats the above procedure when a new batch of data arrives. A schematic illustration of Lorenz 63 system with regime switching is displayed in Fig. 1, and more details of the CEBoosting algorithm can be found in Algorithm 1 presented in  Appendix A. It should be noted that the notations adopted in this section assume the regime switching time t s [ t B 1 , t B 2 ) for the simplicity of the illustration. In practice, regime switching can happen at t s [ t B k , t B k + 1 ) with k > 1, for which Algorithm 1 presents the detailed procedures of detecting regime switching, aggregating causation entropy matrix, identifying a sparse model structure, and then fitting the model parameters.

FIG. 1.

Schematic of CEBoosting algorithm (based on the Lorenz 63 model). Panel (a): data are generated from the Lorenz 63 system with regime switching. The parameter in regime 1 is σ = 10 , β = 8 / 3 , ρ = 28, while ρ is changed to 38 in regime 2. The index of incoming batch data k starts with 1. Panel (b): Ξ ( 0 ) is the current model parameter matrix defined in (2.3) with Φ being the basis functions and x ˙ i being the dynamic of state x i. With Ξ ( 0 ), the residual dynamics r i and the causation entropy between r i and Φ can be calculated for each batch. CEM ( k ) is the aggregated causation entropy from batch 1 to k. C + ( k ) is the binary matrix of CEM ( k ) defined in (2.10) indicating the sparse structure of the residual model. D is defined in (2.11) indicating number that C + ( k ) becomes stable, i.e., the pattern is consistent with the previous D aggregated causation entropy matrix. Panel (c): with C + ( k ) and data batches from the batch with new regime detected to the one with a stable C +, the residual model is calibrated by least square estimation.

FIG. 1.

Schematic of CEBoosting algorithm (based on the Lorenz 63 model). Panel (a): data are generated from the Lorenz 63 system with regime switching. The parameter in regime 1 is σ = 10 , β = 8 / 3 , ρ = 28, while ρ is changed to 38 in regime 2. The index of incoming batch data k starts with 1. Panel (b): Ξ ( 0 ) is the current model parameter matrix defined in (2.3) with Φ being the basis functions and x ˙ i being the dynamic of state x i. With Ξ ( 0 ), the residual dynamics r i and the causation entropy between r i and Φ can be calculated for each batch. CEM ( k ) is the aggregated causation entropy from batch 1 to k. C + ( k ) is the binary matrix of CEM ( k ) defined in (2.10) indicating the sparse structure of the residual model. D is defined in (2.11) indicating number that C + ( k ) becomes stable, i.e., the pattern is consistent with the previous D aggregated causation entropy matrix. Panel (c): with C + ( k ) and data batches from the batch with new regime detected to the one with a stable C +, the residual model is calibrated by least square estimation.

Close modal

In recent years, SINDy has been a popular framework for sparsely identifying nonlinear dynamics from data.32 SINDy exploits an iterative thresholding regularization to determine the model structure and estimate the model parameters simultaneously. The iterative thresholding regularization guarantees the parsimonious model structure. Using abundant training data (e.g., long time-series), the original SINDy method was designed for offline model identification. To better work with a limited amount of data, a recent extension of SINDy34 leveraged the statistical approach of bagging and achieved a more robust learning performance.

Compared to the SINDy methods, the CEBoosting algorithm is mainly designed with a low computational cost for robust online learning with limited sequential data. The key difference from the SINDy methods is that imposing sparsity is decoupled from the parameter estimation in the CEBoosting algorithm. Specifically, causation entropy is utilized only to determine a parsimonious model structure without dealing with the model calibration. The concept of bagging is further introduced with sequential batches of data to ensure robust estimation of causation entropy before imposing a parsimonious model structure. It results in utilizing the minimum data to determine the model structure. With a robust estimation of the parsimonious model structure, the CEBoosting algorithm exploits a simple least square estimate to solve the parameter values, a quadratic optimization problem with a closed analytic solution. This avoids repeatedly estimating the parameters in the LASSO-type or thresholding-based regression approaches when determining the model structure by examining each batch data. In online learning, the CEBoosting algorithm, by design, can also avoid accidentally removing essential basis functions due to the incorrectly calibrated model based on a limited amount of data.

In this section, the performance of the CEBoosting method is demonstrated utilizing four different chaotic or turbulent systems, which are models that mimic crucial features in many science or engineering disciplines. The presentation starts with the three-dimensional Lorenz 63 system, which is a classical chaotic system. This test is utilized as a proof of concept to demonstrate the detailed steps of the method in the online identification of the non-linear dynamics with regime switching. To illustrate the efficiency of the algorithm in capturing the regime switching behavior in a relatively high-dimensional case, the 40-dimensional Lorenz 96 system is utilized as a second test, where the localization technique is incorporated to mitigate the curse of dimensionality. As strongly non-Gaussian statistics, intermittency and extreme events appear in many climate, atmosphere, and ocean science problems, a multi-mode layered topographic model that captures these crucial turbulent features is adopted as the next test model. The last test case aims to deal with a more realistic scenario where only a subset of the state variables is observed. Data assimilation is, therefore, essential in such a partial observational case. A stochastic parameterized extended Kalman filter (SPEKF) model is used to estimate and simulate the hidden system state variables. Below, assuming the starting model and its parameters are available for all the numerical examples. The goal is to learn the regime switching and the corresponding residual model that adjusts the original model to a new regime based on online sequential data.

For all four systems, the numerical simulation time step size is chosen as d t = 0.001. The hyper-parameter D in (2.11) is chosen as 4 for the numerical examples of this work.

The Lorenz 63 (L63) system is proposed by Lorenz in 1963.89 It is a simplified mathematical model for atmospheric convection. The equations relate the properties of a two-dimensional fluid layer uniformly warmed from below and cooled from above. In particular, the equations describe the rate of change of three quantities concerning time: x is proportional to the rate of convection, y to the horizontal temperature variation, and z to the vertical temperature variation. The constants σ, ρ, and β are system parameters proportional to the Prandtl number, Rayleigh number, and certain physical dimensions of the layer itself.90 The L63 model is also widely used as a simplified model for lasers, dynamos, thermosyphons, brushless DC motors, electric circuits, chemical reactions, and forward osmosis.91–97 The governing equation of the Lorenz 63 system is as follows:
(3.1)
The standard parameters σ = 10, β = 8 / 3, and ρ = 28 that create the butterfly profile are utilized as the starting regime. The new regime takes a different value of the parameter ρ = 38. The sudden change of the parameter from ρ = 28 to ρ = 38 occurs at t = 100. The goal is to (i) detect the regime switching and (ii) learn the residual model that adjusts the original system to the new one.

Figure 2(a) compares the trajectories in the phase space between the original (regime 1) and new (regime 2) regimes. It can be seen that the two systems are on different manifolds in the phase space. Figure 2(c) shows the time series of the three state variables. With the regime switching at t = 100, it can be seen in Fig. 2(c) that the patterns of time series change accordingly, especially for the variable z that demonstrates a shift of its mean value. The autocorrelation function (ACF) of each state variable for the original system is presented in Fig. 2(b), which shows a rapid decay of correlation within one time unit, except for the variable z that has some oscillations in its ACF. Similar behavior of the ACF is observed in Fig. 2(d) after the regime switching. Finally, Fig. 2(e) shows the ensemble mean of the state variable z as a function of time from an independent simulation with 5000 ensemble members, which reveals that the transition time of the regime switching is about 20 time units. This also indicates that the transition time depends on the property of the transient feature and is very different from the decorrelation time. Nevertheless, the decorrelation time of the original system provides a natural way to determine the batch size. Thus, the batch size is chosen as one time unit here.

FIG. 2.

Lorenz 63 system with regime switching. (a) Trajectories of the original system and the new one in phase space. (c) Time series of system state variables. (b) and (d) Autocorrelation function (ACF) of the original system and the new regime. (e) Ensemble mean of variable z before and after regime switching at t = 100.

FIG. 2.

Lorenz 63 system with regime switching. (a) Trajectories of the original system and the new one in phase space. (c) Time series of system state variables. (b) and (d) Autocorrelation function (ACF) of the original system and the new regime. (e) Ensemble mean of variable z before and after regime switching at t = 100.

Close modal

The CEBoosting algorithm is employed to detect the regime switching and identify the sparse structure of the residual model. The candidate basis functions include all the linear and quadratic nonlinear functions: { x , y , z , x y , x z , y z , x 2 , y 2 , z 2 }. With the identified sparse structure, the coefficients of the residual model are then determined via the least square estimation. To detect the regime switching and identify the sparse structure of the residual model, the aggregated causation entropy matrix is gradually updated until its structure gets stable, i.e., the causation entropy values of some candidate basis functions keep being significantly greater than others. The stable causation entropy matrix structure is obtained after 6 time units and is presented in Table I, where only one entry has a significant value. It is worthwhile to highlight that the total time units (6 units) to discover the regime switching and determine the model parameters in the new regime is shorter than the transition time (20 units). In other words, utilizing the information from the transient period is sufficient for the CEBoosting algorithm to determine the model response to the regime switching in this test case.

TABLE I.

The L63 model—The causation entropy matrix (CEM) after 6 time units. The pattern of CEM becomes stable and does not change with incorporating more batch data. The entry with a significant value of the causation entropy is highlighted using the bold font. According to the pattern of this CEM, a residual model will be built.

10−4 x y z xy xz yz x2 y2 z2
x ˙  11.0437  10.6292  5.1156  4.6996  6.7041  9.2123  8.9308  3.0866  4.0888 
y ˙  63.1474  3.7597  5.0283  2.5584  2.6370  5.5017  4.3689  2.4472  4.4099 
z ˙  9.0692  6.6605  9.5399  11.3740  8.5529  10.6165  10.8299  10.8725  10.5916 
10−4 x y z xy xz yz x2 y2 z2
x ˙  11.0437  10.6292  5.1156  4.6996  6.7041  9.2123  8.9308  3.0866  4.0888 
y ˙  63.1474  3.7597  5.0283  2.5584  2.6370  5.5017  4.3689  2.4472  4.4099 
z ˙  9.0692  6.6605  9.5399  11.3740  8.5529  10.6165  10.8299  10.8725  10.5916 

Finally, based on the sparse structure identified in Table I, a residual model can be calibrated via the least square estimation based on the residuals of the original model. The residual model is then added to the original model as a correction term. The coefficients of the corrected model for the new regime are listed in Table II. It can be seen that the corrected model successfully captures the crucial parameter value ρ = 38 that induces the new regime.

TABLE II.

The L63 model—Updated model parameters for the new regime.

x y z xy xz yz x2 y2 z2
x ˙  −9.9749  9.9671 
y ˙  38.0357  −1.0129  −1.0016 
z ˙  −2.6670  1.0017 
x y z xy xz yz x2 y2 z2
x ˙  −9.9749  9.9671 
y ˙  38.0357  −1.0129  −1.0016 
z ˙  −2.6670  1.0017 
The Lorenz 96 (L96) system is a classical chaotic to turbulent model.98 It can be regarded as a coarse discretization of atmospheric flow on a latitude circle with complicated wave-like and chaotic behavior. It is also widely used as a testbed for data assimilation, prediction, and uncertainty quantification.99–102 The L96 model reads:
(3.2)
with periodic boundary conditions x 1 = x J 1 , x 0 = x J , x J + 1 = x 1 and J = 40 are adopted. The L96 system is utilized here to demonstrate that the CEBoosting algorithm, combined with localization techniques, can detect regime switching for relatively high-dimensional systems. The motivation here is that the number of candidate functions quickly increases with the dimension of the system if all possible combinations of state variables up to a particular order are included. As a result, the computational cost becomes unaffordable without suitable treatment for such a curse of dimensionality. To this end, the idea of localization is exploited here. Localization means the dynamics of each state variable only depend on the nearby ones. In fact, the advection, diffusion, and dispersion are all local operators.103,104 Similarly, the localization applies to many parameterization problems for the subgrid scales, which also depend only on the nearby corresponding large-scale state variables.105–107 The idea of localization is also widely utilized in data assimilation and prediction.108–110 Localization reduces the number of candidate functions for the dynamics of each x j by including only the terms that represent local interactions with d x j / d t. Note that the total number of candidate functions in the entire library can remain large, but only a relatively small number of functions will be examined for the causal relationship to the dynamics of each x j.

The original system (regime 1) has a forcing coefficient F = 8, which makes the system have a strongly chaotic behavior. After the regime switching at t = 100, the new system (regime 2) takes the forcing value F = 16. Meanwhile, the coefficient of x j, representing the damping effect, changes from 1 to 1.5. This leads to a fully turbulent regime.111 Some weak coherent structures can still be observed in the strongly chaotic regime, but they disappear in the fully turbulent one. See Fig. 3 for the two regimes. Figure 4 shows the detailed statistical properties of the state variable x 10. Note that the model has homogeneous dynamics, so the statistical properties for different state variables are the same. Figure 4(a) illustrates how the time series of x 10 change with the regime switching. From the strongly chaotic to the fully turbulent regime, the variance of all the state variables becomes more extensive, and the decorrelation time becomes shorter. The batch size of 1 time unit is chosen here, which is again of the same order as the decorrelation time. The total transition time is about 1–2 units, according to Fig. 4(b).

FIG. 3.

The Lorenz 96 system with regime switching. Top: 40-dimensional system state evolving at the time interval [ 0 , 200 ] with a regime switching at t = 100. Regime 1 is chaotic with F = 8, and regime 2 is turbulent with F = 16 and the linear terms 1.5 x j. Bottom: zoom-in view of the regime switching at the time interval [ 90 , 110 ].

FIG. 3.

The Lorenz 96 system with regime switching. Top: 40-dimensional system state evolving at the time interval [ 0 , 200 ] with a regime switching at t = 100. Regime 1 is chaotic with F = 8, and regime 2 is turbulent with F = 16 and the linear terms 1.5 x j. Bottom: zoom-in view of the regime switching at the time interval [ 90 , 110 ].

Close modal
FIG. 4.

Statistical properties of the state variable x 10 of the Lorenz 96 system with regime switching. Panel (a): time series of x 10 in regimes 1 and 2. Panels (b) and (c): the PDFs of x 10 in both regimes. Panels (d) and (e): the ACFs of x 10 in both regimes. Panels (f) and (g): the ensemble mean and the ensemble variance of x 10 before and after the regime switching at t = 100.

FIG. 4.

Statistical properties of the state variable x 10 of the Lorenz 96 system with regime switching. Panel (a): time series of x 10 in regimes 1 and 2. Panels (b) and (c): the PDFs of x 10 in both regimes. Panels (d) and (e): the ACFs of x 10 in both regimes. Panels (f) and (g): the ensemble mean and the ensemble variance of x 10 before and after the regime switching at t = 100.

Close modal

A set of candidate basis functions { x j , x j 2 , x j x j 1 , x j x j + 1 , x j x j 2 , x j x j + 2 x j 2 } is built by exploiting polynomials up to the second order for each state x j , j = 1 , 2 , J. Note that the residual dynamics of each state x j is assumed to have contributions from the basis functions that consist of only its adjacent states. The CEBoosting algorithm identifies the sparse structure of the residual model using about two batches of data (i.e., the causation entropy matrix pattern does not change after two time units). Table III shows the CEM based on utilizing two batches of data, which is about the same length as the estimated transition time of Lorenz 96 system [see Figs. 4(f) and 4(g)]. It is seen that the causation entropy entries associated with the actual residual terms are at least one order more significant than others. Therefore, the CEBoosting algorithm successfully identifies the correct sparse structure of the residual model based on data within the time interval of the transient period.

TABLE III.

The L96 model—The CEM after 2 time units. The entries with significant values of the causation entropy are highlighted using the bold font. The pattern of CEM becomes stable and does not change with incorporating more data batches. According to the pattern of this CEM, a residual model will be built.

10−4 x1 x 1 2 x2x40 x39x40 x2 x 2 2 x3x1 x40x1 ⋅⋅ ⋅ x40 x 40 2 x1x39 x38x39
x 1 ˙  436.0597  4.2015  2.2964  1.4823  0.0976  3.4716  1.3792  0.9251  ⋅⋅ ⋅  0.1869  3.0008  0.7236  8.0802 
x 2 ˙  8.4920  9.8888  6.0428  8.0793  19.8015  8.3095  10.1492  9.6844  ⋅⋅ ⋅ 
⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅ 
x 40 ˙  2.7772  3.1363  4.5714  8.4405  9.6327  5.3498  4.8701  7.6239  ⋅⋅ ⋅  200.3710  2.5200  7.6225  2.7748 
10−4 x1 x 1 2 x2x40 x39x40 x2 x 2 2 x3x1 x40x1 ⋅⋅ ⋅ x40 x 40 2 x1x39 x38x39
x 1 ˙  436.0597  4.2015  2.2964  1.4823  0.0976  3.4716  1.3792  0.9251  ⋅⋅ ⋅  0.1869  3.0008  0.7236  8.0802 
x 2 ˙  8.4920  9.8888  6.0428  8.0793  19.8015  8.3095  10.1492  9.6844  ⋅⋅ ⋅ 
⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅ 
x 40 ˙  2.7772  3.1363  4.5714  8.4405  9.6327  5.3498  4.8701  7.6239  ⋅⋅ ⋅  200.3710  2.5200  7.6225  2.7748 

After obtaining the sparse structure according to the CEM in Table III, the residual dynamics are calibrated via a linear combination of candidate basis functions, and the coefficients for the linear combination are obtained by the least square estimation. The L96 model in the new regime is then updated by adding this residual model to the original model. The coefficients of the updated model are summarized in Table IV. The identified model shows a good agreement with the true system of the new regime with F = 16 and the linear term 1.5 x j.

TABLE IV.

The L96 model—Coefficients of the updated model for the new regime. The entries with bold font indicate the parameters that contribute to the regime switching.

1 x1 x 1 2 x2x40 x39x40 x2 x 2 2 x3x1 x40x1 ⋅⋅ ⋅ x40 x 40 2 x1x39 x38x39
x 1 ˙  16.0248  −1.4938  1.0000  −1.0016  ⋅⋅ ⋅ 
x 2 ˙  16.0490  −1.5077  1.0000  −1.0000  ⋅⋅ ⋅ 
⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅ 
x 40 ˙  16.0027  ⋅⋅ ⋅  −1.5017  1.0000  −1.0000 
1 x1 x 1 2 x2x40 x39x40 x2 x 2 2 x3x1 x40x1 ⋅⋅ ⋅ x40 x 40 2 x1x39 x38x39
x 1 ˙  16.0248  −1.4938  1.0000  −1.0016  ⋅⋅ ⋅ 
x 2 ˙  16.0490  −1.5077  1.0000  −1.0000  ⋅⋅ ⋅ 
⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅  ⋅⋅ ⋅ 
x 40 ˙  16.0027  ⋅⋅ ⋅  −1.5017  1.0000  −1.0000 
The topographic model is an ideal model to study the complex nonlinear interaction of the large-scale and the small-scale flow and the role of the topography.2,112 The topographic model can generate intermittency and extreme events. The corresponding PDF is often highly non-Gaussian with heavy tails. Therefore, it provides a challenging test case for detecting the regime switching behavior. Here, the small-scale flow is given in terms of the stream function ψ. The large-scale velocity field only has the zonal component u ( t ), and the topography is given by the function h. The parameter β > 0 is the contribution from the beta-plane effect. A common simplified version of the topographic model assumes a layered topography along the y direction, which means ψ is only a function of y. In addition, the model contains only the leading two Fourier wavenumbers of the stream function (with k = ± 1 and ± 2). With these simplifications, the resulting model is reduced to a five-dimensional system containing u, ψ ± 1, and ψ ± 2. For the simplicity of notation, a change of variables defines the new state variables v 1 , , v 4, which are linked with ψ ± 1 and ψ ± 2 via
where i is the imaginary unit. Similarly, ω 1 and ω 3 are the two new variables standing for the Fourier coefficients of the topographic effect from h. The model reads
(3.3)
The original system (regime 1) takes the parameters d v 1 = d v 2 = d v 3 = d v 4 = d v 5 = 0.005, β = 1, σ v 1 = σ v 2 = σ v 3 = σ v 4 = 1 / ( 20 2 ), σ u = 1 / 2, ω 1 = 2 / 2, and ω 3 = 2 / 4. The new model (regime 2) utilizes different parameters for the topographic effect with ω 1 = ω 3 = 3 2 / 2. It is worthwhile to note that the noise level in the dynamics of the zonal flow u is much larger than those in the dynamics of v i in (3.3). This is a typical situation as the u is the only mode that explicitly describes the zonal feature of the flow. A large noise is taken to mimic the unresolved dynamical features.

Figure 5 displays the regime switching and statistical properties of all five variables in the topographic model. The pattern and amplitude of each time series demonstrate a noticeable change after the regime switching happens at t = 2500. Note that the long time series of both regimes are utilized here, ensuring that the associated statistics are computed with a sufficiently large number of sample points. The PDFs behave like Laplace distributions with heavy tails, indicating many extreme events in the model simulation. The ACF of u in the original model (regime 1) decays very slowly, taking about 30 time units until the ACF approaches zero. Because of this, a batch size of 30 time units is utilized in the CEBoosting algorithm. Due to the strengthening of the topographic effect after the regime switching, the time series of v 2 and v 4 in the new model (regime 2) display multiscale features, where the ACFs have a quick decay at the beginning but then relax to zero slowly after 50 time units.

FIG. 5.

Topographic model with regime switching. Middle column: time series of each state in the time interval [0, 5000] with regime switching at t = 2500. First and fifth columns: probability density function (PDF) of each state in both regimes. Second and fourth columns: autocorrelation function (ACF) of each state in both regimes.

FIG. 5.

Topographic model with regime switching. Middle column: time series of each state in the time interval [0, 5000] with regime switching at t = 2500. First and fifth columns: probability density function (PDF) of each state in both regimes. Second and fourth columns: autocorrelation function (ACF) of each state in both regimes.

Close modal

Similar to the previous test models, the library contains the polynomials up to the second order. In other words, the following 20 basis functions are employed to build the library: { v 1 , v 2 , v 3 , v 4 , u , v 1 2 , v 2 2 , v 3 2 , v 4 2 , u 2 , v 1 v 2 , v 1 v 3 , v 1 v 4 , v 1 u , v 2 v 3 , v 2 v 4 , v 2 u , v 3 v 4 , v 3 u , v 4 u }.

As the noise levels of state variables are significantly different from each other, the magnitudes of causation entropy for different residual dynamics are not the same either. Therefore, the selection of the candidate functions is based on the causation entropy values for the dynamics of each state variable separately. To determine the entire residual model, 60 time units is used. See Table V.

TABLE V.

The topographic model—The CEM after 60 time units. The pattern of CEM does not change with incorporating more batch data. The entries with significant values of the causation entropy are highlighted using the bold font.

10−3 v1 v2 v3 v4 u v1u v2u v3u v4u ⋅⋅ ⋅
v 1 ˙  0.0115  0.0227  0.0165  0.0614  120.9879  0.0216  0.0365  0.0407  0.0249  ⋅⋅ ⋅ 
v 2 ˙  0.0124  0.0327  0.0066  0.0150  0.0313  0.0002  0.0090  0.0039  0.0284  ⋅⋅ ⋅ 
v 3 ˙  0.0023  0.0033  0.0142  0.0099  51.3949  0.0537  0.0291  0.0572  0.0082  ⋅⋅ ⋅ 
v 4 ˙  0.0359  0.0619  0.0150  0.0134  0.0253  0.0180  0.0127  0.0099  0.0210  ⋅⋅ ⋅ 
u ˙  0.1080  0.0022  0.0471  0.0146  0.0065  0.0068  0.0097  0.0049  0.0063  ⋅⋅ ⋅ 
10−3 v1 v2 v3 v4 u v1u v2u v3u v4u ⋅⋅ ⋅
v 1 ˙  0.0115  0.0227  0.0165  0.0614  120.9879  0.0216  0.0365  0.0407  0.0249  ⋅⋅ ⋅ 
v 2 ˙  0.0124  0.0327  0.0066  0.0150  0.0313  0.0002  0.0090  0.0039  0.0284  ⋅⋅ ⋅ 
v 3 ˙  0.0023  0.0033  0.0142  0.0099  51.3949  0.0537  0.0291  0.0572  0.0082  ⋅⋅ ⋅ 
v 4 ˙  0.0359  0.0619  0.0150  0.0134  0.0253  0.0180  0.0127  0.0099  0.0210  ⋅⋅ ⋅ 
u ˙  0.1080  0.0022  0.0471  0.0146  0.0065  0.0068  0.0097  0.0049  0.0063  ⋅⋅ ⋅ 

Based on the sparse model structure identified in Table V, the residual model is calibrated via the least square estimation. The coefficients of the calibrated model for the new regime are summarized in Table VI. It can be seen that the corrected model successfully updates the changed parameters ω 1 and ω 3 in the new regime. The resulting model can reproduce the strong non-Gaussian features with intermittency and extreme events.

TABLE VI.

The topographic model—Updated model for the new regime.

v1 v2 v3 v4 u v1u v2u v3u v4u ⋅⋅ ⋅
v 1 ˙  −0.0496  −1.0008  −4.2463  0.9998  ⋅⋅ ⋅ 
v 2 ˙  0.9988  −0.0501  −1.0009  ⋅⋅ ⋅ 
v 3 ˙  −0.0554  −0.4968  −2.1200  2.0015  ⋅⋅ ⋅ 
v 4 ˙  0.4998  −0.0523  −1.9996  ⋅⋅ ⋅ 
u ˙  2.1123  4.5713  −0.0599  ⋅⋅ ⋅ 
v1 v2 v3 v4 u v1u v2u v3u v4u ⋅⋅ ⋅
v 1 ˙  −0.0496  −1.0008  −4.2463  0.9998  ⋅⋅ ⋅ 
v 2 ˙  0.9988  −0.0501  −1.0009  ⋅⋅ ⋅ 
v 3 ˙  −0.0554  −0.4968  −2.1200  2.0015  ⋅⋅ ⋅ 
v 4 ˙  0.4998  −0.0523  −1.9996  ⋅⋅ ⋅ 
u ˙  2.1123  4.5713  −0.0599  ⋅⋅ ⋅ 

It is worthwhile to remark that, as the noise levels in the dynamics of v i are lower than that in u, a shorter time series (and shorter batch length) with in total of only 15 time units can be utilized to reach a stable CEM for the v i components. See Table VII. Figure 6 shows the time evolution of the ensemble mean and the ensemble variance of the topographic model, including the time instant of regime switching. The results here can be utilized to infer the transition time, which is about 20 time units. Therefore, in this topographic model with a large noise in the u dynamics, the stable pattern of all v i variables can be identified by the CEBoosting algorithm within the transition period. Yet, 60 time units of data are needed for identifying the model structure associated with the variable u, mainly due to the larger noises of u.

FIG. 6.

Ensemble mean and variance of each state variable in the topographic model. Regime switching happens at t = 2500.

FIG. 6.

Ensemble mean and variance of each state variable in the topographic model. Regime switching happens at t = 2500.

Close modal
TABLE VII.

The topographic model—The CEM after 15 time units. The pattern of all v variables is stable. The entries with significant values of the causation entropy are highlighted using the bold font.

10−3 v1 v2 v3 v4 u v1u v2u v3u v4u ⋅⋅ ⋅
v 1 ˙  0.0182  0.0233  0.0322  0.0679  22.665  0.0244  0022  0.1726  0.0285  ⋅⋅ ⋅ 
v 2 ˙  0.0768  0.0071  0.0002  0.0004  0.0879  0.0582  0.0007  0.0171  0.1059  ⋅⋅ ⋅ 
v 3 ˙  0.0187  0.0006  0.1123  0.0411  10.3880  0.0133  0.0024  0.0843  0.0166  ⋅⋅ ⋅ 
v 4 ˙  0.0087  0.1908  0.0171  0.0023  0.000  0.0001  0.0520  0.0015  0.0016  ⋅⋅ ⋅ 
u ˙  0.0008  0.0007  0.0078  0.0194  0.0167  0.0250  0.0054  0.0223  0.0107  ⋅⋅ ⋅ 
10−3 v1 v2 v3 v4 u v1u v2u v3u v4u ⋅⋅ ⋅
v 1 ˙  0.0182  0.0233  0.0322  0.0679  22.665  0.0244  0022  0.1726  0.0285  ⋅⋅ ⋅ 
v 2 ˙  0.0768  0.0071  0.0002  0.0004  0.0879  0.0582  0.0007  0.0171  0.1059  ⋅⋅ ⋅ 
v 3 ˙  0.0187  0.0006  0.1123  0.0411  10.3880  0.0133  0.0024  0.0843  0.0166  ⋅⋅ ⋅ 
v 4 ˙  0.0087  0.1908  0.0171  0.0023  0.000  0.0001  0.0520  0.0015  0.0016  ⋅⋅ ⋅ 
u ˙  0.0008  0.0007  0.0078  0.0194  0.0167  0.0250  0.0054  0.0223  0.0107  ⋅⋅ ⋅ 
In many practical situations, the observations are only available for a subset of state variables, known as partial observations. It will be shown in the following that the CEBoosting algorithm can be naturally applied to the case with partial observations when data assimilation is appropriately incorporated. To illustrate the CEBoosting algorithm in the partial observational scenario, a simple yet practically useful nonlinear model is utilized as a testbed. The model is the so-called stochastic parameterized extended Kalman filter (SPEKF) model,113,114
(3.4)
In this SPEKF model, u ( t ) is a complex-valued state variable and is the only variable in the system to be observed. The observed variable u ( t ) is driven by three hidden variables γ ( t ) , ω ( t ), and b ( t ). The parameters d γ , d ω , d b are all positive, serving as damping factors. The parameters σ u, σ γ, σ ω, and σ b are noise coefficients, which are also positive. The white noises W ˙ u and W ˙ b are complex-valued while W ˙ γ and W ˙ ω are real. The governing equations of γ ( t ), ω ( t ), and b ( t ) are Ornstein–Uhlenbeck (OU) processes115 with γ and ω taking real values and b taking a complex value. The three constants γ ^, ω ^, and b ^ in the dynamics of u represent the mean damping, mean phase, and the mean forcing, respectively.

The SPEKF model (3.4) has been widely used as an approximate model to describe a spectral mode of a complex turbulent system, especially in the context of data assimilation and ensemble prediction.111,114,116 Physically, the variable u ( t ) represents one of the resolved modes (i.e., observable) in the turbulent signal, while the three hidden variables γ ( t ) , ω ( t ), and b ( t ) are surrogates for the nonlinear interaction between u ( t ) and other unobserved modes in the original governing equation after applying the spectral decomposition. The idea of the SKEPF model is that the small or unresolved scale variables are stochastically parameterized by inexpensive linear and Gaussian processes, representing stochastic damping γ ( t ), stochastic phase ω, and stochastic forcing b. Despite the model error in using such Gaussian approximations for the original unresolved nonlinear dynamics, these Gaussian processes succeed in providing accurate statistical feedback from the unresolved scales to the resolved ones. Thus, the intermittency and non-Gaussian features observed in the resolved variables can be accurately recovered. The statistics in the SPEKF model can also be solved with exact and analytic formulas, which allow an accurate and efficient estimation of the model states. The SPEKF type of model has been used for filtering multiscale turbulent dynamical systems,111 stochastic superresolution,117 and filtering Navier–Stokes equations with model error.118 It has been shown that the SPEKF model has much higher skill than classical Kalman filters using the so-called mean stochastic model (MSM) to capture the irregularity and intermittency in nature.

In the following, the system will experience regime switching twice. In the starting regime (regime 1), all three hidden variables have significant contributions to the dynamics of the observed process u. In regime 2, ω will be set to zero; therefore, the stochastic phase does not influence the dynamics of the resolved variable u. Finally, in regime 3, the stochastic damping γ will be removed from the dynamics of u, but the contribution from ω will be added back. Figure 7 shows the observed trajectories and the associated statistics of the real part of the observed u variable. Due to the stochastic damping, the time series in regimes 1 and 2 are intermittent with multiple extreme events when the overall damping γ ^ + γ ( t ) becomes positive. As a result, the PDFs are non-Gaussian fat-tailed. In contrast, very few extreme events are found in regime 3, and the associated PDF is Gaussian. Similarly, the ACF in regime 2 shows a clear oscillatory pattern when it decays. This is due to a dominant frequency of the time series coming from the constant phase ω ^. Such a regular oscillation in the ACF becomes less significant when the stochasticity is added to the phase term via ω ( t ).

FIG. 7.

The SPEKF model with three different regimes (twice the regime switching) as time evolves. In regime 1, all three hidden variables γ, ω, and b have significant contributions to the dynamics of u. In regime 2, ω is set to zero. In regime 3, the stochastic damping γ is removed from the dynamics of u and the contribution from ω is added back. First row: the time series of the real part of the observed variable u. Second row: the PDF of u in each regime. Third row: the ACF of u in each regime.

FIG. 7.

The SPEKF model with three different regimes (twice the regime switching) as time evolves. In regime 1, all three hidden variables γ, ω, and b have significant contributions to the dynamics of u. In regime 2, ω is set to zero. In regime 3, the stochastic damping γ is removed from the dynamics of u and the contribution from ω is added back. First row: the time series of the real part of the observed variable u. Second row: the PDF of u in each regime. Third row: the ACF of u in each regime.

Close modal

The goal here is to detect the regime switching and reveal the dynamics in each regime. It is worth highlighting that the dimensions of the system in the three regimes are different. In the absence of ω or γ, the dimension reduces from 4 to 3 from regime 1 to regimes 2 and 3. However, such a change is unknown in practice and relies on the learning algorithm to detect it. In particular, there is no observed time series of γ, ω, and b. Therefore, recovering these variables becomes an essential step in the online learning algorithm. To this end, the following procedure is adopted that incorporates data assimilation into the CEBoosting algorithm. Assume the model in regime 1 is known. Each time when the new batch of time series of u is obtained, such a model is utilized to sample a trajectory of the three unobserved variables γ, ω, and b conditioned on the observed trajectory of u. The sampled trajectory can be thought of as the analog of one ensemble member of the ensemble Kalman smoother solution of the system conditioned on the observed signal of u,119 although the sampled trajectory using the SPEKF model can be written down using closed analytic formulas.27 See Ref. 39 for the implementation details of using such closed analytic formulas for data assimilation. This augments the unobserved components of each batch of data. Then, the CEBooting algorithm is utilized to compute the causal relationships in light of the observed time series of u and the sampled time series of γ, ω, and b. If the causation entropy from any function involving γ to the dynamics of u is zero, then the γ equation is eliminated from the final model structure. A similar logic applies to ω and b. Note that as u and b are complex-valued variables, the actual computation regards the real and imaginary parts as two processes in computing the causation entropy.

The main focus here is on identifying the dynamics of u. In particular, we aim to investigate if all three unobserved variables contribute to the observed process of u. To this end, the linear Gaussian models of γ, ω, and b are assumed to be fixed. A function library is designed with { u γ, u ω, b, u 2 , γ ω , γ 2, ω 2, b 2, γ b, ω b, u b} to learn the structure of u.

Figure 8 shows the real part of u with regime switching and sampled trajectory of γ, ω, b across the three regimes. When the unobserved variables γ, ω, or b contribute to the dynamics of u, the sampled processes match the truth quite well. On the other hand, when ω and γ disappear in regimes 2 and 3, respectively, the sampling result provides random trajectories. The causation entropy from the terms involving these trajectories has no contribution to the dynamics of u. The first row of Table VIII, showing the causation entropies of regime 1, is based on a time series with 200 time units. The second and the third rows show the causation entropies for the detected regimes 2 and 3, respectively, using one batch of data with a length of 20 units. It is seen that the term u ω has a nearly zero causation entropy to u ˙ in regime 2, where ω is sampled. Similarly, the sampled γ leads to a nearly zero causation entropy from u γ to u ˙ in regime 3.

FIG. 8.

Trajectory sampling for the SPEKF model with twice the regime switching. The black curve represents the conditional sampling of each hidden process given the observed variable u in all three regimes.

FIG. 8.

Trajectory sampling for the SPEKF model with twice the regime switching. The black curve represents the conditional sampling of each hidden process given the observed variable u in all three regimes.

Close modal
TABLE VIII.

The SPEKF model—The CEM with Data Assimilation. First row: causation entropy with existing regime 1 data (200 time units). Second row: first batch (20 time units) causation entropy of regime 2. Third row: first batch (20 time units) causation entropy of regime 3.

b u2 γω γ2 ω2 b2 γb ωb ub
u ˙ ( 1 )  0.2527  0.5272  0.2289  0.0077  0.0006  0.0028  0.0003  0.0011  0.0090  0.0084  0.0137 
u ˙ ( 2 )  0.4525  0.0208  0.1163  0.0491  0.0070  0.0080  0.0019  0.0015  0.0113  0.0035  0.0066 
u ˙ ( 3 )  0.0844  0.7032  0.3486  0.0048  0.0027  0.0019  0.0017  0.0001  0.0068  0.0106  0.0063 
b u2 γω γ2 ω2 b2 γb ωb ub
u ˙ ( 1 )  0.2527  0.5272  0.2289  0.0077  0.0006  0.0028  0.0003  0.0011  0.0090  0.0084  0.0137 
u ˙ ( 2 )  0.4525  0.0208  0.1163  0.0491  0.0070  0.0080  0.0019  0.0015  0.0113  0.0035  0.0066 
u ˙ ( 3 )  0.0844  0.7032  0.3486  0.0048  0.0027  0.0019  0.0017  0.0001  0.0068  0.0106  0.0063 

Figure 9 shows the conditional mean and uncertainty (two standard deviations) of γ and ω from the smoother solution. It is seen that when γ and ω contribute to the dynamics of u in regime 1, the conditional mean follows the truth quite well. However, the conditional mean of ω and γ behaves like a random trajectory in regimes 2 and 3, respectively, with relatively large uncertainty. Thus, the sampled trajectories are formed from randomness and uncertainty, with no causal inference on the observed variable u.

FIG. 9.

The smoother mean and the smoother uncertainty (two standard deviations) of γ and ω from the SPEKF model with twice the regime switching. The black curve shows the posterior mean of each hidden process conditioned on the observed u. The shaded gray area corresponds to the 95% confidence interval (two standard deviations) of the corresponding mean estimate.

FIG. 9.

The smoother mean and the smoother uncertainty (two standard deviations) of γ and ω from the SPEKF model with twice the regime switching. The black curve shows the posterior mean of each hidden process conditioned on the observed u. The shaded gray area corresponds to the 95% confidence interval (two standard deviations) of the corresponding mean estimate.

Close modal

Online nonlinear system identification with sequential data has recently become important in many applications, e.g., extreme weather events, climate change, and autonomous systems. In this work, we developed a causation entropy boosting (CEBoosting) framework for online nonlinear system identification. The CEBoosting algorithm aims to (i) discover a sparse residual model structure based on the aggregated causation entropy calculated from sequential data and (ii) calibrate the residual model with the identified sparse structure via least square estimation. If the true system experiences multiple regime switching, the proposed framework gradually identifies a summation of residual models, which has a close analogy to the statistical technique of boosting. We tested the proposed framework for complex systems with features including chaotic behavior, high dimensionality, intermittency and extreme events, and partial observations. The results show that the CEBoosting method can capture the regime switching and then calibrate residual models for various types of complex dynamical based on a limited amount of sequential data. It is worth noting that constraints can be naturally added to the learning algorithm. One important constraint is the so-called physics constraint,24 which requires the total energy in the quadratic nonlinear terms to be conserved. It guarantees the long-term stability of the identified system. Such a constraint has yet to be explicitly incorporated into the current framework, although the resulting parameters in various non-Gaussian test cases shown in this work already roughly satisfy this constraint. Adding constraints can be easily achieved by imposing simple relationships between model parameters in the parameter estimation step, which still allows using closed analytic formulas for finding the parameters. See, for example, Ref. 39 for details. Other future work includes the uncertainty quantification of the aggregated causation entropy and the further study of other causality metrics.

N.C. is partially funded by Office of Naval Research N00014-19-1-2421 and Army Research Office W911NF-23-1-0118. J.W. and C.C. are supported by the University of Wisconsin-Madison, Office of the Vice Chancellor for Research and Graduate Education with funding from the Wisconsin Alumni Research Foundation.

The authors have no conflicts to disclose.

Chuanqi Chen: Conceptualization (supporting); Investigation (lead); Methodology (supporting); Software (lead); Writing – original draft (supporting); Writing – review & editing (supporting). Nan Chen: Conceptualization (equal); Methodology (equal); Supervision (equal); Writing – original draft (equal); Writing – review & editing (equal). Jin-Long Wu: Conceptualization (equal); Methodology (equal); Supervision (equal); Writing – original draft (equal); Writing – review & editing (equal).

The data that support the findings of this study are available from the corresponding author upon reasonable request. The codes and examples that support the findings of this study are available in the link: https://github.com/ChuanqiChenCC/CEBoosting.120 

Algorithm 1 presents the detailed procedures of the CEBoosting method, including (i) detecting regime switching, (ii) aggregating causation entropy matrix (CEM) until a consistent pattern of CEM is obtained, (iii) identifying a sparse model structure according to the aggregated CEM, and (iv) fitting the model parameters. The consistent pattern of CEM is determined by (2.11), i.e., the aggregated CEM does not change for D data batches. In this work, we choose D = 4 for all the numerical examples.

Algorithm 1

Causation Entropy Boosting.

1:   f Ξ Φ      Current Model 
2:  for k = 1 , 2 , do 
3:           Φ Φ ( x ( t B k + m Δ t ) ) 
4:           x ˙ [ x ( t B k + ( m + 1 ) Δ t ) x ( t B k + m Δ t ) ] / Δ t 
5:           r x ˙ Ξ Φ      Residual dynamics 
6:           C C ϕ n r i | [ Φ ϕ n ] C ¯ 
7:          if C = 0 then      Same regime; Output current model 
8:                   f Ξ Φ 
9:          else      New regime detected 
10                   K 0, d 1, D 4 
11:                  while d D 1 do      Aggregate CEM until it is consistent for D iterations 
12:                            K K + 1 
13:                            C C + C ϕ n r i | [ Φ ϕ n ] ( k + K ) 
14:                            C + ( K ) 1 K C C ¯      Threshold aggregated CEM to get C + with 0 / 1 values 
15:                           if C + ( K ) = C + ( K 1 ) then 
16:                                     d d + 1 
17:                           else 
18:                                     d 1 
19:                           end if 
20:                  end while 
21:                   K K      Smallest K that satisfies the criterion in (2.11) 
22:                   Ξ r [ C + ( K ) = 0 ] 0      Select a sparse model structure and estimate parameters 
23:                   Ξ r [ C + ( K ) = 1 ] a r g m i n Ξ r m = 1 M K r ( t B k + m Δ t ) Ξ r Φ ( t B k + ( m 1 ) Δ t ) 2 
24:         Ξ Ξ + Ξ r 
25:                   f Ξ Φ 
26:          end if 
27:          return f  
28:    end for 
1:   f Ξ Φ      Current Model 
2:  for k = 1 , 2 , do 
3:           Φ Φ ( x ( t B k + m Δ t ) ) 
4:           x ˙ [ x ( t B k + ( m + 1 ) Δ t ) x ( t B k + m Δ t ) ] / Δ t 
5:           r x ˙ Ξ Φ      Residual dynamics 
6:           C C ϕ n r i | [ Φ ϕ n ] C ¯ 
7:          if C = 0 then      Same regime; Output current model 
8:                   f Ξ Φ 
9:          else      New regime detected 
10                   K 0, d 1, D 4 
11:                  while d D 1 do      Aggregate CEM until it is consistent for D iterations 
12:                            K K + 1 
13:                            C C + C ϕ n r i | [ Φ ϕ n ] ( k + K ) 
14:                            C + ( K ) 1 K C C ¯      Threshold aggregated CEM to get C + with 0 / 1 values 
15:                           if C + ( K ) = C + ( K 1 ) then 
16:                                     d d + 1 
17:                           else 
18:                                     d 1 
19:                           end if 
20:                  end while 
21:                   K K      Smallest K that satisfies the criterion in (2.11) 
22:                   Ξ r [ C + ( K ) = 0 ] 0      Select a sparse model structure and estimate parameters 
23:                   Ξ r [ C + ( K ) = 1 ] a r g m i n Ξ r m = 1 M K r ( t B k + m Δ t ) Ξ r Φ ( t B k + ( m 1 ) Δ t ) 2 
24:         Ξ Ξ + Ξ r 
25:                   f Ξ Φ 
26:          end if 
27:          return f  
28:    end for 

The data assimilation technique27 used in Sec. III D (SPEKF model) of numerical experiments has been summarized here.

The general form of dynamical systems from fluid mechanics and geophysics could be represented as
(B1)
where u C N is the state variable, ( L + D ) u is linear dispersion and dissipation effects, B ( u , u ) is energy-conserving quadratic form introducing nonlinear effect, F ( t ) is a deterministic external forcing term, σ C N × K is the noise matrix, and W ˙ C K is white noise.
This general form (B1) could be approximated by conditional Gaussian non-linear system (CGNS),
(B2)
with original state u be decomposed into X C N 1 and Y C N 2. A 0, a 0, A 1, B 1, and b 2 are vectors or matrix that could depend non-linearly on X and t. W ˙ 1 ( t ) and W ˙ 2 ( t ) are independent white noise.
For CGNS (B2), the optimal filter solution of Y ( t ) given X ( s ) , s [ 0 , T ] is
(B3)

(The notation is the complex conjugate transpose.)

The above filter solution only exploits information up to current time, not the entire time interval. A more smoother solution that utilizes whole information in time interval would be more accurate,
(B4)

[The notation d . / d t means solving (B4) backward from end point of interval [0,T] with μ s ( T ) = μ f ( T ) and R s ( T ) = R f ( T ).]

The conditional sampling formula of Y ( t ) given X ( s ) , s [ 0 , T ] could then be derived as
(B5)

The SPEKF model (3.4) is following the structure of conditional Gaussian non-linear system (B2): X corresponds to u and Y corresponds to [ γ ω b ] T with A 0 = γ ^ u + i ω ^ u + b ^, A 1 = [ u i u 1 ], B 1 = σ u, W ˙ 1 = W ˙ u, a 0 = 0, a 1 = [ d γ 0 0 0 d ω 0 0 0 d b ], b 2 = [ σ γ 0 0 0 σ ω 0 0 0 σ b ], W ˙ 2 = [ W ˙ γ W ˙ ω W ˙ b ] T.

With the observed variable u, three hidden variables γ , ω , b will be sampled according to the sampling formula (B5) and presented in Fig. 8. The smoother estimation in Fig. 9 is from the smoother formula (B4). The sampled trajectory is the one that satisfies the posterior distribution at each time instant and considers the temporal correlation at different time instants. The formula is used in generating the trajectories of the unobserved variables for computing the causation entropy in Sec. III D.

1.
A. J.
Majda
,
Introduction to Turbulent Dynamical Systems in Complex Systems
(
Springer
,
2016
).
2.
A.
Majda
and
X.
Wang
,
Nonlinear Dynamics and Statistical Theories for Basic Geophysical Flows
(
Cambridge University Press
,
2006
).
3.
S. H.
Strogatz
,
Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering
(
CRC Press
,
2018
).
4.
D.
Baleanu
,
J. A. T.
Machado
, and
A. C. J.
Luo
,
Fractional Dynamics and Control
(
Springer Science & Business Media
,
2011
).
5.
T.
Deisboeck
and
J.
Yasha Kresh
,
Complex Systems Science in Biomedicine
(
Springer Science & Business Media
,
2007
).
6.
J.
Stelling
,
A.
Kremling
,
M.
Ginkel
, and
K.
Bettenbrock
,
ED Gilles of Book: Foundations of Systems Biology
(
MIT Press
,
2001
).
7.
S. A.
Sheard
and
A.
Mostashari
, “
Principles of complex systems for systems engineering
,”
Syst. Eng.
12
(
4
),
295
311
(
2009
).
8.
J. M.
Amundson
,
M.
Fahnestock
,
M.
Truffer
,
J.
Brown
,
M. P.
Lüthi
, and
R. J.
Motyka
, “
Ice mélange dynamics and implications for terminus stability, Jakobshavn Isbræ, Greenland
,”
J. Geophys. Res.: Earth Surf.
115
(
F1
),
F01005
(
2010
).
9.
A. J.
Majda
and
B.
Gershgorin
, “
Elementary models for turbulent diffusion with complex physical features: Eddy diffusivity, spectrum and intermittency
,”
Philos. Trans. R. Soc. A
371
(
1982
),
20120184
(
2013
).
10.
A. V.
Holden
,
M.
Markus
, and
H. G.
Othmer
,
Nonlinear Wave Processes in Excitable Media
(
Springer
,
2013
), Vol. 244.
11.
B.
Lindner
,
J.
Garcıa-Ojalvo
,
A.
Neiman
, and
L.
Schimansky-Geier
, “
Effects of noise in excitable systems
,”
Phys. Rep.
392
(
6
),
321
424
(
2004
).
12.
N.
Chen
,
A. J.
Majda
, and
X. T.
Tong
, “
Spatial localization for nonlinear dynamical stochastic models for excitable media
,”
Chin. Ann. Math. Ser. B
40
(
6
),
891
924
(
2019
).
13.
W. K.-M.
Lau
and
D. E.
Waliser
,
Intraseasonal Variability in the Atmosphere-Ocean Climate System
(
Springer Science & Business Media
,
2011
).
14.
A. J.
Clarke
,
An Introduction to the Dynamics of El Niño and the Southern Oscillation
(
Elsevier
,
2008
).
15.
E. G.
Altmann
and
H.
Kantz
, “
Recurrence time analysis, long-term correlations, and extreme events
,”
Phys. Rev. E
71
(
5
),
056106
(
2005
).
16.
C. A.
Bronkhorst
,
H.
Cho
,
P. W.
Marcy
,
S. A.
Vander Wiel
,
S.
Gupta
,
D.
Versino
,
V.
Anghel
, and
G. T.
Gray III
, “
Local micro-mechanical stress conditions leading to pore nucleation during dynamic loading
,”
Int. J. Plast.
137
,
102903
(
2021
).
17.
A. J.
Majda
and
N.
Chen
, “
Model error, information barriers, state estimation and prediction in complex multiscale systems
,”
Entropy
20
(
9
),
644
(
2018
).
18.
S.
Wiggins
,
Introduction to Applied Nonlinear Dynamical Systems and Chaos
(
Springer Science & Business Media
,
2003
), Vol. 2.
19.
E.
Kalnay
,
Atmospheric Modeling, Data Assimilation and Predictability
(
Cambridge University Press
,
2003
).
20.
K.
Law
,
A.
Stuart
, and
K.
Zygalakis
,
Data Assimilation: A Mathematical Introduction
(
Springer
,
2015
), Vol. 62.
21.
M.
Branicki
,
N.
Chen
, and
A. J.
Majda
, “
Non-Gaussian test models for prediction and state estimation with model errors
,”
Chin. Ann. Math. Ser. B
34
(
1
),
29
64
(
2013
).
22.
D. A.
Freedman
,
Statistical Models: Theory and Practice
(
Cambridge University Press
,
2009
).
23.
X.
Yan
and
X.
Su
,
Linear Regression Analysis: Theory and Computing
(
World Scientific
,
2009
).
24.
A. J.
Majda
and
J.
Harlim
, “
Physics constrained nonlinear regression models for time series
,”
Nonlinearity
26
(
1
),
201
(
2012
).
25.
J.
Harlim
,
A.
Mahdi
, and
A. J.
Majda
, “
An ensemble Kalman filter for statistical estimation of physics constrained nonlinear regression models
,”
J. Comput. Phys.
257
,
782
812
(
2014
).
26.
N.
Chen
and
A. J.
Majda
, “
Conditional Gaussian systems for multiscale nonlinear stochastic systems: Prediction, state estimation and uncertainty quantification
,”
Entropy
20
(
7
),
509
(
2018
).
27.
N.
Chen
,
Y.
Li
, and
H.
Liu
, “
Conditional Gaussian nonlinear system: A fast preconditioner and a cheap surrogate model for complex nonlinear systems
,”
Chaos
32
(
5
),
053122
(
2022
).
28.
S. E.
Ahmed
,
S.
Pawar
,
O.
San
,
A.
Rasheed
,
T.
Iliescu
, and
B. R.
Noack
, “
On closures for reduced order models—A spectrum of first-principle to machine-learned avenues
,”
Phys. Fluids
33
(
9
),
091301
(
2021
).
29.
S.
Hijazi
,
G.
Stabile
,
A.
Mola
, and
G.
Rozza
, “
Data-driven POD-Galerkin reduced order model for turbulent flows
,”
J. Comput. Phys.
416
,
109513
(
2020
).
30.
K. K.
Lin
and
F.
Lu
, “
Data-driven model reduction, Wiener projections, and the Koopman-Mori-Zwanzig formalism
,”
J. Comput. Phys.
424
,
109864
(
2021
).
31.
B.
Peherstorfer
and
K.
Willcox
, “
Dynamic data-driven reduced-order models
,”
Comput. Methods Appl. Mech. Eng.
291
,
21
41
(
2015
).
32.
S. L.
Brunton
,
J. L.
Proctor
, and
J. N.
Kutz
, “
Discovering governing equations from data by sparse identification of nonlinear dynamical systems
,”
Proc. Natl. Acad. Sci.
113
(
15
),
3932
3937
(
2016
).
33.
S. H.
Rudy
,
S. L.
Brunton
,
J. L.
Proctor
, and
J. N.
Kutz
, “
Data-driven discovery of partial differential equations
,”
Sci. Adv.
3
(
4
),
e1602614
(
2017
).
34.
U.
Fasel
,
J. N.
Kutz
,
B. W.
Brunton
, and
S. L.
Brunton
, “
Ensemble-SINDy: Robust sparse model discovery in the low-data, high-noise limit, with active learning and control
,”
Proc. R. Soc. A
478
(
2260
),
20210904
(
2022
).
35.
H.
Schaeffer
,
R.
Caflisch
,
C. D.
Hauck
, and
S.
Osher
, “
Sparse dynamics for partial differential equations
,”
Proc. Natl. Acad. Sci.
110
(
17
),
6634
6639
(
2013
).
36.
S. A.
Billings
and
H.-L.
Wei
, “
Sparse model identification using a forward orthogonal regression algorithm aided by mutual information
,”
IEEE Trans. Neural Networks
18
(
1
),
306
310
(
2007
).
37.
R.
Mojgani
,
A.
Chattopadhyay
, and
P.
Hassanzadeh
, “
Discovery of interpretable structural model errors by combining bayesian sparse regression and data assimilation: A chaotic Kuramoto–Sivashinsky test case
,”
Chaos
32
(
6
),
061105
(
2022
).
38.
M.
Quade
,
M.
Abel
,
J. N.
Kutz
, and
S. L.
Brunton
, “
Sparse identification of nonlinear dynamics for rapid model recovery
,”
Chaos
28
(
6
),
063116
(
2018
).
39.
N.
Chen
, “
Learning nonlinear turbulent dynamics from partial observations via analytically solvable conditional statistics
,”
J. Comput. Phys.
418
,
109635
(
2020
).
40.
P.
Kim
,
J.
Rogers
,
J.
Sun
, and
E.
Bollt
, “
Causation entropy identifies sparsity structure for parameter estimation of dynamic systems
,”
J. Comput. Nonlinear Dyn.
12
(
1
),
011008
(
2017
).
41.
A. A. R.
AlMomani
,
J.
Sun
, and
E.
Bollt
, “
How entropic regression beats the outliers problem in nonlinear system identification
,”
Chaos
30
(
1
),
013107
(
2020
).
42.
J.
Fish
,
A.
DeWitt
,
A. A. R.
AlMomani
,
P. J.
Laurienti
, and
E.
Bollt
, “
Entropic regression with neurologically motivated applications
,”
Chaos
31
(
11
),
113105
(
2021
).
43.
A. A.
AlMomani
and
E.
Bollt
, “Erfit: Entropic regression fit matlab package, for data-driven system identification of underlying dynamic equations,” arXiv:2010.02411 (2020).
44.
H.
Xiao
,
J.-L.
Wu
,
J.-X.
Wang
,
R.
Sun
, and
C. J.
Roy
, “
Quantifying and reducing model-form uncertainties in Reynolds-averaged Navier–Stokes simulations: A data-driven, physics-informed Bayesian approach
,”
J. Comput. Phys.
324
,
115
136
(
2016
).
45.
X.-L.
Zhang
,
H.
Xiao
,
X.
Luo
, and
G.
He
, “
Ensemble Kalman method for learning turbulence models from indirect observation data
,”
J. Fluid Mech.
949
,
A26
(
2022
).
46.
T.
Schneider
,
A. M.
Stuart
, and
J.-L.
Wu
, “
Ensemble Kalman inversion for sparse learning of dynamical systems from time-averaged data
,”
J. Comput. Phys.
470
,
111559
(
2022
).
47.
S.
Pawar
,
S. E.
Ahmed
,
O.
San
, and
A.
Rasheed
, “
Data-driven recovery of hidden physics in reduced order modeling of fluid flows
,”
Phys. Fluids
32
(
3
),
036602
(
2020
).
48.
A.
Moosavi
,
R.
Stefanescu
, and
A.
Sandu
, “Efficient construction of local parametric reduced order models using machine learning techniques,” arXiv:1511.02909 (2015).
49.
O.
San
and
R.
Maulik
, “
Extreme learning machine for reduced order modeling of turbulent geophysical flows
,”
Phys. Rev. E
97
(
4
),
042322
(
2018
).
50.
A.
Beck
and
M.
Kurz
, “
A perspective on machine learning methods in turbulence modeling
,”
GAMM-Mitteilungen
44
(
1
),
e202100002
(
2021
).
51.
J.-X.
Wang
,
J.-L.
Wu
, and
H.
Xiao
, “
Physics-informed machine learning approach for reconstructing Reynolds stress modeling discrepancies based on DNS data
,”
Phys. Rev. Fluids
2
(
3
),
034603
(
2017
).
52.
J.-L.
Wu
,
H.
Xiao
, and
E.
Paterson
, “
Physics-informed machine learning approach for augmenting turbulence models: A comprehensive framework
,”
Phys. Rev. Fluids
3
(
7
),
074602
(
2018
).
53.
H.-J.
Rong
,
N.
Sundararajan
,
G.-B.
Huang
, and
P.
Saratchandran
, “
Sequential adaptive fuzzy inference system (SAFIS) for nonlinear system identification and prediction
,”
Fuzzy Sets Syst.
157
(
9
),
1260
1275
(
2006
).
54.
T. J. J.
Lombaerts
,
H. O.
Huisman
,
Q. P.
Chu
,
J. A.
Mulder
, and
D. A.
Joosten
, “
Nonlinear reconfiguring flight control based on online physical model identification
,”
J. Guid. Control Dyn.
32
(
3
),
727
748
(
2009
).
55.
Y.
Kopsinis
,
K.
Slavakis
, and
S.
Theodoridis
, “
Online sparse system identification and signal reconstruction using projections onto weighted 1 balls
,”
IEEE Trans. Signal Process.
59
(
3
),
936
952
(
2010
).
56.
N.
Kalouptsidis
,
G.
Mileounis
,
B.
Babadi
, and
V.
Tarokh
, “
Adaptive algorithms for sparse system identification
,”
Signal Process.
91
(
8
),
1910
1919
(
2011
).
57.
T.
Chen
,
M. S.
Andersen
,
L.
Ljung
,
A.
Chiuso
, and
G.
Pillonetto
, “
System identification via sparse multiple kernel-based regularization using sequential convex optimization techniques
,”
IEEE Trans. Autom. Control
59
(
11
),
2933
2945
(
2014
).
58.
S.
Srinivasan
,
J.
Billeter
, and
D.
Bonvin
, “
Sequential model identification of reaction systems—The missing path between the incremental and simultaneous approaches
,”
AIChE J.
65
(
4
),
1211
1221
(
2019
).
59.
G. A.
Gottwald
and
S.
Reich
, “
Supervised learning from noisy observations: Combining machine-learning techniques with data assimilation
,”
Physica D
423
,
132911
(
2021
).
60.
A.
Wikner
,
J.
Pathak
,
B. R.
Hunt
,
I.
Szunyogh
,
M.
Girvan
, and
E.
Ott
, “
Using data assimilation to train a hybrid forecast system that combines machine-learning and knowledge-based components
,”
Chaos
31
(
5
),
053114
(
2021
).
61.
T.
Schneider
,
A. M.
Stuart
, and
J.-L.
Wu
, “
Learning stochastic closures using ensemble Kalman inversion
,”
Trans. Math. Appl.
5
(
1
),
tnab003
(
2021
).
62.
R. E.
Kalman
, “
A new approach to linear filtering and prediction problems
,”
J. Basic Eng.
82
(
1
),
35
45
(
1960
).
63.
R.
Tibshirani
, “
Regression shrinkage and selection via the LASSO
,”
J. R. Stat. Soc., Ser. B
58
(
1
),
267
288
(
1996
).
64.
J.
Elinger
, “Information theoretic causality measures for parameter estimation and system identification,” Ph.D. thesis (Georgia Institute of Technology, 2020).
65.
J.
Elinger
and
J.
Rogers
, “Causation entropy method for covariate selection in dynamic models,” in 2021 American Control Conference (ACC) (IEEE, 2021), pp. 2842–2847.
66.
L.
Breiman
, “
Bagging predictors
,”
Mach. Learn.
24
,
123
140
(
1996
).
67.
Y.
Freund
and
R. E.
Schapire
et al., “Experiments with a new boosting algorithm,” in ICML (Citeseer, 1996), Vol. 96, pp. 148–156.
68.
J. H.
Friedman
, “
Stochastic gradient boosting
,”
Comput. Stat. Data Anal.
38
(
4
),
367
378
(
2002
).
69.
R. E.
Schapire
and
Y.
Freund
, “
Boosting: Foundations and algorithms
,”
Kybernetes
42
(
1
),
164
166
(
2013
).
70.
T.
Chen
and
C.
Guestrin
, “
Xgboost: A scalable tree boosting system
,” in
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(
2016
), pp.
785
794
.
71.
C. J.
Quinn
,
N.
Kiyavash
, and
T. P.
Coleman
, “
Directed information graphs
,”
IEEE Trans. Inf. Theory
61
(
12
),
6887
6909
(
2015
).
72.
N.
Chen
and
Y.
Zhang
, “
A causality-based learning approach for discovering the underlying dynamics of complex systems from partial observations with stochastic parameterization
,”
Physica D
449
,
133743
(
2023
).
73.
A. D.
Wyner
, “
A definition of conditional mutual information for arbitrary ensembles
,”
Inf. Control
38
(
1
),
51
59
(
1978
).
74.
J.
Massey
et al., Causality, feedback and directed information,” in Proceedings of International Symposium on Information Theory and its Applications (ISITA-90) (IEEE, 1990), pp. 303–305.
75.
T.
Schreiber
, “
Measuring information transfer
,”
Phys. Rev. Lett.
85
(
2
),
461
(
2000
).
76.
A.
Lozano-Durán
,
H.
Jane Bae
, and
M. P.
Encinar
, “
Causality of energy-containing eddies in wall turbulence
,”
J. Fluid Mech.
882
,
A2
(
2020
).
77.
A.
Lozano-Durán
and
G.
Arranz
, “
Information-theoretic formulation of dynamical systems: Causality, modeling, and control
,”
Phys. Rev. Res.
4
(
2
),
023195
(
2022
).
78.
R.
Vicente
,
M.
Wibral
,
M.
Lindner
, and
G.
Pipa
, “
Transfer entropy a model-free measure of effective connectivity for the neurosciences
,”
J. Comput. Neurosci.
30
,
45
67
(
2011
).
79.
T.
Bossomaier
,
L.
Barnett
,
M.
Harré
,
J. T.
Lizier
,
T.
Bossomaier
,
L.
Barnett
,
M.
Harré
, and
J. T.
Lizier
,
Transfer Entropy
(
Springer
,
2016
).
80.
J.
Sun
and
E. M.
Bollt
, “
Causation entropy identifies indirect influences, dominance of neighbors and anticipatory couplings
,”
Physica D
267
,
49
57
(
2014
).
81.
N.
Branchini
,
V.
Aglietti
,
N.
Dhir
, and
T.
Damoulas
, “Causal entropy optimization,” arXiv:2208.10981 (2022).
82.
J.
Sun
,
D.
Taylor
, and
E. M.
Bollt
, “
Causal network inference by optimal causation entropy
,”
SIAM J. Appl. Dyn. Syst.
14
(
1
),
73
106
(
2015
).
83.
R. E.
Bellman
,
Dynamic Programming Treatment of the Traveling Salesman Problem
(
RAND Corporation
,
1961
).
84.
M. K.
Tippett
,
R.
Kleeman
, and
Y.
Tang
, “
Measuring the potential utility of seasonal climate predictions
,”
Geophys. Res. Lett.
31
(
22
),
L22201
, https://doi.org/10.1029/2004GL021575 (
2004
).
85.
R.
Kleeman
, “
Information theory and dynamical system predictability
,”
Entropy
13
(
3
),
612
649
(
2011
).
86.
M.
Branicki
and
A. J.
Majda
, “
Quantifying uncertainty for predictions with model error in non-Gaussian systems with intermittency
,”
Nonlinearity
25
(
9
),
2543
(
2012
).
87.
L.
Barnett
,
A. B.
Barrett
, and
A. K.
Seth
, “
Granger causality and transfer entropy are equivalent for gaussian variables
,”
Phys. Rev. Lett.
103
(
23
),
238701
(
2009
).
88.
A.
Shojaie
and
E. B.
Fox
, “
Granger causality: A review and recent advances
,”
Annu. Rev. Stat. Appl.
9
,
289
319
(
2022
).
89.
E. N.
Lorenz
, “
Deterministic nonperiodic flow
,”
J. Atmos. Sci.
20
(
2
),
130
141
(
1963
).
90.
C.
Sparrow
,
The Lorenz Equations: Bifurcations, Chaos, and Strange Attractors
(
Springer Science & Business Media
,
2012
), Vol. 41.
91.
H.
Haken
, “
Analogy between higher instabilities in fluids and lasers
,”
Phys. Lett. A
53
(
1
),
77
78
(
1975
).
92.
E.
Knobloch
, “
Chaos in the segmented disc dynamo
,”
Phys. Lett. A
82
(
9
),
439
440
(
1981
).
93.
M.
Gorman
,
P. J.
Widmann
, and
K. A.
Robbins
, “
Nonlinear dynamics of a convection loop: A quantitative comparison of experiment with theory
,”
Physica D
19
(
2
),
255
267
(
1986
).
94.
N.
Hemati
, “
Strange attractors in brushless DC motors
,”
IEEE Trans. Circuits Syst. I: Fundam. Theory Appl.
41
(
1
),
40
45
(
1994
).
95.
K. M.
Cuomo
and
A. V.
Oppenheim
, “
Circuit implementation of synchronized chaos with applications to communications
,”
Phys. Rev. Lett.
71
(
1
),
65
(
1993
).
96.
D.
Poland
, “
Cooperative catalysis and chemical chaos: A chemical model for the lorenz equations
,”
Physica D
65
(
1-2
),
86
99
(
1993
).
97.
S. I.
Tzenov
, “Strange attractors characterizing the osmotic instability,” arXiv:1406.0979 (2014).
98.
E. N.
Lorenz
, “Predictability: A problem partly solved,” in Proceedings of Seminar on Predictability (Reading, 1996), Vol. 1.
99.
D. S.
Wilks
, “
Effects of stochastic parametrizations in the Lorenz’96 system
,”
Q. J. R. Meteorol. Soc.
131
(
606
),
389
407
(
2005
).
100.
Y.
Lee
and
A. J.
Majda
, “
State estimation and prediction using clustered particle filters
,”
Proceedings of the National Academy of Sciences
113
(
51
),
14609
14614
(
2016
).
101.
H. M.
Arnold
,
I. M.
Moroz
, and
T. N.
Palmer
, “
Stochastic parametrizations and model uncertainty in the Lorenz’96 system
,”
Philos. Trans. R. Soc. A
371
(
1991
),
20110479
(
2013
).
102.
N.
Chen
and
A. J.
Majda
, “
Beating the curse of dimension with accurate statistics for the Fokker–Planck equation in complex turbulent systems
,”
Proc. Natl. Acad. Sci.
114
(
49
),
12864
12869
(
2017
).
103.
A.
Majda
,
Introduction to PDEs and Waves for the Atmosphere and Ocean
(
American Mathematical Society
,
2003
), Vol. 9.
104.
G. K.
Vallis
,
Atmospheric and Oceanic Fluid Dynamics
(
Cambridge University Press
,
2017
).
105.
W. W.
Grabowski
, “
An improved framework for superparameterization
,”
J. Atmos. Sci.
61
(
15
),
1940
1952
(
2004
).
106.
D. J.
Gagne
,
H. M.
Christensen
,
A. C.
Subramanian
, and
A. H.
Monahan
, “
Machine learning for stochastic parameterization: Generative adversarial networks in the Lorenz’96 model
,”
J. Adv. Model. Earth Syst.
12
(
3
),
e2019MS001896
(
2020
).
107.
A.
Chattopadhyay
,
P.
Hassanzadeh
, and
D.
Subramanian
, “
Data-driven predictions of a multiscale Lorenz 96 chaotic system using machine-learning methods: Reservoir computing, artificial neural network, and long short-term memory network
,”
Nonlinear Process. Geophys.
27
(
3
),
373
389
(
2020
).
108.
K.
Bergemann
and
S.
Reich
, “
A localization technique for ensemble Kalman filters
,”
Q. J. R. Meteorol. Soc.
136
(
651
),
1636
1643
(
2010
).
109.
J. L.
Anderson
, “
Exploring the need for localization in ensemble data assimilation using a hierarchical ensemble filter
,”
Physica D
230
(
1-2
),
99
111
(
2007
).
110.
T.
Janjić
,
L.
Nerger
,
A.
Albertella
,
J.
Schröter
, and
S.
Skachko
, “
On domain localization in ensemble-based Kalman filter algorithms
,”
Mon. Weather Rev.
139
(
7
),
2046
2060
(
2011
).
111.
A. J.
Majda
and
J.
Harlim
,
Filtering Complex Turbulent Systems
(
Cambridge University Press
,
2012
).
112.
A. J.
Majda
,
I.
Timofeyev
, and
E.
Vanden-Eijnden
, “
Systematic strategies for stochastic mode reduction in climate
,”
J. Atmos. Sci.
60
(
14
),
1705
1722
(
2003
).
113.
B.
Gershgorin
,
J.
Harlim
, and
A. J.
Majda
, “
Improving filtering and prediction of spatially extended turbulent systems with model errors through stochastic parameter estimation
,”
J. Comput. Phys.
229
(
1
),
32
57
(
2010
).
114.
B.
Gershgorin
,
J.
Harlim
, and
A. J.
Majda
, “
Test models for improving filtering with model errors through stochastic parameter estimation
,”
J. Comput. Phys.
229
(
1
),
1
31
(
2010
).
115.
C. W.
Gardiner
, Handbook of Stochastic Methods for Physics, Chemistry and the Natural Sciences, Springer Series in Synergetics Vol. 13 (Springer-Verlag, Berlin, Heidelberg, 2004).
116.
N.
Chen
,
Stochastic Methods for Modeling and Predicting Complex Dynamical Systems: Uncertainty Quantification, State Estimation, and Reduced-Order Models
(
Springer Nature
,
2023
).
117.
M.
Branicki
and
A. J.
Majda
, “
Dynamic stochastic superresolution of sparsely observed turbulent systems
,”
J. Comput. Phys.
241
,
333
363
(
2013
).
118.
M.
Branicki
,
A. J.
Majda
, and
K. J. H.
Law
, “Accuracy of some approximate Gaussian filters for the Navier-Stokes equation in the presence of model error,”
Multiscale Model. Simul.
16
(4),
1756
1794
(2018).
119.
G.
Evensen
and
P.
Jan Van Leeuwen
, “
An ensemble Kalman smoother for nonlinear dynamics
,”
Mon. Weather Rev.
128
(
6
),
1852
1867
(
2000
).
120.
C. C.
Chuang
(2023). “,”
Github
, https://github.com/ChuanqiChenCC/CEBoosting