Big data have become a critically enabling component of emerging mathematical methods aimed at the automated discovery of dynamical systems, where first principles modeling may be intractable. However, in many engineering systems, abrupt changes must be rapidly characterized based on limited, incomplete, and noisy data. Many leading automated learning techniques rely on unrealistically large data sets, and it is unclear how to leverage prior knowledge effectively to re-identify a model after an abrupt change. In this work, we propose a conceptual framework to recover parsimonious models of a system in response to abrupt changes in the low-data limit. First, the abrupt change is detected by comparing the estimated Lyapunov time of the data with the model prediction. Next, we apply the sparse identification of nonlinear dynamics (SINDy) regression to update a previously identified model with the fewest changes, either by addition, deletion, or modification of existing model terms. We demonstrate this sparse model recovery on several examples for abrupt system change detection in periodic and chaotic dynamical systems. Our examples show that sparse updates to a previously identified model perform better with less data, have lower runtime complexity, and are less sensitive to noise than identifying an entirely new model. The proposed abrupt-SINDy architecture provides a new paradigm for the rapid and efficient recovery of a system model after abrupt changes.

Dynamical systems modeling is a cornerstone of modern mathematical physics and engineering. The dynamics of many complex systems (e.g., neuroscience, climate, and epidemiology) may not have first-principles derivations, and researchers are increasingly using data-driven methods for system identification and the discovery of dynamics. Related to the discovery of dynamical systems models from data is the recovery of these models following abrupt changes to the system dynamics. In many domains, such as aviation, model recovery is mission critical, and must be achieved rapidly and with limited noisy data. This paper leverages recent advances in sparse optimization to identify the fewest terms required to recover a model, introducing the concept of parsimony of change. In other words, many abrupt system changes, even catastrophic bifurcations, may be characterized with relatively few changes to the terms in the underlying model. In this work, we show that sparse optimization enables rapid model recovery that is faster, requires less data, is more accurate, and has higher noise robustness than the alternative approach of re-characterizing a model from scratch.

The data-driven discovery of physical laws and dynamical systems is poised to revolutionize how we model, predict, and control physical systems. Advances are driven by the confluence of big data, machine learning, and modern perspectives on dynamics and control. However, many modern techniques in machine learning (e.g., neural networks) often rely on access to massive data sets, have limited ability to generalize beyond the attractor where data are collected, and do not readily incorporate the known physical constraints. These various limitations are framing many state-of-the-art research efforts around learning algorithms,1 especially as it pertains to generalizability, limited data, and one-shot learning.2–4 Such limitations also frame the primary challenges and limitations associated with the data-driven discovery for real-time control of strongly nonlinear, high-dimensional, multi-scale systems with abrupt changes in the dynamics. Whereas traditional methods often require unrealistic amounts of training data to produce a viable model, this work focuses on methods that take advantage of prior experience and knowledge of physics to dramatically reduce the data and time required to characterize dynamics. Our methodology is similar in philosophy to the machine learning technique of transfer learning,5 which allows networks trained on one task to be efficiently adapted to another task. Our architecture is designed around the goal of rapidly extracting parsimonious, nonlinear dynamical models that identify only the fewest important interaction terms so as to avoid overfitting.

There are many important open challenges associated with the data-driven discovery of dynamical systems for real-time tracking and control. When abrupt changes occur in the system dynamics, an effective controller must rapidly characterize and compensate for the new dynamics, leaving little time for recovery based on limited data.6 The primary challenge in real-time model discovery is the reliance on large quantities of training data. A secondary challenge is the ability of models to generalize beyond the training data, which is related to the ability to incorporate new information and quickly modify the model. Machine learning algorithms often suffer from overfitting and a lack of interpretability, although the application of these algorithms to physical systems offers a unique opportunity to enforce known symmetries and physical constraints (e.g., conservation of mass). Inspired by biological systems, which are capable of extremely fast adaptation and learning based on very few trials of new information,7–9 we propose model discovery techniques that leverage an experiential framework, where known physics, symmetries, and conservation laws are used to rapidly infer model changes with limited data.

There are a wealth of regression techniques for the characterization of system dynamics from data, with varying degrees of generality, accuracy, data requirements, and computational complexity. Classical linear model identification algorithms include Kalman filters,10–12 the eigensystem realization algorithm (ERA),13 dynamic mode decomposition (DMD),14–17 and autoregressive moving average (ARMA) models,18,19 to name only a few. The resulting linear models are ideal for control design, but are unable to capture the underlying nonlinear dynamics or structural changes. Increasingly, machine learning is being used for nonlinear model discovery. Neural networks have been used for decades to identify nonlinear systems,20 and are experiencing renewed interest because of the ability to train deeper networks with more data1,21–25 and the promise of transformations that linearize dynamics via the Koopman operator.26,27 Neural networks show good capacity to recover the dynamics in a so-called “model-free” way.28,29 These methods are also known as “reservoir computers,” “liquid state machines,” or “echo state networks,” depending on the context. However, a real-time application is unrealistic, and the output is generally not analytically interpretable. In another significant vein of research, genetic programming30,31 is a powerful bio-inspired method that has successfully been applied to system identification,32–35 time-series prediction36,37 and control.38,39 However, evolutionary methods in their pure form, including genetic programming, are computationally complex and thus are not suitable for real-time tracking.

Recently, interpretability and parsimony have become important themes in nonlinear system identification.32,33 A common goal now is to identify the fewest terms required to represent the nonlinear structure of a dynamical system model while avoiding overfitting.40 Symbolic regression methods33,40–42 are generally appealing for system identification of structural changes, although they may need to be adapted to the low-data limit and for faster processing time. Nonparametric additive regression models43–45 require a backfitting loop which allows general transformations, but may be prohibitively slow for real-time applications. Generalized linear regression methods are slightly less general but can be brought to a fast evaluation and sparse representation.40,42 These leading approaches to identify dynamical equations from data usually rely on past data and aim at a reliable reproduction of a stationary system, i.e., when the underlying equations do not change in the course of time.33,40,45

In this work, we develop an adaptive modification of the sparse identification of nonlinear dynamics (SINDy) algorithm40 for real-time recovery of a model following abrupt changes to the system dynamics. We refer to this modeling framework as abrupt-SINDy. Although this is not the only approach for real-time change detection and recovery, parsimony and sparsity are natural concepts to track abrupt changes, focusing on the fewest modifications to an existing model. SINDy already requires relatively small amounts of data,46 is based on fast regression techniques, and has been extended to identify partial differential equations (PDEs),47,48 to include known constraints and symmetries,49 to work with limited measurements50 and highly corrupted and noisy data,51,52 to include control inputs,46,53 and to incorporate information criteria to assess the model quality,54 which will be useful in abrupt model recovery.

Here, we demonstrate that the abrupt-SINDy architecture is capable of rapidly identifying sparse changes to an existing model to recover the new dynamics following an abrupt change to the system. The first step in the adaptive identification process is to detect a system change using divergence of the prediction from measurements. Next, an existing model is updated with sparse corrections, including parameter variations, deletions, and additions of terms. We show that identifying sparse model changes from an existing model requires less data, less computation, and is more robust to noise than identifying a new model from scratch. Further, we attempt to maintain a critical attitude and caveat limitations of the proposed approach, highlighting when it can break down and suggesting further investigation. The overarching framework is illustrated in Fig. 1.

FIG. 1.

Schematic overview of abrupt-SINDy method. (top) Illustration of single variation in a term, giving rise to an abrupt change in the dynamics, from black to blue. (bottom) Canonical sparse changes, including parameter variation, addition, deletion, or a combination. The data and results for the top panel are from the example in Sec. IV A below.

FIG. 1.

Schematic overview of abrupt-SINDy method. (top) Illustration of single variation in a term, giving rise to an abrupt change in the dynamics, from black to blue. (bottom) Canonical sparse changes, including parameter variation, addition, deletion, or a combination. The data and results for the top panel are from the example in Sec. IV A below.

Close modal

Recently, sparse regression in a library of candidate nonlinear functions has been used for sparse identification of nonlinear dynamics (SINDy) to efficiently identify a sparse model structure from data.40 The SINDy architecture bypasses an intractable brute-force search through all possible models, leveraging the fact that many dynamical systems of the form

ddtx=f(x)
(1)

have dynamics f that are sparse in the state variable xn. Such models may be identified using a sparsity-promoting regression55–57 that penalizes the number of nonzero terms ξij in a generalized linear model

fî(x)=j=1pξijθj(x),
(2)

where θj(x) form a set of nonlinear candidate functions. The candidate functions may be chosen to be polynomials, trigonometric functions, or a more general set of functions.40,42 With poor choice of the candidate functions θj, i.e., if library functions are non-orthogonal and/or overdetermined, the SINDy approach may fail to identify the correct model.

Sparse models may be identified from time-series data, which are collected and formed into the data matrix

X=[x1x2xm]T.
(3)

We estimate the time derivatives using a simple forward Euler finite-difference scheme, i.e., the difference of two consecutive data, divided by the time difference

Ẋ=[ẋ1ẋ2ẋm]T.
(4)

This estimation procedure is numerically ill-conditioned if data are noisy, although there are many methods to handle noise which work very well if used correctly.58,59 Noise-robust derivatives were investigated in the original SINDy algorithm.40 Next, we consider a library of candidate nonlinear functions Θ(X) of the form

Θ(X)=[1XX2Xdsin(X)].
(5)

Here, the matrix Xd denotes a matrix with column vectors given by all possible time-series of d-th degree polynomials in the state x. The terms in Θ can be functional forms motivated by knowledge of the physics. Within the proposed work, they may parameterize a piecewise-affine dynamical model. Following best practices of statistical learning,56 to preprocess, we mean-subtract and normalize each column of Θ to have unit variance. The dynamical system can now be represented in terms of the data matrices as

ẊΘ(X)Ξ.
(6)

The coefficients in the column Ξk of Ξ determine the active terms in the k-th row of Eq. (2). A parsimonious model has the fewest terms in Ξ required to explain the data. One option to obtain a sparse model is via convex 1-regularized regression

Ξ=argminΞẊΘ(X)Ξ2+γΞ1.
(7)

The hyper parameter γ balances complexity and sparsity of the solution. Sparse regression, such as least absolute shrinkage and selection operator (LASSO)55 and sequential thresholded least-squares,40 improves the robustness of identification for noisy overdetermined data, in contrast to earlier methods60 using compressed sensing.61,62 Other regularization schemes may be used to improve performance, such as the elastic net regression.63 

In this paper we use the sequentially thresholded ridge regression,47 which iteratively solves the ridge regression

Ξ=argminΞẊΘ(X)Ξ2+αΞ2.
(8)

and then thresholds any coefficient that is smaller than γ. The procedure is repeated on the non-zero entries of Ξ until the model converges. The convergence of the SINDy architecture has been discussed in Ref. 64. After a sparse model structure has been identified in normalized coordinates, it is necessary to regress onto this sparse structure in the original unnormalized coordinates. Otherwise, non-physical constant terms appear when transforming back from normalized coordinates due to the mean-subtraction.

In La Cava et al.35 the authors pursue a complementary although more computationally intensive idea of adaptive modeling in the context of generalized linear models. Starting from an initial guess for the model, a brute force search is conducted to scan a larger set of candidate functions θθθγ, where θ are multiplicative extensions to the initial set of candidate functions and γ are real valued exponents. The intended use of this method is the refinement of first-principle based models by the discovery of coupling terms. It is possible to combine this refinement with our proposed scheme for dealing with abrupt changes. In addition, sparse sensors65 and randomized algorithms66 may improve speed.

The viewpoint of sparsity extends beyond model discovery, and we propose to extend SINDy to identify systems undergoing abrupt changes. It may be the case that abrupt model changes will only involve the addition, deletion, or modification of a few terms in the model. This is a statement of the parsimony of change, and indicates that we can use sparse regression to efficiently identify the new or missing terms with considerably less data than required to identify a new model from scratch. In general, each additional term that must be identified requires additional training data to distinguish between joint effects. Thus, having only a few changes reduces the amount of data required, making the model recovery more rapid. This section will describe a procedure that extends SINDy to handle three basic types of model changes:

  • Variation of a term. If the structure of the model is unchanged and only the parameters vary, we will perform least-squares regression on the known structure to identify the new parameters. This is computationally fast, and it is easy to check if the model explains the new dynamics, or if it is necessary to explore possible additions or deletions of terms.

  • Deletion of a term. If the model changes by the removal of a few terms, then SINDy regression can be applied on the sparse coefficients in order to identify which terms have dropped out.

  • Addition of a term. If a term is added, then SINDy regression will find the sparsest combination of inactive terms that explain the model error. Since least squares regression scales asymptotically O(p3), with p as the number of columns in the library, this is computationally less expensive than regression in the entire library.

Combinations of these changes, such as a simultaneous addition and deletion, are more challenging and will also be explored. This approach is known as abrupt-SINDy, and it is depicted schematically in Fig. 2.

FIG. 2.

Adaptive SINDy flow chart. For an initial model and hyper parameter selection, a gridsearch is conducted. Next, we apply a predictor corrector scheme checking every terror for model divergence using estimated Lyapunov time, and eventually update the model in a two step fashion.

FIG. 2.

Adaptive SINDy flow chart. For an initial model and hyper parameter selection, a gridsearch is conducted. Next, we apply a predictor corrector scheme checking every terror for model divergence using estimated Lyapunov time, and eventually update the model in a two step fashion.

Close modal

First, we must identify a baseline SINDy model, and we use a gridsearch to determine the optimal hyper parameter selection. In gridsearch, all combinations of hyper parameters are tested and the best performing set is selected. This search is only performed once, locking in hyper parameters for future updates. The baseline model is characterized by the sparse coefficients in Ξ0.

It is essential to rapidly detect any change in the model, and we employ a classical predictor-corrector scheme.12 The predictor step is performed over a time τpred in the interval t,t+τpred using the model valid at time t. The divergence of the predicted and measured state is computed at t+τ as Δx=x̂(t+τ)x(t+τ), where x̂ is the prediction and x is the measurement. The idea is to identify when the model and the measurement diverge faster than predicted by the dynamics of the system. For a chaotic system, the divergence of a trajectory is measured by the largest Lyapunov exponent of the system,67 although a wealth of similar measures has been suggested.68 The Lyapunov exponent is defined as

λ=limτlimΔx(t0)0log(Δx(t0+τ)Δx(t0))τ,
(9)

and its inverse sets the fastest timescale. Here, the analogy of ensemble and time average is used, more precisely the local, finite-time equivalent.69,70 An improvement can be achieved by exploiting an ensemble, e.g., by adding noise to the state x that corresponds to the given measurement accuracy. Since we know the dynamical system for the prediction step, the Lyapunov exponent is determined by evolving the tangent space with the system.71,72

In our detection algorithm, we fix a fluctuation tolerance Δx and measure if the divergence time we find deviates from the expectation. If data are noisy, this tolerance must be significantly larger than the typical fluctuation scale of the noise. Formally, the model and measurements diverge if the timescale given by the local Lyapunov exponent and prediction horizon disagree. The local Lyapunov exponent is computed directly from the eigenvalues of the dynamical system.71,72 The prediction horizon T(t) is the first passage time where prediction x̂(t+Δt) with initial condition x(t) and measurement x(t+Δt) differ by more than Δx

T(t)=argmaxΔt||x̂(t+Δt)x(t+Δt)||<||Δx||.
(10)

Analogous to the local Lyapunov exponent, we compute the ratio logΔx(t0+τ)/Δx(t0) as a measure for the divergence based on the measurement. For the model, we compute the local Lyapunov exponent as the average maximum eigenvalue λ¯(t)=λ(t)t[t,t+T] with λ(t)=max(λi(t)) and λivi(t)=fj/xk|x(t)vi(t). Thus, we compare the expected and observed trajectory divergence. Model and measurement have diverged at time t if the model timescale and the measured one differ:

λ¯(t)>αlog(Δx)log(Δ¯(t))T(t).
(11)

If the model is not chaotic, but the measurement is chaotic, one must invert the inequality, as in Fig. 3. The empirical factor α accounts for finite-time statistics.

FIG. 3.

Sketch of the prediction horizon estimation. We use the observation x(t) as initial condition for the current model. Integration gives x̂(t). The prediction horizon T(t) is calculated according to Eq. (10). The prediction horizon is a function of time and the current model. It indicates the divergence of model and observation. For details see text.

FIG. 3.

Sketch of the prediction horizon estimation. We use the observation x(t) as initial condition for the current model. Integration gives x̂(t). The prediction horizon T(t) is calculated according to Eq. (10). The prediction horizon is a function of time and the current model. It indicates the divergence of model and observation. For details see text.

Close modal

This method depends heavily on the particular system under investigation, including the dynamics, timescales, and sampling rate. In a practical implementation, these considerations must be handled carefully and automatically. It is important to note that we are able to formulate the divergence in terms of dynamical systems theory, because our model is a dynamical system, in other cases, such as artificial neural networks, this is not possible due to the limited mathematical framework.

After a change is detected, the following procedure is implemented to rapidly recover the model:

  1. First, the new data are regressed onto the existing sparse structure Ξ0 to identify varying parameters.

  2. Next, we identify deletions of terms by performing the sparse regression on the sparse columns of Θ that correspond to nonzero rows in Ξ0. This is more efficient than identifying a new model, as we only seek to delete existing terms from the model.

  3. Finally, if there is still a residual error, then a sparse model is fit for this error in the inactive columns of Θ that correspond to zero rows in Ξ0. In this way, new terms may be added to the model.

If the residual is sufficiently small after any step, the procedure ends. Alternatively, the procedure may be iterated until convergence. We are solving smaller regression problems by restricting our attention to subsets of the columns of Θ. These smaller regressions require less data and are less computationally expensive,63 compared to fitting a new model. The deletion-addition procedure is performed after a model divergence is detected, using new transient data collected in an interval of size tupdate.

In this section, we describe the results of the abrupt-SINDy framework on dynamical systems with abrupt changes, including parameter variation, deletion of terms, and addition of terms. The proposed algorithm is compared against the original SINDy algorithm, which is used to identify a new model from scratch, in terms of data required, computational time, and model accuracy.

In each case, we begin by running a gridsearch algorithm73,74 to identify the main parameters: α, the ridge regression regularization parameter; γ, the thresholding parameter; ndegree, the maximum degree of the polynomial feature transformation; and nfold, the number of cross-validation runs. For scoring we use the explained variance score and conduct a five-fold cross validation for each point in the (α,γ,ndegree) parameter grid. Parameters are listed in Table I.

TABLE I.

Parameters for the grid search.

ParameterValue
α 0,0.2,0.4,0.6,0.8,0.95 
γ 0.1,0.2,0.4 
ndegree 2, 3 
nfold 
Seed 42 
CV k-fold 
Score explained variance score 
ParameterValue
α 0,0.2,0.4,0.6,0.8,0.95 
γ 0.1,0.2,0.4 
ndegree 2, 3 
nfold 
Seed 42 
CV k-fold 
Score explained variance score 

The Lorenz system is a well-studied, and highly simplified, conceptual model for atmospheric circulation75 

ẋ=σ(yx),ẏ=ρxxzy,ż=xyβz,
(12)

where the parameter ρ represents the heating of the atmosphere, corresponding to the Rayleigh number, σ corresponds to Prandtl number, and β to the aspect ratio.76 The parameters are set to ρ = 28, β=8/3, σ = 10 (Table I).

In the following, we integrate the system numerically to produce a reference data set. We deliberately change the parameter ρ at t = 40 to ρ = 15 and at t = 80 back to ρ = 28, as shown in Figs. 4 and 5. These parametric changes lead to a bifurcation in the dynamics, and they are detected quickly. The subsequent adapted parameters are accurately detected up to two digits, as shown in Table II. Because we are identifying the sparse model structure on a normalized library Θ, with zero mean and unit variance, we must de-bias the parameter estimates by computing a least-squares regression onto the sparse model structure in the original unnormalized variables. Otherwise, computing the least-squares regression in the normalized library, as is typically recommended in machine learning, would result in non-physical constant terms in the original unnormalized coordinates.

FIG. 4.

Time-series of the y coordinate of the Lorenz system. The blue and green segments correspond to the system parameters σ=10,ρ=28,β=83. The orange segment from t = 40 and t = 80 corresponds to the modified parameter ρ = 15. The initial condition is x0=(1,1,1).

FIG. 4.

Time-series of the y coordinate of the Lorenz system. The blue and green segments correspond to the system parameters σ=10,ρ=28,β=83. The orange segment from t = 40 and t = 80 corresponds to the modified parameter ρ = 15. The initial condition is x0=(1,1,1).

Close modal
FIG. 5.

Lorenz system: Colors and parameters as in Fig. 4. In (a)–(c), we show the first, second, and third segments of the trajectory in color with the concatenated trajectory in grey. The system changes from a butterfly attractor to a stable fixed point and back to a butterfly attractor.

FIG. 5.

Lorenz system: Colors and parameters as in Fig. 4. In (a)–(c), we show the first, second, and third segments of the trajectory in color with the concatenated trajectory in grey. The system changes from a butterfly attractor to a stable fixed point and back to a butterfly attractor.

Close modal
TABLE II.

Lorenz system: detection and update times, along with identified equations. The detection time coincides up to the second digit with the true switching time. The rapidly identified model agrees well with the true model structure and parameters. Coefficients are rounded to the second digit.

tdetectedtupdateEquations
0.00 10.0 ẋ=10.0x+10.0yẏ=27.96x0.99y1.0xzż=2.67z+1.0xy 
40.01 41.0 ẋ=10.0x+10.0yẏ=15.0x1.0y1.0xzż=2.67z+1.0xy 
80.02 81.0 ẋ=10.0x+10.0yẏ=27.98x1.0y1.0xzż=2.67z+1.0xy 
tdetectedtupdateEquations
0.00 10.0 ẋ=10.0x+10.0yẏ=27.96x0.99y1.0xzż=2.67z+1.0xy 
40.01 41.0 ẋ=10.0x+10.0yẏ=15.0x1.0y1.0xzż=2.67z+1.0xy 
80.02 81.0 ẋ=10.0x+10.0yẏ=27.98x1.0y1.0xzż=2.67z+1.0xy 

Abrupt changes to the system parameters are detected using the prediction horizon from Eq. (10). When the system changes, the prediction horizon of the system should decrease, with smaller horizon corresponding to a more serious change. Conversely, the inverse time, corresponding to the Lyapunov exponent, should diverge. Figure 6 exhibits this expected behavior. After a change is detected the model is rapidly recovered as shown in Table II. It is important to confirm that the updated model accurately represents the structure of the true dynamics. Figure 6 shows the norm of the model coefficients, ξξ̂, which is a measure of the distance between the estimated and true systems. Except for a short time (tupdate=1) after the abrupt change, the identified model closely agrees with the true model.

FIG. 6.

Lorenz system: (a) We show the model accuracy over time. For coefficients, see Table II. For t10, no model is available and ξξ̂=1. At both switch points, t = 40 and t = 80, tupdate=1 is needed to update the model. During this interval, a fallback solution, e.g., DMD could be implemented. Note that the accuracy metric requires knowledge about the ground truth and thus is only available in a hindcast scenario. (b) Evaluation of Eq. (10). At both switch points, t = 40 and t = 80, we quickly detect the divergence of model and measurement. The parameters are tmodel=10,tupdate=1,terror=0.5, and Δx=1.0.

FIG. 6.

Lorenz system: (a) We show the model accuracy over time. For coefficients, see Table II. For t10, no model is available and ξξ̂=1. At both switch points, t = 40 and t = 80, tupdate=1 is needed to update the model. During this interval, a fallback solution, e.g., DMD could be implemented. Note that the accuracy metric requires knowledge about the ground truth and thus is only available in a hindcast scenario. (b) Evaluation of Eq. (10). At both switch points, t = 40 and t = 80, we quickly detect the divergence of model and measurement. The parameters are tmodel=10,tupdate=1,terror=0.5, and Δx=1.0.

Close modal

1. Effects of noise and data volume

An important set of practical considerations includes how noise and the amount of data influence the speed of change detection and the accuracy of subsequent model recovery. Both the noise robustness and the amount of data required will change for a new problem, and here we report trends for this specific case. In addition, the amount of data required is also related to the sampling rate, which is the subject of ongoing investigation; in some cases, higher sampling time may even degrade model performance due to numerical effects.58 

Figure 7 shows the model fit following the abrupt change, comparing both the abrupt-SINDy method, which uses information about the existing model structure, and the standard SINDy method, which re-identifies the model from scratch following a detected change. In this figure, the model quality is shown as a function of the amount of data collected after the change.

FIG. 7.

Lorenz system: We show the model accuracy versus the amount of data used to update (blue ×) or re-fit (orange dot), respectively. Data are collected in the interval [40,40+tupdate] just after the first change of the system dynamics. The number of data points for tupdate=0.1 is N = 25, for tupdate=10 we have 2500 points. At tupdate1, updating and re-fitting methods become comparable. However, for smaller update times, or less data, respectively, the fraction of transient data becomes too small for identifying the exact model from scratch. Updating the model needs less data for the same accuracy or achieves higher accuracy with the same amount of data.

FIG. 7.

Lorenz system: We show the model accuracy versus the amount of data used to update (blue ×) or re-fit (orange dot), respectively. Data are collected in the interval [40,40+tupdate] just after the first change of the system dynamics. The number of data points for tupdate=0.1 is N = 25, for tupdate=10 we have 2500 points. At tupdate1, updating and re-fitting methods become comparable. However, for smaller update times, or less data, respectively, the fraction of transient data becomes too small for identifying the exact model from scratch. Updating the model needs less data for the same accuracy or achieves higher accuracy with the same amount of data.

Close modal

The abrupt-SINDy model is able to identify more accurate models in a very short amount of time, given by tupdate0.1. At this point, the standard SINDy method shows comparable error, however for even smaller times, the data are no longer sufficient for the conventional method. Since the adaptive method starts near the optimal solution, larger data sets do not degrade the model, which was an unexpected additional advantage.

Figure 8 explores the effect of additive noise on the derivative on the abrupt-SINDy and standard SINDy algorithms. Note that in practice noise will typically be added to the measurement of x, as in the original SINDy algorithm,40 requiring a denoising derivative;58,59 however, simple additive noise on the derivative is useful to investigate the robustness of the regression procedure. Abrupt-SINDy has considerably higher noise tolerance than the standard algorithm, as it must identify fewer unknown coefficients. In fact, it is able to handle approximately an order of magnitude more noise before failing to identify a model. Generally, increasing the volume of data collection improves the model. The critical point in the abrupt-SINDy curves corresponds to when small but dynamically important terms are mis-identified as a result of insufficient signal-to-noise. Although the noise and chaotic signal cannot be easily distinguished for small signal-to-noise, it may be possible to distinguish between them using a spectral analysis, since chaos yields red noise in contrast to the white additive noise.

FIG. 8.

Lorenz system: We show the noise robustness of model accuracy. In (a), we use the previous knowledge and update the model; in (b), we make a new fit only re-using the previously discovered hyper-parameters. The curves are parametrized by tupdate, cf. Fig. 7. The accuracy measure is very noise sensitive, as the distinction between library functions gets lost. At a signal to noise ratio of approximately 1, no accurate model can be obtained with either model. At lower noise ratios, updating the model achieves higher accuracy (the library is smaller). In both cases, accuracy scales approximately logarithmically with tupdate.

FIG. 8.

Lorenz system: We show the noise robustness of model accuracy. In (a), we use the previous knowledge and update the model; in (b), we make a new fit only re-using the previously discovered hyper-parameters. The curves are parametrized by tupdate, cf. Fig. 7. The accuracy measure is very noise sensitive, as the distinction between library functions gets lost. At a signal to noise ratio of approximately 1, no accurate model can be obtained with either model. At lower noise ratios, updating the model achieves higher accuracy (the library is smaller). In both cases, accuracy scales approximately logarithmically with tupdate.

Close modal

As a second example, we consider the famous nonlinear Van der Pol oscillator.77 We include additional quadratic nonlinearities αx2 and αy2 to study the ability of our method to capture structural changes when these terms are added and removed abruptly. This example focuses on the important class of periodic phenomena, in contrast to the chaotic Lorenz dynamics. The modified Van der Pol oscillator is described by the following equations:

ẋ=yαy2,ẏ=μ(1x2)yx+αx2,
(13)

where μ>0 is a parameter controlling the nonlinear damping, and α parameterizes the additional quadratic nonlinearity. The reference data set is shown in Fig. 9, with μ=7.5 and α = 0 for t[0,100], which results in a canonical periodic orbit. At t = 100, we introduce a structural change, switching on the quadratic nonlinearity (α=0.25), and driving the system to a stable fixed point. We also modify the parameter μ, setting it to μ=6.0. Finally, at t = 200, we switch off the additional nonlinearity (α = 0) and keep μ = 6.

FIG. 9.

Van der Pol system with parameters μ=5,α=0 (blue), μ=7.5,α=0.25 (orange), and μ=6.0,α=0 (green). (a): time evolution of the y-coordinate. (b) phase-space-trajectory x, y.

FIG. 9.

Van der Pol system with parameters μ=5,α=0 (blue), μ=7.5,α=0.25 (orange), and μ=6.0,α=0 (green). (a): time evolution of the y-coordinate. (b) phase-space-trajectory x, y.

Close modal

Table III shows the corresponding models recovered using the abrupt-SINDy method. The change is detected using the Lyapunov time defined in Eq. (11), as shown in Fig. 10. Again, the estimated Lyapunov time [Fig. 10(b)] captures the changes in the model, which correspond to peaks in structural model error [Fig. 10(a)]. While the first and third stages are indeed identified correctly, the term 1.25x is preferred over x0.25x2 in the sparse estimate for ẏ in the orange trajectory. However, since both terms look similar near the fixed point at x1, this describes the dynamics well. This type of mis-identification often occurs in data mining when features are highly correlated63 and is more related to sparse regression in general than the proposed abrupt-SINDy. For dynamic system identification, the correct nonlinearity could be resolved by obtaining more transient data, i.e., by perturbing the system through actuation. However, this model may be sufficient for control while a more accurate model is identified.

TABLE III.

Van der Pol system: Summary of the discovered equations. Coefficients are rounded to the second digit.

tdetectedtupdateEquations
0.00 20.01 ẋ=1.0yẏ=1.0x+4.99y4.99x2y 
106.39 116.00 ẋ=0.99y+0.25y2ẏ=1.26x+7.46y7.46x2y 
200.12 210.00 ẋ=1.0yẏ=1.0x+5.98y5.98x2y 
tdetectedtupdateEquations
0.00 20.01 ẋ=1.0yẏ=1.0x+4.99y4.99x2y 
106.39 116.00 ẋ=0.99y+0.25y2ẏ=1.26x+7.46y7.46x2y 
200.12 210.00 ẋ=1.0yẏ=1.0x+5.98y5.98x2y 
FIG. 10.

Van der Pol system: Evaluation of Eq. (10). Parameters: tmodel=20,tupdate=10,terror=1,Δx=1.5.

FIG. 10.

Van der Pol system: Evaluation of Eq. (10). Parameters: tmodel=20,tupdate=10,terror=1,Δx=1.5.

Close modal

In this work, we develop an adaptive nonlinear system identification strategy designed to rapidly recover nonlinear models from limited data following an abrupt change to the system dynamics. The sparse identification of nonlinear dynamics (SINDy) framework is ideal for change detection and model recovery, as it relies on parsimony to select the fewest active terms required to model the dynamics. In our adaptive abrupt-SINDy method, we rely on previously identified models to identify the fewest changes required to recover the model. This modified algorithm is shown to be highly effective at model recovery following an abrupt change, requiring less data, less computation time, and having improved noise robustness over identifying a new model from scratch. The abrupt-SINDy method is demonstrated on several numerical examples exhibiting chaotic dynamics and periodic dynamics, as well as parametric and structural changes, enabling real-time model recovery.

There are limitations of the method which can be addressed by several promising directions that may be pursued to improve the abrupt-SINDy method

  1. Fallback models: In the current implementation, after a change has been detected, the old model will be used until enough data are collected to identify a new model. The dynamic mode decomposition17 provides an alternative fallback model that may be identified rapidly with even less data. Additionally, instead of relying on a sparse update to the current model, it is sensible to also maintain a library of past models for rapid characterization.78 

  2. Hyperparameterization: In the initial prototype, the hyper-parameters Δx and tupdate are fixed. Over time, an improved algorithm may learn and adapt optimal hyper-parameters.

  3. Comprehensive Lyapunov time estimation: According to Eq. (10), the Lyapunov time T(t|Δx) is estimated for a fixed Δx. Estimating the time for a range of values, i.e., Δx(0,Δxmax], will be more robust and may provide a richer analysis without requiring additional data. Further investigation must be made in the case of chaotic systems, where the numerical calculation of the Lyapunov exponent may fail to reveal divergence due to the fact of simple averaging over time. Because of the importance of the detection of model divergence, this is a particularly important area of future research.

  4. Advanced optimization and objectives: Looking forward, advanced optimization techniques may be used to further improve the adaptation to system changes. Depending on the system, other objectives may be optimized, either by including regularization or in a multi-objective optimization.

The proposed abrupt-SINDy framework is promising for the real-time recovery of nonlinear models following abrupt changes. It will be interesting to compare with other recent algorithms that learn local dynamics for control in response to abrupt changes.79 Future work will be required to demonstrate this method on more sophisticated engineering problems and to incorporate it in controllers. To understand the limitations for practical use, many further studies are needed, and it will be particularly useful to test this method on a real experiment. The abrupt-SINDy modeling framework may also help inform current rapid learning strategies in neural network architectures,2–4 potentially allowing dynamical systems methods to inform rapid training paradigms in deep learning.

M.Q. was supported by a fellowship within the FITweltweit program of the German Academic Exchange Service (DAAD). M.Q. and M.A. acknowledge support by the European Erasmus SME/HPC project (588372-EPP-1-2017-1-IE-EPPKA2-KA). S.L.B. acknowledges support from the ARO and AFOSR Young Investigator Programs (ARO W911NF-17-1-0422 and AFOSR FA9550-18-1-0200). S.L.B. and J.N.K. acknowledge support from DARPA (HR0011-16-C-0016). We would like to thank the anonymous reviewers for their comments which helped to improve this manuscript. We also acknowledge valuable discussions related to abrupt model recovery and programming wisdom with Dennis Bernstein, Karthik Duraisamy, Thomas Isele, Eurika Kaiser, and Hod Lipson.

All code for this work is publicly available on Github: https://github.com/Ohjeah/sparsereg.

1.
I.
Goodfellow
,
Y.
Bengio
,
A.
Courville
, and
Y.
Bengio
,
Deep Learning
(
MIT Press
,
Cambridge
,
2016
), Vol.
1
.
2.
L.
Fei-Fei
,
R.
Fergus
, and
P.
Perona
,
IEEE Trans. Pattern Anal. Mach. Intell.
28
,
594
(
2006
).
3.
O.
Vinyals
,
C.
Blundell
,
T.
Lillicrap
,
D.
Wierstra
 et al.,
Advances in Neural Information Processing Systems
(
NIPs Foundation
,
2016
), pp.
3630
3638
.
4.
C. B.
Delahunt
and
J. N.
Kutz
, preprint arXiv:1802.05405 (
2018
).
5.
S. J.
Pan
and
Q.
Yang
,
IEEE Trans. Knowl. Data Eng.
22
,
1345
(
2010
).
6.
S. L.
Brunton
and
B. R.
Noack
,
Appl. Mech. Rev.
67
,
050801
(
2015
).
7.
C. H.
Rankin
,
Curr. Biol.
14
,
R617
(
2004
).
8.
J. R.
Whitlock
,
Science
313
,
1093
(
2006
).
9.
J. P.
Johansen
,
C. K.
Cain
,
L. E.
Ostroff
, and
J. E.
LeDoux
,
Cell
147
,
509
(
2011
).
10.
R. E.
Kalman
,
J. Fluids Eng.
82
,
35
(
1960
).
11.
B. L.
Ho
and
R. E.
Kalman
, in
Proceedings of the 3rd Annual Allerton Conference on Circuit and System Theory
(
1965
), pp.
449
459
.
12.
N. A.
Gershenfeld
,
The Nature of Mathematical Modeling
(
Cambridge University Press
,
1999
).
13.
J. N.
Juang
and
R. S.
Pappa
,
J. Guid., Control, Dyn.
8
,
620
(
1985
).
14.
P. J.
Schmid
,
J. Fluid Mech.
656
,
5
(
2010
).
15.
C. W.
Rowley
,
I.
Mezić
,
S.
Bagheri
,
P.
Schlatter
, and
D.
Henningson
,
J. Fluid Mech.
641
,
115
(
2009
).
16.
J. H.
Tu
,
C. W.
Rowley
,
D. M.
Luchtenburg
,
S. L.
Brunton
, and
J. N.
Kutz
,
J. Comput. Dyn.
1
,
391
(
2014
).
17.
J. N.
Kutz
,
S. L.
Brunton
,
B. W.
Brunton
, and
J. L.
Proctor
,
Dynamic Mode Decomposition
(
Society for Industrial and Applied Mathematics
,
2016
).
18.
H.
Akaike
,
Ann. Inst. Stat. Math.
21
,
243
(
1969
).
19.
P. J.
Brockwell
and
R. A.
Davis
,
Introduction to Time Series and Forecasting
, Springer Texts in Statistics (
Springer
,
1996
).
20.
R.
Gonzalez-Garcia
,
R.
Rico-Martinez
, and
I.
Kevrekidis
,
Comput. Chem. Eng.
22
,
S965
(
1998
).
21.
E.
Yeung
,
S.
Kundu
, and
N.
Hodas
, preprint arXiv:1708.06850 (
2017
).
22.
N.
Takeishi
,
Y.
Kawahara
, and
T.
Yairi
,
Advances in Neural Information Processing Systems
(
2017
), pp.
1130
1140
.
23.
C.
Wehmeyer
and
F.
Noé
,
J. Chem. Phys.
148
,
241703
(
2018
).
24.
A.
Mardt
,
L.
Pasquali
,
H.
Wu
, and
F.
Noé
,
Nat. Commun.
9
,
5
(
2018
).
25.
B.
Lusch
,
J. N.
Kutz
, and
S. L.
Brunton
, preprint arXiv:1712.09707 (
2017
).
26.
B. O.
Koopman
,
Proc. Natl. Acad. Sci.
17
,
315
(
1931
).
27.
I.
Mezić
,
Nonlinear Dyn.
41
,
309
(
2005
).
28.
M.
Lukoševičius
and
H.
Jaeger
,
Comput. Sci. Rev.
3
,
127
(
2009
).
29.
Z.
Lu
,
J.
Pathak
,
B.
Hunt
,
M.
Girvan
,
R.
Brockett
, and
E.
Ott
,
Chaos
27
,
041102
(
2017
).
30.
G. B.
Dantzig
,
R.
Cottle
, and
Y. P.
Aneja
,
Mathematical Programming: Essays in Honor of George B. Dantzig
(
North-Holland
,
1985
), No. Bd. 1.
31.
J. R.
Koza
,
Genetic Programming: On the programming of Computers by Means of Natural Selection
(
MIT Press
,
1992
), Vol.
1
.
32.
J.
Bongard
and
H.
Lipson
,
Proc. Natl. Acad. Sci.
104
,
9943
(
2007
).
33.
M.
Schmidt
and
H.
Lipson
,
Science
324
,
81
(
2009
).
34.
M. D.
Schmidt
,
R. R.
Vallabhajosyula
,
J. W.
Jenkins
,
J. E.
Hood
,
A. S.
Soni
,
J. P.
Wikswo
, and
H.
Lipson
,
Phys. Biol.
8
,
055011
(
2011
).
35.
W. G.
La Cava
and
K.
Danai
,
Int. J. Syst. Sci.
47
,
249
(
2015
).
36.
W.
La Cava
,
K.
Danai
,
L.
Spector
,
P.
Fleming
,
A.
Wright
, and
M.
Lackner
,
Renewable Energy
87
,
892
(
2016
).
37.
M.
Quade
,
M.
Abel
,
K.
Shafi
,
R. K.
Niven
, and
B. R.
Noack
,
Phys. Rev. E
94
,
012214
(
2016
).
38.
J.
Gout
,
M.
Quade
,
K.
Shafi
,
R. K.
Niven
, and
M.
Abel
,
Nonlinear Dyn.
91
,
1001
(
2018
).
39.
T.
Duriez
,
S. L.
Brunton
, and
B. R.
Noack
,
Machine Learning Control–Taming Nonlinear Dynamics and Turbulence
, Fluid Mechanics and Its Applications Vol.
116
(
Springer
,
2017
).
40.
S. L.
Brunton
,
J. L.
Proctor
, and
J. N.
Kutz
,
Proc. Natl. Acad. Sci.
113
,
3932
(
2016
).
41.
H.
Voss
,
M.
Bünner
, and
M.
Abel
,
Phys. Rev. E
57
,
2820
(
1998
).
42.
T.
McConaghy
,
Genetic and Evolutionary Computation
(
Springer
New York
,
2011
), pp.
235
260
.
43.
M.
Abel
,
K.
Ahnert
,
J.
Kurths
, and
S.
Mandelj
,
Phys. Rev. E
71
,
015203
(
2005
).
44.
H. U.
Voss
,
P.
Kolodner
,
M.
Abel
, and
J.
Kurths
,
Phys. Rev. Lett.
83
,
3422
(
1999
).
45.
M.
Abel
,
Int. J. Bifurcation Chaos
14
,
2027
(
2004
).
46.
E.
Kaiser
,
J. N.
Kutz
, and
S. L.
Brunton
, preprint arXiv:1711.05501 (
2017
).
47.
S. H.
Rudy
,
S. L.
Brunton
,
J. L.
Proctor
, and
J. N.
Kutz
,
Sci. Adv.
3
,
e1602614
(
2017
).
48.
H.
Schaeffer
,
Proc. R. Soc., A
473
,
20160446
(
2017
).
49.
J.-C.
Loiseau
and
S. L.
Brunton
,
J. Fluid Mech.
838
,
42
(
2018
).
50.
S. L.
Brunton
,
B. W.
Brunton
,
J. L.
Proctor
,
E.
Kaiser
, and
J. N.
Kutz
,
Nat. Commun.
8
,
19
(
2017
).
51.
G.
Tran
and
R.
Ward
,
Multiscale Modeling and Simulations
15
,
1108
(
2017
).
52.
H.
Schaeffer
and
S. G.
McCalla
,
Phys. Rev. E
96
,
023302
(
2017
).
53.
S. L.
Brunton
,
J. L.
Proctor
, and
J. N.
Kutz
, in
IFAC NOLCOS
(
2016
), Vol.
49
, p.
710
.
54.
N. M.
Mangan
,
J. N.
Kutz
,
S. L.
Brunton
, and
J. L.
Proctor
,
Proc. R. Soc., A
473
,
20170009
(
2017
).
55.
R.
Tibshirani
,
J. R. Stat. Soc.: Ser. B
73
,
273
(
2011
).
56.
T.
Hastie
,
J.
Friedman
, and
R.
Tibshirani
,
The Elements of Statistical Learning
(
Springer
,
New York
,
2001
), Vol.
2
.
57.
G.
James
,
D.
Witten
,
T.
Hastie
, and
R.
Tibshirani
,
An Introduction to Statistical Learning
(
Springer
,
New York
,
2013
).
58.
K.
Ahnert
and
M.
Abel
,
Comput. Phys. Commun.
177
,
764
(
2007
).
59.
R.
Chartrand
,
ISRN Appl. Math.
2011
,
1
(
2011
).
60.
W.-X.
Wang
,
R.
Yang
,
Y.-C.
Lai
,
V.
Kovanis
, and
C.
Grebogi
,
Phys. Rev. Lett.
106
,
154101
(
2011
).
61.
D.
Donoho
,
IEEE Trans. Inf. Theory
52
,
1289
(
2006
).
62.
E. J.
Candès
, in
Proceedings of the International Congress of Mathematics
(
2006
).
63.
J.
Li
,
K.
Cheng
,
S.
Wang
,
F.
Morstatter
,
R. P.
Trevino
,
J.
Tang
, and
H.
Liu
,
CSUR
50
,
1
(
2017
).
64.
L.
Zhang
and
H.
Schaeffer
, preprint arXiv:1805.06445 (
2018
).
65.
K.
Manohar
,
B. W.
Brunton
,
J. N.
Kutz
, and
S. L.
Brunton
,
invited for IEEE Control Systems Magazine
(
2017
); e-print arXiv:1701.07569.
66.
N. B.
Erichson
,
S.
Voronin
,
S. L.
Brunton
, and
J. N.
Kutz
, “Prediction of dynamical systems by symbolic regression,”
J. Stat. Software
(to be published); e-print arXiv:1608.02148.
67.
E.
Ott
,
Chaos in Dynamical Systems
(
Cambridge University Press
,
2002
).
68.
E.
Schöll
and
H. G.
Schuster
,
Handbook of Chaos Control
(
Wiley-VCH Verlag GmbH & Co. KGaA
,
2007
).
69.
Y.-C.
Lai
and
T.
Tél
,
Transient Chaos
(
Springer
New York
,
2011
), Vol.
173
.
70.
R.
Ding
and
J.
Li
,
Phys. Lett. A
364
,
396
(
2007
).
71.
H.
Kantz
,
Phys. Lett. A
185
,
77
(
1994
).
72.
A.
Pikovsky
and
A.
Politi
,
Nonlinearity
11
,
1049
(
1998
).
73.
A.
Abraham
,
F.
Pedregosa
,
M.
Eickenberg
,
P.
Gervais
,
A.
Mueller
,
J.
Kossaifi
,
A.
Gramfort
,
B.
Thirion
, and
G.
Varoquaux
,
Front. Neuroinf.
8
,
14
(
2014
).
75.
E. N.
Lorenz
,
J. Atmos. Sci.
20
,
130
(
1963
).
76.
S. H.
Strogatz
,
Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering
(
Hachette
,
UK
,
2014
).
77.
B.
Van der Pol
,
Radio Rev.
1
,
701
(
1920
).
78.
S. L.
Brunton
,
J. H.
Tu
,
I.
Bright
, and
J. N.
Kutz
,
SIAM J. Appl. Dyn. Syst.
13
,
1716
(
2014
).
79.
M.
Ornik
,
A.
Israel
, and
U.
Topcu
, preprint arXiv:1709.04889 (
2017
).