We develop and test machine learning techniques for successfully using past state time series data and knowledge of a time-dependent system parameter to predict the evolution of the “climate” associated with the long-term behavior of a non-stationary dynamical system, where the non-stationary dynamical system is itself unknown. By the term climate, we mean the statistical properties of orbits rather than their precise trajectories in time. By the term non-stationary, we refer to systems that are, themselves, varying with time. We show that our methods perform well on test systems predicting both continuous gradual climate evolution as well as relatively sudden climate changes (which we refer to as “regime transitions”). We consider not only noiseless (i.e., deterministic) non-stationary dynamical systems, but also climate prediction for non-stationary dynamical systems subject to stochastic forcing (i.e., dynamical noise), and we develop a method for handling this latter case. The main conclusion of this paper is that machine learning has great promise as a new and highly effective approach to accomplishing data driven prediction of non-stationary systems.

Prediction of the behavior of dynamical systems that are themselves changing with time or are influenced by independently time evolving external environmental conditions is an important task that appears in many scientific, technological, medical, sociological, and economic settings. Often, knowledge of the scientific principles, underlying equations, and conditions enabling a basis for the construction of models capable of predicting the behavior of such non-stationary systems is lacking or very incomplete. However, one often has available past measurements of the system state as a function of time. Can these kind of data be used to enable prediction of future dynamical system behavior? In this paper, we address this question via an approach utilizing machine learning and find that it can be extremely effective for this purpose.

Predicting the time evolution of a dynamical system is a central requirement for enabling many applications. Moreover, it is often the case that first principles knowledge of the dynamical system to be predicted is unavailable. Alternatively, one may have access to measured time series data of the previous states of the system. For situations where the system to be predicted is stationary in time, machine learning has proven to be effective in using available time series data to accomplish prediction tasks.1–25 In this stationary context, particularly for systems with chaotic behavior, it is useful to partition the problem of prediction into (i) short-term prediction of the state of the system and (ii) long-term prediction/replication of the statistical/ergodic dynamical behavior of the system. That is, even after short-term prediction fails (e.g., due to the sensitive dependence of chaos), we may still desire to replicate the statistical properties of the orbits of the real system. We have in mind the situation in which, following an initial transient, the orbit from an initial condition goes to an “attractor,” i.e., a subset of the state space on which the orbit continues to reside, while, perhaps, evolving chaotically. In such a stationary situation, a measurement of the state at a random instant of time can be thought of as corresponding to a sample taken from an underlying probability distribution, which is invariant under the flow of time. For such systems, the ergodic theorem allows us to characterize the long-term statistical properties of the orbits by equating the time average of a state function over a single typical orbit to the state space average of that function taken with respect to an invariant measure.26 If we succeed in predicting trajectories that have the same long-term statistical properties as those produced by trajectories of the actual system, then we say we have succeeded in replicating the “climate” of that system.

In contrast to the situation described in the above paragraph, we are interested in prediction of non-stationary dynamical systems, i.e., systems that themselves vary with time. However, in the case of a non-stationary system, ergodicity no longer holds. Thus, for non-stationary systems, we cannot meaningfully define a representative statistical quantity in the above way,27 and we wish to extend our concept of “climate” to the non-stationary case. To do this, we consider an ensemble of trajectories originating from many randomly chosen initial conditions in the past. At any subsequent given time, after a sufficiently large convergence time has passed since initialization, the ensemble trace out a so-called “snapshot attractor.”27–29 This attractor possesses a probability distribution that (under appropriate general conditions) is independent of the set of initial conditions chosen for the ensemble and thus provides an appropriate measure representing the probability distribution of states on the snapshot attractor.27 [The collection of all snapshot attractors (at every time instance) is referred to, in mathematical and terrestrial climate-related literature, as the pullback attractor of the system.30] This time-dependent snapshot probability distribution can then be used to obtain expectation values of state-dependent functions, thereby defining the time-dependent climate of a non-stationary system.

The goal of this paper will be to devise and test machine learning approaches for the prediction of the future evolution of the climate of non-stationary dynamical systems.

We note that, although machine learning has previously been extensively investigated and applied for the task of short-term state prediction of stationary dynamical systems (especially chaotic systems) (see Refs. 1–3 and 13–22) and has also been shown to be effective for replicating the climate of stationary systems,4,5,23,24 machine learning has previously received little attention as a general approach to the important task of predicting climate evolution in non-stationary dynamical systems. For recent papers on using machine learning for short-term state prediction of non-stationary systems, see Refs. 19, 25, and 31. In the context of our stated goal, an important issue is that of “regime transitions.” Specifically, for different system parameters, stationary dynamical systems can generally exhibit a variety of different types of motions with different climates, and, correspondingly, slowly time varying non-stationary systems can experience regime transitions where, once the system crosses some critical condition, the system’s behavior rapidly becomes qualitatively and quantitatively different. There are a variety of mechanisms for regime transitions ranging from basic bifurcations of periodic orbits (e.g., Hopf, saddle-node, and period doubling bifurcations) to transitions of chaotic attractors (e.g., period doubling cascades,32,33 intermittency,34 and crises35,36). For a very recent work on using machine learning to predict system collapse and chaotic transients associated with crisis transitions, see Ref. 37.

Thus, in addition to slow, temporally continuous climate change, regime transitions are important in many fields. For example, in the context of terrestrial climate, reference is typically made to “tipping points” representing abrupt transitions in the Earth’s climate.38,39 Geological records indicate that the Earth has experienced such transitions in the past; for example, Greenland ice-core records indicate a series of abrupt climatic events in the Last Glacial period,40 and there is growing concern that anthropogenic activities such as greenhouse gas emissions could drive the climate system to a regime transition,41,42 such as the possible shutdown of the thermohaline circulation in the Atlantic Ocean43,44 and the potential transition from a wet to a dry Indian summer monsoon.45 Such regional transitions could also affect the climate of geographically distant locations via teleconnection.46–48 Naturally, both temporally continuous behavior evolution as well as regime change events are of immense consequence not only for atmospheric climate, but also in many other dynamical contexts, e.g., economics,49,50 ecology,51–62 medicine,63,64 technology,65,66 and sociology.67 

The goal of predicting the climate of non-stationary systems from time series state data, in principle, requires some knowledge of the future evolution of the underlying dynamical system that produces the data. Such knowledge might be directly available or derivable from the available past state time series data. The need for this type of knowledge can be seen by considering the example where the underlying system time dependence experiences an abrupt change such that the system after the change is unrelated to the system before the change, thus making predictions of the state changes past the time of the abrupt system change untenable. With this in mind, we restrict our considerations to the case where the underlying system evolution is smooth and slow compared to the time scale (τs) of the state evolution (τnsτs, where τns is the “non-stationarity time scale” over which the underlying system evolves). We emphasize that we are interested in climate prediction over the long time scale τns. In this paper, we consider the scenario where the non-stationarity of the system to be predicted is due to time-dependent system parameters where the time dependence is known, although the system itself is otherwise unknown. Discussion of the importance of this scenario in application is given in Sec. IV A. (For simplicity, in what follows, we will only consider the case where there is one relevant time-dependent system parameter.)

The main conclusion of this paper is that machine learning offers a surprisingly promising avenue for using measurements of past system state time series to accomplish the important general goal of predicting the long-term dynamical behavior (climate) of non-stationary dynamical systems, including prediction of apparently continuous change and of more sudden regime transitions.

In what follows, we use reservoir computing68–70 as our machine learning platform of choice for practical reasons. Reservoir computing is a computationally inexpensive framework since it requires training only its output layer via a simple linear regression. The resulting reduced training time allows for relatively rapid testing of various methodologies and parameter regimes. However, we expect other types of machine learning, e.g., deep learning,71 to also work well for this task via the methods we develop and test in this paper.

Another issue we address is that of predicting the time evolution of a non-stationary dynamical system evolving under the influence of dynamical noise, which is often unavoidable in real situations. We present a simple method that uses the residual error distribution from the training procedure to inject appropriately determined noise into the reservoir prediction scheme to perturb the reservoir output in a way that mimics the effects of the dynamical noise present in the real system, thus enabling prediction of the noise-influenced climate.

The rest of the paper is organized as follows. In Sec. II, we give a brief overview of reservoir computing and of a widely employed basic scheme for using it for prediction of stationary dynamical systems. In Sec. III, we discuss noiseless and noisy non-stationary versions of the logistic map and their behaviors. In Sec. IV A, using the noiseless non-stationary logistic map, we introduce and test our method. In Sec. IV B, we generalize the method of Sec. IV A to include dynamical noise. Sections V and VI test our methods on non-stationary versions of the Lorenz 63 system72 and the Kuramoto–Sivashinsky system,73,74 respectively. Section VII presents conclusions. Referring to Secs. III and VI, we note that the example systems tested include a simple discrete-time map (Sec. III), a low-dimensional continuous-time system (three ordinary differential equations; Sec. V), and a partial differential equation potentially exhibiting spatiotemporal chaos (Sec. VI).

Reservoir computing (e.g., see the review paper68) is a machine learning approach that has proven to be effective in tasks involving dynamical time series data. Although there are various ways of implementing RC,75–80 in this paper, we consider the widely used implementation that utilizes a network of neuron-like nodes with recurrent connections, the “reservoir,” to process temporal/sequential data.69,70 The main idea is to drive the reservoir with an input signal and form a trainable linear combination of the nonlinear response of the individual network nodes to obtain an output sequence that closely approximates some desired sequence.

For illustrative and background purposes, in this section, we describe the application of reservoir computing to predicting the time evolution of an unknown stationary system. Given a set of measurements of the state of a system from some time in the past t=T up to some time labeled t=0, we desire to predict the state of the system for t>0. Even after prediction of the state of the system fails, we may still desire that the predicted trajectory replicates the climate of the actual system in the long-time limit (Refs. 4 and 5).

We denote the state of the dynamical system (from which the measurements are obtained) at time t by a K dimensional vector u(t)=[u1(t),u2(t),,uK(t)]T. If our reservoir consists of N nodes, then the K dimensional input vector at each iteration is coupled into the reservoir via a (N×K) dimensional input-to-reservoir coupling matrix, Win (see Fig. 1). We chose to construct this matrix by randomly selecting one element of each row to be a nonzero number chosen randomly from a uniform distribution on the interval [χ,χ]. Each reservoir node i has a scalar state ri(t). The state of the reservoir at time t is denoted by the N dimensional vector r(t)=[r1(t),r2(t),,rN(t)]T. The reservoir evolves dynamically given the linearly coupled input and a network adjacency matrix A, according to

r(t+Δt)=tanh(Ar(t)+Winu(t)),
(1)

where the hyperbolic tangent function is applied to each element of its vector argument. The directed adjacency matrix is a randomly generated sparse matrix such that it has a predetermined average number of connections per node, d. For our task, the desired output time sequence is of the same dimension as the input sequence; therefore, a (K×N) dimensional reservoir-to-output coupling matrix Wout is used to linearly map the reservoir state vector r(t) to the output denoted u~ at the next time step, t+Δt. That is, the output is the K-vector, Woutr(t)=u~(t+Δt). These steps are diagrammatically represented in Fig. 1(a). The matrix elements of Wout are determined by the training so that the output of the reservoir computer closely approximates the desired output for Tt0. To do this, we minimize a “cost function,”

Tt0||Woutr(t)u(t+Δt)||2+α||Wout||2,

where we include a Tikhonov regularization parameter α to prevent overfitting by penalizing large output weights. This minimization is a simple linear regression. The set of coefficients of Wout, which minimizes this L2 norm, is taken to be the “trained” set of output weights for the network. The choice of α and other “hyperparameters” [such as the average degree d and the spectral radius ρs of A (i.e., the largest eigenvalue magnitude of A)] as well as the maximal strength of the input-to-reservoir coupling χ are empirically chosen to minimize the discrepancy between a target prediction and the reservoir prediction over a validation dataset (e.g., by performing a grid search over the space of these hyperparameters). Furthermore, we constrain our search by restricting our attention to a set of hyperparameters that satisfy the “echo state condition” (see, e.g., Ref. 69), which requires that, starting from any two different reservoir states r1 and r2, iterations of Eq. (1), after a transient phase, yield the same time series for the reservoir state r(t); i.e., the reservoir eventually “forgets” its initial condition, meaning that its evolution depends only on its input u(t) and is independent of its initial state. If N is sufficiently large, it can be expected that the output u~(t+Δt) will be extremely close to the desired output u(t+Δt); i.e., the reservoir will give an accurate one time step prediction.

FIG. 1.

Schematic of the reservoir computer. (a) The “open-loop” configuration used for training the output weight matrix, Wout. An input sequence, u(t), is coupled into a reservoir of nodes via a matrix Win. The excited reservoir state r(t) is mapped to an output, u~(t), via a matrix of adjustable weights Wout. (b) The “closed-loop” configuration used for prediction. The reservoir output at one time step is fed as input to the reservoir at the next time step. The dashed red box denotes the “reservoir system,” made up of Win, the reservoir, and Wout.

FIG. 1.

Schematic of the reservoir computer. (a) The “open-loop” configuration used for training the output weight matrix, Wout. An input sequence, u(t), is coupled into a reservoir of nodes via a matrix Win. The excited reservoir state r(t) is mapped to an output, u~(t), via a matrix of adjustable weights Wout. (b) The “closed-loop” configuration used for prediction. The reservoir output at one time step is fed as input to the reservoir at the next time step. The dashed red box denotes the “reservoir system,” made up of Win, the reservoir, and Wout.

Close modal

Once we have trained the reservoir on previous measurements of the system states, using the “open-loop” configuration for Tt0 shown in Fig. 1(a), we predict the system evolution for t>0 by using the “closed-loop” configuration shown in Fig. 1(b). In this configuration, the output from the reservoir computer at one step is fed back as input at the next iteration. Thus, on the first iteration, the input is u(t) and [neglecting the small difference between u(t) and u~(t)] the output is ideally u(t+Δt), which, when fed back, yields an output u(t+2Δt), which, when fed back, yields an output u(t+3Δt), and so on. However, due to the small deviation between u(t) and u~(t), errors build up with multiple circuits of the feedback loop, and eventually, the prediction fails. Nevertheless, it is observed that this scheme often produces predictions that are useful for a limited time. One notable point is that, because the reservoir network is “recurrent” (i.e., has directed links forming closed paths cycling between nodes), the reservoir system retains memory of its past states and inputs.

As mentioned in Sec. I, we are interested in both discrete-time dynamical systems (maps; Secs. III and IV), as well as continuous-time dynamical systems (flows; Secs. V and VI). In the case of maps, t is an integer, where t=n(n=0,1,2,), and the reservoir time step is Δt=1. In the case of flows, we will choose Δt to be small compared to the characteristic time for variation of u(t) so that the predicted discrete outputs, separated by Δt, give a good representation of the continuous-time evolution.

It is important to note (e.g., Sec VI) that the closed-loop configuration of Fig. 1(a) is itself a stationary dynamical system whose dynamics, through the training of Wout, has essentially been devised so as to mimic the dynamics of the unknown system producing the measurement data (see Ref. 4).

In Sec. IV, we modify this setup (applicable for the prediction of stationary dynamical evolution processes) to the application of non-stationary, possibly stochastic, processes.

In this section, we consider the one-dimensional logistic map that provides a simple platform for formulating, illustrating, and testing the application of RC to the task of predicting the time evolution of a non-stationary discrete-time dynamical system. Because of its simplicity, the logistic map allows us to rapidly test various methodologies and techniques. Furthermore, we will subsequently show (Secs. V and VI) that our results and methods, demonstrated for the simple example of the logistic map, carry over to other, less simple systems. Before we demonstrate the reservoir computer’s ability to predict the non-stationary climate of the logistic map, we briefly review three variations of this system: the stationary logistic map, a non-stationary logistic map, and a stochastic non-stationary logistic map. We demonstrate that the dynamics and regime transitions have different characteristics for each of these three variations of this system.

We will first briefly review the behavior of the stationary logistic map system,

xn+1=λxn(1xn),
(2)

for different but fixed values of λ. We randomly initiate a trajectory in the interval (0,1), allow it to evolve under the action of the map for fixed λ, and consider only its long-term behavior (after any transients associated with the choice of initial condition have died away). Plotting the iterations of x in this long-term limit for each value of λ, we generate a bifurcation diagram, as shown in Fig. 2(a). For λ(1,3), a typical initial condition x0(0,1) will evolve to a fixed point attractor, x=11/λ. As λ is increased through λ=3, the stationary point x bifurcates to a period 2 orbit. As λ is increased further, the attracting orbit undergoes a period doubling cascade32,33 terminating at λ=λ3.569. In the range λ<λ<4, chaos occurs, accompanied by an infinite number of windows of different periodicity; e.g., see Sec. 2.2 in Ref. 81. For this section, in order to conveniently address a variety of regime transition types, we focus on the period 3 window λ[3.82,3.87], as shown in Fig. 2(b). As λ increases through a critical value λc13.828, the period 3 window is initiated from a chaotic situation below λc1 through an intermittency transition34 associated with a period 3 tangent bifurcation. As λ increases beyond λc2, this period 3 orbit undergoes a period doubling cascade, leading to chaos cycling through three bands (with interspersed windows). This three-piece chaotic attractor widens for larger λ until it collides with the unstable period 3 orbit generated at the start of the window. This collision results in a crisis transition35,82 causing an abrupt broadening of the three-piece chaotic attractor into a single-band chaotic attractor, thereby terminating the period 3 window (at λc3).

FIG. 2.

(a) Bifurcation diagram of the logistic map. (b) Zoomed-in view of the period 3 window of the logistic map bifurcation diagram.

FIG. 2.

(a) Bifurcation diagram of the logistic map. (b) Zoomed-in view of the period 3 window of the logistic map bifurcation diagram.

Close modal

In the tests and examples in the rest of this paper, we will often consider situations with windows, like the period three window of Fig. 2(b). We emphasize, however, that we do this not necessarily because we want to treat windows, but rather because, as noted above for the case of Fig. 2(b), windows contain within them a variety of general types of common regime transitions (intermittency transitions,34 period doubling cascades, crises35,82). Thus, windows provide a convenient platform for studying and testing the effectiveness of our methods when regime transitions are present.

Now, we modify the stationary system to create a non-stationary situation where the parameter λ drifts linearly with time,

xn+1=λnxn(1xn),
(3a)
λn+1=λn+γ.
(3b)

[Later in this paper, we will consider the case where the linear drift with time, Eq. (3b), is replaced by a nonlinear drift with time.] To discuss the dynamics of this system, we consider the situation where λn varies very slowly (|γ|λn). For γ=105, Fig. 3(a) shows a typical orbit evolved under the action of Eq. (3) over the interval of the period 3 window. Notice that the period doubling bifurcation, from period 3 to period 6, looks different, more “abrupt” [see blow-up in Fig. 3(b)], in this non-stationary case than in the stationary case shown in Fig. 2(b). This can be understood as follows. Just before the bifurcation, the orbit is approximately moving on the period 3 attractor. When λn increases past λc2, the period 3 orbit points become unstable, but the trajectory remains, for a few iterations, very close to now an unstable period 3 orbit, and then, after a few more time steps, the trajectory becomes sufficiently displaced from the unstable period 3 orbit (since λn is changing) and is repelled from the period 3 orbit, quickly moving to the now stable period 6 orbit. Thus, we obtain the observed delayed, but rapid, transition from a period 3 orbit to a period 6 orbit. As the drift rate of λ increases, the transition from a period 3 orbit to a period 6 orbit is further delayed, and the details of the period doubling cascade become less discernable and are eventually washed out, as seen from Figs. 3(c) and 3(d), which apply for γ=5×105. The main point illustrated in Figs. 3(a)3(d) is that non-stationarity can qualitatively alter the dynamical behavior of a system, especially during a regime transition.

FIG. 3.

A trajectory evolving under the non-stationary logistic map (a) given by Eq. (3) for γ=105, (c) given by Eq. (3) for γ=5×105, and (e) given by Eq. (4) for γ=105 and ε=0.002. (b) and (d) show a zoomed-in view of the red boxed areas in (a) and (c), respectively, near the period doubling bifurcation from a period 3 orbit to a period 6 orbit.

FIG. 3.

A trajectory evolving under the non-stationary logistic map (a) given by Eq. (3) for γ=105, (c) given by Eq. (3) for γ=5×105, and (e) given by Eq. (4) for γ=105 and ε=0.002. (b) and (d) show a zoomed-in view of the red boxed areas in (a) and (c), respectively, near the period doubling bifurcation from a period 3 orbit to a period 6 orbit.

Close modal

Similarly, dynamical noise can also significantly alter the climate and features of regime transitions. To illustrate this, we consider the non-stationary stochastic logistic map,83 given by Eq. (4),

xn+1=λnxn(1xn)+εn,
(4a)
λn+1=λn+γ,
(4b)

where εn (the dynamical noise) is a number randomly and independently chosen from a distribution ζ(ε). For simplicity, we will consider the case in which the noise distribution is uniform on the interval [ε,ε]. As shown in Ref. 84 for the stationary system bifurcation diagram, the dynamical noise acts to blur the fine structure of the period doubling cascade (near λ) and appears to cause the period doubling cascade to terminate after a few period doublings. Turning to the non-stationary case, in Fig. 3(e), we have plotted a typical trajectory in the period 3 window evolving under the action of Eq. (4) with a noise of strength ε=0.002 and λ drifting linearly at a rate γ=105. We see that the dynamical noise “blurs” the period doubling bifurcation from a period 3 orbit to a period 6 orbit. We also see, before the occurrence of the internal crisis, at the upper λ range of the window, that the noisy trajectory intermittently leaves (and returns) to the region of the state space, which corresponds to the three-piece chaotic attractor of the stationary system. Thus, dynamical noise can cause a significant change in features of regime transitions, including blurring of the noiseless sharp location of the point at which a transition occurs.

We desire to accurately predict and replicate, from past time series data, the effects of both noiseless and noisy non-stationarity on the climate associated with time evolution of the system.

Now, we turn to our stated goal: given a finite set of measurements of the state of an unknown non-stationary dynamical system from some time in the past t=T up to some time t=0, we predict the climate and regime transitions for t>0. More precisely, when the prediction is performed over an ensemble of trajectories, each initiated from a randomly chosen initial condition, we want our predicted ensemble of trajectories to replicate the non-stationary climate of the actual system. For work previously applying initial condition ensembles for studying and understanding non-stationary dynamics, see Refs. 85–87. First, we demonstrate how naïvely applying RC using the measurements of the previous states of the system as the only input to the reservoir (i.e., following the procedure of Sec. II) fails to accomplish our goal. We will call this setup “Setup 1.” Considering the noiseless system given by Eq. (3) for γ=105, we train the reservoir in the “open-loop” configuration [Fig. 1(a)] using a set of training data, {xn} for n0 (1000 data points) and then allow the reservoir to autonomously predict a trajectory, xn for n>0, forward in time using the “closed-loop” configuration [Fig. 1(b)]. Table I displays the set of hyperparameters used in all tasks in this paper involving the prediction of the logistic map. Figure 4(a) compares the actual trajectory of the system (black dots) and the reservoir-predicted trajectory (red dots). We see that the reservoir-predicted trajectory does not undergo any of the regime transitions that the actual trajectory undergoes. Alternatively, the reservoir-predicted trajectory appears to evolve according to a fixed stationary map. Going back to the description of the prediction in the stationary case, Sec. II, this is to be expected because the memory of the reservoir is short compared to the timescale for change of λn (i.e., γ1), and the training produces a single fixed Wout, which, when used in the closed-loop prediction configuration, produces a stationary reservoir dynamical system. As such, we can think of the training procedure of Fig. 1(a) acting on non-stationary training data as choosing a Wout that in some sense yields the stationary system that is most consistent with the non-stationary training data T<t<0. Figure 4(b) shows the map traced out by the reservoir-predicted trajectory (red dots) and by the actual trajectory (black dots) from Fig. 4(a).

FIG. 4.

(a) Comparison of the actual (black dots) and the reservoir-predicted (red dots) trajectories of the logistic map for Setup 1. (b) The logistic map for λ=3.8 (black) and the map obtained from the reservoir-predicted trajectory (red dots) of (a). (c) and (d) compare the actual and the reservoir-predicted trajectories for γ=105 and γ=5×105, respectively, for Setup 2.

FIG. 4.

(a) Comparison of the actual (black dots) and the reservoir-predicted (red dots) trajectories of the logistic map for Setup 1. (b) The logistic map for λ=3.8 (black) and the map obtained from the reservoir-predicted trajectory (red dots) of (a). (c) and (d) compare the actual and the reservoir-predicted trajectories for γ=105 and γ=5×105, respectively, for Setup 2.

Close modal
TABLE I.

Hyperparameters for predicting the logistic map.

Reservoir size N 500 
Average number of connections per node d⟩ 
Spectral radius of the reservoir’s adjacency matrix ρs 0.55 
Maximal coupling strength of the input to the reservoir χ 0.23 
Tikhonov regularization parameter α 2.15 × 10−11 
Reservoir size N 500 
Average number of connections per node d⟩ 
Spectral radius of the reservoir’s adjacency matrix ρs 0.55 
Maximal coupling strength of the input to the reservoir χ 0.23 
Tikhonov regularization parameter α 2.15 × 10−11 

In order to take into account non-stationarity, we must break the stationarity of the closed-loop dynamical system of Fig. 1. Instead of providing only the state of the dynamical system as input to the reservoir, we also provide the reservoir with the value of the driving parameter λn of the system during both the training and the prediction. We call this setup, illustrated in Fig. 5, “Setup 2.” (A similar scheme was very recently independently formulated in the pre-print Ref. 37 for studying crises.) We have coupled this parameter to the reservoir in the same way we couple the measured states of the dynamical system; i.e., each reservoir node is coupled to a single randomly selected element of the state vector u(t) or to the driving parameter. Using Setup 2, we will now demonstrate the successful application of RC to the prediction of the noiseless non-stationary logistic map. With an additional modification to this method, we will later extend its capabilities to successfully predict the stochastic logistic map as well.

FIG. 5.

(a) Open-loop training phase and (b) the closed-loop prediction phase of Setup 2. The rectangular boxes, as for the dashed rectangles in Figs. 1(a) and 1(b), contain the RC system consisting of the input layer (Win), the reservoir, and the output layer (Wout).

FIG. 5.

(a) Open-loop training phase and (b) the closed-loop prediction phase of Setup 2. The rectangular boxes, as for the dashed rectangles in Figs. 1(a) and 1(b), contain the RC system consisting of the input layer (Win), the reservoir, and the output layer (Wout).

Close modal

We assume that we have a set of measurements of the state of the system, as well as knowledge of the time variation of the parameter, {xn,λn}n=Tn=0, from some time in the past n=T to the present n=0, and we desire to predict the future climate of this non-stationary system. We note that the task of prediction where non-stationarity is due to time variation of a known parameter is often of great practical interest. For instance, suppose we are trying to predict how the Earth’s climate will change over time, and we have a set of measurements that describe the climate from the past as well as of relevant quantities that we think may directly influence the dynamical behavior of the terrestrial climate system, such as atmospheric CO2 concentration.41,42 We may be interested in forecasting how the climate would change in the future for different rates of increase of CO2 concentration in the atmosphere. In that case, we may treat CO2 concentration levels as a known time-dependent driving parameter of the system. With this in mind, we train the reservoir, as shown in Fig. 5(a), and then we employ the “closed-loop” configuration of Fig. 5(b) to predict the future states of the system. For the case with γ=105, ε=0, the results are shown in Fig. 4(c). The black dots represent the actual trajectory evolving under Eqs. (3a) and (3b), while the red dots represent the trajectory predicted by the reservoir computer. The reservoir computer was trained on a portion of the actual trajectory corresponding to the interval λ[3.81,3.82] (T=1000 data points). Although the reservoir quickly loses the ability to predict the precise time evolution of the true system orbit, as expected for a chaotic orbit, we see that it was able to replicate and predict the climate, including the various regime transitions: the intermittency transition near λ=3.828 from a chaotic orbit to a stable period three orbit, the cascade of period doubling bifurcations starting at λ3.845 leading to chaos near λ=3.857, and the crisis transition at the upper boundary of the period 3 window (λ3.856). Notice that the reservoir-predicted trajectory was able to accurately capture the effects of non-stationarity, for example, in the sudden transition from a period 3 orbit to a period 6 orbit discussed previously. We have also considered the reverse bifurcations (obtained as λn decreases with time), and they are well-predicted by the machine learning method; however, we omit these figures here but will show the corresponding figures for the next example (the Lorenz system, Sec. V). Figure 4(d) shows a similar result, but here, the system is drifting at a faster rate (γ=5×105). We see that this faster drift alters the details of the transition quite a bit (e.g., we cannot clearly see a period doubling cascade), while, the reservoir, this time trained on T=250 data points (over a similar range of λ as for the previous case), again accurately mimics the dynamics.

We can, more quantifiably, verify that the long-term dynamical behavior predicted by the reservoir computer is indeed characteristic of that of the true system as follows. We initiate a trajectory xn from an initial condition randomly selected from the interval (0,1). We train the reservoir over a portion of the actual trajectory and then predict forward in time. We repeat this many times, each time using a trajectory initiated from a different randomly chosen initial condition. [The number of iterations of this step is made large enough that Γ (defined below) appears to be well converged; i.e., Γ does not change appreciably when there is an increase in the number of iterates.] After prediction has started, for each of the actual and predicted trajectories, we collect those points in the trajectory, which are contained in the segment of the trajectory where λ[λ,λ+Δλ] for a predetermined small Δλ and a set of λ. For each such interval, we then record the orbit points for the actual and predicted trajectories from which we approximate the probability distribution functions of x on the intervals [λ,λ+Δλ]. This allows us to compare the x-distributions from the true system and from the RC prediction for different λ values. To assist in doing this, we define the distance between the two probability distributions as

Γ(λ)=2Δx|fa(x;λ)fp(x;λ)|dx,
(5)

where fa(x;λ) is the approximation to the cumulative probability distribution of x over a small interval [λ,λ+dλ] obtained from the ensemble of actual trajectories, fp(x;λ) is that which is obtained from the ensemble of reservoir-predicted trajectories, and Δx=xmaxxmin is the range in x of the attracting set. For each portion of the trajectory of the system for which we calculate Γ, the corresponding Δx is determined by taking the difference between the minimum and maximum values of the map points, xmaxxmin, over the entire ensemble of trajectories, which provides a good estimate of the size of the attractor over that portion of the trajectory. Without the normalization factor of 2/Δx, this metric is called the Wasserstein metric,88 a notion of distance between the two probability distributions. We choose to use this metric because it accounts for the underlying metric space from which the distributions are drawn and also because, since it uses cumulative distributions rather than histograms, it does not depend on an (arbitrary) choice of histogram binning width. This latter property is of particular importance in the analysis we perform. As an example, consider the situation of predicting an orbit in the periodic regime. If the reservoir-predicted orbit accurately captures the periodicity of the true orbit, but the predicted orbit values are slightly displaced from the true orbit values, then depending on the choice of the histogram binning width, a statistical distance that directly compares the predicted and true probability distributions (e.g., the Kullback–Leibler divergence89) may suggest that the prediction was either very good or very bad. This arbitrariness is absent when comparing the underlying cumulative probability distributions using the Wasserstein metric. An intuitive interpretation of the Wasserstein metric is as the minimum “work” needed to transform one probability distribution into the other, where we imagine that the distributions are approximated by histograms with very narrow bins, each bin containing very many samples, and that the transformation from one distribution to the other is accomplished by shifting samples to different bins, in terms of which, the work is then defined as the average over all samples of their x-length shifts. We added the normalization factor Δx/2 in our definition of Γ because in general, the integral has units that are the same as those of x and because we wish a small value of the metric Γ to have meaning independent of Δx [e.g., if we are considering the logistic map vs the Lorenz system (Sec. V)]. Figure 6(a) shows Γ(λ), calculated using an ensemble of 2000 randomly initiated trajectories, at locations throughout the period 3 window for the case with γ=105. We see that there are two instances where Γ(λ) peaks sharply. The first spike, near λ3.829, occurs when the actual system undergoes a transition from chaos to period 3 and the reservoir fails to predict the location of this transition. Note, however, that the narrow width of this spike indicates that this error of the transition point is fairly small. The second spike occurs when the reservoir fails to predict exactly when the internal crisis leads to the abrupt broadening of the three-piece chaotic attracting set into a single-band chaotic set near λ3.860. Figure 6(b) shows plots of cumulative probability distributions of the true system (solid curves) and for the RC prediction (dashed curves) at several values of λ indicated in Fig. 6(a) (corresponding to λ1=3.825, λ2=3.845, λ3=3.865, and Δλ=0.001). For those cases, we calculated Γ(λ1)=0.0211, Γ(λ2)=0.0064, and Γ(λ3)=0.0245. Excellent climate predictions are indicated by the close agreement of the solid and dashed curves in Fig. 6(b) and the small values of Γ in Fig. 6(a).

FIG. 6.

(a) Γ(λ) for a set of λ in the period 3 window and Δλ=0.001. (b) Comparison of the approximate cumulative probability distributions obtained from real trajectories (solid black, purple, and green curves) and predicted trajectories (dashed red, yellow, and blue curves) for λ1=3.825, λ2=3.845, and λ3=3.865 [vertical dashed lines in (a)]. The numerical approximations to fa(x,λ) and fp(x,λ) are taken as the fraction of sample values less than x. The agreement between the actual and predicted cumulative distributions indicates that the reservoir-predicted trajectories accurately represent the dynamics of the different regimes (i.e., the pre-transition chaos, the periodic orbit, and the post-transition chaos).

FIG. 6.

(a) Γ(λ) for a set of λ in the period 3 window and Δλ=0.001. (b) Comparison of the approximate cumulative probability distributions obtained from real trajectories (solid black, purple, and green curves) and predicted trajectories (dashed red, yellow, and blue curves) for λ1=3.825, λ2=3.845, and λ3=3.865 [vertical dashed lines in (a)]. The numerical approximations to fa(x,λ) and fp(x,λ) are taken as the fraction of sample values less than x. The agreement between the actual and predicted cumulative distributions indicates that the reservoir-predicted trajectories accurately represent the dynamics of the different regimes (i.e., the pre-transition chaos, the periodic orbit, and the post-transition chaos).

Close modal

We now consider the case where the time evolution of λn is not linear,

xn+1=λnxn(1xn),
(6a)
λn+1=λ0+λ1sin(θn)+m(θn/γ),
(6b)
θn+1=θn+γ.
(6c)

Figure 7(a) shows how the slowly changing parameter λn varies in time for θ0=π/8, γ=0.001, λ0=3.79, λ1=0.03, and m=105 [note that for this case, 0<λ1<(m/γ), and the variation of λn with time n is non-monotonic]. Figure 7(b) compares a trajectory evolving under the action of this map to one predicted by the reservoir computer, again using the method illustrated in Fig. 5. Here, we have trained the reservoir computer on a set of data (T=1000) where λ is changing almost linearly [see Figs. 7(a) and 7(b)]. After this training segment, we see that λn deviates more strongly from linear behavior, and remarkably, the reservoir computer can predict the corresponding changes to the nature of the dynamics of the trajectory. We see that the reservoir computer is able to predict regime transitions and dynamics which it was not exposed to during its training. After training, initially, λ continues to increase through λc1 and the dynamics experience a transition from chaotic to periodic via a tangent bifurcation, but shortly after, λ begins to decrease and the system undergoes an intermittency transition in the reverse direction (i.e., from periodic to chaotic) as λ decreases below λc1. After this reverse transition, λ rapidly increases and traverses the range of the period 3 window, which results in an intermittency transition from chaotic below λc1 to periodic above it, and then an abbreviated period doubling cascade starting near λc2 and a crisis transition from a three-piece chaotic attractor to a single-band chaotic attractor near λc3. The reservoir computer successfully predicts the multiple forward and backward regime transitions experienced by the actual system in this window, as shown in Figs. 7(b) and 7(c). Figure 7(d) shows Γ(λ), obtained by using an ensemble of 2000 randomly initiated trajectories and the corresponding reservoir predictions, over the entire prediction interval. Once again, we see local maxima in Γ(λ) at points near a regime transition, and, as before, this can be attributed to the reservoir predicting a transition slightly earlier or later than it actually occurs. Figure 7(c) explicitly compares fa(x;λ) and fp(x;λ) for θ1=π, θ2=1.5π, θ3=2.25π, and Δθ=0.05 (λ1=3.8253, λ2=3.8111, and λ3=3.8858, respectively). The calculated errors between the distributions are small Γ(θ1)=0.0195, Γ(θ2)=0.0342, and Γ(θ3)=0.0390, indicating good prediction of the climate at these points.

FIG. 7.

(a) Time dependence of λ. (b) Comparison of an actual trajectory of the system (black dots) and the corresponding predicted trajectory (red dots). (c) Comparison of the approximate cumulative probability distributions obtained from real trajectories (solid black, purple, and green curves) and predicted trajectories (dashed red, yellow, and blue curves) for θ1=π, θ2=1.5π, and θ3=2.25π (λ1=3.8253, λ2=3.8111, and λ3=3.8858, respectively) corresponding to the vertical green dashed lines in (d). (d) Γ(θ) over the prediction interval.

FIG. 7.

(a) Time dependence of λ. (b) Comparison of an actual trajectory of the system (black dots) and the corresponding predicted trajectory (red dots). (c) Comparison of the approximate cumulative probability distributions obtained from real trajectories (solid black, purple, and green curves) and predicted trajectories (dashed red, yellow, and blue curves) for θ1=π, θ2=1.5π, and θ3=2.25π (λ1=3.8253, λ2=3.8111, and λ3=3.8858, respectively) corresponding to the vertical green dashed lines in (d). (d) Γ(θ) over the prediction interval.

Close modal

Next, we consider the noisy (or stochastic) logistic map, given by Eq. (4). Once again, we assume that we have a set of measurements of xn and knowledge of the time variation of the parameter, {xn,λn}n=Tn=0 from some point in the past (n=T) up to the present (n=0), and we wish to predict the future trajectory of the system (n>0). Applying the method of Sec. IV A (Setup 2, Fig. 5) leads us to predicted trajectories, which do not accurately capture the noisy dynamics of the true system. Figure 8 shows this result for two different noise levels (λ is taken to drift linearly with γ=105).

FIG. 8.

A comparison of trajectories evolving under the noisy logistic map (black dots) and those predicted by the reservoir (red dots that overlay the black dots) for two noise levels (a) ε=0.0005 and (b) ε=0.002.

FIG. 8.

A comparison of trajectories evolving under the noisy logistic map (black dots) and those predicted by the reservoir (red dots that overlay the black dots) for two noise levels (a) ε=0.0005 and (b) ε=0.002.

Close modal

Note, in particular, the lack of spread in the red dots (as opposed to the black dots) in the period 3 range of λ, as well as the actual system’s (black dots) intermittent bursting away from the three chaotic attractor pieces preceding the interior crisis at the upper-λ boundary of the window (which, as seen from the red dots, for the prediction, occurs abruptly, without being preceded by bursting).

We see that despite training the reservoir computer using a dataset obtained from a noisy trajectory, in agreement with Ref. 90, we obtain a prediction that appears to approximate a noiseless trajectory. This is because the linear regression used in the training (see Sec. II) fits the Wout elements to many data pairs (xn,xn+1), thus averaging out the dynamical noise component (which has zero mean). That is, at time step n, if the input to the trained reservoir is xn, then its output is an approximation to λnxn(1xn), the result of applying the noiseless map to xn. To restore the effect of dynamical noise on the climate, we begin by representing the true noisy orbit by introducing an intermediate variable vn,

vn+1=λnxn(1xn)
(7)

in terms of which, by Eq. (4a), the true noisy value of xn is xn+1=vn+1+εn, where εn (the noise) is a random number with probability distribution ζ(εn). This can be diagrammatically represented as shown in Fig. 9(a), where an input xn is mapped according to the noiseless logistic map to vn+1. Based on this, we modify our previous non-stationary reservoir prediction scheme [Fig. 5(b)] by adding a stochastic noise component, as illustrated in Fig. 9(b), where v~n+1 is the reservoir approximation to vn+1 and ε~n is a random independent variable with probability distribution ζ~(ε~n), where ζ~ is our approximation to the actual noise probability distribution ζ. Given the training data time series {xn} used to train the RC system, we obtain ζ~, our approximation to ζ, as follows. Inputting λn and xn to the trained RC system, we obtain the corresponding v~n+1 for each n in the training time interval. Next, we determine ε~n via ε~n=xn+1v~n+1, where xn comes from the training data. We then make a histogram of this collection {ε~n} of values and take this histogram to be our approximate noise probability distribution ζ~.

FIG. 9.

(a) Diagrammatic representation of the action of the noisy logistic map, using the intermediate variable vn. (b) Diagram depicting how to approximate the dynamics of the noisy logistic map using a trained reservoir.

FIG. 9.

(a) Diagrammatic representation of the action of the noisy logistic map, using the intermediate variable vn. (b) Diagram depicting how to approximate the dynamics of the noisy logistic map using a trained reservoir.

Close modal

Once we have determined a suitable ζ~(ε~), we use it to generate samples ε~n for n in the prediction time range for use in the prediction phase, Fig. 9(b). Figure 10 shows results for two different noise levels with a linearly drifting λn (γ=105), again demonstrating good climate prediction.

FIG. 10.

(a) and (b) Comparison of trajectories evolving under the noisy logistic map (black dots) and those predicted by the reservoir (red dots that overlay the black dots) operating under the modified scheme with noise-injection [Fig. 9(b)] for two noise levels in the actual system (a) ε=0.0005 and (b) ε=0.002. (c) Γ(λ) over the prediction interval. (d) Approximate cumulative probability distributions from actual trajectories (solid curves) and predicted trajectories (dashed curves) for λ1=3.8325, λ2=3.845, and λ3=3.865 and Δλ=0.001 for ε=0.002.

FIG. 10.

(a) and (b) Comparison of trajectories evolving under the noisy logistic map (black dots) and those predicted by the reservoir (red dots that overlay the black dots) operating under the modified scheme with noise-injection [Fig. 9(b)] for two noise levels in the actual system (a) ε=0.0005 and (b) ε=0.002. (c) Γ(λ) over the prediction interval. (d) Approximate cumulative probability distributions from actual trajectories (solid curves) and predicted trajectories (dashed curves) for λ1=3.8325, λ2=3.845, and λ3=3.865 and Δλ=0.001 for ε=0.002.

Close modal

Since εn=xn+1vn+1 and ε~n=xn+1v~n+1, the accuracy of our numerical approximation of the distribution ζ by ζ~ depends on the accuracy of the reservoir’s approximation v~n+1 to the actual, noiseless system response vn+1. In the  Appendix, we discuss this issue further.

Having shown the reservoir computer’s ability to predict climate evolution and regime transitions in a simple discrete-time one-dimensional map, we now implement this approach for the continuous-time three-dimensional Lorenz system,72 

dxdt=σ(yx),
(8a)
dydt=x(ρz)y,
(8b)
dzdt=xyβz.
(8c)

For certain choices of the parameters σ, ρ, and β, typical initial conditions on the attractor of this system give rise to chaotic trajectories. Previous work shows the ability of reservoir computing to predict the short-term trajectories, as well as the long-term climate, of this system when the parameters are fixed.1,3–5,18 Here, we expand on those findings to show that we can predict the long-term dynamical behavior of this system when its parameters are drifting, making the trajectories statistically nonstationary.

We consider time varying ρ in Eq. (8b) while fixing σ=10 and β=8/3. To display how the system climate evolves as ρ varies, we take a Poincaré surface of section specified by dz/dt=xyβz=0 and d2z/dt2<0 (i.e., we consider maxima of z). We first construct a stationary system bifurcation diagram as follows. For each ρn=nΔρ and Δρ=0.01 in the interval [25,200], we allow a trajectory to evolve under this system and observe its stationary behavior (i.e., with ρn constant in time). Figure 11(a) shows the resulting plot of the z-coordinate maxima, zm, vs ρ, where zmz(tm) and tm denotes the time of the mth piercing of the surface of section [x(tm)y(tm)=βz(tm) for d2z/dt2<0 at t=tm]. In applying RC to predict the time evolution of this system, we restrict our attention to ρ in the range [140,170]. In this range [see Fig. 11(b)], there are eight types of regime transitions, four as ρ increases and four corresponding reverse transitions as ρ decreases. The transitions as ρ increases are (1) the crisis near ρ142 where the one-piece chaotic attractor splits into a two-piece chaotic attractor, (2) a cascade of pairwise chaotic band splittings beginning near ρ145, (3) a period halving cascade beginning near ρ146.5, and (4) an intermittency transition from a period 2 orbit to chaos near ρ166.

FIG. 11.

(a) Lorenz system bifurcation diagram for the interval ρ=[25,200]. (b) A zoomed-in view of the window in the dotted red oval of (a): (1) indicates the crisis transition from a one-piece chaotic attractor to a two-piece chaotic attractor, (2) (and the translucent yellow region) indicates the region of pairwise chaotic band splittings, (3) (and the translucent blue region) shows the region of a period halving cascade, and (4) indicates the intermittency transition from a period 2 orbit to chaos.

FIG. 11.

(a) Lorenz system bifurcation diagram for the interval ρ=[25,200]. (b) A zoomed-in view of the window in the dotted red oval of (a): (1) indicates the crisis transition from a one-piece chaotic attractor to a two-piece chaotic attractor, (2) (and the translucent yellow region) indicates the region of pairwise chaotic band splittings, (3) (and the translucent blue region) shows the region of a period halving cascade, and (4) indicates the intermittency transition from a period 2 orbit to chaos.

Close modal

We first demonstrate the reservoir’s ability to predict the four transitions as ρ increases linearly according to ρ(t)=ρ0+γt, where γ>0 and ρ0 is an initial value of ρ. We chose γ=0.1. As before with the logistic map (Fig. 5), we assume that we have available to us a set of measurements of the system’s state as well as knowledge of the time variation of the parameter ρ, {u(t),ρ(t)}t=Tt=0 from some point in the past t=T (taken as T170 for this example) up to the “present” t=0 in increments of Δt dictated by the sampling rate of the measurement process. We will use Δt=0.01 in the examples that follow. The state of the system is represented by u(t)=[x(t),y(t),z(t)]T. We train the reservoir using our measurement dataset and then predict forward for the duration of time it takes the system to traverse the interval of ρ that is of interest to us. We normalize each input to the reservoir, including the parameter ρ, by dividing it by its root-mean-square value over the training interval so that we do not introduce a large bias into the activation function of the reservoir nodes. Tables II and III list the choice of RC hyperparameters that we use for all prediction tasks in this paper involving the noiseless (noisy) Lorenz system. We then use the sequence {zm} of resulting z maxima to display the results. Specifically, we plot the difference between consecutive z maxima, zm+1zm, vs ρ in Fig. 12(a) and 12(b) for the trained reservoir (red dots) and the true orbit (black dots). We see that the reservoir was able to accurately capture all the transitions (for increasing ρ) discussed above, including the period halving cascade and the abrupt transition to chaos near ρ169. To more quantitatively verify that the reservoir predicts trajectories that mimic the long-term dynamical properties of the true system, we compare the cumulative distributions of z maxima for chosen intervals [ρ,ρ+Δρ], where Δρ=0.5, obtained from an ensemble of 2000 randomly initiated trajectories for both the true system, Eqs. (8), and the corresponding reservoir-predictions [see Figs. 12(c) and 12(d)]. Figure 12(d) shows a plot of Γ(ρ) for different values of ρ in the prediction window showing that this quantity is small throughout most of the interval. There is, however, a large spike near ρ=167, which corresponds to the transition from a period two orbit to a chaotic orbit. This spike occurs because, as was the case for the logistic map, the reservoir does not always get the exact location of the transition correct. However, the predicted transition point is still fairly close to the actual transition point. This can be seen by the immediate drop in Γ(ρ) after the spike [and from Figs. 12(a) and 12(b)]. In Fig. 12(c), we show the cumulative probability distributions for ρ1=139, ρ2=149, ρ3=159, and ρ4=169.4. The corresponding errors in the distributions were Γ(ρ1)=0.001, Γ(ρ2)=0.0021, Γ(ρ3)=0.0050, and Γ(ρ4)=0.0058. These small values confirm that the reservoir has accurately replicated the non-stationary climate of the system at these ρ values.

FIG. 12.

(a) (zm+1zm) vs ρ for the true system (black dots) compared with that predicted by the reservoir (red dots). (b) Zoomed-in view of the region enclosed by the dashed blue oval in (a). (c) Cumulative probability distributions of z maxima for the true trajectories (solid curves) and the reservoir-predicted trajectories (dashed curves) for ρ1=139, ρ2=149, ρ3=159, ρ4=169.4, and Δρ=0.5. (d) Γ(ρ) over the prediction interval. The vertical dashed lines indicate the points at which the cumulative probability distributions are shown in (c).

FIG. 12.

(a) (zm+1zm) vs ρ for the true system (black dots) compared with that predicted by the reservoir (red dots). (b) Zoomed-in view of the region enclosed by the dashed blue oval in (a). (c) Cumulative probability distributions of z maxima for the true trajectories (solid curves) and the reservoir-predicted trajectories (dashed curves) for ρ1=139, ρ2=149, ρ3=159, ρ4=169.4, and Δρ=0.5. (d) Γ(ρ) over the prediction interval. The vertical dashed lines indicate the points at which the cumulative probability distributions are shown in (c).

Close modal
TABLE II.

Hyperparameters for predicting the noiseless Lorenz system.

Reservoir size N 1000 
Average number of connections per node d⟩ 
Spectral radius of the reservoir’s adjacency matrix ρs 0.85 
Maximal coupling strength of the input to the reservoir χ 0.6775 
Tikhonov regularization parameter α 2.15 × 10−11 
Numerical integration time step = RC time step Δt 0.01 
Reservoir size N 1000 
Average number of connections per node d⟩ 
Spectral radius of the reservoir’s adjacency matrix ρs 0.85 
Maximal coupling strength of the input to the reservoir χ 0.6775 
Tikhonov regularization parameter α 2.15 × 10−11 
Numerical integration time step = RC time step Δt 0.01 
TABLE III.

Hyperparameters for predicting the noisy Lorenz system.

Reservoir size N 1000 
Average number of connections per node d⟩ 
Spectral radius of the reservoir’s adjacency matrix ρs 0.85 
Maximal coupling strength of the input to the reservoir χ 0.4 
Tikhonov regularization parameter α 1.29 × 10−10 
Numerical integration time step = RC time step Δt 0.01 
Reservoir size N 1000 
Average number of connections per node d⟩ 
Spectral radius of the reservoir’s adjacency matrix ρs 0.85 
Maximal coupling strength of the input to the reservoir χ 0.4 
Tikhonov regularization parameter α 1.29 × 10−10 
Numerical integration time step = RC time step Δt 0.01 

Next, we demonstrate the reservoir’s ability to predict the transitions in the opposite direction (i.e., for ρ decreasing in time); specifically, we take ρ(t)=ρ0+γt with γ=0.1 and ρ0 above the window. In addition, we also add white noise εz(t) to the right-hand side of the Lorenz equation for z(t), Eq. (8c), implemented by adding a random number sampled from a uniform distribution over the interval [0.05,0.05] at each Δt=0.01 time step. We use the method developed in Sec. III to approximate a distribution ζ~(ε~z) from which we can sample ε~z and re-inject approximate noise [Fig. 9(b)] into our prediction scheme to replicate the effects of the dynamical noise present in the actual system. In Fig. 13, we show the results of the reservoir prediction of this noisy non-stationary process. Figure 13(a) and its zoomed-in view, Fig. 13(b), compare a plot of zm+1zm vs ρ for a true trajectory and for that predicted by the reservoir. We see that the reservoir is able to anticipate the intermittency-type regime transition from chaos to a period two orbit at ρ165, the noisy period doubling starting near ρ154, and the noisy pairwise band-merging near ρ145 (as well as the noise-induced intermittency-like behavior just prior to the band-merging). Figure 13(d) shows Γ(ρ) for various values of ρ in the prediction window. Again, we see a spike near ρ165, which is due to the reservoir’s slightly inaccurate prediction of when the intermittency transition initiating the start of the window occurs. There is also a growing discrepancy between the reservoir predicted cumulative probability distribution and that obtained from actual trajectories toward the end of the prediction window due to error in the reservoir prediction of when the specific period doubling bifurcations and transition to chaos occur. Even so, although the predictions of these transitions are shifted from their true locations, their existence and occurrence are still well-anticipated. Figure 13(c) compares the approximate cumulative distributions at locations ρ1=139, ρ2=149, ρ3=159, and ρ4=168.5 as obtained from the true trajectories and as predicted by the reservoir for 2000 randomly initialized trajectories. The calculated errors at these points are small, Γ(ρ1)=0.0123, Γ(ρ2)=0.0077, Γ(ρ3)=0.0025, and Γ(ρ4)=0.0008, once again indicating that the reservoir has satisfactorily replicated the dynamics of the different regimes.

FIG. 13.

(zm+1zm) vs ρ from a numerical solution of the Lorenz equations (black dots) compared with the reservoir prediction (red dots that overlay the black dots). (b) Zoomed-in view of the region enclosed by the dashed blue oval in (a). (c) Cumulative probability distributions of zm from an ensemble of true (solid curves) and reservoir-predicted trajectories (dashed curves) for ρ1=139, ρ2=149, ρ3=159, ρ4=168.5, and Δρ=0.5. (d) Γ(ρ) vs ρ over the prediction interval [for case with ρ(t) decreasing in time]. The vertical dashed lines indicate the ρ values for which the cumulative probability distributions are shown in (c). (e) Γ(ρ) for the case of the same noisy Lorenz system but with ρ(t) increasing with time (γ=0.1).

FIG. 13.

(zm+1zm) vs ρ from a numerical solution of the Lorenz equations (black dots) compared with the reservoir prediction (red dots that overlay the black dots). (b) Zoomed-in view of the region enclosed by the dashed blue oval in (a). (c) Cumulative probability distributions of zm from an ensemble of true (solid curves) and reservoir-predicted trajectories (dashed curves) for ρ1=139, ρ2=149, ρ3=159, ρ4=168.5, and Δρ=0.5. (d) Γ(ρ) vs ρ over the prediction interval [for case with ρ(t) decreasing in time]. The vertical dashed lines indicate the ρ values for which the cumulative probability distributions are shown in (c). (e) Γ(ρ) for the case of the same noisy Lorenz system but with ρ(t) increasing with time (γ=0.1).

Close modal

For comparison, Fig. 13(e) shows Γ(ρ) for the same noisy Lorenz system but with ρ0 under the window and ρ(t) increasing with time; i.e., γ=+0.1. We note that this plot does not look similar to the previous case [Fig. 13(d)] with ρ(t) decreasing in that there is no spike in Γ near the start of the prediction window. We do, however, see that there is a relatively larger error in the reservoir’s prediction of the transition from a period 2 orbit to chaos.

We now revisit the case [Eq. (6) and Fig. 7 from Sec. III] for the noiseless Lorenz system, but for a case in which the parameter variation is non-monotonic,

ρ(t)=ρ0+ρ1sin(θ)+μt,
(9a)
θ(t)=γt,
(9b)

where we set ρ0=120, ρ1=10, γ=0.02, and μ=0.1. Figure 14(a) shows the time variation of ρ(t). Note that the training is over an interval of time during which ρ(t) is approximately linear and increasing. The prediction is then carried out as ρ(t) changes highly nonlinearly and, furthermore, contains an interval where ρ(t) decreases with time, dρ/dt<0. Figure 14(b), and the blow-up in Fig. 14(c), compares zm+1zm vs time for a true trajectory evolving according to the Lorenz system with that from the reservoir prediction. Figure 14(d) shows the values of Γ(θ) over the prediction interval. As with the previous cases, we see a spike near the intermittency transition from the period 2 orbit to chaos near θ2.67π (which corresponds to ρ165), which is due to the reservoir’s slightly inaccurate prediction of when the transition occurs. The approximate cumulative distributions computed over small intervals Δθ=0.05 are obtained at three times θ1=π, θ2=1.8π, and θ3=2.9π [shown in Fig. 14(e)]. At these θ values, the calculated errors between the distributions obtained from the actual orbits and those obtained from the reservoir predictions are small, Γ(θ1)=0.0006, Γ(θ2)=0.0014, and Γ(θ3)=0.0084. We note that, although the reservoir was trained over an interval during which ρ increased approximately linearly, it was still able to predict the transitions and climate as ρ changed nonlinearly. In particular, the reservoir was able to accurately capture the transition from the two-piece chaotic set to a single-band chaotic set near θ1.75π. After this point, ρ increases rapidly, leading to a transition from a period 2 orbit to chaos, which the reservoir is able to accurately predict. To summarize, our results for the parameter time variation, Eq. (9), show that the reservoir was able to replicate the long-term dynamical properties of the non-stationary Lorenz system when it was trained on a set of measurements where the driving parameter ρ varied differently than it did during prediction.

FIG. 14.

(a) ρ(t) as a function of t. (b) (zm+1zm) vs γt for the true trajectory (black dots) and for the reservoir prediction (red dots that overlay the black dots). (c) A zoomed-in view of the oval region indicated in (b). (d) Γ(θ) over the prediction interval. The vertical dashed lines indicate the points at which the cumulative probability distributions are shown in (e). (e) Cumulative distributions of the map obtained from the real trajectory (solid curves) and from the reservoir-predicted trajectory (dashed curves) at θ1=π, θ2=1.8π, and θ3=2.9π with Δθ=0.05.

FIG. 14.

(a) ρ(t) as a function of t. (b) (zm+1zm) vs γt for the true trajectory (black dots) and for the reservoir prediction (red dots that overlay the black dots). (c) A zoomed-in view of the oval region indicated in (b). (d) Γ(θ) over the prediction interval. The vertical dashed lines indicate the points at which the cumulative probability distributions are shown in (e). (e) Cumulative distributions of the map obtained from the real trajectory (solid curves) and from the reservoir-predicted trajectory (dashed curves) at θ1=π, θ2=1.8π, and θ3=2.9π with Δθ=0.05.

Close modal

We consider a scalar field w(x,t) in one spatial variable x and time t, with periodic boundary conditions w(x,t)=w(x+2π,t) and with w(x,t) satisfying the noisy, possibly non-stationary, Kuramoto-Sivashinsky (KS) equation,

w(x,t)t+2w(x,t)w(x,t)x+2w(x,t)x2+κ4w(x,t)x4=ξ(x,t),
(10)

where ξ(x,t) represents spatially dependent noise that is white in time and the parameter κ can be either constant in time, in which case, we write κ=κ0, or varying with time [κ=κ(t)]. Furthermore, for simplicity, following Christiansen et al.,91 we consider solutions w(x,t) having odd symmetry about x=0, w(x,t)=w(x,t), in which case we can expand w(x,t) in a Fourier series of the form

w(x,t)=ip=+ap(t)eipx,
(11)

where ap(t)=ap(t), ap(t) is real, and a0(t)=0. (In the general, non-symmetric case, the Fourier coefficients, rather than being purely imaginary, iap, would have real and imaginary parts.) In order that the noise not break this convenient symmetry, we take the noise to also have the same symmetry, in which case its Fourier representation has the same form as Eq. (11),

ξ(x,t)=ip=+ξp(t)eipx.
(12)

Hence, ξp(t) are real white noise terms that are uncorrelated for pp and delta-correlated in time, ξp(t)ξp(t)= (constant)×δ(tt), where the angle brackets denote an ensemble average.

Substituting (11) and (12) into (10), we obtain an infinite, coupled set of first order ordinary differential equations,91 

dap(t)/dt=(p2κp4)ap(t)pq=aq(t)apq(t)+ξp(t).
(13)

We solve Eq. (13) numerically using the Runge–Kutta fourth order method with a time step of 0.0001 and using a truncation approximation at pmax=16 (i.e., we set ap=0 for p>pmax). Figure 15 shows an example of a stationary system solution for w(x,t) and κ0=0.02975, where time is plotted horizontally, x is plotted vertically, and the value of w is color-coded.

FIG. 15.

A solution, w(x,t), of the KS equation for κ0=0.02975. The vertical axis shows x, the horizontal shows t (in seconds), and the values of w(x,t) are color-coded.

FIG. 15.

A solution, w(x,t), of the KS equation for κ0=0.02975. The vertical axis shows x, the horizontal shows t (in seconds), and the values of w(x,t) are color-coded.

Close modal

Considering the noiseless [ξ(x,t)=0], stationary (κ=κ0) version of Eq. (10), again following Ref. 91, we use a Poincaré map with surface of section, a1=0, to obtain a stationary system bifurcation diagram showing the various regime transitions and bifurcations. This is shown in Fig. 16(a), which plots the values of a6(tn) at many surface of section crossings (t=tn) vs κ0 for κ0 in the range 0.0296<κ0<0.02989. In this range of κ, we see chaotic dynamics, interspersed with periodic windows of different periodicities (e.g., the period 5 window near κ00.02985). We also see that the size of the attractor is larger for smaller values of κ0.

FIG. 16.

(a) Bifurcation diagram of the stationary Kuramoto–Sivashinsky system [Eq. (10)]. (b) Comparison of the actual a6(tn) (black dots) for a numerically integrated trajectory of Eq. (13) and the corresponding reservoir-predicted a6(tn) (red dots overlaying the black dots) for a linear drift in the parameter κ. (c) Γ(κ) over the period 5 prediction window. The dashed vertical green lines indicate the points at which the cumulative probability distributions are shown in (d). (d) Cumulative probability distributions of the map obtained from the numerical solutions of Eq. (13) (solid curves) and from the reservoir-predicted trajectories (dashed curves) at κ1=0.0298400,κ2=0.0298500,κ3=0.0298565, and Δκ=5×107 . (e) a6(tn) vs κ for a numerically determined trajectory of Eq. (13) (black dots) and that predicted by the trained reservoir (red dots). (f) Γ(κ) over the prediction window for the case in (e).

FIG. 16.

(a) Bifurcation diagram of the stationary Kuramoto–Sivashinsky system [Eq. (10)]. (b) Comparison of the actual a6(tn) (black dots) for a numerically integrated trajectory of Eq. (13) and the corresponding reservoir-predicted a6(tn) (red dots overlaying the black dots) for a linear drift in the parameter κ. (c) Γ(κ) over the period 5 prediction window. The dashed vertical green lines indicate the points at which the cumulative probability distributions are shown in (d). (d) Cumulative probability distributions of the map obtained from the numerical solutions of Eq. (13) (solid curves) and from the reservoir-predicted trajectories (dashed curves) at κ1=0.0298400,κ2=0.0298500,κ3=0.0298565, and Δκ=5×107 . (e) a6(tn) vs κ for a numerically determined trajectory of Eq. (13) (black dots) and that predicted by the trained reservoir (red dots). (f) Γ(κ) over the prediction window for the case in (e).

Close modal

Applying our RC prediction method for a non-stationary, noiseless case,

κ(t)=κ(0)+γt,
(14)

with parameters given in Table IV, we obtain the results in Fig. 16(b), comparing the RC prediction (red dots) with the numerical solution of the KS equation (black dots). Here, we set γ=5.5×108 and κ(0)=0.029825. In this case, the (relatively) fast drift of the parameter κ results in the loss of fine detail in the regime transitions in this period 5 window (e.g., we cannot discern a period doubling cascade in each of the five branches near the lower κ range of the window). Alternatively, we see a sudden transition from chaos below κ0.029844 to a period 5 orbit above it and similarly an abrupt transition from the period 5 orbit below κ0.029854 to chaos above it. We see from Fig. 16(b) that the reservoir system is able to capture the transitions from chaos to periodic and then back to chaos. The red dots represent the reservoir-predicted surface of section crossings of the a6-coordinate, a6(tn), and the black dots represent a6(tn) obtained from a numerical solution of Eq. (13). Figure 16(c) shows a plot of Γ(κ) over this period 5 prediction window, calculated using an ensemble of 1000 randomly selected initial conditions and for Δκ=5×107. As before, we see that the value of Γ is small throughout the window, indicating that the reservoir-predicted trajectories adequately replicate the non-stationary climate of this system. The spikes in Γ near κ0.029844 and κ0.029853 are due to the reservoir’s slightly inaccurate prediction of the exact location of the transition from chaotic to periodic motion and from periodic back to chaotic, respectively. The narrow width of these spikes indicates that this error is small. Figure 16(d) shows the cumulative probability distributions obtained from the numerically integrated solutions of Eq. (13) (solid curves) and from the reservoir-predicted trajectories (dashed curves) at κ1=0.0298400,κ2=0.0298500, and κ3=0.0298565. The corresponding values of Γ are Γ(κ1)=0.0026, Γ(κ2)=0.0038, and Γ(κ3)=0.0038. The agreement between the curves and the small values of Γ shows that the reservoir-predicted trajectories accurately predict the dynamics of the different regimes.

TABLE IV.

Hyperparameters for predicting the noiseless Kuramoto–Sivashinsky system.

Reservoir size N 2000 
Average number of connections per node d⟩ 
Spectral radius of the reservoir’s adjacency matrix ρs 0.45 
Maximal coupling strength of the input to the reservoir χ 0.6775 
Tikhonov regularization parameter α 1.67 × 10−7 
Numerical integration time step for Eq. (13) Δt 0.0001 
RC time step ΔtRC 0.005 
Reservoir size N 2000 
Average number of connections per node d⟩ 
Spectral radius of the reservoir’s adjacency matrix ρs 0.45 
Maximal coupling strength of the input to the reservoir χ 0.6775 
Tikhonov regularization parameter α 1.67 × 10−7 
Numerical integration time step for Eq. (13) Δt 0.0001 
RC time step ΔtRC 0.005 

To ensure reservoir stability during the closed-loop prediction phase for the non-stationary, noiseless case, we found it necessary to insert a small noise δu(t), sampled from a uniform distribution over the interval [δu,δu], into the reservoir update equation [Eq. (1)] at each time step,

r(t+Δt)=tanh(Ar(t)+Win(u(t)+δu(t))).
(15)

For the above example, we used δu=9×105. In addition, we used a “multi-pass” training scheme to obtain the above results as follows. The reservoir is run in the open-loop training configuration using Eq. (15), the training data {u(t),κ(t)}t=Tt0, and a realization of the input noise process {δu1(t)}t=Tt0. The reservoir states {r(t)} are collected. The reservoir, after re-initializing it to some starting value r(0), is again run in the open-loop configuration using Eq. (15), the same training data {u(t),κ(t)}t=Tt0, and a different realization of the input noise process {δu2(t)}t=Tt0. The reservoir states are collected again. This procedure is repeated a predetermined number of times (ten in this case), and a linear regression is performed on the collection of all resulting reservoir states to determine the output weight matrix Wout, which maps the collection of reservoir states {r(t)} to {u(t)}t=Tt0. (The closed-loop prediction phase is performed in the usual way, i.e., without adding any noise to the reservoir update equation.) This method uses a single set of training data and multiple realizations of an input noise process to allow for determination of a more robust Wout. (A detailed exposition and analysis of noise stabilization of machine learning prediction will be the subject of a future publication.)

Next, we apply our RC prediction method to the noisy, non-stationary case with γ=5×107, and, as in Sec. V, the noise (ξp) is numerically implemented by adding a random number at each iteration of our numerical solution of Eq. (13) for ap, 0<ppmax, where the random number, independent for each value of p, is uniformly chosen in the range [5×105,5×105]. Table V lists the hyperparameters used for this task. As with the previous case, we use the multi-pass training scheme with ten passes (i.e., we run the reservoir in the open-loop configuration ten times using the same set of training data and a different noise realization each time) and the noise sampled from the same distribution as before. We consider the range κ[0.02960,0.02988], in which, due to the rate of non-stationarity and added dynamical noise, we cannot discern the periodic windows [see the black dots in Fig. 16(e) representing the a6(tn) obtained from a numerical solution of Eq. (13)], which are visible in the stationary bifurcation diagram of Fig. 16(a). Here, we see that the size of the attractor gradually increases with decreasing κ. The reservoir-predicted trajectory [represented by the red dots in Fig. 16(e)] accurately captures this gradual increase in the size of the attractor. Figure 16(f) shows Γ(κ), obtained from 5000 randomly initialized trajectories, over the prediction window. From the small value of Γ over the prediction interval, we see that the reservoir-predicted trajectory also accurately captures the κ-dependent density variation of a6(tn). This indicates that the non-stationary climate of this system is well-replicated by the reservoir-predicted trajectories.

TABLE V.

Hyperparameters for predicting the noisy Kuramoto–Sivashinsky system.

Reservoir size N 2000 
Average number of connections per node d⟩ 
Spectral radius of the reservoir’s adjacency matrix ρs 0.65 
Maximal coupling strength of the input to the reservoir χ 0.9 
Tikhonov regularization parameter α 7.74 × 10−10 
Numerical integration time step for Eq. (13) Δt 0.0001 
RC time step ΔtRC 0.005 
Reservoir size N 2000 
Average number of connections per node d⟩ 
Spectral radius of the reservoir’s adjacency matrix ρs 0.65 
Maximal coupling strength of the input to the reservoir χ 0.9 
Tikhonov regularization parameter α 7.74 × 10−10 
Numerical integration time step for Eq. (13) Δt 0.0001 
RC time step ΔtRC 0.005 

In this paper, we have formulated and tested a machine learning method for prediction of the behavior or “climate” (defined in Sec. I), of non-stationary dynamical systems (both noiseless and noisy) from time series data of the state of the system and knowledge of the time dependence of a system parameter responsible for the non-stationary but with no further knowledge of the nature of the system to be predicted. The methods introduced in Sec. IV are based on a previously formulated machine learning feedback prediction scheme for the dynamics of stationary systems (Sec. II), which we modify in order to adapt it for the purpose of climate prediction of noiseless non-stationary dynamical systems (Sec. IV A) and of noisy non-stationary dynamical systems (Sec. IV B). In the case of prediction for noiseless non-stationary systems, the modified prediction method (Fig. 5) is the stationary system prediction method (Fig. 1) altered to include input of the known time-dependent parameter {and this input applies for both the training phase [Fig. 5(a)] and the prediction phase [Fig. 5(b)]}. In the case of noisy non-stationary systems, the prediction method (Fig. 9) contains a further alteration in which dynamical noise approximating that of the unknown system to be predicted is injected at the output side of the machine learning system; we describe how to obtain from the available time series data the required approximate probability distribution of the actual noise.

These methods are tested on non-stationary versions of three systems: a simple discrete-time map (the logistic map, Sec. IV), a continuous time set of three autonomous ordinary differential equations (the Lorenz 63 equations, Sec. V), and a spatiotemporal partial differential equation (the Kuramoto–Sivashinsky equation, Sec. VI). Evaluations for the resulting tests are made through comparison of orbit characteristics of the actual systems with those predicted using plots of orbits, cumulative probability distributions, and the Wasserstein metric for the distance between probability distributions.

Some dynamical orbit features whose predictions are tested include the following:

  • Apparently continuous evolution of the distribution of chaotic orbit densities in chaotic regions without apparent regime transitions.

  • Non-stationarity induced phenomenon, such as the apparent delay and increased abruptness of period doublings [Fig. 3(b)].

  • The effects of dynamical noise, e.g., the apparent blurring of features present in stationary system bifurcation diagrams and intermittent bursting out of the small multiple piece chaotic attractors preceding interior crisis transitions [Fig. 8 and 10(b)].

  • Various types of regime transitions, including the forward and backward versions of period doubling cascades of periodic orbits followed by pairwise splitting cascades of chaotic attractors, intermittency transitions between periodicity and chaos, and crisis transitions.

Results of our tests show that the overall dependence of the future pattern of climate variation with time is extremely well-predicted and that the quantitative agreement between the predicted and actual behavior is also remarkably good, although there are some notable small to moderate shifts in the occurrences of regime transitions. Based on these results, we expect that machine learning will be of great future utility as a means for predicting the statistical behavior (climate) of unknown, non-stationary dynamical systems from time series data. Resolution of the question of how, and whether, this expectation is realized for many potential practical applications (see Sec. I) awaits further study.

We thank Brian Hunt for insightful comments, suggestions, and advice. The authors acknowledge the University of Maryland supercomputing resources (http://hpcc.umd.edu) made available for conducting the research reported in this paper. This work was supported, in part, by the National Science Foundation (NSF) under Grant Nos. DMS-1813027 and DGE-1632976 and by the Defense Advanced Research Projects Agency (DARPA) grant (No. W31P4Q-20-C-0077).

The data that support the findings of this study are available from the corresponding author upon reasonable request.

As shown in Fig. 9, to approximately replicate the effects of uncorrelated stochastic forcing, we inject an uncorrelated noise term ε~n into the reservoir output at each time step. We desire that the distribution ζ~ of the injected noise ε~n is a reasonable approximation to the distribution ζ of the actual noise εn. To obtain this estimate, during the training, we take the difference between the reservoir output v~n+1 and the actual noisy system trajectory xn+1 as our noise estimate, ε~n=xn+1v~n+1. If the reservoir output v~n+1 were the actual noiseless output vn+1, then ε~n=εn, but the reservoir makes some error Δvn+1=v~n+1vn+1 and ε~n is only an approximation of the true noise; i.e., ε~n=εn+Δvn+1. Thus, to the extent that the reservoir error Δvn+1 is small compared to the actual noise εn, our approximation will be good. Noting that εn is an uncorrelated noise, while both vn+1 and v~n+1 come from deterministic mappings of a state xn that is independent of εn (where these mappings are the mapping of the system to be predicted and its reservoir approximation), we see that the reservoir error Δvn+1 is uncorrelated with εn. Letting ζΔ(Δv) denote the distribution function of Δv computed from the time series Δvn, by virtue of ε~n=εn+1+Δvn+1, ζ~ is given by the convolution

ζ~(ε~)=ζ(ε)ζΔ(ε~ε)dε.

Roughly speaking, the approximation ζ~(ε~) is obtained by broadening ζ(ε) by the width of ζΔ, which, if small compared to the width of ζ, may be expected to yield a good approximation of ζ. Using a flat-top distribution for ζ(ε) (as in our previous numerical experiments), we demonstrate this for the non-stationary logistic map for two noise levels [widths of ζ(ε)], 0.0005,0.002, and γ=105, as follows. An ensemble of randomly chosen initial conditions is evolved by the actual orbit xn, obtained using the logistic map, with noise εn, and the trained reservoir is used to obtain the time series, ε~n=xn+1v~n+1, where v~n+1 is the reservoir output when its input is xn. In addition, we also use our determination of the actual orbit xn to obtain the time series vn via Eq. (7). We then use these time series to form histogram approximations to ζ,ζ~, and ζΔ. Figures 17(a) and 17(b) show numerical results for the distributions ζΔ (green lines), ζ (blue lines), and ζ~ (black lines). Figure 17(a) shows a logistic case where the dynamical noise level is substantially larger than the reservoir error (width of ζΔ), and ζ~ provides a good approximation to ζ. Figure 17(b) provides an example where the reservoir error is of the same order as the noise, leading ζ~ to noticeably deviate from ζ, although we note that the reservoir training error could be decreased by using a larger reservoir with well-optimized hyperparameters so that ζ~ better approximates ζ.

FIG. 17.

(a) and (b) show comparisons of ζ~ (black lines), ζ (blue line), and ζΔ (green line) for different noise levels [widths of ζ(ε)], 0.002, and 0.0005, respectively.

FIG. 17.

(a) and (b) show comparisons of ζ~ (black lines), ζ (blue line), and ζΔ (green line) for different noise levels [widths of ζ(ε)], 0.002, and 0.0005, respectively.

Close modal
1
H.
Jaeger
and
H.
Haas
, “
Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication
,”
Science
304
,
78
(
2004
).
2
J.
Pathak
,
B. R.
Hunt
,
M.
Girvan
,
Z.
Lu
, and
E.
Ott
, “
Model-free prediction of large spatiotemporally chaotic systems from data: A reservoir computing approach
,”
Phys. Rev. Lett.
120
,
024102
(
2018
).
3
J.
Pathak
,
A.
Wikner
,
R.
Fussel
,
S.
Chandra
,
B.
Hunt
,
M.
Girvan
, and
E.
Ott
, “
Hybrid forecasting of chaotic processes: Using machine learning in conjunction with a knowledge based model
,”
Chaos
28
,
041101
(
2018
).
4
J.
Pathak
,
Z.
Lu
,
B. R.
Hunt
,
M.
Girvan
, and
E.
Ott
, “
Using machine learning to replicate chaotic attractors and calculate Lyapunov exponents from data
,”
Chaos
27
,
121102
(
2017
).
5
Z.
Lu
,
B. R.
Hunt
, and
E.
Ott
, “
Attractor reconstruction by machine learning
,”
Chaos
28
,
061104
(
2018
).
6
Z.
Shi
and
M.
Han
, “
Support vector echo-state machine for chaotic time series prediction
,”
IEEE Trans. Neural Netw.
18
,
359
(
2007
).
7
S.
Mukherjee
,
E.
Osuna
, and
F.
Girosi
, “Nonlinear prediction of chaotic time series using support vector machines,” in Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop (IEEE, 1997) pp. 511–520.
8
Q.
Ma
,
Q.
Zheng
,
H.
Peng
,
T.
Zhong
, and
L.
Xu
, “Chaotic time series prediction based on evolving recurrent neural networks,” in 2007 International Conference on Machine Learning and Cybernetics (IEEE, 2007), Vol. 6, pp. 3496–3500.
9
R.
Chandra
,
Y.-S.
Ong
, and
C.-K.
Goh
, “
Co-evolutionary multi-task learning with predictive recurrence for multi-step chaotic time series prediction
,”
Neurocomputing
243
,
21
34
(
2017
).
10
D.
Li
,
M.
Han
, and
J.
Wang
, “
Chaotic time series prediction based on a novel robust echo state network
,”
IEEE Trans. Neural Netw. Learn. Syst.
23
,
787
799
(
2012
).
11
M.
Sangiorgio
and
F.
Dercole
, “
Robustness of LSTM neural networks for multi-step forecasting of chaotic time series
,”
Chaos, Solitons Fractals
139
,
110045
(
2020
).
12
Z.
Tian
, “
Echo state network based on improved fruit fly optimization algorithm for chaotic time series prediction
,”
J. Ambient Intell. Humaniz. Comput.
(published online).
13
A.
Vaughan
and
S. V.
Bohac
, “
Real-time, adaptive machine learning for non-stationary, near chaotic gasoline engine combustion time series
,”
Neural Netw.
70
,
18
26
(
2015
).
14
A.
Griffith
,
A.
Pomerance
, and
D. J.
Gauthier
, “
Forecasting chaotic systems with very low connectivity reservoir computers
,”
Chaos
29
,
123108
(
2019
).
15
G.
Shen
,
J.
Kurths
, and
Y.
Yuan
, “
Sequence-to-sequence prediction of spatiotemporal systems
,”
Chaos
30
,
023102
(
2020
).
16
R.
Follmann
and
J. E.
Rosa
, “
Predicting slow and fast neuronal dynamics with machine learning
,”
Chaos
29
,
113119
(
2019
).
17
S.
Herzog
,
F.
Wörgötter
, and
U.
Parlitz
, “
Convolutional autoencoder and conditional random fields hybrid for predicting spatial-temporal chaos
,”
Chaos
29
,
123116
(
2019
).
18
A.
Chattopadhyay
,
P.
Hassanzadeh
, and
D.
Subramanian
, “
Data-driven predictions of a multiscale Lorenz 96 chaotic system using machine-learning methods: Reservoir computing, artificial neural network, and long short-term memory network
,”
Nonlinear Process. Geophys.
27
,
373
389
(
2020
).
19
D.
Chen
and
W.
Han
, “
Prediction of multivariate chaotic time series via radial basis function neural networks
,”
Complexity
18
,
55
(
2013
).
20
W.
Huang
,
Y.
Li
, and
Y.
Huang
, “
Deep hybrid neural network and improved differential neuroevolution for chaotic time series prediction
,”
IEEE Access
8
,
159552
159565
(
2020
).
21
H.
Fan
,
J.
Jiang
,
C.
Zhang
,
X.
Wang
, and
Y.-C.
Lai
, “
Long-term prediction of chaotic systems with machine learning
,”
Phys. Rev. Res.
2
,
012080
(
2020
).
22
A.
Faqih
,
A. P.
Lianto
, and
B.
Kusumoputro
, “Mackey-Glass chaotic time series prediction using modified RBF neural networks,” in Proceedings of the 2nd International Conference on Software Engineering and Information Management (ICSIM 2019) (Association for Computing Machinery, New York, 2019), pp. 7–11.
23
P.
Antonik
,
M.
Gulina
,
J.
Pauwels
, and
S.
Massar
, “
Using a reservoir computer to learn chaotic attractors, with applications in chaos synchronization and cryptography
,”
Phys. Rev. E
98
,
012215
(
2018
).
24
D.
Nguyen
,
S.
Ouala
,
L.
Drumetz
, and
R.
Fablet
, “EM-like learning chaotic dynamics from noisy and partial observations,” arXiv:1903.10335 (2019).
25
F. Z.
Xing
,
E.
Cambria
, and
X.
Zou
, “Predicting evolving chaotic time series with fuzzy neural networks,” in 2017 International Joint Conference on Neural Networks (IJCNN) (IEEE Press, 2017) pp. 3176–3183.
26
J. P.
Eckmann
and
D.
Ruelle
, “
Ergodic theory of chaos and strange attractors
,”
Rev. Mod. Phys.
57
,
617
(
1985
).
27
G.
Drótos
,
T.
Bódai
, and
T.
Tél
, “
Quantifying nonergodicity in nonautonomous dissipative dynamical systems: An application to climate change
,”
Phys. Rev. E
94
,
022214
(
2016
).
28
G.
Drótos
,
T.
Bódai
, and
T.
Tél
, “
Probabilistic concepts in a changing climate: A snapshot attractor picture
,”
J. Clim.
28
,
3275
3288
(
15 Apr. 2015
).
29
F. J.
Romeiras
,
C.
Grebogi
, and
E.
Ott
, “
Multifractal properties of snapshot attractors of random maps
,”
Phys. Rev. A
41
,
784
799
(
1990
).
30
M. D.
Chekroun
,
E.
Simonnet
, and
M.
Ghil
, “
Stochastic climate dynamics: Random attractors and time-dependent invariant measures
,”
Phys. D
240
,
1685
1700
(
2011
).
31
S. H.
Lim
,
L.
Theo Giorgini
,
W.
Moon
, and
J. S.
Wettlaufer
, “
Predicting critical transitions in multiscale dynamical systems using reservoir computing
,”
Chaos
30
,
123126
(
2020
).
32
M. J.
Feigenbaum
, “
Quantitative universality for a class of nonlinear transformations
,”
J. Stat. Phys.
19
,
25
52
(
1978
).
33
M. J.
Feigenbaum
, “
Universal behavior in nonlinear systems
,”
Phys. D
7
,
16–39
(
1983
).
34
Y.
Pomeau
and
P. M.
Manneville
, “
Intermittent transition to turbulence in dissipative dynamical systems
,”
Commun. Math. Phys.
74
,
189
197
(
1980
).
35
C.
Grebogi
,
E.
Ott
, and
J. A.
Yorke
, “
Crises, sudden changes in chaotic attractors, and transient chaos
,”
Phys. D
7
,
181
200
(
1983
).
36
C.
Grebogi
,
E.
Ott
, and
J. A.
Yorke
, “
Critical exponents of chaotic transients in nonlinear dynamical systems
,”
Phys. Rev. Lett.
57
,
1284
(
1986
).
37
L.-W.
Kong
,
H.-W.
Fan
,
C.
Grebogi
, and
Y.-C.
Lai
, “Machine learning prediction of critical transition and system collapse,” arXiv:2012.01545 (2019).
38
T. M.
Lenton
,
H.
Held
,
E.
Kriegler
,
J. W.
Wall
,
W.
Lucht
,
S.
Rahmstorf
, and
H. J.
Schellnhuber
, “
Tipping elements in the Earth’s climate system
,”
Proc. Natl. Acad. Sci. U.S.A.
105
,
1786
1793
(
2008
).
39
P.
Ashwin
and
A. S.
von der Heydt
, “
Extreme sensitivity and climate tipping points
,”
J. Stat. Phys.
179
,
1531
1552
(
2020
).
40
S. O.
Rasmussen
,
M.
Bigler
,
S. P.
Blockley
,
T.
Blunier
,
S. L.
Buchardt
,
H. B.
Clausen
,
I.
Cvijanovic
,
D.
Dahl-Jensen
,
S. J.
Johnsen
,
H.
Fischer
,
V.
Gkinis
,
M.
Guillevic
,
W. Z.
Hoek
,
J. J.
Lowe
,
J. B.
Pedro
,
T.
Popp
,
I. K.
Seierstad
,
J. P.
Steffensen
,
A. M.
Svensson
,
P.
Vallelonga
,
B. M.
Vinther
,
M. J.
Walker
,
J. J.
Wheatley
, and
M.
Winstrup
, “
A stratigraphic framework for abrupt climatic changes during the Last Glacial period based on three synchronized Greenland ice-core records: Refining and extending the INTIMATE event stratigraphy
,”
Quat. Sci. Rev.
106
,
14
28
(
2014
).
41
R. T.
Pierrehumbert
, “
Warming the world
,”
Nature
432
,
677
(
2004
).
42
A.
Zhang
,
G.
Knorr
,
G.
Lohmann
, and
S.
Barker
, “
Abrupt North Atlantic circulation changes in response to gradual CO2 forcing in a glacial climate state
,”
Nat. Geosci.
10
,
518
523
(
2017
).
43
Q.
Schiermeier
, “
A sea change
,”
Nature
439
,
256
260
(
2006
).
44
T. F.
Stocker
and
D. G.
Wright
, “
Rapid transitions of the ocean’s deep circulation induced by changes in surface water fluxes
,”
Nature
315
,
729
731
(
1991
).
45
K.
Zickfeld
,
B.
Knopf
,
V.
Petoukhov
, and
H. J.
Schellnhuber
, “
Is the Indian summer monsoon stable against global change?
Geophys. Res. Lett.
32
,
L15707
, https://doi.org/10.1029/2005GL022771 (
2005
).
46
Z.
Liu
and
M.
Alexander
, “
Atmospheric bridge, oceanic tunnel, and global climate teleconnections
,”
Rev. Geophys.
45
,
RG2005
, https://doi.org/10.1029/2005RG000172 (
2007
).
47
A.
Angstrom
, “
Teleconnections of climate changes in present times
,”
Geogr. Ann.
17
,
242
258
(
1935
).
48
J.
Kurths
,
A.
Agarwal
,
R.
Shukla
,
N.
Marwan
,
M.
Rathinasamy
,
L.
Caesar
,
R.
Krishnan
, and
B.
Merz
, “
Unravelling the spatial diversity of Indian precipitation teleconnections via a non-linear multi-scale approach
,”
Nonlinear Process. Geophys.
26
,
251
266
(
2019
).
49
T.
Lenton
and
J.
Ciscar
, “
Integrating tipping points into climate impact assessments
,”
Clim. Change
117
,
585
597
(
2013
).
50
D.
Lemoine
and
C. P.
Traeger
, “
Economics of tipping the climate dominoes
,”
Nat. Clim. Change
6
,
514
519
(
2016
).
51
A.
Hastings
and
D. B.
Wysham
, “
Regime shifts in ecological systems can occur with no warning
,”
Ecol. Lett.
13
,
464
472
(
2010
).
52
C.
Folke
,
S.
Carpenter
,
B.
Walker
,
M.
Scheffer
,
T.
Elmqvist
,
L.
Gunderson
, and
C.
Holling
, “
Regime shifts, resilience, and biodiversity in ecosystem management
,”
Annu. Rev. Ecol. Evol. Syst.
35
,
557
581
(
2004
).
53
T.
Amemiya
,
T.
Enomoto
,
A. G.
Rossberg
,
T.
Yamamoto
,
Y.
Inamori
, and
K.
Itoh
, “
Stability and dynamical behavior in a lake-model and implications for regime shifts in real lakes
,”
Ecol. Modell.
206
,
54
62
(
2007
).
54
M.
Scheffer
,
S. H.
Hosper
,
M.-L.
Meijer
,
B.
Moss
, and
E.
Jeppesen
, “
Alternative equilibria in shallow lakes
,”
Trends Ecol. Evol.
8
,
275
279
(
1993
).
55
B.
deYoung
,
M.
Barange
,
G.
Beaugrand
,
R.
Harris
,
R. I.
Perry
,
M.
Scheffer
, and
F.
Werner
, “
Regime shifts in marine ecosystems: Detection, prediction and management
,”
Trends Ecol. Evol.
23
,
402
409
(
2008
).
56
P. J.
Mumby
,
A.
Hastings
, and
H. J.
Edwards
, “
Thresholds and the resilience of Caribbean coral reefs
,”
Nature
450
,
98
101
(
2007
).
57
V.
Brovkin
,
M.
Claussen
,
V.
Petoukhov
, and
A.
Ganopolski
, “
On the stability of the atmosphere-vegetation system in the Sahara/Sahel region
,”
J. Geophys. Res.
103
,
31613
31624
, https://doi.org/10.1029/1998JD200006 (
1998
).
58
Y. R.
Zelnik
,
S.
Kinast
,
H.
Yizhaq
,
G.
Bel
, and
E.
Meron
, “
Regime shifts in models of dryland vegetation
,”
Philos. Trans. R. Soc. A
371
,
20120358
(
2013
).
59
S.
Ellner
and
P.
Turchin
, “
Chaos in a noisy world: New methods and evidence from time-series analysis
,”
Am. Nat.
45
,
343
347
(
1995
).
60
J. M.
Drake
and
B. D.
Griffen
, “
Early warning signals of extinction in deteriorating environment
,”
Nature
467
,
456
459
(
2010
).
61
M.
Krkošek
and
J. M.
Drake
, “
On signals of phase transitions in salmon population dynamics
,”
Proc. R. Soc. B: Biol. Sci.
281
,
20133221
(
2014
).
62
I. V.
Telesh
,
H.
Schubert
,
K. D.
Joehnk
,
R.
Heerkloss
,
R.
Schumann
,
M.
Feike
,
A.
Schoor
, and
S. O.
Skarlato
, “
Chaos theory discloses triggers and drivers of plankton dynamics in stable environment
,”
Sci. Rep.
9
,
20351
(
2019
).
63
C.
Trefois
,
P. M.
Antony
,
J.
Goncalves
,
A.
Skupin
, and
R.
Balling
, “
Critical transitions in chronic disease: Transferring concepts from ecology to systems medicine
,”
Curr. Opin. Biotechnol.
34
,
48
55
(
2015
).
64
B.
Litt
,
R.
Esteller
,
J.
Echauz
,
M.
D’Alessandro
,
R.
Shor
,
T.
Henry
,
P.
Pennell
,
C.
Epstein
,
R.
Bakay
,
M.
Dichter
, and
G.
Vachtsevanos
, “
Epileptic seizures may begin hours in advance of clinical onset: A report of five patients
,”
Neuron
30
,
51
64
(
2001
).
65
E.
Gopalakrishnan
,
Y.
Sharma
,
T.
John
et al., “
Early warning signals for critical transitions in a thermoacoustic system
,”
Sci. Rep.
6
,
35310
(
2016
).
66
I.
Dobson
, “
Observations on the geometry of saddle node bifurcation and voltage collapse in electrical power systems
,”
IEEE Trans. Circuits Syst. I Fundam. Theory Appl.
39
,
240
243
(
1992
).
67
C.
Kuehn
,
E. A.
Martens
, and
D. M.
Romero
, “
Critical transitions in social network activity
,”
J. Complex Netw.
2
,
141
152
(
2014
).
68
M.
Lukosevicius
and
H.
Jaeger
, “
Reservoir computing approaches to recurrent neural network training
,”
Comput. Sci. Rev.
3
,
127
149
(
2009
).
69
H.
Jaeger
, “The “echo state” approach to analyzing and training recurrent neural networks—With an erratum note,” GMD Technical Report No. 148, German National Research Center for Information Technology, Bonn, Germany, 2001.
70
W.
Maass
,
T.
Natschlager
, and
H.
Markram
, “
Real-time computing without stable states: A new framework for neural computation based on perturbations
,”
Neural Comput.
14
,
2531
2560
(
2002
).
71
I.
Goodfellow
,
Y.
Bengio
, and
A.
Courville
,
Deep Learning
(
MIT Press
,
2016
).
72
E.
Lorenz
, “
Deterministic nonperiodic flow
,”
J. Atmos. Sci.
20
,
130
(
1963
).
73
Y.
Kuramoto
, “
Diffusion-induced chaos in reaction systems
,”
Prog. Theor. Phys. Suppl.
64
,
346
367
(
1978
).
74
G. I.
Sivashinsky
, “
On flame propagation under conditions of stoichiometry
,”
SIAM J. Appl. Math.
39
,
67
82
(
1980
).
75
G.
Tanaka
,
T.
Yamane
,
J. B.
Héroux
,
R.
Nakane
,
N.
Kanazawa
,
S.
Takeda
,
H.
Numata
,
D.
Nakano
, and
A.
Hirose
, “
Recent advances in physical reservoir computing: A review
,”
Neural Netw.
115
,
100
123
(
2019
).
76
B.
Penkovsky
,
L.
Larger
, and
D.
Brunner
, “
Efficient design of hardware-enabled reservoir computing in FPGAs
,”
J. Appl. Phys.
124
,
162101
(
2018
).
77
Y. K.
Chembo
, “
Machine learning based on reservoir computing with time-delayed optoelectronic and photonic systems
,”
Chaos
30
,
013111
(
2020
).
78
C.
Fernando
and
S.
Sojakka
, “Pattern recognition in a bucket,” in Advances in Artificial Life, edited by W. Banzhaf, J. Ziegler, T. Christaller, P. Dittrich, and J. T. Kim (Springer, Berlin, 2003), pp. 588–597.
79
J. C.
Coulombe
,
M. C.
York
, and
J.
Sylvestre
, “
Computing with networks of nonlinear mechanical oscillators
,”
PLoS One
12
,
0178663
(
2017
).
80
K.
Harkhoe
and
G. Van
der Sande
, “
Delay-based reservoir computing using multimode semiconductor lasers: Exploiting the rich carrier dynamics
,”
IEEE J. Sel. Top. Quantum Electron.
25
,
1502909
(
2019
).
81
E.
Ott
,
Chaos in Dynamical Systems
(
Cambridge University Press
,
2002
).
82
C.
Grebogi
,
E.
Ott
, and
J.
Yorke
, “
Chaotic attractors in crisis
,”
Phys. Rev. Lett.
48
,
1507
(
1982
).
83
K.
Erguler
and
M.
Stumpf
, “
Statistical interpretations of the interplay between noise and chaos in the stochastic logistic map
,”
Math. Biosci.
216
,
90
99
(
2008
).
84
B.
Shraiman
,
C.
Wayne
, and
P.
Martin
, “
Scaling theory for noisy period-doubling transitions to chaos
,”
Phys. Rev. Lett.
46
,
935
(
1981
).
85
F.
Ledrappier
and
L. S.
Young
, “
Dimension formula for random transformations
,”
Commun. Math. Phys.
117
,
529
548
(
1988
).
86
L.
Yu
,
E.
Ott
, and
Q.
Chen
, “
Transition to chaos for random dynamical systems
,”
Phys. Rev. Lett.
65
,
2935
2938
(
1990
).
87
T.
Tél
,
T.
Bódai
,
G.
Drótos
,
T.
Haszpra
,
M.
Herein
,
B.
Kaszás
,
M.
Vincze
et al., “
The theory of parallel climate realizations
,”
J. Stat. Phys.
179
,
1496
1530
(
2020
).
88
S. S.
Vallender
, “
Calculation of the Wasserstein distance between probability distributions on the line
,”
Theory Probab. Appl.
18
,
784
786
(
1973
).
89
S.
Kullback
and
R. A.
Leibler
, “
On information and sufficiency
,”
Ann. Math. Stat.
22
,
79
86
(
1951
).
90
S.
Chandra
,
B. R.
Hunt
,
R.
Roy
, and
E.
Ott
, “Using machine learning to uncover the deterministic dynamics of systems subject to dynamical noise” (to be published).
91
F.
Christiansen
,
P.
Cvitanovic
, and
V.
Putkaradze
, “
Spatiotemporal chaos in terms of unstable recurrent patterns
,”
Nonlinearity
10
,
55
70
(
1997
).