We propose a new approach to dynamical system forecasting called data-informed-reservoir computing (DI-RC) that, while solely being based on data, yields increased accuracy, reduced computational cost, and mitigates tedious hyper-parameter optimization of the reservoir computer (RC). Our DI-RC approach is based on the recently proposed hybrid setup where a knowledge-based model is combined with a machine learning prediction system, but it replaces the knowledge-based component by a data-driven model discovery technique. As a result, our approach can be chosen when a suitable knowledge-based model is not available. We demonstrate our approach using a delay-based RC as the machine learning component in conjunction with sparse identification of nonlinear dynamical systems for the data-driven model component. We test the performance on two example systems: the Lorenz system and the Kuramoto–Sivashinsky system. Our results indicate that our proposed technique can yield an improvement in the time-series forecasting capabilities compared with both approaches applied individually, while remaining computationally cheap. The benefit of our proposed approach, compared with pure RC, is most pronounced when the reservoir parameters are not optimized, thereby reducing the need for hyperparameter optimization.

We present a data-driven method that can be used for forecasting of very complex time-dependent processes. Our proposed machine learning method is based on the well known concept of reservoir computing, where the innovative idea is to improve the performance and reliability of the prediction algorithm by an additional data-based model-inference stage. This hybrid data-informed reservoir computer (DI-RC) yields increased accuracy and reduced computational cost and mitigates tedious hyper-parameter optimization of the reservoir computer. The performance of the proposed method is tested on two prototypical dynamical systems, namely, the chaotic attractor of the Lorenz-63 system and the spatiotemporal dynamics of the Kuramoto–Sivashinsky equation. It is shown that the spectral properties of the time-dependent dynamics and the long time evolution up to several Lyapunov times are reproduced by our DI-RC method.

Predicting the evolution of dynamical systems is a problem of interest in many fields, ranging from predicting the evolution of a flame front,1,2 or the wind speed over an airplane wing,3,4 to forecasting the terrestrial weather or climate evolution.5–7 Classical approaches often rely on models derived from basic physical principles, whereas recent advances in machine learning have made it possible to predict the evolution of an unknown dynamical system based solely on past measured time series data.8 Alsharef et al.9 provides an overview over the different machine learning methods used for time-series prediction during the last few decades, ranging from linear models to automated machine learning frameworks. Methods based on neural networks trained via backpropagation, for example, the recently studied spatiotemporal transformer neural network10 or deep network structures,11,12 are computationally expensive, while approaches based on reservoir computing (introduced as echo-state networks13 or liquid state machines14) are computationally much less demanding. Echo-state networks have proven to perform well on forecasting tasks;15 however, the performance is sensitive to the network hyper-parameters, the optimization of which is a complex and costly task. Various methods have been studied to solve the hyper-parameter optimization problem, see Refs. 9 and 16 and references therein; however, these all mitigate one of the main benefits of RC, which is the reduced computational cost compared with other machine learning approaches.

It has been shown that combining data-driven machine learning approaches with an available imperfect knowledge-based model can significantly increase the time-horizon (valid prediction time) over which the system can be reliably predicted when compared to the case where the methods are applied individually.17–22 These hybrid approaches are appropriate when suitable (but imperfect) knowledge-based and data-based models are available, but neither fulfill the desired prediction task to a satisfactory level on their own. However, in many real situations, knowledge-based models of sufficient quality do not exist. For such cases, we propose a new approach, which is purely data-driven, called data-informed reservoir computing (DI-RC). The basic idea of DI-RC is the hybridization of a data-based machine learning technique, in this case reservoir computing, with a data-driven approach to find a model representation of a dynamical system. This way the complementary capabilities of two different machine learning based agents can be combined and used on the same data, yielding a better performance than each solitary agents. For the model representation agent, we use sparse identification of nonlinear dynamical systems (SINDy).23 SINDy has been demonstrated to find equations of complex dynamical systems, such as the Lorenz system, the Kuramoto–Sivashinsky equation, or the sheer flow around a cylinder, but it shows decreased performances for very short training data sets or coarse discretization/measurements of the system.23 

For the reservoir computing part of our DI-RC, we consider two configurations: a delay-based reservoir consisting of a single nonlinear dynamical node subject to a delayed feedback signal24–27 and a sparsely connected network of tanh functions, for example, as used in Ref. 28,29. We show that the DI-RC approach can improve the valid prediction time of chaotic systems, stabilize the prediction, and reduce the cost of time-consuming hyperparameter optimization. We see benefits to our approach especially for small reservoirs or for short training lengths. We also find through the analysis of the power spectrum that DI-RC captures the characteristics of the true attractor with higher accuracy than the two approaches used individually.

The model-free prediction of non-linear dynamical systems is an active field of research and has been realized using recurrent neural networks,30 echo-state networks,28,31–33 and nonlinear vector regression methods.34–36 For echo-state networks, the approximation property has been shown analytically.37,38 Nevertheless, in all these cases, a tedious optimization procedure is necessary to prepare the network or the nonlinear feature vector, and it has been shown that the best performing echo-state network does a direct polynomial regression in an implicit and computationally costly manner.33 With our approach, we want to present a purely data-based method that shows very good performance without the need for a time-consuming parameter optimization. A diagrammatic representation of our approach is shown in Fig. 1.

FIG. 1.

Sketch illustrating the DI-RC setup. The training data are utilized to train a SINDy agent and a reservoir computer. Improved prediction performance is achieved upon combining the trained SINDy model and the data for training the reservoir.

FIG. 1.

Sketch illustrating the DI-RC setup. The training data are utilized to train a SINDy agent and a reservoir computer. Improved prediction performance is achieved upon combining the trained SINDy model and the data for training the reservoir.

Close modal

This paper is structured as follows: Sec. II contains descriptions of reservoir computing, SINDy, and our proposed DI-RC approach. Section III introduces the benchmark tasks and the metrics we use to assess our results before we present our results in Sec. IV.

Reservoir computing is an architecture that maps an input into high-dimensional latent space via a fixed dynamical system (the reservoir), followed by mapping from the latent space to the output. The key feature of reservoir computing is that training consists solely of adjusting parameters of the output mapping layer. There are several ways of implementing this architecture. We focus on an implementation via a nonlinear dynamical node under the influence of a delayed-feedback signal to illustrate the DI-RC approach. However, we emphasize that the DI-RC approach is not dependent on the particular RC implementation and show results for an echo-state network13 in the supplementary material.

In both RC implementations, the training phase consists of feeding an input training sequence u ( j ) of length N t r into the reservoir and collecting a number N v of responses to each input step j into a state matrix S. The dimension of the state matrix is N t r × N v, with entries S j , k for k [ 0 , N v ).

A linear output layer T ^ o u t is trained to approximate the given target y ( t ) as
T ^ o u t [ r ( t ) , W o u t ] = r T ( t ) W o u t y ( t ) ,
(1)
where W o u t is a set of adjustable readout weights. The optimization of W o u t depends on the chosen metric and optimization method. In this paper, we use the well-known L 2 norm,
W o u t = min W o u t ( S W o u t Y 2 2 + λ W o u t 2 2 ) ,
(2)
where λ W o u t 2 is the Tikhonov regularization that suppresses overfitting.39 For all simulations, we chose λ = 10 6. Y is the target vector. W o u t and Y are matrices if the target consists of more than one dimensions and vectors if the target is a scalar. S is the state matrix that contains the reservoir states r ( t ),
S = [ r T ( t 1 ) r T ( t 2 ) r T ( t N t r ) ] = [ r 1 ( t 1 ) r 2 ( t 1 ) r N v ( t 1 ) r 1 ( t 2 ) r 2 ( t 2 ) r N v ( t 2 ) r 1 ( t N t r ) r 2 ( t N t r ) r N v ( t N t r ) ] ,
(3)
where [ t 1 , t 2 , , t N t r ] denotes t [ T , , 0 ].

For time-series prediction tasks, the target is set to y ( t ) = u ( t + Δ t ), and the reservoir computer is trained to make a one-step ahead prediction r T ( t ) W o u t u ( t + Δ t ).

The delay-based reservoir computing approach was introduced by Appeltant et al.24 and first realized for different photonic systems in Refs. 25–27. Here, the reservoir is formed by a single nonlinear dynamical node under the influence of a delayed-feedback signal, and the high-dimensional mapping from the latent space to the output is achieved via time-multiplexing. Reservoir computing setups of this type have shown promising prediction performances with simple hardware implementations40–44 (see also a recent review45), while theoretical analysis has provided deep insight into the configuration and optimization of delay-based reservoir computing.46–54 A dynamical system under the influence of delayed feedback and reservoir inputs can be written as r ˙ ( t ) = F [ r ( t ) , r ( t τ ) , I ( t ) ], where r ( t ) is the system state, r ( t τ ) is the system state delayed by the time τ, I ( t ) is a time-dependent input, and F is a linear or nonlinear transformation. The input data are applied piece-wise constant over an input time interval T at times t [ t m 1 , t m ), with t m = m T, m = 1 , , N t r.

To get a high-dimensional response, a mask function T i n is used for time-multiplexing. It is applied at every input time interval T and is T-periodic and piece-wise constant on N v intervals of length θ ( θ = T / N v). The values of T i n in this paper are i.i.d. from a uniform distribution [ 0 , 1 ], but also, other masks have been studied.55,56

The reservoir states r ( t ) are measured at the end of each of these intervals and are called virtual nodes. These virtual nodes take the role of the reservoir nodes from the classical reservoir computer approach. The timescale θ is a free parameter and can be chosen according to the specific task and reservoir setup.24,57 The piece-wise constant values of T i n correspond to input weights in the classic approach. The input I ( t ) to the dynamical system is given by I ( t ) = T i n ( t ) u ( t ), where T i n ( t ) is now a single scalar value taken from the input weights T i n at the time t. The responses r ( t ) of the reservoir are, thus, distributed over the time interval time T. The nth node response to the mth input is given at r ( m T + n θ ), where t = 0 models the time of the systems state r ( 0 ) at the start of the reservoir setup training.

For the delay-based reservoir computer, we will use a Stuart–Landau oscillator under the influence of a delayed-feedback signal, which models any dynamical system close to a Hopf bifurcation under the influence of a delayed-feedback signal. This is an often seen bifurcation in photonics, thus modeling a broad range of photonic setups. The equation of motion is given by
d d t Z ( t ) = [ p + η I ( t ) + i ω + γ | Z ( t ) | 2 ] Z ( t ) + κ e i ϕ Z ( t τ ) ,
(4)
where Z ( t ) and Z ( t τ ) are complex quantities describing the state of the system at time t and delayed time t τ, respectively. p, ω, γ are control parameters modeling the linear rate of change contribution, the local oscillation frequency, and the nonlinearity, respectively. κ and ϕ are the feedback strength and the phase shift, while η and I ( t ) are the reservoir input strength and input data, respectively. We set ω = 0, γ = 0.1, κ = 0.1, ϕ = 0, τ = 1.41 T (see also Refs. 49, 57, and 58), while p and η are chosen dependent on the specific tasks. We do not perform an extensive hyper-parameter optimization. The reservoir states needed to fill the state matrix defined in Eq. (3) are given by the squared amplitude r ( t ) = | Z ( t ) | 2.

Similar to the classical reservoir computer, training consists of linearly regressing the reservoir states against the target output as described in Eq. (2). Prediction is then performed in the closed-loop configuration. While deploying this closed-loop configuration, we add noise ξ to the inputs of the reservoir i.i.d. drawn from a normal distribution with zero mean and a standard deviation of 1; i.e., ξ N ( 0 , 1 ). We scale the noise with a noise strength parameter ε. Figure 2(a) depicts a flow chart of the whole process.

FIG. 2.

Flow diagrams for an autonomous operation of a reservoir computer (a) and SINDy (b). (a) Data are fed into the reservoir via a random fixed input layer and then processed by the reservoir. The switch is set in the vertical position for training the output layer on the target via ridge regression [Eq. (2)]. (b) Data are given to SINDy to find a model representation on which an additional linear regression output layer is applied. For both schemes, autonomous prediction is achieved via setting the switch to the horizontal position.

FIG. 2.

Flow diagrams for an autonomous operation of a reservoir computer (a) and SINDy (b). (a) Data are fed into the reservoir via a random fixed input layer and then processed by the reservoir. The switch is set in the vertical position for training the output layer on the target via ridge regression [Eq. (2)]. (b) Data are given to SINDy to find a model representation on which an additional linear regression output layer is applied. For both schemes, autonomous prediction is achieved via setting the switch to the horizontal position.

Close modal

SINDy, introduced by Kutz and co-workers,23 aims to find a dynamical model for a system via either a lasso regression or an iteratively thresholded ridge regression on some given measurements. Note that all implementations of SINDy in this paper used the open-source Python library pySINDy.59 

The goal of the SINDy method is to use measured data to obtain a model, which accurately emulates the dynamics of the system of interest. The dynamical system, which is to be obtained via SINDy, can generically be described by
d d t x ( t ) = f ( x ( t ) ) ,
(5)
where x ( t ) is the dynamical system state and f ( x ( t ) ) models any linear or nonlinear transformation of x ( t ). We assume to have data points at time points t [ T , , 0 ], which we will denote [ t 1 , t 2 , , t M ] and concatenate them into a vector
X T = ( x T ( t 1 ) , x T ( t 2 ) , , x T ( t M ) ) .
(6)
The same is done for the vector X ˙ of derivatives at every time-point. The derivatives may come from measurements or an approximation from the measured system states x ( t ). Now, SINDy approximates the function f ( x ( t ) ) as a linear combination of simple linear or nonlinear transformations of x ( t ), which form the feature library Θ ( X ). The library can contain polynomial, trigonometric, or other functions and can be generically written as
Θ ( X ) = [ | | | | | 1 X X P 2 sin ( X ) cos ( X ) | | | | | ] ,
(7)
where the transformations are performed element-wise on X. The choice of the feature vector itself is crucial to the performance and has to be adapted to the problem. To approximate the dynamics of the system, the last step is to find a vector of appropriate parameters W such that Θ ( x ( t ) ) W f ( x ( t ) ), which can be done again via minimization of a given metric. The pySINDy implementation deployed in this paper uses an iteratively thresholded ridge regression optimization problem,
W = min W Θ ( X ) W f ( X ) 2 2 + α W 2 2 .
(8)
Ridge regularization via α W penalizes large weights. SINDy additionally has the threshold parameter σ, that is, the minimum magnitude for a coefficient in the weight vector. Coefficients with magnitude below the threshold are set to zero. PySINDy iteratively performs least squares and masks out elements of the weight array W that are too small. See Ref. 59 for more information. If the basis in Eq. (7) spans the full phase space of the dynamical system, SINDy can find f ( x ( t ) ) via Eq. (8). An imperfect basis will only approximate a subspace of the dynamical system. We additionally post-process the output of SINDy via a ridge regression to make the comparisons to the reservoir computing and the DI-RC approach more consistent. After training the SINDy model, we numerically integrate the differential equation x ˙ ( t ) = Θ ( x ( t ) ) W (this is the model representation) forward from an initial condition to obtain a prediction. A flow diagram of SINDy is shown in Fig. 2(b).
In our proposed data-informed hybrid scheme (DI-RC), we will combine the two data-driven techniques mentioned above, i.e., RC and SINDy, into a single machine learning agent. Given a data set of u ( t ) for T < t < 0, we first train SINDy to find a model representation K. Given an initial condition, the model K is integrated forward to produce a Δ t prediction; i.e., K [ u ( t ) ] = t t + Δ t Θ ( u ( t ) ) W d t. Subsequently, the reservoir computer is fed with the same data as SINDy, and a reservoir state r ( t ) is obtained. The one-step forward prediction of the SINDy model and the reservoir state are then concatenated into a single state vector that is passed to the output layer. This can be concisely written as
T ^ o u t [ r ( t ) , K [ u ( t ) ] , W o u t ] u ( t + Δ t ) ,
(9)
where W o u t are the readout weights that we learn via ridge regression. After training, DI-RC can be operated in a closed-loop configuration for prediction (see the flow diagram in Fig. 3).
FIG. 3.

Flow diagram of DI-RC. The time-series data are given to SINDy, which is trained to find a model representation. The same data enter the reservoir. The output of the reservoir and the SINDy model are combined in the output layer on which a linear regression is performed. When closing the loop, an autonomous dynamical system is constructed that mimics the dynamics of the given data.

FIG. 3.

Flow diagram of DI-RC. The time-series data are given to SINDy, which is trained to find a model representation. The same data enter the reservoir. The output of the reservoir and the SINDy model are combined in the output layer on which a linear regression is performed. When closing the loop, an autonomous dynamical system is constructed that mimics the dynamics of the given data.

Close modal

The predictive performance of the reservoir computer, SINDy, and our DI-RC approach is tested on the following dynamical systems with complex dynamics:

The well-known Lorenz 63 system60 is described by the three coordinates:
x ˙ = σ ( y x ) ,
(10)
y ˙ = x ( ρ z ) y ,
(11)
z ˙ = x y β z .
(12)
We set σ = 10, ρ = 28, and β = 8 / 3 in order to be on the chaotic attractor and integrate numerically with a Runge–Kutta 4th-order solver and a step size of Δ t s i m = 0.01. We sample the system with a step size of Δ t s a m p = 0.1 to produce the input data u ( t ). All three discretized dynamical variables are provided as input to the reservoir, and the reservoir is trained to predict all three variables one step Δ t s a m p into the future. It is noted that the choice of Δ t s a m p changes the difficulty of the task, and a Δ t s a m p = 0.01, which we use for comparison later on, is a much simpler task to predict. In general, boundaries for the step size exist that depend on the chaotic attractor.61 A coarse grained sampling of the chaotic attractor can be seen in Fig. 4. The dynamics on the attractor is known to have a Lyapunov exponent of Λ L 0.91.
FIG. 4.

Distribution of valid prediction times t V for the autonomous Lorenz task performed with the DI-RC setup in phase space. The color of each dot represents t V while its position represents the initial condition on the Lorenz attractor.

FIG. 4.

Distribution of valid prediction times t V for the autonomous Lorenz task performed with the DI-RC setup in phase space. The color of each dot represents t V while its position represents the initial condition on the Lorenz attractor.

Close modal
The inhomogeneous Kuramoto–Sivashinsky equation1,2 is a partial differential equation that can exhibit chaotic behavior and is described by
u ˙ = ( 1 + μ cos ( 2 π x λ i n h ) ) u x x u x x x x u u x ,
(13)
where u is a spatially one-dimensional state variable on the periodic domain x [ 0 , 60 ], u ˙ models the time-derivative, and u x x and u x x x x are the second and fourth spatial derivatives, respectively. The non-constant coefficient before u x x introduces a spatial inhomogeneity if μ 0. We choose λ i n h = 5, which is an integer multiple of the spatial periodicity L = 60. The spatiotemporal evolution for the homogeneous case can be seen in the top row of Fig. 7. We simulate the system with a step size of Δ t s i m = 0.25 with an exponential time-differencing Runge–Kutta fourth order solver.62 Our integration time step to produce the input data u ( t ) is also our sampling time step ( Δ t s i m = Δ t s a m p). We calculate the Kuramoto–Sivashinsky’s largest Lyapunov exponent to be Λ K 0.079 for the homogeneous system.

Throughout the paper, all SINDy derived models for the Lorenz 63 and Kuramoto–Sivashinsky are numerically integrated via the same method and with the same integration time step. All the parameters used for the numeric simulations are summarized in Table I.

TABLE I.

Parameters, descriptions, and values if not stated otherwise. If two numbers in “Value” are given, they correspond to the setup for the Lorenz and Kuramoto–Sivashinsky task, respectively.

Parameter Description Value
NV  Reservoir size of delay-based RC  Varied 
Ntr  Training length  Varied 
T  Clock cycle  NVθ 
θ  Time distance between two virtual nodes  1/0.5 
λ  Regularization parameter  10−6 
α  Regularization parameter for SINDy  10−6 
p  Local linear response of delay-based RC  0.05/0.1 
η  Input strength of delay-based RC  0.01/0.1 
ω  Rotation of delay-based RC 
γ  Nonlinearity of delay-based RC  −0.1 
κ  Feedback rate of delay-based RC  0.1 
ϕ  Feedback phase of delay-based RC 
τ  Delay time of delay-based RC  1.41 ⋅ T 
Δtsamp  Sampling step  0.1/0.25 
ΛL  Lyapunov exponent Lorenz 63  0.91 
ΛK  Lyapunov exponent Kuramoto– Sivashinsky  0.079 
tV  Valid prediction time  Measured 
δM  Closed-loop map error  Measured 
Parameter Description Value
NV  Reservoir size of delay-based RC  Varied 
Ntr  Training length  Varied 
T  Clock cycle  NVθ 
θ  Time distance between two virtual nodes  1/0.5 
λ  Regularization parameter  10−6 
α  Regularization parameter for SINDy  10−6 
p  Local linear response of delay-based RC  0.05/0.1 
η  Input strength of delay-based RC  0.01/0.1 
ω  Rotation of delay-based RC 
γ  Nonlinearity of delay-based RC  −0.1 
κ  Feedback rate of delay-based RC  0.1 
ϕ  Feedback phase of delay-based RC 
τ  Delay time of delay-based RC  1.41 ⋅ T 
Δtsamp  Sampling step  0.1/0.25 
ΛL  Lyapunov exponent Lorenz 63  0.91 
ΛK  Lyapunov exponent Kuramoto– Sivashinsky  0.079 
tV  Valid prediction time  Measured 
δM  Closed-loop map error  Measured 

To quantify the quality of the time-series prediction, we use two different metrics, the closed-loop map error and the valid prediction time, which are explained in the following.

The long-term prediction quality in the closed-loop configuration is measured via the closed-loop map error δ M,
δ M = [ [ x i P x i 1 P ] [ x i T ( x i 1 P ) x i 1 P ] ] 2 [ x i T x i 1 T ] 2 ,
(14)
where x i P and x i T are the predicted and true system states, respectively, at the ith time step and x i T ( x i 1 P ) is the state at the ith time step given x i 1 P as initial condition to the true system. After training, we close the loop as shown in Figs. 2 and 3 and compute closed-loop map errors according to Eq. (14) over a long trajectory of 20 000 steps. This metric measures the difference in the integrated short-term flow over the reconstructed attractor. If the map error is closer to 1 than 0, then the reconstructed attractor is not similar to the true one; thus, the long-term behavior of the autonomous system differs significantly. The map error is averaged over 1000 different random initial conditions, each with their own training phase.
Complementary to the map error δ M, the valid prediction time t V captures the short-term prediction quality. It quantifies the agent’s capability to stay on the true trajectory. We define the prediction error as
δ V ( t ) = | x T ( t ) x P ( t ) ] | 2 | x T ( t ) x T ( t ) | 2 ,
(15)
where x T ( t ) and x P ( t ) are the true and predicted system states, respectively, x T ( t ) denotes the time-average of the argument up to time t, and | x T ( t ) | denotes the Euclidean norm. The time t V at which δ V ( t ) first exceeds a threshold, which we chose to be 0.4, is the valid prediction time,
t V = max { t | δ V ( t ) < 0.4 } .
(16)
We will measure t V in Lyapunov times of the true system, which captures the characteristic difficulty of the given task. Furthermore, t V is averaged over 1000 different random initial conditions (IC) that have been chosen to generate the Lorenz task (each with their own training phase). To visualize the strong dependence of t V on the IC, Fig. 4 shows the chosen IC coordinates in the Lorenz phase space via circles with colors representing t V. On average, the ICs close to the unstable origin lead to smaller t V. This could be due to the properties of the Lorenz attractor, which has been shown to have a larger instantaneous dimension at the origin and at the edge.63 

In previous work, we used the normalized root mean square error (NRMSE) as a benchmark metric for evaluating one-step-ahead predictions in time-series forecasting studies (open loop configuration), e.g., in.50,57,64 The NRMSE is a good measure for quantifying the regression quality of one-step ahead prediction (open loop) approaches. However, in closed-loop setups, the NRMSE is not a good measure as it will strongly depend on the evaluation time interval and the specific dynamics of the trajectories. We, thus, focus on the prediction time t V and map error measurements for our closed-loop setup (please see Fig. S1 in the supplementary material), where a comparison between the open loop NRMSE of one-step ahead prediction is compared to the valid prediction time in a closed-loop operation using the same reservoir).

We now present our results for the hybrid DI-RC approach and compare it to the performance of the two solitary components. For the reservoir computer, we use a Stuart–Landau oscillator with delayed-feedback (see Sec. II), while results for an echo-state reservoir are presented in the supplementary material. For the Lorenz system, we set the SINDy model feature basis to all monomials up to order two, and for the Kuramoto–Sivashinsky equation, we use all combinations of two functions with up to order fourth derivatives and a constant bias. The hyperparameters of the reservoir were chosen from a region of the parameter space where adequate performance can be expected, i.e., such that the system without an input is in a steady state, the mask step length is of the order of the system response time, and the feedback delay time is not resonant with the input clocktime T. The pump parameter p was chosen such that the system was far from criticality, where stronger damping of the dominant eigenvalues means that the system has shorter memory.50 However, apart from fulfilling these prerequisites, the reservoir hyperparameters were not optimized.

Figure 5 shows the results for the Lorenz 63 task computed for the RC (cyan), the SINDy (black), and DI-RC (red) approach. All metrics were calculated for 1000 different initial conditions (IC) of the true Lorenz system and the randomly chosen mask function T i n. The symbols in Fig. 5 depict the median of these 1000 simulations, while the errorbars give the interval of 25 % quantile to 75 % quantile. Please note that the errorbars are quite large due to the large number of ICs that we use if, e.g., compared to the results in Ref. 21, where only 20 different ICs are averaged. The statistical variations with the IC can be seen in Fig. S1 of the supplementary material. A much smaller deviation is found when different masks are used for a fixed IC. Figure 5(a) shows a scan over the reservoir size N v for large training data sets, while Fig. 5(b) shows the performance vs the training data length for a small reservoir of fixed size N v = 100. Looking at the valid prediction time [upper row in Fig. 5(a)], we can see that the SINDy model’s short-term prediction performance is limited to less than one Lyapunov time. The RC, instead, shows valid prediction times of more than three Lyapunov times (if the training length is sufficiently long and the reservoir size is increased to around 1000 nodes). Combining both schemes in the hybrid DI-RC shows improved performance compared with the individual approaches, particularly when the reservoir is comparably small with around 100 nodes. Both the RC approach and DI-RC do tend to poor performances if the training data set is too small, probably arising from the fact that the training set has not seen enough examples from the Lorenz attractor. However, both approaches make better short-term predictions than SINDy.

FIG. 5.

Lorenz 63 task with a delay-based reservoir: Valid prediction time (top row) and a map error (bottom row) plotted over the reservoir size N v (a) and the training length N t r (b). Black, cyan, and red symbols show solitary SINDy, solitary RC, and our hybrid DI-RC performance, respectively. The RC is a Stuart–Landau oscillator with feedback (Sec. II) with p = 0.05, γ = 0.1, η = 0.01, κ = 0.1, τ = 1.41 T, T = N v θ, λ = 10 6, and θ = 1; (a) N t r=1000 and (b) N v = 100.

FIG. 5.

Lorenz 63 task with a delay-based reservoir: Valid prediction time (top row) and a map error (bottom row) plotted over the reservoir size N v (a) and the training length N t r (b). Black, cyan, and red symbols show solitary SINDy, solitary RC, and our hybrid DI-RC performance, respectively. The RC is a Stuart–Landau oscillator with feedback (Sec. II) with p = 0.05, γ = 0.1, η = 0.01, κ = 0.1, τ = 1.41 T, T = N v θ, λ = 10 6, and θ = 1; (a) N t r=1000 and (b) N v = 100.

Close modal

Similar trends can be found for the long-term behavior, which is measured via the map error (the lower row in Fig. 5). The SINDy model uniformly exhibits a high map error of approximately 0.2 for all training lengths tested. The RC approach achieves better performance depending on the training length and reservoir size (the lowest map error is on the order of 10 2 for reservoir sizes approximately greater than 300 nodes). On the other hand, DI-RC achieves a map error of 10 2 more uniformly across a range of reservoir sizes (even 100 nodes) and training lengths (at around 200 training time units).

We note that the performance of the SINDy model depends strongly on the sampling time step of the time-series data. A SINDy model trained on measured time-series data from a Lorenz 63 system with Δ t s a m p = 0.01 returns model coefficients very close to those of the true system (a deviation of max. 10%, i.e., ε < 0.1); however, if trained on a time series sampled at Δ t s a m p = 0.1, like done in this paper, it returns model coefficients very different from those of the true system ( ε > 0.5). As an example, we give the equation found by SINDy (training length of 1000 system time units and Δ t s a m p = 0.1) below,
x ˙ = 17.041 x + 14.531 y + 0.266 x z 0.204 y z ,
(17)
y ˙ = 0.14 + 8.115 x + 5.218 y 0.428 x z 0.108 y z ,
(18)
z ˙ = 13.833 + 1.793 z + 1.651 x 2 0.996 x y + 0.508 y 2 0.176 z 2 .
(19)
Looking at Eqs. (17)(19) and comparing it with the real Lorenz 63 equations (10)–(12) indicates the SINDy model error, which is more than 50%. Nevertheless, these results are still able to support the reservoir computer prediction, as DI-RC significantly improves on all aspects. Interestingly, the model deviates much more from the true task than the usually chosen imperfect models in the hybrid approach21,65 and is still very efficient in improving the DI-RC performance. Table II exemplifies that by showing the median valid prediction time t V M for DI-RC and an imperfect model with an ε = 0.05 as used in Ref. 21 for two different sampling steps Δ s a m p, respectively. Only small reductions of t V M are found for our DI-RC for larger sampling steps even though the deviation of the SINDy model representation [Eqs. (17)(19)] is with ε > 0.5 much larger than the ε = 0.05 deviation from the true Lorenz system in the imperfect model approach.
TABLE II.

Comparison of the performance of our DI-RC and the imperfect model introduced in Ref. 21 with two different imperfections of ε and two different discretizations of the Lorenz attractor Δsamp. The median of the valid prediction time t V M is given in Lyapunov times, and the values of the 33% and 66% quantile are given in parentheses. The reservoir is the same as in Fig. 5 with Ntr = 1000 and Nv = 500.

DI-RC DI-RC Imp. model21  Imp. model21 
Δsamp = 0.1 Δsamp = 0.01 Δsamp = 0.1 Δsamp = 0.01
ε > 0.5 ε < 0.1 ε = 0.05 ε = 0.05
t V M = 3.1  t V M = 4.05  t V M = 5.1  t V M = 5.35 
(0.6 − 7.9)  (1.1 − 9.3)  (1.4 − 8.7)  (1.86 − 10.45) 
DI-RC DI-RC Imp. model21  Imp. model21 
Δsamp = 0.1 Δsamp = 0.01 Δsamp = 0.1 Δsamp = 0.01
ε > 0.5 ε < 0.1 ε = 0.05 ε = 0.05
t V M = 3.1  t V M = 4.05  t V M = 5.1  t V M = 5.35 
(0.6 − 7.9)  (1.1 − 9.3)  (1.4 − 8.7)  (1.86 − 10.45) 

To emphasize the benefits of using DI-RC, a normalized power spectral density (PSD) and stability analysis was performed for all three approaches for one example parameter setup. The results for the Lorenz 63’s z-variable are plotted in Fig. 6. The true system is shown in blue, the SINDy approach in dotted black, the RC approach in dashed cyan, and DI-RC in dashed red. The results indicate that SINDy finds a shifted PSD, missing the main frequency of the Lorenz 63 attractor by underestimating it. It also fails to reproduce the higher frequencies between 1.5 and 4. The RC prediction matches the correct frequencies but overestimates the valleys around the peaks. DI-RC on the other hand fits the main frequency, while it also manages to reproduce the additions of higher frequencies. From the PSD, we can, therefore, learn that DI-RC captures the characteristics of the data set attractor better than the other two approaches.

FIG. 6.

Lorenz 63 normalized power spectral density (PSD) for the true Lorenz system (blue) and all three data-driven approaches for the z-variable. The inset depicts the fraction of unstable trajectories for 1000 different random initial conditions. The same reservoir as in Fig. 5 with reservoir size N v = 100 and training length N t r = 100.

FIG. 6.

Lorenz 63 normalized power spectral density (PSD) for the true Lorenz system (blue) and all three data-driven approaches for the z-variable. The inset depicts the fraction of unstable trajectories for 1000 different random initial conditions. The same reservoir as in Fig. 5 with reservoir size N v = 100 and training length N t r = 100.

Close modal

The long-term stability of the predictions is visualized in the inset of Fig. 6. On the x axis, we have plotted the training length in system time units, while the y axis shows the percentage of unstable predictions over a testing time period of 5000 time units averaged over 1000 different initial conditions. By “unstable,” we mean any prediction trajectory, which diverged to large unphysical values. The cyan line again depicts the RC, for which we can see highly unstable behavior particularly for short training lengths. As the training length is increased, the fraction of unstable trajectories decreases. The SINDy model and DI-RC on the other hand produce stable predictions for all training lengths. We attribute this to SINDy’s small feature space (in this case made up of all state variable monomials of up to order 2). These results indicate that adding a data-driven approach like SINDy can help to improve predictions when the training data sets are small.

The results of testing the models on the Lorenz 63 system suggest that DI-RC outperforms both the SINDy model and the RC. There is an especially marked improvement in the case where only a short length of training data is available and/or a small RC is used. This could yield reduced computation costs from smaller reservoirs. We observe a longer valid prediction time as well as a higher fraction of stable predictions when using DI-RC.

We start by analyzing the spatially homogeneous Kuramoto–Sivashinsky equation ( μ = 0). An example of the spatiotemporal dynamics and the respective predictions of our data-driven models is shown in the four panels of Fig. 7. The three lower panels give the difference to this true time series for the RC, the SINDy, and DI-RC in a decreasing order. As can be seen by the increasing amount of green colors in these panels, DI-RC improves on the prediction performance with about three Lyapunov times of valid prediction time, i.e., t V 3.

FIG. 7.

Time evolution of a homogeneous Kuramoto–Sivashinsky system: top panel shows the true trajectory, while the prediction differences to the true time series for the RC, the SINDy, and DI-RC are plotted below. Parameters: N r = 2000, p = 0.1, γ = 0.1, η = 0.1, κ = 0.1, τ = 1.41 T, T = N r θ, λ = 10 6, and θ = 0.5 for the RC.

FIG. 7.

Time evolution of a homogeneous Kuramoto–Sivashinsky system: top panel shows the true trajectory, while the prediction differences to the true time series for the RC, the SINDy, and DI-RC are plotted below. Parameters: N r = 2000, p = 0.1, γ = 0.1, η = 0.1, κ = 0.1, τ = 1.41 T, T = N r θ, λ = 10 6, and θ = 0.5 for the RC.

Close modal
For a more complete analysis similar to what we have done for the Lorenz 63 task in Fig. 5, the performance of the SINDy model, the RC, and DI-RC are shown in Fig. 8 for increasing training and reservoir size. Looking first at the RC (cyan curve) and SINDy (black curve) approach, we see that their respective valid prediction times do not reach higher values than 0.5 Lyapunov times. We observe that the trained SINDy model’s coefficients tend to be slightly larger than those of the real system. An example output of SINDy with a training length of 500 system time units is given by
u ˙ = 0.05 u 1.01 u x x 1.04 u x x x x 0.99 u u x 0.091 u u x x .
(20)
The differences in the SINDy model representation given in Eq. (20) and the real system [Eq. (13)] are obviously sufficient to result in poor performance in the pure SINDy approach. An explanation of this could come from a badly chosen regularization parameter for SINDy. Nevertheless, the aim of this paper is not to use time-consuming hyperparameter optimization but on the contrary showing that those optimizations can be avoided by using data-driven approaches in conjunction. In comparison, DI-RC can predict up to two Lyapunov times (median) and up to 1.5 Lyapunov times when the reservoir size is small. The long-term prediction behavior shows similar results as the Lorenz 63 benchmark. For small reservoirs and small or large data sets, the RC approach tends to perform uniformly unstable reaching unphysical values outside of the graph limits, as can been seen in the lower right plot. The map errors of the solitary SINDy model approach tend to be uniformly worse than DI-RC. The only regime where this is not true is for very short training lengths, where the SINDy model performance comes close to that of DI-RC. For very large training data sets, the overall performance of DI-RC is significantly better than the two approaches used solitary.
FIG. 8.

Performance for the homogeneous Kuramoto–Sivashinsky task: Valid prediction time (upper row) and map error (lower row) plotted over the number of nodes N v (a) and the training length (b). Black, cyan, and red symbols show solitary SINDy, solitary RC, and our hybrid DI-RC performance, respectively. The same reservoir as in Fig. 5 but with p = 0.1, η = 0.1, and θ = 0.5: (a) N t r=1000 and (b) N v = 200.

FIG. 8.

Performance for the homogeneous Kuramoto–Sivashinsky task: Valid prediction time (upper row) and map error (lower row) plotted over the number of nodes N v (a) and the training length (b). Black, cyan, and red symbols show solitary SINDy, solitary RC, and our hybrid DI-RC performance, respectively. The same reservoir as in Fig. 5 but with p = 0.1, η = 0.1, and θ = 0.5: (a) N t r=1000 and (b) N v = 200.

Close modal

As we are running the three prediction schemes in an autonomous setup, it is insightful to look at the PSD to answer the question about how good they emulate the chaotic dynamics of the Kuramoto–Sivashinsky system. The results are shown in Fig. 9. It is obvious that the SINDy model (dotted black curve) behaves differently from the true system (PSD in a solid blue curve). The SINDy model does not capture the low-frequency behavior and instead emphasizes a frequency range, which is not prominent in the real system. The solitary RC (cyan) tends to do a good job in getting the qualitative behavior but overestimates the higher frequencies. DI-RC (red), on the other hand, matches the PSD very closely. The inset in Fig. 9 depicts the fraction of trajectories that are unstable. The solitary RC (cyan) tends to be unstable for most of the training lengths tested, while the solitary SINDy (black) approach has some unstable predictions for short training lengths, which become stable as the size of the training data set is increased. DI-RC (red) appears to produce stable predictions for all training lengths tested.

FIG. 9.

Homogeneous Kuramoto–Sivashinsky normalized PSD for the true system and all three data-driven approaches. The inset depicts the number of unstable trajectories for 1000 different random initial conditions. The same reservoir as in Fig. 8 with reservoir size N v = 200 and training length N t r = 400.

FIG. 9.

Homogeneous Kuramoto–Sivashinsky normalized PSD for the true system and all three data-driven approaches. The inset depicts the number of unstable trajectories for 1000 different random initial conditions. The same reservoir as in Fig. 8 with reservoir size N v = 200 and training length N t r = 400.

Close modal
So far, we have seen that DI-RC makes very good predictions for even small reservoir sizes and training lengths. One critical aspect we have not yet considered is when the chosen feature basis of the SINDy model does not contain the features of the true model. Recently, Zhang and Cornelius36 have shown that such basis imperfections can give rise to a catch-22 problem in the nonlinear vector regression approach.66 To give an answer to that question, we will, therefore, consider the spatially inhomogeneous Kuramoto–Sivashinsky equation and restrict the feature space of the SINDy model such that it cannot fully capture the spatial inhomogeneity. An example model representation for μ = 0.35 made via a SINDy model trained on a training interval of 1000 time units for the inhomogeneous Kuramoto–Sivashinsky case is given by
u ˙ = 0.04 u 0.92 u x x 0.95 u x x x x 0.87 u u x 0.02 u u u x 0.02 u u u x x 0.02 u u x x x 0.01 u u u x x x .
(21)
Equation (21) shows that SINDy underestimates the real coefficients [Eq. (13)], while struggling with the nonconstant coefficient. A few small additional features that do not appear in the original equation (13) compensate for this behavior.

Our simulations generally suggest that adding inhomogeneity results in decreasing performance of SINDy as the SINDy feature basis is not able to span the dynamical system. With decreasing SINDy performance, also, the benefit of using DI-RC is lost. We visualize this in Fig. 10 where the valid prediction times are plotted as a function of the reservoir size for increasing strength of the inhomogeneity μ (from top to bottom). The greater μ, the less beneficial SINDy performance and with it DI-RC. Our results indicate that DI-RC should only be used if the solitary approaches can contribute to the performance on their own.

FIG. 10.

Inhomogeneous Kuramoto–Sivashinsky task with an echo-state reservoir: Valid prediction time as a function of reservoir size for the RC (cyan), SINDy(black), and DI-RC(red). From top to bottom, three different inhomogeneity strengths μ are chosen. The parameters for the echo state RC are C = 3, r s p e c = 0.4, η = 0.9, λ = 10 7, α = 1, b = 1, and ε = 10 3.7 (see the supplementary material for details on the echo-state implementation).

FIG. 10.

Inhomogeneous Kuramoto–Sivashinsky task with an echo-state reservoir: Valid prediction time as a function of reservoir size for the RC (cyan), SINDy(black), and DI-RC(red). From top to bottom, three different inhomogeneity strengths μ are chosen. The parameters for the echo state RC are C = 3, r s p e c = 0.4, η = 0.9, λ = 10 7, α = 1, b = 1, and ε = 10 3.7 (see the supplementary material for details on the echo-state implementation).

Close modal

In this paper, we have shown a complete data-driven hybrid approach (DI-RC), which combines a class of recurrent neural network called reservoir computing with a sparse identification of nonlinear dynamical systems (SINDy). Our results indicate that the addition of SINDy can stabilize the reservoir computing predictions, while also enabling higher accuracy in short- and long-term predictions when used in conjunction. The advantages of this setup are threefold. First, with this method, a data-driven extension of the hybrid approach is achieved that automates the time-consuming derivation of imperfect models. Second, a hardware implementation of RC can be improved upon via the simple addition of a SINDy model, stabilizing its prediction and reducing the computational cost. Third, a costly optimization of RC can be mitigated, which is computationally expensive and, in the case of hardware implementations, sometimes impossible.

It is also worth noting that for our DI-RC, the specific methods SINDy and RC were used as examples and could be replaced with other data-driven approaches, such as LANDO67 for model discovery and some form of recurrent neural networks substituting the RC.

We present data about the statistical variations of the valid prediction times of our DI-RC simulations when different masks or different initial conditions are used. Furthermore, our DI-RC results for the echo-state reservoir computer are presented.

The authors would like to thank Michelle Girvan, Andrew Pomerance, Brian Hunt, and Edward Ott for fruitful discussions. This study was funded by the “Deutsche Forschungsgemeinschaft” (DFG) in the framework of SFB910 (Project B9) and Project No. 411803875.

The authors have no conflicts to disclose.

Felix Köster: Conceptualization (equal); Data curation (lead); Investigation (equal); Methodology (equal); Software (equal); Validation (equal); Visualization (equal); Writing – original draft (lead); Writing – review & editing (equal). Dhruvit Patel: Methodology (supporting); Software (supporting); Visualization (supporting). Alexander Wikner: Methodology (supporting); Software (supporting); Visualization (supporting). Lina Jaurigue: Investigation (equal); Methodology (equal); Validation (equal); Writing – review & editing (equal). Kathy Lüdge: Conceptualization (lead); Formal analysis (equal); Funding acquisition (lead); Methodology (lead); Project administration (lead); Resources (lead); Software (supporting); Supervision (lead); Validation (equal); Visualization (equal); Writing – original draft (supporting); Writing – review & editing (lead).

The data that support the findings of this study are available from the corresponding author upon reasonable request.

1.
Y.
Kuramoto
and
T.
Tsuzuki
, “
Persistent propagation of concentration waves in dissipative media far from thermal equilibrium
,”
Prog. Theor. Phys.
55
,
356
369
(
1976
).
2.
G. I.
Sivashinsky
, “
Nonlinear analysis of hydrodynamic instability in laminar flames—I. Derivation of basic equations
,”
Acta Astronaut.
4
,
1177
1206
(
1977
).
3.
E.
Musa
,
M.
Abdulla
, and
O.
Mohamed
, “
Numerical simulation of the flow around a light aircraft wing
,”
Int. Conf. Appl. Mech. Mechan. Eng.
15
,
1
13
(
2012
).
4.
M.
Rashaduddin
and
A.
Waheedullah
, “
A study on airflow over a plane
,”
Int. J. Sci. Eng. Techn.
5
,
9
(
2017
).
5.
P.
Neumann
,
P.
Düben
,
P.
Adamidis
,
P.
Bauer
,
M.
Brück
,
L.
Kornblueh
,
D.
Klocke
,
B.
Stevens
,
N.
Wedi
, and
J.
Biercamp
, “
Assessing the scales in numerical weather and climate predictions: Will exascale be the rescue?
,”
Philos. Trans. R. Soc. A
377
,
20180148
(
2019
).
6.
T.
Arcomano
,
I.
Szunyogh
,
A.
Wikner
,
J.
Pathak
,
B. R.
Hunt
, and
E.
Ott
, “
A hybrid approach to atmospheric modeling that combines machine learning with a physics-based numerical model
,”
J. Adv. Model. Earth Syst.
14
,
e2021MS002712
(
2022
).
7.
F.
Hassanibesheli
,
J.
Kurths
, and
N.
Boers
, “
Long-term ENSO prediction with echo-state networks
,”
Environ. Res.: Clim.
1
,
011002
(
2022
).
8.
G. E. P.
Box
,
G. M.
Jenkins
,
G. C.
Reinsel
, and
G. M.
Ljung
,
Time Series Analysis: Forecasting and Control
,
5th ed.
(
Wiley
,
Englewood Cliffs, NJ
,
2015
).
9.
A.
Alsharef
,
K.
Aggarwal
,
Sonia
,
M.
Kumar
, and
A.
Mishra
, “
Review of ML and AutoML solutions to forecast time-series data
,”
Arch. Comput. Methods Eng.
29
,
5297
5311
(
2022
).
10.
Y.
You
,
L.
Zhang
,
P.
Tao
,
S.
Liu
, and
L.
Chen
, “
Spatiotemporal transformer neural network for time-series forecasting
,”
Entropy
24
,
1651
(
2022
).
11.
Z.
Wu
,
Q.
Li
, and
H.
Zhang
, “
Chain-structure echo state network with stochastic optimization: Methodology and application
,”
IEEE Trans. Neural Netw. Learn. Syst.
33
,
1974
1985
(
2022
).
12.
S.
Zhong
,
X.
Xie
,
L.
Lin
, and
F.
Wang
, “
Genetic algorithm optimized double-reservoir echo state network for multi-regime time series prediction
,”
Neurocomputing
238
,
191
204
(
2017
).
13.
H.
Jaeger
, “The ‘echo state’ approach to analysing and training recurrent neural networks,” GMD Report No. 148, German National Research Institute for Computer Science (GMD), 2001.
14.
W.
Maass
,
T.
Natschläger
, and
H.
Markram
, “
Real-time computing without stable states: A new framework for neural computation based on perturbations
,”
Neural Comput.
14
,
2531
2560
(
2002
).
15.
D.
Brunner
,
M. C.
Soriano
, and
G.
Van der Sande
, Photonic Reservoir Computing, Optical Recurrent Neural Networks, edited by D. Brunner, M. C. Soriano, and G. Van der Sande (De Gruyter, Berlin, 2019).
16.
T.-C.
Chen
,
S. G.
Penny
,
T. A.
Smith
, and
J. A.
Platt
, “Next generation reservoir computing: An empirical data-driven expression of dynamical equations in time-stepping form,” arXiv:2201.05193 (2022).
17.
M. E.
Levin
and
A. M.
Stuart
, “
A framework for machine learning of model error in dynamical systems
,”
Commun. Am. Math. Soc.
2
,
283
(
2022
).
18.
P. C.
Fu
and
J. P.
Barford
, “
A hybrid neural network-first principles approach for modelling of cell metabolism
,”
Comput. Chem. Eng.
20
,
951
958
(
1996
).
19.
J.
Pathak
,
Z.
Lu
,
B. R.
Hunt
,
M.
Girvan
, and
E.
Ott
, “
Using machine learning to replicate chaotic attractors and calculate Lyapunov exponents from data
,”
Chaos
27
,
121102
(
2017
).
20.
J.
Pathak
,
B. R.
Hunt
,
M.
Girvan
,
Z.
Lu
, and
E.
Ott
, “
Model-free prediction of large spatiotemporally chaotic systems from data: A reservoir computing approach
,”
Phys. Rev. Lett.
120
,
024102
(
2018
).
21.
J.
Pathak
,
A.
Wikner
,
R.
Fussell
,
S.
Chandra
,
B. R.
Hunt
,
M.
Girvan
, and
E.
Ott
, “
Hybrid forecasting of chaotic processes: Using machine learning in conjunction with a knowledge-based model
,”
Chaos
28
,
041101
(
2018
).
22.
P. R.
Vlachas
,
W.
Byeon
,
Z. Y.
Wan
,
T. P.
Sapsis
, and
P.
Koumoutsakos
, “
Data-driven forecasting of high-dimensional chaotic systems with long short-term memory networks
,”
Proc. R. Soc. A
474
,
20170844
(
2018
).
23.
S. L.
Brunton
,
J. L.
Proctor
, and
J. N.
Kutz
, “
Discovering governing equations from data by sparse identification of nonlinear dynamical systems
,”
Proc. Natl. Acad. Sci. U.S.A.
113
,
3932
3937
(
2016
).
24.
L.
Appeltant
,
M. C.
Soriano
,
G.
Van der Sande
,
J.
Danckaert
,
S.
Massar
,
J.
Dambre
,
B.
Schrauwen
,
C. R.
Mirasso
, and
I.
Fischer
, “
Information processing using a single dynamical node as complex system
,”
Nat. Commun.
2
,
468
(
2011
).
25.
Y.
Paquot
,
F.
Duport
,
A.
Smerieri
,
J.
Dambre
,
B.
Schrauwen
,
M.
Haelterman
, and
S.
Massar
, “
Optoelectronic reservoir computing
,”
Sci. Rep.
2
,
287
(
2012
).
26.
L.
Larger
,
M. C.
Soriano
,
D.
Brunner
,
L.
Appeltant
,
J. M.
Gutierrez
,
L.
Pesquera
,
C. R.
Mirasso
, and
I.
Fischer
, “
Photonic information processing beyond turing: An optoelectronic implementation of reservoir computing
,”
Opt. Express
20
,
3241
3249
(
2012
).
27.
D.
Brunner
,
M. C.
Soriano
,
C. R.
Mirasso
, and
I.
Fischer
, “
Parallel photonic information processing at gigabyte per second data rates using transient states
,”
Nat. Commun.
4
,
1364
(
2013
).
28.
M.
Roy
,
S.
Mandal
,
C.
Hens
,
A.
Prasad
,
N. V.
Kuznetsov
, and
M.
Dev Shrimali
, “
Model-free prediction of multistability using echo state network
,”
Chaos
32
,
101104
(
2022
).
29.
T. L.
Carroll
, “
Optimizing memory in reservoir computers
,”
Chaos
32
,
023123
(
2022
).
30.
J. Z.
Kim
,
Z.
Lu
,
E.
Nozari
,
G. J.
Pappas
, and
D. S.
Bassett
, “
Teaching recurrent neural networks to infer global temporal structure from local examples
,”
Nat. Mach. Intell.
3
,
316
323
(
2021
).
31.
P.
Antonik
,
M.
Gulina
,
J.
Pauwels
, and
S.
Massar
, “
Using a reservoir computer to learn chaotic attractors, with applications to chaos synchronization and cryptography
,”
Phys. Rev. E
98
,
012215
(
2018
).
32.
J.
Liu
,
J. G.
Zhang
, and
Y. C.
Wang
, “
Secure communication via chaotic synchronization based on reservoir computing
,”
IEEE Trans. Neural Netw. Learn. Syst.
(published online)
.
33.
R.
Pyle
,
N.
Jovanovic
,
D.
Subramanian
,
K. V.
Palem
, and
A. B.
Patel
, “
Domain-driven models yield better predictions at lower cost than reservoir computers in Lorenz systems
,”
Philos. Trans. R. Soc. A
379
,
20200246
(
2021
).
34.
D. J.
Gauthier
,
I.
Fischer
, and
A.
Röhm
, “
Learning unseen coexisting attractors
,”
Chaos
32
,
113107
(
2022
).
35.
W. A. S.
Barbosa
and
D. J.
Gauthier
, “
Learning spatiotemporal chaos using next-generation reservoir computing
,”
Chaos
32
,
093137
(
2022
).
36.
Y.
Zhang
and
S. P.
Cornelius
, “A catch-22 of reservoir computing,” arXiv:2210.10211 (2022).
37.
A. G.
Hart
,
J. L.
Hook
, and
J. H.
Dawes
, “
Echo state networks trained by Tikhonov least squares are l2( μ) approximators of ergodic dynamical systems
,”
Physica D
421
,
132882
(
2021
).
38.
L.
Grigoryeva
,
A. G.
Hart
, and
J.-P.
Ortega
, “Learning strange attractors with reservoir systems,” arXiv:2108.05024 (2021).
39.
A. N.
Tikhonov
and
V. I.
Arsenin
, Solutions of Ill-Posed Problems, Scripta Series in Mathematics (Winston; distributed solely by Halsted Press, Washington, DC, New York, 1977).
40.
L.
Keuninckx
,
J.
Danckaert
, and
G.
Van der Sande
, “
Real-time audio processing with a cascade of discrete-time delay line-based reservoir computers
,”
Cogn. Comput.
9
,
315
326
(
2017
).
41.
L.
Larger
,
A.
Baylón-Fuentes
,
R.
Martinenghi
,
V. S.
Udaltsov
,
Y. K.
Chembo
, and
M.
Jacquot
, “
High-speed photonic reservoir computing using a time-delay-based architecture: Million words per second classification
,”
Phys. Rev. X
7
,
011015
(
2017
).
42.
K.
Takano
,
C.
Sugano
,
M.
Inubushi
,
K.
Yoshimura
,
S.
Sunada
,
K.
Kanno
, and
A.
Uchida
, “
Compact reservoir computing with a photonic integrated circuit
,”
Opt. Express
26
,
29424
29439
(
2018
).
43.
J.
Bueno
,
J.
Robertson
,
M.
Hejda
, and
A.
Hurtado
, “
Comprehensive performance analysis of a VCSEL-based photonic reservoir computer
,”
IEEE Photon. Technol. Lett.
33
,
920
923
(
2021
).
44.
K.
Kanno
,
A. A.
Haya
, and
A.
Uchida
, “
Reservoir computing based on an external-cavity semiconductor laser with optical feedback modulation
,”
Opt. Express
30
,
34218
34238
(
2022
).
45.
Y. K.
Chembo
, “
Machine learning based on reservoir computing with time-delayed optoelectronic and photonic systems
,”
Chaos
30
,
013111
(
2020
).
46.
J.
Dambre
,
D.
Verstraeten
,
B.
Schrauwen
, and
S.
Massar
, “
Information processing capacity of dynamical systems
,”
Sci. Rep.
2
,
514
(
2012
).
47.
L.
Grigoryeva
,
J.
Henriques
,
L.
Larger
, and
J.-P.
Ortega
, “
Optimal nonlinear information processing capacity in delay-based reservoir computers
,”
Sci. Rep.
5
,
12858
(
2015
).
48.
S.
Ortín
and
L.
Pesquera
, “
Reservoir computing with an ensemble of time-delay reservoirs
,”
Cognit. Comput.
9
,
327
336
(
2017
).
49.
A.
Röhm
,
L. C.
Jaurigue
, and
K.
Lüdge
, “
Reservoir computing using laser networks
,”
IEEE J. Sel. Top. Quantum Electron.
26
,
7700108
(
2019
).
50.
F.
Köster
,
S.
Yanchuk
, and
K.
Lüdge
, “
Insight into delay based reservoir computing via eigenvalue analysis
,”
J. Phys.: Photonics
3
,
024011
(
2021
).
51.
F.
Köster
,
S.
Yanchuk
, and
K.
Lüdge
, “
Master memory function for delay-based reservoir computers with single-variable dynamics
,”
IEEE Trans. Neural Netw. Learn. Syst.
(published online)
.
52.
L. C.
Jaurigue
,
E.
Robertson
,
J.
Wolters
, and
K.
Lüdge
,
Photonic reservoir computing with non-linear memory cells: Interplay between topology, delay and delayed input
,”
Proc. SPIE
12204
,
1220408
(
2022
).
53.
T. L.
Carroll
and
J. D.
Hart
, “
Time shifts to reduce the size of reservoir computers
,”
Chaos
32
,
083122
(
2022
).
54.
T.
Hülser
,
F.
Köster
,
K.
Lüdge
, and
L. C.
Jaurigue
, “
Deriving task specific performance from the information processing capacity of a reservoir computer
,”
Nanophotonics
12
,
937
(
2023
).
55.
J.
Nakayama
,
K.
Kanno
, and
A.
Uchida
, “
Laser dynamical reservoir computing with consistency: An approach of a chaos mask signal
,”
Opt. Express
24
,
8679
8692
(
2016
).
56.
Y.
Kuriki
,
J.
Nakayama
,
K.
Takano
, and
A.
Uchida
, “
Impact of input mask signals on delay-based photonic reservoir computing with semiconductor lasers
,”
Opt. Express
26
,
5777
5788
(
2018
).
57.
T.
Hülser
,
F.
Köster
,
L. C.
Jaurigue
, and
K.
Lüdge
, “
Role of delay-times in delay-based photonic reservoir computing
,”
Opt. Mater. Express
12
,
1214
1231
(
2022
).
58.
F.
Köster
,
D.
Ehlert
, and
K.
Lüdge
, “
Limitations of the recall capabilities in delay based reservoir computing systems
,”
Cogn. Comput.
(
2020
).
59.
B. M.
de Silva
,
K.
Champion
,
M.
Quade
,
J.-C.
Loiseau
,
J. N.
Kutz
, and
S. L.
Brunton
, “PySINDy: A Python package for the sparse identification of nonlinear dynamics from data,” arXiv:2004.08424 (2020).
60.
E. N.
Lorenz
, “
Deterministic nonperiodic flow
,”
J. Atmos. Sci.
20
,
130
(
1963
).
61.
K.
Tsuchiyama
,
A.
Röhm
,
T.
Mihana
,
R.
Horisaki
, and
M.
Naruse
, “Effect of temporal resolution on the reproduction of chaotic dynamics via reservoir computing,” arXiv:2302.10761 (2023).
62.
A.-K.
Kassam
and
L. N.
Trefethen
, “
Fourth-order time-stepping for stiff PDEs
,”
SIAM J. Sci. Comput.
26
,
1214
1233
(
2005
).
63.
T.
Alberti
,
D.
Faranda
,
V.
Lucarini
,
R. V.
Donner
,
B.
Dubrulle
, and
F.
Daviaud
, “
Scale dependence of fractal dimension in deterministic and stochastic Lorenz-63 systems
,”
Chaos
33
,
023144
(
2023
).
64.
L. C.
Jaurigue
,
E.
Robertson
,
J.
Wolters
, and
K.
Lüdge
, “
Reservoir computing with delayed input for fast and easy optimization
,”
Entropy
23
,
1560
(
2021
).
65.
A.
Wikner
,
J.
Pathak
,
B. R.
Hunt
,
I.
Szunyogh
,
M.
Girvan
, and
E.
Ott
, “
Using data assimilation to train a hybrid forecast system that combines machine-learning and knowledge-based components
,”
Chaos
31
,
053114
(
2021
).
66.
D. J.
Gauthier
,
E. M.
Bollt
,
A.
Griffith
, and
W. A. S.
Barbosa
, “
Next generation reservoir computing
,”
Nat. Commun.
12
,
5564
(
2021
).
67.
P. J.
Baddoo
,
B.
Herrmann
,
B. J.
McKeon
, and
S. L.
Brunton
, “
Kernel learning for robust dynamic mode decomposition: Linear and nonlinear disambiguation optimization (LANDO)
,”
Proc. R. Soc. A
478
,
20210830
(
2022
).
Published open access through an agreement with Technische Informationsbibliothek

Supplementary Material