We propose a new approach to dynamical system forecasting called data-informed-reservoir computing (DI-RC) that, while solely being based on data, yields increased accuracy, reduced computational cost, and mitigates tedious hyper-parameter optimization of the reservoir computer (RC). Our DI-RC approach is based on the recently proposed hybrid setup where a knowledge-based model is combined with a machine learning prediction system, but it replaces the knowledge-based component by a data-driven model discovery technique. As a result, our approach can be chosen when a suitable knowledge-based model is not available. We demonstrate our approach using a delay-based RC as the machine learning component in conjunction with sparse identification of nonlinear dynamical systems for the data-driven model component. We test the performance on two example systems: the Lorenz system and the Kuramoto–Sivashinsky system. Our results indicate that our proposed technique can yield an improvement in the time-series forecasting capabilities compared with both approaches applied individually, while remaining computationally cheap. The benefit of our proposed approach, compared with pure RC, is most pronounced when the reservoir parameters are not optimized, thereby reducing the need for hyperparameter optimization.

We present a data-driven method that can be used for forecasting of very complex time-dependent processes. Our proposed machine learning method is based on the well known concept of reservoir computing, where the innovative idea is to improve the performance and reliability of the prediction algorithm by an additional data-based model-inference stage. This hybrid data-informed reservoir computer (DI-RC) yields increased accuracy and reduced computational cost and mitigates tedious hyper-parameter optimization of the reservoir computer. The performance of the proposed method is tested on two prototypical dynamical systems, namely, the chaotic attractor of the Lorenz-63 system and the spatiotemporal dynamics of the Kuramoto–Sivashinsky equation. It is shown that the spectral properties of the time-dependent dynamics and the long time evolution up to several Lyapunov times are reproduced by our DI-RC method.

## I. INTRODUCTION

Predicting the evolution of dynamical systems is a problem of interest in many fields, ranging from predicting the evolution of a flame front,^{1,2} or the wind speed over an airplane wing,^{3,4} to forecasting the terrestrial weather or climate evolution.^{5–7} Classical approaches often rely on models derived from basic physical principles, whereas recent advances in machine learning have made it possible to predict the evolution of an unknown dynamical system based solely on past measured time series data.^{8} Alsharef *et al.*^{9} provides an overview over the different machine learning methods used for time-series prediction during the last few decades, ranging from linear models to automated machine learning frameworks. Methods based on neural networks trained via backpropagation, for example, the recently studied spatiotemporal transformer neural network^{10} or deep network structures,^{11,12} are computationally expensive, while approaches based on reservoir computing (introduced as echo-state networks^{13} or liquid state machines^{14}) are computationally much less demanding. Echo-state networks have proven to perform well on forecasting tasks;^{15} however, the performance is sensitive to the network hyper-parameters, the optimization of which is a complex and costly task. Various methods have been studied to solve the hyper-parameter optimization problem, see Refs. 9 and 16 and references therein; however, these all mitigate one of the main benefits of RC, which is the reduced computational cost compared with other machine learning approaches.

It has been shown that combining data-driven machine learning approaches with an available imperfect knowledge-based model can significantly increase the time-horizon (valid prediction time) over which the system can be reliably predicted when compared to the case where the methods are applied individually.^{17–22} These hybrid approaches are appropriate when suitable (but imperfect) knowledge-based and data-based models are available, but neither fulfill the desired prediction task to a satisfactory level on their own. However, in many real situations, knowledge-based models of sufficient quality do not exist. For such cases, we propose a new approach, which is purely data-driven, called data-informed reservoir computing (DI-RC). The basic idea of DI-RC is the hybridization of a data-based machine learning technique, in this case reservoir computing, with a data-driven approach to find a model representation of a dynamical system. This way the complementary capabilities of two different machine learning based agents can be combined and used on the same data, yielding a better performance than each solitary agents. For the model representation agent, we use sparse identification of nonlinear dynamical systems (SINDy).^{23} SINDy has been demonstrated to find equations of complex dynamical systems, such as the Lorenz system, the Kuramoto–Sivashinsky equation, or the sheer flow around a cylinder, but it shows decreased performances for very short training data sets or coarse discretization/measurements of the system.^{23}

For the reservoir computing part of our DI-RC, we consider two configurations: a delay-based reservoir consisting of a single nonlinear dynamical node subject to a delayed feedback signal^{24–27} and a sparsely connected network of $tanh$ functions, for example, as used in Ref. 28,29. We show that the DI-RC approach can improve the valid prediction time of chaotic systems, stabilize the prediction, and reduce the cost of time-consuming hyperparameter optimization. We see benefits to our approach especially for small reservoirs or for short training lengths. We also find through the analysis of the power spectrum that DI-RC captures the characteristics of the true attractor with higher accuracy than the two approaches used individually.

The model-free prediction of non-linear dynamical systems is an active field of research and has been realized using recurrent neural networks,^{30} echo-state networks,^{28,31–33} and nonlinear vector regression methods.^{34–36} For echo-state networks, the approximation property has been shown analytically.^{37,38} Nevertheless, in all these cases, a tedious optimization procedure is necessary to prepare the network or the nonlinear feature vector, and it has been shown that the best performing echo-state network does a direct polynomial regression in an implicit and computationally costly manner.^{33} With our approach, we want to present a purely data-based method that shows very good performance without the need for a time-consuming parameter optimization. A diagrammatic representation of our approach is shown in Fig. 1.

## II. DATA-INFORMED PREDICTION METHODS

### A. Reservoir computing

Reservoir computing is an architecture that maps an input into high-dimensional latent space via a fixed dynamical system (the reservoir), followed by mapping from the latent space to the output. The key feature of reservoir computing is that training consists solely of adjusting parameters of the output mapping layer. There are several ways of implementing this architecture. We focus on an implementation via a nonlinear dynamical node under the influence of a delayed-feedback signal to illustrate the DI-RC approach. However, we emphasize that the DI-RC approach is not dependent on the particular RC implementation and show results for an echo-state network^{13} in the supplementary material.

In both RC implementations, the training phase consists of feeding an input training sequence $u(j)$ of length $ N t r$ into the reservoir and collecting a number $ N v$ of responses to each input step $j$ into a state matrix $S$. The dimension of the state matrix is $ N t r\xd7 N v$, with entries $ S j , k$ for $k\u2208[0, N v)$.

^{39}For all simulations, we chose $\lambda = 10 \u2212 6$. $Y$ is the target vector. $ W o u t$ and $Y$ are matrices if the target consists of more than one dimensions and vectors if the target is a scalar. $S$ is the state matrix that contains the reservoir states $r(t)$,

For time-series prediction tasks, the target is set to $y(t)=u(t+\Delta t)$, and the reservoir computer is trained to make a one-step ahead prediction $ r T(t)\u22c5 W o u t\u2248u(t+\Delta t)$.

### B. Delay-based reservoir computing

The delay-based reservoir computing approach was introduced by Appeltant *et al.*^{24} and first realized for different photonic systems in Refs. 25–27. Here, the reservoir is formed by a single nonlinear dynamical node under the influence of a delayed-feedback signal, and the high-dimensional mapping from the latent space to the output is achieved via time-multiplexing. Reservoir computing setups of this type have shown promising prediction performances with simple hardware implementations^{40–44} (see also a recent review^{45}), while theoretical analysis has provided deep insight into the configuration and optimization of delay-based reservoir computing.^{46–54} A dynamical system under the influence of delayed feedback and reservoir inputs can be written as $ r \u02d9(t)=F [ r ( t ) , r ( t \u2212 \tau ) , I ( t ) ]$, where $r(t)$ is the system state, $r(t\u2212\tau )$ is the system state delayed by the time $\tau $, $I(t)$ is a time-dependent input, and $F$ is a linear or nonlinear transformation. The input data are applied piece-wise constant over an input time interval $T$ at times $t\u2208[ t m \u2212 1, t m)$, with $ t m=mT$, $m=1,\u2026, N t r$.

To get a high-dimensional response, a mask function $ T i n$ is used for time-multiplexing. It is applied at every input time interval $T$ and is $T$-periodic and piece-wise constant on $ N v$ intervals of length $\theta $ ( $\theta =T/ N v$). The values of $ T i n$ in this paper are i.i.d. from a uniform distribution $[0,1]$, but also, other masks have been studied.^{55,56}

The reservoir states $r(t)$ are measured at the end of each of these intervals and are called virtual nodes. These virtual nodes take the role of the reservoir nodes from the classical reservoir computer approach. The timescale $\theta $ is a free parameter and can be chosen according to the specific task and reservoir setup.^{24,57} The piece-wise constant values of $ T i n$ correspond to input weights in the classic approach. The input $I(t)$ to the dynamical system is given by $I(t)= T i n(t)u(t)$, where $ T i n(t)$ is now a single scalar value taken from the input weights $ T i n$ at the time $t$. The responses $r(t)$ of the reservoir are, thus, distributed over the time interval time $T$. The $n$th node response to the $m$th input is given at $r(mT+n\theta )$, where $t=0$ models the time of the systems state $r(0)$ at the start of the reservoir setup training.

Similar to the classical reservoir computer, training consists of linearly regressing the reservoir states against the target output as described in Eq. (2). Prediction is then performed in the closed-loop configuration. While deploying this closed-loop configuration, we add noise $\xi $ to the inputs of the reservoir i.i.d. drawn from a normal distribution with zero mean and a standard deviation of 1; i.e., $\xi \u221dN ( 0 , 1 )$. We scale the noise with a noise strength parameter $\epsilon $. Figure 2(a) depicts a flow chart of the whole process.

### C. Sparse identification of nonlinear dynamics (SINDy)

SINDy, introduced by Kutz and co-workers,^{23} aims to find a dynamical model for a system via either a lasso regression or an iteratively thresholded ridge regression on some given measurements. Note that all implementations of SINDy in this paper used the open-source Python library pySINDy.^{59}

### D. Data-informed reservoir computing (DI-RC)

## III. BENCHMARK TASKS AND METRICS

The predictive performance of the reservoir computer, SINDy, and our DI-RC approach is tested on the following dynamical systems with complex dynamics:

### A. Lorenz 63

^{60}is described by the three coordinates:

^{61}A coarse grained sampling of the chaotic attractor can be seen in Fig. 4. The dynamics on the attractor is known to have a Lyapunov exponent of $ \Lambda L\u22480.91$.

### B. Kuramoto–Sivashinsky equation

^{1,2}is a partial differential equation that can exhibit chaotic behavior and is described by

^{62}Our integration time step to produce the input data $u(t)$ is also our sampling time step ( $\Delta t s i m=\Delta t s a m p$). We calculate the Kuramoto–Sivashinsky’s largest Lyapunov exponent to be $ \Lambda K\u22480.079$ for the homogeneous system.

Throughout the paper, all SINDy derived models for the Lorenz 63 and Kuramoto–Sivashinsky are numerically integrated via the same method and with the same integration time step. All the parameters used for the numeric simulations are summarized in Table I.

Parameter . | Description . | Value . |
---|---|---|

N_{V} | Reservoir size of delay-based RC | Varied |

N_{tr} | Training length | Varied |

T | Clock cycle | N_{V}θ |

θ | Time distance between two virtual nodes | 1/0.5 |

λ | Regularization parameter | 10^{−6} |

α | Regularization parameter for SINDy | 10^{−6} |

p | Local linear response of delay-based RC | 0.05/0.1 |

η | Input strength of delay-based RC | 0.01/0.1 |

ω | Rotation of delay-based RC | 0 |

γ | Nonlinearity of delay-based RC | −0.1 |

κ | Feedback rate of delay-based RC | 0.1 |

ϕ | Feedback phase of delay-based RC | 0 |

τ | Delay time of delay-based RC | 1.41 ⋅ T |

Δt_{samp} | Sampling step | 0.1/0.25 |

Λ_{L} | Lyapunov exponent Lorenz 63 | 0.91 |

Λ_{K} | Lyapunov exponent Kuramoto– Sivashinsky | 0.079 |

t_{V} | Valid prediction time | Measured |

δ_{M} | Closed-loop map error | Measured |

Parameter . | Description . | Value . |
---|---|---|

N_{V} | Reservoir size of delay-based RC | Varied |

N_{tr} | Training length | Varied |

T | Clock cycle | N_{V}θ |

θ | Time distance between two virtual nodes | 1/0.5 |

λ | Regularization parameter | 10^{−6} |

α | Regularization parameter for SINDy | 10^{−6} |

p | Local linear response of delay-based RC | 0.05/0.1 |

η | Input strength of delay-based RC | 0.01/0.1 |

ω | Rotation of delay-based RC | 0 |

γ | Nonlinearity of delay-based RC | −0.1 |

κ | Feedback rate of delay-based RC | 0.1 |

ϕ | Feedback phase of delay-based RC | 0 |

τ | Delay time of delay-based RC | 1.41 ⋅ T |

Δt_{samp} | Sampling step | 0.1/0.25 |

Λ_{L} | Lyapunov exponent Lorenz 63 | 0.91 |

Λ_{K} | Lyapunov exponent Kuramoto– Sivashinsky | 0.079 |

t_{V} | Valid prediction time | Measured |

δ_{M} | Closed-loop map error | Measured |

To quantify the quality of the time-series prediction, we use two different metrics, the closed-loop map error and the valid prediction time, which are explained in the following.

### C. The closed-loop map error

### D. Valid prediction time

^{63}

In previous work, we used the normalized root mean square error (NRMSE) as a benchmark metric for evaluating one-step-ahead predictions in time-series forecasting studies (open loop configuration), e.g., in.^{50,57,64} The NRMSE is a good measure for quantifying the regression quality of one-step ahead prediction (open loop) approaches. However, in closed-loop setups, the NRMSE is not a good measure as it will strongly depend on the evaluation time interval and the specific dynamics of the trajectories. We, thus, focus on the prediction time $ t V$ and map error measurements for our closed-loop setup (please see Fig. S1 in the supplementary material), where a comparison between the open loop NRMSE of one-step ahead prediction is compared to the valid prediction time in a closed-loop operation using the same reservoir).

## IV. RESULTS

We now present our results for the hybrid DI-RC approach and compare it to the performance of the two solitary components. For the reservoir computer, we use a Stuart–Landau oscillator with delayed-feedback (see Sec. II), while results for an echo-state reservoir are presented in the supplementary material. For the Lorenz system, we set the SINDy model feature basis to all monomials up to order two, and for the Kuramoto–Sivashinsky equation, we use all combinations of two functions with up to order fourth derivatives and a constant bias. The hyperparameters of the reservoir were chosen from a region of the parameter space where adequate performance can be expected, i.e., such that the system without an input is in a steady state, the mask step length is of the order of the system response time, and the feedback delay time is not resonant with the input clocktime $T$. The pump parameter $p$ was chosen such that the system was far from criticality, where stronger damping of the dominant eigenvalues means that the system has shorter memory.^{50} However, apart from fulfilling these prerequisites, the reservoir hyperparameters were not optimized.

### A. Lorenz 63

Figure 5 shows the results for the Lorenz 63 task computed for the RC (cyan), the SINDy (black), and DI-RC (red) approach. All metrics were calculated for 1000 different initial conditions (IC) of the true Lorenz system and the randomly chosen mask function $ T i n$. The symbols in Fig. 5 depict the median of these 1000 simulations, while the errorbars give the interval of $25%$ quantile to $75%$ quantile. Please note that the errorbars are quite large due to the large number of ICs that we use if, e.g., compared to the results in Ref. 21, where only 20 different ICs are averaged. The statistical variations with the IC can be seen in Fig. S1 of the supplementary material. A much smaller deviation is found when different masks are used for a fixed IC. Figure 5(a) shows a scan over the reservoir size $ N v$ for large training data sets, while Fig. 5(b) shows the performance vs the training data length for a small reservoir of fixed size $ N v=100$. Looking at the valid prediction time [upper row in Fig. 5(a)], we can see that the SINDy model’s short-term prediction performance is limited to less than one Lyapunov time. The RC, instead, shows valid prediction times of more than three Lyapunov times (if the training length is sufficiently long and the reservoir size is increased to around 1000 nodes). Combining both schemes in the hybrid DI-RC shows improved performance compared with the individual approaches, particularly when the reservoir is comparably small with around 100 nodes. Both the RC approach and DI-RC do tend to poor performances if the training data set is too small, probably arising from the fact that the training set has not seen enough examples from the Lorenz attractor. However, both approaches make better short-term predictions than SINDy.

Similar trends can be found for the long-term behavior, which is measured via the map error (the lower row in Fig. 5). The SINDy model uniformly exhibits a high map error of approximately $0.2$ for all training lengths tested. The RC approach achieves better performance depending on the training length and reservoir size (the lowest map error is on the order of $ 10 \u2212 2$ for reservoir sizes approximately greater than 300 nodes). On the other hand, DI-RC achieves a map error of $ 10 \u2212 2$ more uniformly across a range of reservoir sizes (even 100 nodes) and training lengths (at around 200 training time units).

^{21,65}and is still very efficient in improving the DI-RC performance. Table II exemplifies that by showing the median valid prediction time $ t V M$ for DI-RC and an imperfect model with an $\epsilon =0.05$ as used in Ref. 21 for two different sampling steps $ \Delta s a m p$, respectively. Only small reductions of $ t V M$ are found for our DI-RC for larger sampling steps even though the deviation of the SINDy model representation [Eqs. (17)–(19)] is with $\epsilon >0.5$ much larger than the $\epsilon =0.05$ deviation from the true Lorenz system in the imperfect model approach.

DI-RC . | DI-RC . | Imp. model^{21}
. | Imp. model^{21}
. |
---|---|---|---|

Δ_{samp} = 0.1
. | Δ_{samp} = 0.01
. | Δ_{samp} = 0.1
. | Δ_{samp} = 0.01
. |

$\epsilon >0.5$ . | $\epsilon <0.1$ . | $\epsilon =0.05$ . | $\epsilon =0.05$ . |

$ t V M=3.1$ | $ t V M=4.05$ | $ t V M=5.1$ | $ t V M=5.35$ |

(0.6 − 7.9) | (1.1 − 9.3) | (1.4 − 8.7) | (1.86 − 10.45) |

DI-RC . | DI-RC . | Imp. model^{21}
. | Imp. model^{21}
. |
---|---|---|---|

Δ_{samp} = 0.1
. | Δ_{samp} = 0.01
. | Δ_{samp} = 0.1
. | Δ_{samp} = 0.01
. |

$\epsilon >0.5$ . | $\epsilon <0.1$ . | $\epsilon =0.05$ . | $\epsilon =0.05$ . |

$ t V M=3.1$ | $ t V M=4.05$ | $ t V M=5.1$ | $ t V M=5.35$ |

(0.6 − 7.9) | (1.1 − 9.3) | (1.4 − 8.7) | (1.86 − 10.45) |

To emphasize the benefits of using DI-RC, a normalized power spectral density (PSD) and stability analysis was performed for all three approaches for one example parameter setup. The results for the Lorenz 63’s z-variable are plotted in Fig. 6. The true system is shown in blue, the SINDy approach in dotted black, the RC approach in dashed cyan, and DI-RC in dashed red. The results indicate that SINDy finds a shifted PSD, missing the main frequency of the Lorenz 63 attractor by underestimating it. It also fails to reproduce the higher frequencies between $1.5$ and $4$. The RC prediction matches the correct frequencies but overestimates the valleys around the peaks. DI-RC on the other hand fits the main frequency, while it also manages to reproduce the additions of higher frequencies. From the PSD, we can, therefore, learn that DI-RC captures the characteristics of the data set attractor better than the other two approaches.

The long-term stability of the predictions is visualized in the inset of Fig. 6. On the x axis, we have plotted the training length in system time units, while the y axis shows the percentage of unstable predictions over a testing time period of 5000 time units averaged over 1000 different initial conditions. By “unstable,” we mean any prediction trajectory, which diverged to large unphysical values. The cyan line again depicts the RC, for which we can see highly unstable behavior particularly for short training lengths. As the training length is increased, the fraction of unstable trajectories decreases. The SINDy model and DI-RC on the other hand produce stable predictions for all training lengths. We attribute this to SINDy’s small feature space (in this case made up of all state variable monomials of up to order 2). These results indicate that adding a data-driven approach like SINDy can help to improve predictions when the training data sets are small.

The results of testing the models on the Lorenz 63 system suggest that DI-RC outperforms both the SINDy model and the RC. There is an especially marked improvement in the case where only a short length of training data is available and/or a small RC is used. This could yield reduced computation costs from smaller reservoirs. We observe a longer valid prediction time as well as a higher fraction of stable predictions when using DI-RC.

### B. Kuramoto–Sivashinsky results

We start by analyzing the spatially homogeneous Kuramoto–Sivashinsky equation ( $\mu =0$). An example of the spatiotemporal dynamics and the respective predictions of our data-driven models is shown in the four panels of Fig. 7. The three lower panels give the difference to this true time series for the RC, the SINDy, and DI-RC in a decreasing order. As can be seen by the increasing amount of green colors in these panels, DI-RC improves on the prediction performance with about three Lyapunov times of valid prediction time, i.e., $ t V\u22483$.

As we are running the three prediction schemes in an autonomous setup, it is insightful to look at the PSD to answer the question about how good they emulate the chaotic dynamics of the Kuramoto–Sivashinsky system. The results are shown in Fig. 9. It is obvious that the SINDy model (dotted black curve) behaves differently from the true system (PSD in a solid blue curve). The SINDy model does not capture the low-frequency behavior and instead emphasizes a frequency range, which is not prominent in the real system. The solitary RC (cyan) tends to do a good job in getting the qualitative behavior but overestimates the higher frequencies. DI-RC (red), on the other hand, matches the PSD very closely. The inset in Fig. 9 depicts the fraction of trajectories that are unstable. The solitary RC (cyan) tends to be unstable for most of the training lengths tested, while the solitary SINDy (black) approach has some unstable predictions for short training lengths, which become stable as the size of the training data set is increased. DI-RC (red) appears to produce stable predictions for all training lengths tested.

^{36}have shown that such basis imperfections can give rise to a catch-22 problem in the nonlinear vector regression approach.

^{66}To give an answer to that question, we will, therefore, consider the spatially inhomogeneous Kuramoto–Sivashinsky equation and restrict the feature space of the SINDy model such that it cannot fully capture the spatial inhomogeneity. An example model representation for $\mu =\u22120.35$ made via a SINDy model trained on a training interval of 1000 time units for the inhomogeneous Kuramoto–Sivashinsky case is given by

Our simulations generally suggest that adding inhomogeneity results in decreasing performance of SINDy as the SINDy feature basis is not able to span the dynamical system. With decreasing SINDy performance, also, the benefit of using DI-RC is lost. We visualize this in Fig. 10 where the valid prediction times are plotted as a function of the reservoir size for increasing strength of the inhomogeneity $\mu $ (from top to bottom). The greater $\mu $, the less beneficial SINDy performance and with it DI-RC. Our results indicate that DI-RC should only be used if the solitary approaches can contribute to the performance on their own.

## V. CONCLUSION

In this paper, we have shown a complete data-driven hybrid approach (DI-RC), which combines a class of recurrent neural network called reservoir computing with a sparse identification of nonlinear dynamical systems (SINDy). Our results indicate that the addition of SINDy can stabilize the reservoir computing predictions, while also enabling higher accuracy in short- and long-term predictions when used in conjunction. The advantages of this setup are threefold. First, with this method, a data-driven extension of the hybrid approach is achieved that automates the time-consuming derivation of imperfect models. Second, a hardware implementation of RC can be improved upon via the simple addition of a SINDy model, stabilizing its prediction and reducing the computational cost. Third, a costly optimization of RC can be mitigated, which is computationally expensive and, in the case of hardware implementations, sometimes impossible.

It is also worth noting that for our DI-RC, the specific methods SINDy and RC were used as examples and could be replaced with other data-driven approaches, such as LANDO^{67} for model discovery and some form of recurrent neural networks substituting the RC.

## SUPPLEMENTARY MATERIAL

We present data about the statistical variations of the valid prediction times of our DI-RC simulations when different masks or different initial conditions are used. Furthermore, our DI-RC results for the echo-state reservoir computer are presented.

## ACKNOWLEDGMENTS

The authors would like to thank Michelle Girvan, Andrew Pomerance, Brian Hunt, and Edward Ott for fruitful discussions. This study was funded by the “Deutsche Forschungsgemeinschaft” (DFG) in the framework of SFB910 (Project B9) and Project No. 411803875.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

**Felix Köster:** Conceptualization (equal); Data curation (lead); Investigation (equal); Methodology (equal); Software (equal); Validation (equal); Visualization (equal); Writing – original draft (lead); Writing – review & editing (equal). **Dhruvit Patel:** Methodology (supporting); Software (supporting); Visualization (supporting). **Alexander Wikner:** Methodology (supporting); Software (supporting); Visualization (supporting). **Lina Jaurigue:** Investigation (equal); Methodology (equal); Validation (equal); Writing – review & editing (equal). **Kathy Lüdge:** Conceptualization (lead); Formal analysis (equal); Funding acquisition (lead); Methodology (lead); Project administration (lead); Resources (lead); Software (supporting); Supervision (lead); Validation (equal); Visualization (equal); Writing – original draft (supporting); Writing – review & editing (lead).

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.

## REFERENCES

*Time Series Analysis: Forecasting and Control*

*Photonic Reservoir Computing, Optical Recurrent Neural Networks*, edited by D. Brunner, M. C. Soriano, and G. Van der Sande (De Gruyter, Berlin, 2019).

*Solutions of Ill-Posed Problems*, Scripta Series in Mathematics (Winston; distributed solely by Halsted Press, Washington, DC, New York, 1977).