Hybrid reservoir computing combines purely data-driven machine learning predictions with a physical model to improve the forecasting of complex systems. In this study, we investigate in detail the predictive capabilities of three different architectures for hybrid reservoir computing: the input hybrid (IH), output hybrid (OH), and full hybrid (FH), which combines IH and OH. By using nine different three-dimensional chaotic model systems and the high-dimensional spatiotemporal chaotic Kuramoto–Sivashinsky system, we demonstrate that all hybrid reservoir computing approaches significantly improve the prediction results, provided that the model is sufficiently accurate. For accurate models, we find that the OH and FH results are equivalent and significantly outperform the IH results, especially for smaller reservoir sizes. For totally inaccurate models, the predictive capabilities of IH and FH may decrease drastically, while the OH architecture remains as accurate as the purely data-driven results. Furthermore, OH allows for the separation of the reservoir and the model contributions to the output predictions. This enables an interpretation of the roles played by the data-driven and model-based elements in output hybrid reservoir computing, resulting in higher explainability of the prediction results. Overall, our findings suggest that the OH approach is the most favorable architecture for hybrid reservoir computing, when taking accuracy, interpretability, robustness to model error, and simplicity into account.

The forecasting of physical systems has traditionally relied on a system of equations derived to model the process to be predicted. This approach reaches its limitations when the underlying processes are nonlinear, particularly in situations where only imperfect models are at hand. On the other hand, the application of purely data-driven artificial intelligence (AI) based forecasting methods has led to great progress in forecasting complex dynamical systems. In recent studies using reservoir computing as the data-driven machine learning element, it was shown that a hybrid ansatz combining both approaches improves the prediction results. Here, we compare different ways of combining the model-based and data-driven elements within the hybrid method. We find that the prediction quality significantly depends on how the data-driven and model-based elements are combined with each other. Our experimental findings show that the most accurate, reliable with respect to inaccuracies in the model, and interpretable results are obtained when the hybrid architecture combines the data-driven and model-based elements solely at the output layer.

## I. INTRODUCTION

The application of AI-based forecasting methods has led to great progress in the prediction of complex systems.^{1,2} Among the AI methods being used so far, reservoir computing (RC) turns out to be a highly promising approach as it combines superior prediction results with little CPU-needs for training,^{3–5} as well as its ability to be realized by physical devices leading to novel, energy-efficient unconventional computing methods.^{6} The software-based realization of RC, called an echo state network,^{7} has proven to be a valuable machine learning technique, overcoming the difficulties connected with training conventional recurrent neural networks.

The reservoir computing approach in its conventional form is purely data-driven, which is a useful approach if there is no knowledge about the governing equations producing the data. In many cases though, when machine learning is applied to the sciences, some knowledge about the underlying system is available, and its inclusion into the data-driven machine learning methods can be beneficial in improving the prediction accuracy as well as leading to greater interpretability. These combinations of machine learning and physical knowledge are attributed to the important emerging field called *theory-guided data science*, which “seeks to exploit the promise of data science without ignoring the treasure of knowledge accumulated in scientific principles.”^{8} There are several ways to combine physical knowledge-based models with machine learning techniques, with the general aim to create highly accurate, physically consistent models while being resource efficient (see Willard *et al.*^{9} for a review of different possibilities and applications). For example, a common way to include physical knowledge into machine learning is to add a physical loss to the overall loss function, in order to introduce soft constraints for physical consistency. In contrast to this physics-informed loss function, where the physical knowledge is only introduced during the training stage, another approach is to employ hybrid models, where the knowledge-based model becomes an integral part of the machine learning architecture that operates during training and inference.

One example area of science that benefits greatly from hybrid models is the field of extreme events in complex dynamical systems, which studies phenomena such as epidemics, tsunamis, droughts, and more.^{10} Due to the high complexity of these systems, which consist of a multitude of different interacting parts, a detailed model-based description of the full dynamics is often not feasible. Overcoming these issues is usually achieved by employing approximate reduced-order models or using novel data-driven machine learning methods. However, both approaches have significant limitations, as extreme events are often inherently nonlinear phenomena that occur only rarely within the available data. Combining reduced-order models with data-driven machine learning approaches often leads to improved predictions and generalizability.^{10}

In the field of reservoir computing, different ways to include physical knowledge in to the RC framework have been explored. Doan *et al.*^{11,12} introduced the physics-informed echo state network (PI-ESN), which was later extended by Racca and Magri.^{13} By incorporating a physics-based loss term, the PI-ESN utilizes the time series’ true physical equations in order to fine-tune the RC output matrix $ W out$ in a second training phase, following the usual RC training phase using only data. Consequently, the PI-ESN approach falls into the category of loss-based physics-ML methods. Their exemplary predictions on the Lorenz-63 system and chaotic flows related to the Navier–Stokes equation demonstrated improved robustness to noise and overall more accurate predictions compared to data-driven RC.

The object of the studies in this work is the hybrid RC approach introduced by Pathak *et al.*,^{14} where, in contrast to the PI-ESN, physical knowledge becomes an integral part of the RC architecture. The authors demonstrated significant improvements in forecasting the three-dimensional chaotic Lorenz-63 system and high-dimensional spatiotemporal chaotic Kuramoto–Sivashinsky system compared to the usual data-driven RC approach. The physical knowledge of the system is given as an imperfect knowledge-based model (KBM), $K(\u22c5)$, which is a function of the current state of the time series $u(t)$ that outputs an approximation of the next state: $K(u(t))\u2248u(t+\Delta t)$. In the original implementation, the KBM is inserted at two independent points within the RC architecture: first, as an additional input (which we refer to as input hybrid), and second, directly at the RC output (which we refer to as output hybrid). The simultaneous usage of input and output hybrid, as in the original implementation, in this work is referred to as full hybrid. In the subsequent works that employed the hybrid RC approach, all three combinations (input, output, and full hybrid) were utilized, yet a comparative study between the three options is still missing.

Wikner *et al.*^{15} implemented a parallelized hybrid reservoir computing approach, combining the output hybrid method with local states, in order to forecast large spatiotemporal chaotic systems in a scalable manner. This method is then applied in Arcomano *et al.*^{16} to atmospheric modeling, using a low-resolution atmospheric global circulation model as the imperfect knowledge-based model. In Wikner *et al.*,^{17} data assimilation is used to train the output hybrid model for the task of predicting chaotic dynamical systems. Racca and Magri^{18} also used the output hybrid approach alongside a normal data-driven RC approach to investigate robust optimization and validation techniques for learning chaotic dynamics. In Köster *et al.*,^{19} a delay-based RC was combined with a model element discovered from data, employing the output hybrid architecture. Shahi *et al.*^{20} compared different recurrent neural network approaches, including the input hybrid RC approach, for the prediction of arrhythmic cardiac action potentials. Huhn and Magri^{21,22} used the full hybrid approach to learn and optimize ergodic averages of chaotic acoustics, employing a reduced-order model as the knowledge-based model. In most cases, the hybrid approach was shown to be superior in terms of prediction quality, compared to the usual data-driven RC approach.

In this paper, the three different hybrid RC approaches are systematically compared for the task of forecasting chaotic dynamical systems. Here, we use nine different three-dimensional chaotic systems and the high-dimensional spatiotemporal chaotic Kuramoto–Sivashinsky system as exemplary systems. As the effectiveness of the hybrid approach is strongly affected by the accuracy of the knowledge-based model, different knowledge-based models are considered, whose correspondence with the true model varies strongly. Our overall results suggest that the output hybrid architecture achieves the optimal combination of data-driven and model-based elements in hybrid reservoir computing in terms of accuracy, robustness to model error, interpretability, and simplicity.

The remainder of this paper is organized as follows. Section II introduces the forecasting methods, namely, data-driven and hybrid reservoir computing. Section III describes the implementation details, including the chaotic dynamical systems used as time series data, the different knowledge-based models, and the methods used for model evaluation. The results are presented and discussed in Sec. IV, and the work of this study is concluded in Sec. V.

## II. FORECASTING METHODS

### A. Reservoir computing

This section introduces a slightly generalized RC architecture, which allows for a joint description of the usual data-driven RC approach and of the hybrid RC approach developed by Pathak *et al.*^{14} Depending on whether the hybrid methods are employed or not, the following functions $ f inp$ and $ f out$ have different forms, but the training and prediction scheme remains the same.

The state of the true time series, used as training data, will be denoted as a $ u dim$-dimensional vector $u(t)$. From a bird’s eye view, during the training phase, the reservoir—a high-dimensional and nonlinear dynamical system—is fed with the true time series $u(t)$, one discrete time step $\Delta t$ at a time. By encoding the true time series $u(t)$ nonlinearly in the high-dimensional reservoir state $r(t)$, the reservoir’s output $ y r(t)$ is fitted using ridge regression (RR) to approximate the time series next state: $ y r(t)\u2248u(t+\Delta t)$. During the prediction phase, the reservoir’s output $ y r(t)$ is used as its next input, allowing the reservoir to run autonomously in a loop for an arbitrarily long number of prediction steps.

#### 1. Architecture

In this section, the RC architecture defining the flow of quantities from the input to the output is described. The architecture relies on the usual echo state network implementation.^{7} The time series input is denoted as $ u ~(t)$ referring to the true time series $u(t)$ during synchronization and training, as well as the previous prediction $ y r(t\u2212\Delta t)$ during the prediction phase. The training and prediction phases will be described in detail in Secs. II A 3 and II A 4.

In the usual data-driven RC approach, the reservoir input is simply given directly as the time series input $x(t)= u ~(t)$. In the input hybrid approach, as will be described in Sec. II B 2, $ f inp(\u22c5)$ will additionally contain the knowledge-based model.

$ x ^(t)$ is connected to the reservoir through the $ r dim\xd7 x dim$ input matrix $ W in$ and the recurrent connections between consecutive reservoir states $r(t)$ are introduced through the $ r dim\xd7 r dim$ adjacency matrix $ A$, also known as the reservoir network. The $ r dim$-dimensional node bias vector $b$ is introduced to break the symmetry within the reservoir in order to suppress the appearance of mirror attractors.^{23} The hyperbolic tangent $tanh\u2061(\u22c5)$ is used as the node activation function, introducing the important nonlinearities by acting element-wise on each dimension of its argument. Following the fundamental idea of RC, the internal parameters $ A$, $ W in$, and $b$ are static quantities initialized once before the training and then kept fixed. Their implementation details are discussed in Sec. II A 2. The purpose of Eq. (3) is to encode the time series inputs $ u ~(t), u ~(t\u2212\Delta t),\u2026$ nonlinearly into the high-dimensional reservoir state $r(t)$ so that the desired output can be learned by simple ridge regression.

In the simplest data-driven RC approach, the RR input is simply given by the reservoir states directly $h(t)=r(t)$. The more general Eq. (4) allows for the inclusion of the knowledge-based model within the output hybrid approach, as will be described in Sec. II B 3.

The output matrix $ W out$ and intercept $ w out$ are simply obtained by ridge regression, as will be described in Sec. II A 3. The reservoir’s output $ y r(t)$ aims to predict the next step in the time series.

#### 2. Random initialization of **W**_{in}, **A**, and *b*

*b*

Before beginning the training phase, the static quantities $ W in$, $ A$, and $b$ are randomly initialized and then kept fixed.

The $ r dim\xd7 x dim$ input matrix $ W in$ connects the standardized reservoir input $ x ^(t)$ to the reservoir states $r(t)$. The elements of $ W in$ are populated in a sparse way,^{24} where each reservoir node $ r i$ is connected to only one randomly chosen reservoir input variable $ x j$. Consequently, each $ x dim$-dimensional row in $ W in$ contains exactly one non-zero element, whose value is randomly drawn from a uniform distribution within the range $[\u2212\sigma ,\sigma ]$. The hyperparameter $\sigma $ represents the input strength.

The reservoir network $ A$ introduces the recurrent connections between consecutive reservoir states $r(t)$ and $r(t\u2212\Delta t)$. Following the usual implementation,^{24} it is realized by a weighted Erdös-Rényi network, which is generated as follows. Each possible undirected edge between reservoir node $i$ and $j$ with $i,j\u2208[1,\u2026, r dim]$, $i\u2260j$, is generated with a probability $p$, resulting in an average node degree of $d=p( r dim\u22121)$. For each of the created undirected edges, both of the directed edges constituting the undirected edge are weighted independently with a random number drawn from a uniform distribution within the range $[\u22121,1]$. Finally, in order to control the overall scaling applied by the reservoir matrix, each of its elements is uniformly rescaled to a specific spectral radius $\rho >0$, which is usually chosen to be smaller than one to ensure that the influence of past inputs fades away with time. This property is linked to the *echo state property*, which has encountered some controversy within the reservoir computing literature.^{25}

Each element of the $ r dim$-dimensional node bias vector $b$ is randomly drawn from the uniform distribution $[\u2212 \sigma b, \sigma b]$, where the $ \sigma b$ refers to the node bias scale.

#### 3. Training

For the training phase, knowledge of a $ N TS+ N T$ long training sequence of the true time series $u(t)$ is assumed. During the training phase, the time series input of the RC architecture, as defined in Sec. II A 1, is taken to be the actual time series data: $ u ~(t)=u(t)$. The training sequence is split into a $ N TS$ long synchronization sequence $ U sync=[u( t 1),\u2026,u( t N TS)]$, a subsequent $ N T\u22121$ long input training sequence $ U T=[u( t N TS + 1),\u2026,u( t N TS + N T \u2212 1)]$, and another $ N T\u22121$ long output training sequence shifted one time step into the future $ Y=[u( t N TS + 2),\u2026,u( t N TS + N T)]$.

Once the standard-scaler function is fitted, the synchronization and input training sequences $ U sync$ and $ U T$ are transformed to create the corresponding standardized reservoir input $ x ^(t)$ sequences, which are then used to drive the reservoir via Eq. (3). In order to do so, the initial state of the reservoir is initialized to zero, $r( t 0)= [ 0 , \u2026 , 0 ] T$, and then synchronized to the time series data after being driven with $ U sync$, arriving at the synchronized reservoir state $r( t N TS)$. The synchronization aims to remove any dependency on the arbitrarily chosen initial condition $r( t 0)$. The subsequent reservoir states generated by driving with the training input data $ U T$ are transformed to the RR input variables $h(t)$ via Eq. (4) and collected into a $ h dim\xd7( N T\u22121)$ matrix $ H=[h( t N TS + 1),\u2026,h( t N TS + N T \u2212 1)]$.

^{24}

$ y \xaf$ and $ h \xaf$ represent the dimension-wise mean across the datasets $ Y$ and $ H$, respectively. $ H c$ represents the centered RR input dataset $ H c= H\u2212 h \xaf$. The regularization parameter $\beta >0$ is a crucial hyperparameter in RC which aims to avoid overfitting.

To summarize the training phase, the reservoir’s output Eq. (5) was trained to approximate the next step of the time series, $ y r(t)\u2248u(t+\Delta t)$, when driven with the true time series $\cdots ,u(t\u2212\Delta t),u(t)$ up to time $t$.

#### 4. Prediction

Future predictions are then simply obtained by using the reservoir’s previous prediction as the new time series input: $ u ~(t+\Delta t)= y r(t)$. Consequently, Eqs. (1)–(5) can be run in a loop for an arbitrary number of steps, and the generated outputs $ y r(t)$ represent the time series forecast.

### B. Hybrid reservoir computing

#### 1. Knowledge-based model

*et al.*,

^{14}the knowledge-based model (KBM) takes the form of a function acting on the state of time series and producing an imperfect next-step prediction,

In that implementation, the KBM output dimension $ K dim$ equals the time series dimension $ u dim$. In principle, Eq. (10) can be relaxed so that $K(\u22c5)$ represents any function that contains physical knowledge about the system, which does not necessarily have to be an approximate next-step predictor. In these cases, $ K dim\u2260 u dim$ is also possible. For instance, in Racca and Magri,^{18} a reduced-order KBM is obtained by using a Proper Orthogonal Decomposition, leading to $ K dim< u dim$.

#### 2. Input hybrid

This extends the former input dimension of normal data-driven RC, $ x dim= u dim$, to $ x dim= u dim+ K dim$. Consequently, the input matrix $ W in$ becomes a $ r dim\xd7( u dim+ K dim)$ matrix, where the first $ u dim$ columns connect the pure input $ u ~(t)$ to the reservoir nodes and the last $ K dim$ columns connect the knowledge-based predictor $K( u ~(t))$ to the reservoir nodes, $r(t)$, respectively. The original implementation by Pathak *et al.*^{14} additionally introduces a value $\gamma \u2208[0,1]$ defining the fraction of reservoir nodes that are exclusively connected to the pure input $u(t)$ by controlling the generation of the input matrix $ W in$. Here, the additional KBM inputs and pure inputs are treated equally.

#### 3. Output hybrid

The partial output matrices, $ W res$ and $ W kbm$, are of the shape $ u dim\xd7 r dim$ and $ u dim\xd7 K dim$, respectively. The partial outputs $ y res(t)$ and $ y kbm(t)$ represent the reservoir and KBM contributions to the total output $ y r(t)$, respectively. As the data-driven and model-informed information are only combined in the linear output layer, investigating $ y res(t)$ and $ y kbm(t)$ allows for a straightforward interpretation of their respective influence on the prediction. In Sec. IV, the dimension-wise standard deviations of $ y res(t)$ and $ y kbm(t)$, $std( y res / kbm , j)$, will be presented for selected cases, which are generated when driving the reservoir with the training input.

#### 4. Full hybrid

#### 5. Remarks

The three possible hybrid RC approaches, IH, OH, and FH, alongside the normal data-driven approach are visualized in Fig. 1. The main idea of these hybrid approaches is to make the physical knowledge, $K( u ~(t))$, available within the reservoir’s prediction. The output hybrid framework works in a straightforward manner, with ridge regression finding the optimal weights in $ W out$ that combine the reservoir states $r(t)$ with the KBM output $K( u ~(t))$ to learn the next step in the time series. In the input hybrid approach, the KBM output is fed into the reservoir as additional input, making it less straightforward to interpret.

In contrast to the original implementation,^{14} the hybrid approach defined in this work employs the standard-scaler function $S(\u22c5)$. In the usual data-driven RC approach, the additional standard-scaler layer is not necessarily needed, as the whole training time series dataset can just be rescaled to zero-mean and unit-variance. However, this leads to issues in the hybrid approaches, as the KBMs assume the unscaled original time series $u(t)$ as its input. This shortcoming is overcome by employing the additional standard-scaler layer.

## III. IMPLEMENTATION

### A. Time series data from dynamical systems

Due to the chaotic nature of the considered time series, any forecast is destined to fail after some finite time, as errors grow exponentially in time. This property is captured by a positive largest Lyapunov exponent, $ \lambda max>0$, and corresponding Lyapunov time $1/ \lambda max$, after which the error has increased on average by a factor $e$. The largest Lyapunov exponents of the various systems are calculated from their respective maps [Eq. (15)] using the algorithm described in Sprott,^{26} as outlined in Appendix B.

#### 1. 3D chaotic dynamical systems

The discrete time series $u( t 0),u( t 0+\Delta t),\u2026$ is generated by numerically solving the flow Eq. (16) using the fourth-order Runge-Kutta method (RK4): $ k true= k RK4$. The flow equations, parameter values, time steps $\Delta t$, initial conditions, and largest Lyapunov values of the nine chaotic dynamical systems tested are listed in Appendix A.

#### 2. Spatiotemporal chaotic Kuramoto–Sivashinsky system

^{27,28}is used. The KS system is described by the following partial differential equation (PDE) for the 1D state function $u(x,t)$:

^{29}

^{,}$ k true= k ETDRK4$. The simulation is conducted on a grid of $ u dim=64$ uniformly spaced grid points, employing periodic boundary conditions with a domain size of $L=35$ and a time step of $\Delta t=0.25$. The choice of $ u dim,L$, and $\Delta t$ corresponds to the choice in the original hybrid RC implementation.

^{14}As in Kassam and Trefethen,

^{29}the initial condition for the $ u dim$-dimensional state vector is given as

The largest Lyapunov exponent is calculated as $ \lambda max=0.07489$ using the method outlined in Appendix B, which corresponds well to the value noted in Pathak *et al.*^{14} ( $\u22480.07$).

### B. Tested knowledge-based models

This section introduces the various KBMs $K(\u22c5)$ that will be utilized to test the different hybrid approaches. While the flow- and sine-models are only applied to the nine three-dimensional systems, the $\epsilon $-model is also applied to the spatiotemporal KS system.

#### 1. $\epsilon $-model $ K \epsilon $

The $\epsilon $-model, as was originally used by Pathak *et al.*,^{14} is created by modifying one parameter value of the true dynamical system’s equation (flow for the three-dimensional systems and PDE for the KS system) and then using the same numerical integration method as for the true data (RK4 or EDTRK4), in order to artificially create an imperfect predictor $ K \epsilon (\u22c5)$. The modified parameter is obtained by multiplying one of the true system’s parameters by a factor $(1+\epsilon )$. Consequently, the model error $\epsilon $ controls the level of imperfectness, where $\epsilon =0$ corresponds to the true knowledge-based model: $ K \epsilon = 0= K true$.

The chosen parameters for each of the nine chaotic dynamical systems, as well as for the KS system, which are modified within the $\epsilon $-model, are displayed in Table I.

System . | Eq. . | $\epsilon $-parameter . | System . | Eq. . | $\epsilon $-parameter . |
---|---|---|---|---|---|

Lorenz-63 | (A1) | ρ | Roessler | (A6) | c |

Chen | (A2) | a | Rucklidge | (A7) | κ |

ChuaCircuit | (A3) | α | Thomas | (A8) | b |

DoubleScroll | (A4) | a | Windmi | (A9) | a |

Halvorsen | (A5) | a | KS | (17) | a |

#### 2. Flow-model *K*_{flow}

The flow-model is simply given as the flow of the true three-dimensional dynamical system: $ K flow=F$. The flow-model is incorporated into this study to investigate whether a KBM that does not function as a next-step predictor, $ K flow(u(t))\u2249u(t+\Delta t)$, yet still contains information about the system, can be leveraged by the hybrid approaches to improve the prediction qualities when compared to usual data-driven RC.

#### 3. Sine-model *K*_{sin}

The sine function is applied to every dimension of its argument.

### C. Experimental setup

In this section, the experimental setup used to evaluate the prediction qualities of the data-driven and hybrid RC approaches is described.

#### 1. Forecast horizon measure

^{14}

Here, $\u27e8\u22c5\u27e9$ represents the average over all prediction steps $ N P$ and $\Vert \u22c5\Vert $ denotes the L2-norm over all state space dimension. The forecast horizon in defined as the time $ t v$ when $e(t)> e max$ for the first time. It is measured in terms of the Lyapunov times $1/ \lambda max$ of the dynamical system to be predicted, in order to ensure greater comparability across different systems. The threshold is chosen to be $ e max=0.4$.

#### 2. Ensemble experiment method

As the RC framework contains three sources of randomness, namely, the input matrix $ W in$, the reservoir network $ A$, and the node bias vector $b$, different random realizations of these quantities result in different predictions of various qualities. Therefore, to properly assess the prediction quality of a particular reservoir architecture defined by its hyperparameters, the prediction must be repeated for an ensemble of reservoir realizations. Furthermore, as the prediction quality also depends on the specific time series sections used for training and prediction, the statistical significance of the results is further improved by training and predicting each reservoir realization on multiple sections of the time series. The following describes the implementation details of the employed ensemble experiment.

To begin, an ensemble of $ n res$ random realizations of the quantities $( W in, A,b)$ is generated as described in Sec. II A 2, each representing one RC realization $ i res$. Each RC realization is trained on the same $ n T$ consecutive training sections resulting in $ n res\xd7 n T$ trained reservoirs. Each training-section $ i T$ contains $ N TD+ N TS+ N T$ time steps, which includes $ N TS$ train-synchronization steps and $ N T$ train-fitting steps, as described in Sec. II A 3. To ensure that the different training sections are independent of each other, the first $ N TD$ steps of each section are discarded. Next, each reservoir realization $ i res$ that was trained on the training section $ i T$ predicts on $ n P$ consecutive prediction sections following the training section, resulting in $ n res\xd7 n T\xd7 n P$ predictions. Each prediction section consists of $ N PD+ N PS+ N P$ time steps, where the first $ N PD$ time steps are again discarded for greater independence and to traverse larger portions of the attractor. The following $ N PS$ time steps are used for synchronization, after which the prediction is performed for $ N P$ time steps, as described in Sec. II A 4. Finally, for each $ i res$, $ i T$, and $ i P$, the reservoir prediction is compared to the true time series of that prediction section, and the forecast horizon $ t v( i res, i T, i P)$ is recorded. As a final measure of the prediction qualities, the median value from the set of $ n res\xd7 n T\xd7 n P$ forecast horizons is calculated, with the lower and upper quartiles serving as the range of uncertainty.

The data points for the standard deviations for the reservoir and KBM output contributions in the OH model during the training phase, $std( y res / kbm , j)$, are also obtained as the median value across all $ n res\xd7 n T$ trained reservoirs, with the error bar given by the lower and upper quartiles.

The selection of training and prediction sections is visualized in Fig. 2. The number of realizations, training sections, and prediction sections is chosen to be $ n res= n T=15$ and $ n P=10$, resulting in an effective ensemble of $15\xd715\xd710=2250$ forecast horizon calculations. The number of time steps used per training and prediction section is displayed in Table II.

. | Training . | Prediction . | ||||
---|---|---|---|---|---|---|

System . | N_{TD}
. | N_{TS}
. | N_{T}
. | N_{PD}
. | N_{PS}
. | N_{P}
. |

3D chaotic systems | 1000 | 100 | 2000 | 1000 | 100 | 2000 |

Kuramoto–Sivashinsky | 7000 | 200 | 10 000 | 1000 | 200 | 1500 |

. | Training . | Prediction . | ||||
---|---|---|---|---|---|---|

System . | N_{TD}
. | N_{TS}
. | N_{T}
. | N_{PD}
. | N_{PS}
. | N_{P}
. |

3D chaotic systems | 1000 | 100 | 2000 | 1000 | 100 | 2000 |

Kuramoto–Sivashinsky | 7000 | 200 | 10 000 | 1000 | 200 | 1500 |

#### 3. RC hyperparameter values

The hyperparameter values defining the RC setup (data-driven and hybrid) are chosen in a heuristic manner, corresponding to the usual parameter ranges used in RC, and are listed in Table III. Preliminary experiments have demonstrated that a spectral radius of $\rho \u22480.4$ yields favorable results across most of the tested systems.

## IV. RESULTS

While all KBMs ( $\epsilon $-model, flow-model, sine-model) are tested for the predictions of the nine three-dimensional chaotic systems with a particular focus on the Lorenz-63 system, only the $\epsilon $-model is applied for the prediction of the high-dimensional spatiotemporal Kuramoto–Sivashinsky system.

### A. $\epsilon $-model results for 3D systems and high-dimensional Kuramoto–Sivashinsky system

#### 1. 3D systems

Figure 3 shows the influence of reservoir dimension $ r dim$ and model error $\epsilon $ on the forecast horizons of the Lorenz-63 system.

With the model error fixed at $\epsilon =0.1$, Fig. 3(a) clearly shows that all three hybrid approaches significantly increase the forecast horizons compared to either the reservoir-only, the KBM-only, or the KBM-fitted prediction for all reservoir sizes tested: $ r dim=25\u21922000$. The pure KBM-only forecast provides the overall worst prediction results. Similar to what is typically observed in reservoir-only approaches, the hybrid approaches also extend their forecast horizons as the reservoir size $ r dim$ increases, after which they saturate at one point. Within the hybrid approaches, the predictive results of OH and FH are identical and superior to those of the IH approach. While the predictive powers of the reservoir-only and IH approaches are diminished as the reservoir size decreases, those of the OH and FH approaches remain quite accurate, even for the smallest tested reservoir size of $ r dim=25$: $ t v \lambda max\u22486$. On the other hand, the limited predictive power of the KBM-fitted approach, corresponding to the OH/FH approach with $ r dim=0$, demonstrates that the reasonably accurate predictions of the OH/FH approaches with a small reservoir are not solely attributed to the KBM, but rather require at least a small reservoir size. Since the IH approach employs the KBM only as an additional reservoir input, it requires a sufficiently large reservoir to effectively leverage the KBM.

Figure 3(b) shows the effect of a varying model error $\epsilon $ for a fixed reservoir dimension of $ r dim=500$. As expected, the forecast horizons of all hybrid approaches as well as the KBM-only and KBM-fitted approaches increase as the model error decreases from an extremely inaccurate model, $\epsilon =100$, to an extremely accurate model, $\epsilon = 10 \u2212 4$. This effect is particularly prominent for the KBM approaches, both of which approximately reach the forecast horizon of the reservoir-only prediction for $\epsilon = 10 \u2212 4$ ( $ t v \lambda max\u22487.5$). While this increased accuracy of the KBM is clearly mirrored in the OH and FH approaches, enabling them to reach a high forecast horizon of $ t v \lambda max\u224815$, the IH approach interestingly saturates for $\epsilon <1$, achieving a maximal forecast horizon of $ t v \lambda max\u224810$. This demonstrates that the OH and FH approaches, both of which implement the KBM in the linear output layer, benefit more strongly from an extremely accurate model compared to the IH approach, where the KBM has to traverse through the nonlinear reservoir. At a high model error of $\epsilon =100$, the OH approach is equivalent to the reservoir-only approach, whereas the FH and IH approaches exhibit even slightly lower forecast horizons.

Figure 4 visualizes the reservoir and KBM contributions of the OH approach for predicting the Lorenz-63 system with a smaller model error of $\epsilon =0.1$ and a larger error of $\epsilon =1$. As described in Sec. II B 3, in the OH approach, the reservoir and KBM contributions [ $ y res(t)$ and $ y kbm(t)$] to the overall output $ y r(t)$ can be easily separated since they are simply combined additively according to Eq. (14). With a smaller model error of $\epsilon =0.1$, panel (a), most of the total output $ y r(t)$ (“Both”) consists of the accurate KBM contribution, while the reservoir contribution is very small (green dot below “KBM” and “Both” trajectory). Nevertheless, the spatially small reservoir contribution plays an important role in predictive powers, as observed in Fig. 3, where the OH prediction with $\epsilon =0.1$ and $ r dim=500$ achieves a high forecast horizon of $ t v \lambda max\u224813$, whereas the forecast horizon of the KBM-fitted approach is as low as $ t v \lambda max\u22482.5$. This behavior is consistent across the ensemble of reservoir realizations $ n ens$ and training sections $ n T$, as evident from the small error bars in panel (c), which indicate the lower and upper quartile around the median value of the respective contributions’ standard deviations. As expected, for a higher model error of $\epsilon =1$, panels (b) and (d), the spatial extent of the reservoir contribution increases, with both the KBM and reservoir contributions leading to a distorted Lorenz-63-like attractor. These results demonstrate how the output matrix $ W out$ in the OH approach is capable of selecting the optimal combination of the KBM and reservoir nodes based on the model’s accuracy.

Figure 5 illustrates the forecast horizons with a rather small model error of $\epsilon =0.1$ for the nine different three-dimensional dynamical systems using a smaller reservoir size of $ r dim=50$, panel (a), and a larger reservoir size of $ r dim=500$, panel (b). The forecast horizons obtained from the KBM-fitted and KBM-only approaches (independent of $ r dim$) provide insights into the accuracies of the respective $\epsilon $-models for the systems when modifying the system parameters listed in Table I. The almost equal predictive abilities of the OH and FH approaches, previously observed for the Lorenz-63 system, can also be confirmed for the other eight three-dimensional chaotic systems and both reservoir sizes tested. Their superior forecast horizons, compared to the IH, KBM, and reservoir-only approaches, once again become most apparent for the small reservoir size ( $ r dim=50$), wherein the IH and reservoir-only approaches struggle to achieve good prediction results. Similarly, for the larger reservoir size of $ r dim=500$, the OH and FH approaches are superior to the IH approach (except for the Chua-Circuit, where all three are equal). However, in this scenario, the IH approach has acquired a significant amount of predictive power, further demonstrating that the IH approach requires a sufficiently large reservoir size to function properly. In any case, the IH approach still achieves better prediction results compared to the reservoir-only approach. However, when considering the smaller reservoir, the input hybrid performs even worse than the simple KBM-fitted forecasts for certain systems. Interestingly, the reservoir-only predictions for the Double-Scroll, Rössler, and WINDMI systems are not able to achieve any accurate predictions, even for the larger reservoir size $ r dim=500$. On the other hand, the hybrid approaches, particularly the OH and FH approaches, are able to generate very accurate predictions for these three systems.

#### 2. Kuramoto–Sivashinsky

The predicted trajectories corresponding to the first reservoir realization, the first training section, and the first prediction section ( $ i res= i T= i P=1$) for the spatiotemporal chaotic Kuramoto–Sivashinsky system, alongside the true simulated trajectory, are visualized in Fig. 6. The respective forecast horizons are indicated as vertical red lines for each predictor. From a qualitative standpoint, all predictors except the KBM-only approach are capable of generating KS-like trajectories, while the KBM-only approach seems to converge toward a stationary state. While all hybrid approaches outperform the reservoir-only and KBM approaches, the OH and FH approaches once again demonstrate superior prediction results.

The prediction results of this single trajectory are supported by the ensemble experiment displayed in Fig. 7, where the reservoir dimension (panel a) and model error (panel b) are varied. These results are qualitatively very similar to the previously discussed Lorenz-63 findings (see Fig. 3): the OH and FH forecast horizons are roughly equivalent and superior to the IH forecast horizons, while all hybrid approaches still perform better than the reservoir-only approach (except for a large model error $\epsilon =10$ in panel b). Contrary to the Lorenz-63 findings, the KS $\epsilon $-model with $\epsilon =1$ appears to already be quite accurate, as evident from the large forecast horizon of the KBM-fitted approach (panel a), which matches the maximal forecast horizon of the reservoir-only approach with $ r dim=6000$: $ t v \lambda max\u22481$. The sharp increase in forecast horizons of the OH and FH approaches, transitioning from an extremely small reservoir of $ r dim=50$ to a reservoir size $ r dim=1000$ that is still relatively small, in light of the large KS time series dimension $ u dim=64$, once again illustrates how a relatively small reservoir can significantly benefit from a fairly accurate KBM within the OH and FH approaches. On the other hand, the forecast horizons of the IH and reservoir-only approaches improve more slowly as the reservoir size increases. Similar to the Lorenz-63 findings, in the case of a large model error of $\epsilon =10$ (panel b), the OH approach seems to converge toward the reservoir-only approach, whereas the IH and FH approaches exhibit even slightly lower forecast horizons. As the model error is decreased, the predictive capabilities of the IH approach saturate already for $\epsilon <1$, whereas the predictive abilities of the OH and FH approaches only reach saturation for $\epsilon < 10 \u2212 1$, indicating once again for the high-dimensional KS system that the latter approaches can benefit more strongly from highly accurate KBMs.

### B. Flow-model results for 3D systems

In this section, the flow $F(\u22c5)$ of the respective three-dimensional dynamical systems (see Appendix A for the flows of all nine 3D systems) will be used as the KBM within the three hybrid approaches, as well as for the KBM-fitted predictor. This experiment represents a case in which the KBM does not function as a next-step predictor but still contains knowledge about the underlying system. Therefore, a KBM-only predictor is not applicable in this case.

Figure 8 shows the influence of the reservoir size $ r dim$ on the forecast horizons of the Lorenz-63 system. All hybrid approaches show higher forecast horizons compared to the reservoir-only approach. This demonstrates that the physical knowledge within the flow of the Lorenz-63 system can indeed be leveraged by the hybrid approaches. Unlike the comparatively accurate $\epsilon $-models, where the forecast horizons of the OH and FH approaches were, in most cases, significantly higher than those of the IH approach, the usage of the flow-model leads to very similar forecast horizons across all hybrid architectures, especially for reservoir dimensions $ r dim>300$. For reservoirs of smaller size, where $ r dim<300$, the OH approach demonstrates slightly better prediction results, while both the IH and FH approaches align closely with the predictions of the reservoir-only model. Overall, the improvements in predictions achieved by the hybrid approaches over the reservoir-only approach are only marginal when compared to the $\epsilon $-model [compare with Fig. 3(a)]. This marginal improvement is also reflected in the unsuccessful KBM-fitted prediction.

Figure 9 shows the forecast horizons for all nine chaotic dynamical systems, with a fixed reservoir size of $ r dim=500$. The inaccuracy of the flow-models, compared to the $\epsilon $-models, becomes evident through the unsuccessful KBM-fitted predictions for all nine systems. Nevertheless, for most systems, the hybrid methods can benefit from the flow-models and achieve longer forecast horizons compared to the reservoir-only approach. This is particularly evident in the cases of the Rössler and WINDMI systems, where the reservoir-only predictions are unsuccessful, while the hybrid approaches lead to significant prediction results. Interestingly, the Chua-Circuit and Thomas systems represent two cases in which the OH approach performs slightly worse than the reservoir-only approach, while the IH and FH approaches show small improvements.

### C. Sine-model results for 3D systems

Finally, the sine-model, Eq. (19), is considered which represents a highly inaccurate model containing no information about the dynamical system. As for the other KBMs, Fig. 10 shows the influence of the reservoir dimension $ r dim$ on the prediction of the Lorenz-63 system, while Fig. 11 depicts the forecast horizons for all nine 3D dynamical systems that were tested using a reservoir dimension of $ r dim=500$. As expected, the KBM-fitted approach is not able to generate any successful predictions. For the Lorenz-63 system shown in Fig. 10, the OH and reservoir-only approaches perform equally well and significantly outperform the IH and FH approaches, which also show equal forecast horizons. This holds true regardless of the reservoir dimension. The equivalence in performance between the OH and reservoir-only approaches, along with their superiority over the IH and FH predictions, is consistently observed across most of the other eight dynamical systems, as depicted in Fig. 11. This demonstrates that when the model is highly inaccurate, the OH approach reduces to the often still satisfactory reservoir-only prediction. On the other hand, the IH and FH approaches, which incorporate the inaccurate model into the reservoir states $r(t)$, fail to ignore the inaccurate model, and as a result, the prediction quality decreases. The Thomas system behaves differently than the other systems, as the OH approach here is actually slightly worse than the reservoir-only prediction, while the IH and FH approaches perform even slightly better. The exceptional behavior for the Thomas system can be attributed to the fact that its flow equation, Eq. (A8), includes the terms $sin\u2061(x)$, $sin\u2061(y)$, and $sin\u2061(z)$. Therefore, in this case, incorporating the sine-model $ K sin$ actually introduces some relevant knowledge about the system. In the case of the OH approach, it appears to generate unstable predictions, while the IH and FH approaches seem to be able to leverage them successfully.

Figure 12 depicts the standard deviations of the partial KBM and reservoir outputs for the OH approach ( $ r dim=500$) for the Lorenz-63 system (panel a) and the Thomas system (panel b). The ability of the OH approach to entirely disregard a highly inaccurate KBM is demonstrated for the Lorenz-63 system, revealing that all output contributions are attributed to the reservoir nodes, without any contributions from the sine-model. The fact that the sine-model actually provides some information about the Thomas system can be observed in panel (b), where the KBM output contributions do not vanish. Nevertheless, as previously mentioned, the incorporation of the sine-model into the Thomas system results in less accurate predictions.

## V. CONCLUSIONS

In this paper, we present a first comparative study of the three hybrid reservoir computing approaches: input hybrid (IH), output hybrid (OH), and full hybrid (FH), which combine data-driven reservoir computing with an imperfect knowledge-based model (KBM) for the task of forecasting the time evolution of chaotic dynamical systems.

For KBMs that incorporate knowledge about the underlying system, as provided by the $\epsilon $-models and flow-models, it is demonstrated that all three hybrid approaches outperform both the reservoir-only and KBM baselines. Regarding rather accurate KBMs, as provided by the $\epsilon $-models, distinctions among the three hybrid approaches become evident: The OH and FH approaches show equal performances and significantly outperform the IH approach. This is observed for the task of forecasting three-dimensional chaotic systems, as well as the spatiotemporal chaotic Kuramoto–Sivashinsky system. Furthermore, contrary to the IH approach, the OH and FH approaches are shown to require only very small reservoir sizes to generate fairly accurate predictions, while the IH approach, on the other hand, requires larger reservoir sizes to effectively leverage the KBM. The FH and OH approaches are additionally shown to benefit most strongly from extremely accurate KBMs, whereas the IH approach fails to show improvements after a certain point, with further increases in KBM accuracy.

When the hybrid approaches are confronted with a highly inaccurate KBM, as provided by the sine-model that incorporates no knowledge of the underlying system, the advantages of the OH approach over the FH and IH approaches become apparent. In those cases, the OH approach, which combines the KBM with the reservoir nodes only in the linear output layer, simply reduces to the data-driven reservoir-only approach, while the FH and IH approaches, which include the KBM as additional reservoir input, fail to ignore the contributions of the flawed model, leading to drastically reduced performances in the latter.

One downside of the OH and FH approaches, as noted in Huhn and Magri^{22} for FH, is the potential for outputs to diverge to infinity within the closed-loop prediction phase when the predictions are unsuccessful. This is due to the fact that the output hybrid KBM-to-output-connection is, in principle, unbounded (depending on the specific KBM used), in contrast to the $tanh$-bounded reservoir nodes. As this issue may lead to problems in hyperparameter optimization algorithms, Huhn and Magri^{22} proposed several ways to overcome it, such as saturating the prediction error.

However, the OH approach is the only hybrid architecture that offers the possibility to separate the KBM and reservoir contributions to the overall output, as both of them are only combined additively in the linear output layer. This allows for a higher interpretability of the knowledge-based and data-driven elements within the hybrid prediction.

In essence, the results indicate the following: For accurate KBMs, the FH and OH results are equivalent and significantly outperform the IH results, especially for smaller reservoir sizes. For highly flawed KBMs, the OH approach reduces to the reservoir-only approach, while the IH and FH approaches’ performance may drastically reduce even further. This, along with the greater simplicity and interpretability of the OH approach, suggests that the OH approach is the most favorable architecture for hybrid reservoir computing.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

**Dennis Duncan:** Conceptualization (supporting); Formal analysis (lead); Methodology (equal); Software (lead); Validation (lead); Writing – original draft (equal); Writing – review & editing (equal). **Christoph Räth:** Conceptualization (lead); Funding acquisition (lead); Methodology (equal); Supervision (lead); Writing – original draft (equal); Writing – review & editing (equal).

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.

### APPENDIX A: 3D CHAOTIC DYNAMICAL SYSTEMS

This section displays the references and flow equations for the nine three-dimensional chaotic dynamical systems considered in this work. They all represent autonomous dissipative flows and are collected from the appendix in Sprott.^{26} The original works introducing the systems are also provided in the following. The default parameters, initial states, simulation time steps $\Delta t$, and largest Lyapunov exponents (both calculated and reference values) are shown in Table IV. The default parameters and initial states, along with the reference largest Lyapunov exponent, are taken from Sprott.^{26} The forecast horizon results are obtained using the calculated largest Lyapunov exponents. The resulting attractors of the nine systems are shown in Fig. 13.

System . | Parameter values . | Initial state . | Δt
. | λ_{max} (calculated)
. | λ_{max} (Sprott^{26})
. |
---|---|---|---|---|---|

Lorenz-63 | ρ = 28, σ = 10, β = 8/3 | [0, − 0.01, 9] | 0.05 | 0.9041 | 0.9056 |

Chen | a = 35, b = 3, c = 28 | [ − 10, 0, 37] | 0.02 | 2.0138 | 2.0272 |

Chua-Circuit | α = 9, β = 100/7, a = 8/7, b = 5/7 | [0, 0, 0.6] | 0.1 | 0.3380 | 0.3271 |

Double-Scroll | a = 0.8 | [0.01, 0.01, 0] | 0.3 | 0.049 69 | 0.0497 |

Halvorsen | a = 1.27 | [ − 5, 0, 0] | 0.05 | 0.7747 | 0.7899 |

Rössler | a = b = 0.2, c = 5.7 | [ − 9, 0, 0] | 0.1 | 0.069 15 | 0.0714 |

Rucklidge | κ = 2.0, λ = 6.7 | [1, 0, 4.5] | 0.1 | 0.1912 | 0.0643a |

Thomas | b = 0.18 | [0.1, 0, 0] | 0.3 | 0.038 01 | 0.0349 |

WINDMI | a = 0.7, b = 2.5 | [0, 0.8, 0] | 0.2 | 0.079 86 | 0.0755 |

System . | Parameter values . | Initial state . | Δt
. | λ_{max} (calculated)
. | λ_{max} (Sprott^{26})
. |
---|---|---|---|---|---|

Lorenz-63 | ρ = 28, σ = 10, β = 8/3 | [0, − 0.01, 9] | 0.05 | 0.9041 | 0.9056 |

Chen | a = 35, b = 3, c = 28 | [ − 10, 0, 37] | 0.02 | 2.0138 | 2.0272 |

Chua-Circuit | α = 9, β = 100/7, a = 8/7, b = 5/7 | [0, 0, 0.6] | 0.1 | 0.3380 | 0.3271 |

Double-Scroll | a = 0.8 | [0.01, 0.01, 0] | 0.3 | 0.049 69 | 0.0497 |

Halvorsen | a = 1.27 | [ − 5, 0, 0] | 0.05 | 0.7747 | 0.7899 |

Rössler | a = b = 0.2, c = 5.7 | [ − 9, 0, 0] | 0.1 | 0.069 15 | 0.0714 |

Rucklidge | κ = 2.0, λ = 6.7 | [1, 0, 4.5] | 0.1 | 0.1912 | 0.0643a |

Thomas | b = 0.18 | [0.1, 0, 0] | 0.3 | 0.038 01 | 0.0349 |

WINDMI | a = 0.7, b = 2.5 | [0, 0.8, 0] | 0.2 | 0.079 86 | 0.0755 |

^{a}

For the Rucklidge system, the largest Lyapunov exponent noted in Sprott^{26} does not correspond to the calculated one. In different studies, the largest Lyapunov exponent is noted as *λ*_{max} = 0.1877 (Rusyn^{30}) and *λ*_{max} = 0.193 (Pehlivan *et al.*^{31}), which corresponds more closely to the calculated one.

#### 1. Lorenz-63 system

^{32}as a simplified model for atmospheric convection,

#### 2. Chen’s system

#### 3. Chua’s circuit

^{35}and then rigorously studied by Chua

*et al.*,

^{36}

#### 4. Double scroll

^{37}as a “simple model which can capture the essential dynamics of double-scroll-like chaotic attractors.” It can also be realized by a physical electric circuit,

#### 5. Halvorsen’s cyclically symmetric attractor

^{26}),

#### 6. Rössler attractor

^{38}it is like the Lorenz-63 system, a prototypical chaotic dynamical system,

#### 7. Rucklidge attractor

^{39}as a simple model related to fluid convection in a two-dimensional plane,

#### 8. Thomas’ cyclically symmetric attractor

^{40}

#### 9. WINDMI attractor

*et al.*,

^{41}

### APPENDIX B: CALCULATION OF THE LARGEST LYAPUNOV EXPONENT

The algorithm for numerically calculating the largest Lyapunov exponent is taken from Sprott,^{26} which originated from Benettin *et al.*^{42}

This procedure is repeated for $ m disc+m$ times using $u( t n)\u2192u( t 0)$ as the new initial state and $u( t n)+\delta u pert ( t n ) \u2212 u ( t n ) \Vert u pert ( t n ) \u2212 u ( t n ) \Vert \u2192 u pert( t 0)$ as the new initially perturbed state that is renormalized to be again a distance of $\delta $ apart, and oriented in the direction of largest expansion. The numerical estimation of the largest Lyapunov exponent is then calculated as the average of $\lambda $ over the last $m$ iterations: $ \lambda max\u2248\u27e8\lambda \u27e9 m$. The first $ m skip$ iterations are skipped in order to only consider points that are on the system’s attractor. The parameters are chosen to be $\delta = 10 \u2212 10$, $n=15$, $ m disc=500$, and $m=3000$.

## REFERENCES

*International Conference on Computational Science*(Springer, 2019), pp. 192–198.

*International Conference on Computational Science*(Springer, 2021), pp. 323–329.

*International Conference on Computational Science*(Springer, 2020), pp. 124–132.

*Chaos and Time-Series Analysis*