The prediction of complex nonlinear dynamical systems with the help of machine learning has become increasingly popular in different areas of science. In particular, reservoir computers, also known as echo-state networks, turned out to be a very powerful approach, especially for the reproduction of nonlinear systems. The reservoir, the key component of this method, is usually constructed as a sparse, random network that serves as a memory for the system. In this work, we introduce block-diagonal reservoirs, which implies that a reservoir can be composed of multiple smaller reservoirs, each with its own dynamics. Furthermore, we take out the randomness of the reservoir by using matrices of ones for the individual blocks. This breaks with the widespread interpretation of the reservoir as a single network. In the example of the Lorenz and Halvorsen systems, we analyze the performance of block-diagonal reservoirs and their sensitivity to hyperparameters. We find that the performance is comparable to sparse random networks and discuss the implications with regard to scalability, explainability, and hardware realizations of reservoir computers.

The application of reservoir computers to various fields in science and technology yields very promising and fast advancing results due to their capabilities in forecasting chaotic attractors, inferring unmeasured values in systems, and recognizing speech. While the construction of a reservoir computer is rather simple in comparison to other machine learning techniques, the architecture and functionality of them is in many regards still a black-box since at its core, the reservoir is usually still chosen as a random network. Thus, we replace the network with block-diagonal matrices dividing it into multiple smaller reservoirs, take out the randomness, and show that these alterations still deliver an equal quality of short- and long-term predictions. This architecture breaks with the common interpretation of the reservoir as a single network and may prove to be more scalable and easier to implement in hardware than their more complex variants while still performing as well.

## I. INTRODUCTION

The analysis and modeling of complex dynamic systems is a key challenge across various disciplines in science, engineering, and economics.^{1} While machine learning approaches, like generative adversarial networks, can provide excellent predictions on dynamical systems,^{2} difficulties with vast data requirements, the large number of hyperparameters, and lack of interpretability limit their usefulness in some scientific applications.^{3} However, it is required to fundamentally understand how, when, and why the models are working in order to prevent the risk of misinterpreting the results if deeper methodological knowledge is missing.^{4}

In the context of complex system research, reservoir computers (RCs)^{5} have emerged for quantifying and predicting the spatiotemporal dynamics of chaotic nonlinear systems. They represent a special kind of recurrent neural networks (RNNs) and are often referred to as echo-state networks (ESNs).^{6} The core of the model is a fixed reservoir, which is a complex network with connections according to a predefined network topology. The input data are fed into the nodes of the reservoir and solely the weights of the readout layer, which transform the reservoir response to output variables, are subject to optimization via linear regression. This makes the learning extremely fast, comparatively transparent, and prevents the vanishing gradient problem of other RNN methods.^{7}

The topology of a reservoir, or the arrangement of the nodes and connections within it, can have a significant impact on the performance of a reservoir computing system.^{8} In current state-of-the-art models, the topology of the reservoir is often chosen randomly,^{9} with the hope that the resulting dynamics will be sufficiently complex to allow for good performance on a given task. However, this approach can be hit-or-miss,^{10,11} and it is impossible to know a priori how the topology of the reservoir will affect the performance of the system.

While Maass *et al*.^{12} and Jaeger^{13} introduced ESNs with reservoirs being modeled as a random Erdös–Renyi network, Watts and Strogatz,^{14} Albert and Barabási,^{15} and others have shown that random networks are far from being common in physics, biology, or sociology. Instead, more complex networks like scale-free, small-world, or intermediate forms of networks^{16,17} are most often found in real-world applications. Further approaches to make reservoir computing more explainable have been made in recent years, with, e.g., Haluszczynski and Räth^{11} comparing different network construction algorithms, Griffith *et al*.^{18} introducing very low connectivity networks, and Carroll and Pecora^{10} analyzing the effect of network symmetries on prediction performance. However, there still remain open questions about the functionality of reservoir computers, which need to be answered for developing new algorithms, for fine-tuning the system for specific applications, or building efficient hardware realizations of RCs.

In this work, we break with the interpretation of the reservoir as a single network by deliberately using block-diagonal matrices as reservoirs. This implies that we decompose the reservoir into multiple smaller reservoirs as outlined in Sec. II B. Furthermore, we use matrices of ones as the blocks, which take out the randomness of the network completely. We assess the ability of block-diagonal reservoirs for short- and long-term predictions by comparing the measures discussed in Sec. II C to the standard RC setup.

## II. METHODS

We structure our methods section into three different parts: benchmark models, reservoir computing, and prediction performance measures.

### A. Benchmark models

We perform our analyses on two synthetic example models, which exhibit chaotic behavior and are three-dimensional autonomous, dissipative flows.

#### 1. Lorenz system

*et al.*

^{19}and Lu

*et al*.,

^{20}we use the Lorenz system, which was initially used for modeling atmospheric convection,

^{21}as an example for replicating chaotic attractors using reservoir computing,

#### 2. Halvorsen system

^{22}we use the Halvorsen system for our analyses, which has a cyclic symmetry. While the nonlinear terms of the Lorenz system are mixed products of two different variables, the Halvorsen system

^{23}entails only non-mixed quadratic nonlinearities,

#### 3. Simulating and splitting data

If not stated otherwise, we solve the differential equations of the synthetic system using the Runge–Kutta method^{24} for $T=70000$ steps and a discretization of $dt=0.02$ in order to ensure a sufficient manifestation of the attractor. We discard the initial transient of $T=50000$ steps and use the remaining steps for training $ T t r a i n=10000$ and testing $ T t e s t=10000$ of the RCs. In order to get robust results, we vary the starting points on the attractor by using the rounded last point of one data sample as the starting point for the next. The initial starting points for the Lorenz and Halvorsen systems are $(\u221214,\u221220,25)$ and $(\u22126.4,0,0)$, respectively. This setting is comparable to the ones used by Griffith *et al*.^{18} and Haluszczynski and Räth.^{11} Figure 1 illustrates the attractors and trajectories of the simulated data.

### B. Reservoir computing

A reservoir computer (RC)^{12,13,25,26} is an artificial recurrent neural network (RNN) that relies on a static internal network called *reservoir*. The term static means that, unlike other RNN approaches, the reservoir remains fixed once the network is constructed. The same is true for the input weights. Therefore, the RC system is computationally very efficient since the training process only involves optimizing the output layer. As a result, fast training and high model dimensionality are computationally feasible, making RC well suited for complex real-world applications.

#### 1. Algorithm

The reservoir $A$ is usually constructed as a sparse Erdös-Rényi random network^{27} with dimensionality or number of nodes $d$. However, in this paper, we replace the network structure of the reservoir with a block-diagonal matrix of ones. This breaks with the widespread interpretation of the reservoir as a single network.

To feed the $n$-dimensional input data $u(t)$ into reservoir $A$, a $d\xd7n$ input matrix $ W i n$ is constructed, which defines how strongly each input dimension influences every single node. The elements of $ W i n$ are chosen to be uniformly distributed random numbers within the interval $[\u22121,1]$.

^{22}This means that the squared elements of the reservoir states are appended $r\u21a6 { r , r 2}$.

^{28}

#### 2. Block-diagonal reservoir

^{29}We can speed up the calculation by a factor of

#### 3. Blocks of Erdös–Rényi networks

First, we choose the individual blocks $ J i$ to be Erdös–Rényi networks with a connection probability of $p=0.02$.^{14} In our analyses, we distinguish between two cases:

*Individual blocks*: Each block $ J i$ is constructed separately with a different random seed.*Equal blocks*: All the blocks are equal to each other and thus, the Erdös–Rényi network only needs to be constructed once,Furthermore, this case delivers another increase in eigenvalue decomposition speed by a factor of $ \u230a d b \u230b$.$ J 1= J 2=\cdots = J \u230a d b \u230b.$

#### 4. Blocks of matrices of ones

^{31}We denote the mean between the $i$th and $j$th row of reservoir state $ r(t)$ as

Therefore, the reservoir multiplication contribution is identical for each block, which means that at each training step, the reservoir memory is the same for each block. This directly implies lower computational costs.

#### 5. Implications for hardware reservoir computers

By separating the reservoir into multiple smaller reservoirs and by taking out the randomness we significantly reduce the complexity of the architecture, especially with regards to hardware implementations. Examples of the reservoir topologies and their respective spring-layouts according to the *Fruchterman–Reingold* force-directed algorithm^{30} are illustrated in Fig. 2. We observe that the networks constructed fully or partly with Erdös–Rényi Networks have more complex interconnected network structures, while the networks with blocks of ones are ordered and have clear separate unified reservoirs.

This leads us to make the following assumptions on potential implications for hardware RCs:

*Improved generalization*: Similar to ensemble methods^{32}from other machine learning methods, the use of multiple smaller reservoirs can lead to a more diverse representation of the inputs. This potentially reduces the risk of overfitting and improves generalization to unseen data.*Enhanced robustness*: The failure of one reservoir unit can be potentially compensated by the other units, which can still contribute to the processing of the inputs. This makes the system more robust to noise and errors, especially in hardware implementations.*Better scalability*: The computational and memory requirements can be reduced by parallelization, making the system more scalable and suitable for deployment on high-dimensional data.

Each of these assumptions requires different experimental setups—hence validating these potential implications is beyond the scope of this paper and is the subject of further research.

### C. Measuring prediction performance

When forecasting nonlinear dynamical systems such as chaotic attractors, the goal of the predictions is not only to exactly replicate the actual short-time trajectory but also to reproduce the long-term statistical properties of the system called $climate$. This is important because by definition chaotic systems exhibit sensitive dependence on initial conditions and therefore small disturbances grow exponentially fast. Consequently, even if at first the short-term prediction is perfect, at some stage already numerical inaccuracies lead to the separation of the predicted and actual trajectories. However, for many applications, this is not a problem as long as the predicted trajectory still leads to the same attractor. In order to quantify this behavior, quantitative measures are needed that grasp the complex dynamics of the system. Therefore, we use the measures applied in the paper by Haluszczynski and Räth^{11} and Haluszczynski.^{26}

#### 1. Forecast horizon

^{11}For that, we track the number of time steps during which the predicted $v(t)$ and the actual trajectory $ v R(t)$ are matching. As soon as one of the three coordinates exceeds certain deviation thresholds we consider the trajectories as not matching anymore. Throughout our study, we use

The aim of this measure is that small fluctuations around the actual trajectory, as well as minor deviations do not exceed the threshold. A higher value means that the prediction is close to the “true” trajectory over a longer period of time and has not deviated yet, although the underlying system is chaotic.

#### 2. Correlation dimension

^{33}It belongs to the measures for fractal dimensionality, which have been proposed by Mandelbrot

^{34}in 1967. The correlation dimension is based on the correlation integral,

*Heaviside*function and $c( r \u2032)$ is the standard correlation function. The integral represents the mean probability that two states in the phase space are close to each other at different time steps. This is the case if the distance between the two states is smaller than the threshold distance $r$.

For self-similar, strange attractors, this relationship holds for a certain range of $r$, which needs to be properly calibrated. The calculation of the correlation dimension is done using the *Grassberger Procaccia* algorithm.^{35} It is purely data-based and does not require any knowledge of the underlying governing equations of the system. One advantage of the correlation dimension over other fractal measures is that it can be calculated having a comparably small number of data points available. In the context of this work, mainly the relative comparison among various predictions and actual trajectories is of interest and, therefore, the accuracy of the absolute values is not the highest priority.

#### 3. Lyapunov exponent

*Lyapunov*exponents.

^{36}They describe the average rate of divergence of nearby points in the phase space, and thus measure sensitivity with respect to initial conditions. There is one exponent for each dimension in the phase space. If the system has at least one positive Lyapunov exponent, it is classified as chaotic. The magnitudes of $ \lambda i$ quantify the time scale for which the system becomes unpredictable.

^{37}Since at least one positive exponent is the requirement for being classified as chaotic, it is sufficient for the purposes in this work to calculate only the largest Lyapunov exponent $ \lambda max$:

*Rosenstein*algorithm.

^{38}As mentioned for the correlation dimension, mainly a relative comparison is of interest in order to characterize states of the system rather than determine the exact absolute values. Again, for this measure, no model or knowledge of the underlying governing equations is required.

### D. Benchmarks

In order to evaluate whether the predictions using the modified RC architecture are on par with the traditional setup, we run multiple predictions for different reservoir dimensions, target spectral radii, attractor starting points, and random seeds.

We do not observe significant changes for network dimensions $d\u2265400$ and find a target spectral radius of $ \rho \u2217=0.1$ to have the best overall prediction results. This is consistent to the findings by Haluszczynski and Räth.^{11}

Thus, using the traditional architecture with reservoir dimension $d=600$ and target spectral radius $ \rho \u2217=0.1$, we vary the random seeds and attractor starting points, make predictions, and evaluate the quality of the predictions. This yields a total of $n=10000$ different realizations of forecast horizons, correlation dimensions, and largest Lyapunov exponents. The average and standard deviation over these prediction measures for the respective systems are denoted in the columns “Lorenz traditional” and “Halvorsen traditional” in Table I. We use these values as benchmarks for our analysis so that we can compare the prediction performance of our modified architecture to the traditional RC setup.

Measure/System . | Lorenz traditional . | Lorenz test . | Halvorsen traditional . | Halvorsen test . |
---|---|---|---|---|

Forecast horizon τ | 219 ± 10 | ∞ | 382 ± 12 | ∞ |

Correlation dimension ν | 2.01 ± 0.09 | 2.02 ± 0.02 | 1.98 ± 0.07 | 1.99 ± 0.02 |

Lyapunov exponent λ_{max} | 0.86 ± 0.05 | 0.87 ± 0.03 | 0.72 ± 0.06 | 0.74 ± 0.03 |

Measure/System . | Lorenz traditional . | Lorenz test . | Halvorsen traditional . | Halvorsen test . |
---|---|---|---|---|

Forecast horizon τ | 219 ± 10 | ∞ | 382 ± 12 | ∞ |

Correlation dimension ν | 2.01 ± 0.09 | 2.02 ± 0.02 | 1.98 ± 0.07 | 1.99 ± 0.02 |

Lyapunov exponent λ_{max} | 0.86 ± 0.05 | 0.87 ± 0.03 | 0.72 ± 0.06 | 0.74 ± 0.03 |

Furthermore, for each attractor starting point, we have a different test dataset. We calculate the correlation dimension and largest Lyapunov exponent of all test datasets and denote the average and standard deviation in the columns “Lorenz test” and “Halvorsen test” in Table I. These values can be seen as the “true” correlation dimension and largest Lyapunov exponent.

## III. RESULTS

In the following, we present the results of our analyses for variations of different parameters to demonstrate the robustness of the modified architecture. The varied parameters are:

*Network dimension*$d$: We vary the network dimension between $d\u2208{400,450,\u2026,600}$ as these are sensible values in RC research. We specifically choose multiples of $50$ in order to have a decent number $ \u230a d b \u230b$ of block-sizes $b$.*Block-size*$b$: We vary the block-size $b$ and set it to all divisors of the network dimension $d$. We exclude the divisor $b=1$ as it essentially takes out the reservoir. Also, we do not use $b=d$ since it represents again a single network as in the traditional architecture.*Target spectral radius*$ \rho \u2217$: We vary the spectral radius from $ \rho \u2217\u2208{0.1,0.2,\u2026,2.0}$ and find that, similarly to the traditional architecture, the target spectral radius of $ \rho \u2217=0.1$ yields the most robust prediction results. Henceforth, we set $ \rho \u2217=0.1$ as default.*Attractor starting points*: We choose $500$ different starting points on the attractor as explained in Sec. II A 3.*Random seeds*: We choose $100$ different random seeds across all components of the RC architecture which have randomness: input weights $ W i n$ and the reservoir $ A$ for block-diagonal Erdös–Rényi networks.

*x*axis,

Thus, the higher the fraction, the bigger the blocks and the number of connections in the network. This is necessary because the number of divisors for each network size $d$ is different and the divisors are not equally spaced.

### A. Blocks of Erdös–Rényi networks

As mentioned before, we distinguish between two cases for the Erdös–Rényi blocks. First, where all the blocks are individual networks and second, where all blocks are equal.

*Individual blocks*: We find that for both the Lorenz and the Halvorsen systems, all prediction measures are close to the respective benchmark values in Table I and even surpass them for a network size of $600$. Generally, we observe that small block-sizes have a worse long-term prediction quality with regard to the correlation dimension and the largest Lyapunov exponent. However, they appear to have a better short-term forecast horizon. The standard deviation over the variation of random seeds is comparable to the benchmarks. The results are illustrated in Fig. 3. Furthermore, we find the variation for different input weights and starting points to be similarly robust.*Equal blocks*: Similar results can be observed for the equal blocks (Fig. 4)—however, the standard deviation is slightly lower. This can be explained by the reduced level in randomness since only one block is randomly constructed.

Generally, we find that the prediction performances stabilize for $ b d>0.3$ for both individual and equal blocks. As explained in Eq. (9), this speeds up the calculation of the reservoir spectral radius by a factor of $\u2248123$ for individual blocks and $\u2248412$ for equal blocks.

Furthermore, we observe that the modified architecture outperforms traditional RC with regard to long-term predictions while it performs slightly worse for short-term predictions. This can be inferred by looking at the correlation dimensions and largest Lyapunov exponents, which are higher than the respective values of the traditional RC architecture and closer to the “true” values of the test data. A comparable short-term prediction performance can be achieved by increasing the network dimension to $d=600$. There we see that the forecast horizon is able to match the value of the traditional architecture and can even slightly outperform it for small block-sizes of $ b d\u22480.05$. In this specific case, the calculation speed of the spectral radius is increased by a factor of $160000$ and $3200000$ for individual and equal blocks, respectively.

This behavior holds true for both the Lorenz and Halvorsen systems.

### B. Blocks of matrices of ones

For the blocks of matrices of ones, the randomness of the reservoirs is taken out. Thus, we focus on the remaining randomness, which is present in the input weights $ W i n$, and the variation of the attractor starting point.

*Input weights*$ W i n$: We find that for both the Lorenz and the Halvorsen systems, all prediction measures are close to the respective benchmark values and even surpass them for some network sizes. Generally, we observe that the performance is slightly worse than using blocks of Erdös–Rényi networks and the benchmarks. As expected, the standard deviation over the variation of input weights is comparable to the benchmarks and lower than for randomly constructed reservoirs. The results are illustrated in Fig. 5.*Attractor starting points*: A similar behavior can be observed for the variation in attractor starting points. However, we observe that network dimensions $d=600$ even outperform the benchmarks for some block-sizes. The results are illustrated in Fig. 6.

In general, we find that the prediction quality for using blocks of ones as the reservoir is stable for a variation in input weights and attractor starting points. For certain block-sizes, the modified architecture is able to surpass the benchmarks for both short- and long-term predictions. Finding the best performing instance of these reservoirs can be done in a few iterations by varying the block-size for a large enough network dimension. Since the computationally expensive task of calculating the spectral radius of the reservoir is not necessary in this setup, fine-tuning the RC architecture with a parameter scan is fast and scalable.

## IV. CONCLUSION AND OUTLOOK

In this paper, we introduce an alternative approach to constructing reservoir computers by replacing the reservoir, which is traditionally a single random network, with a block-diagonal matrix. This implies that the reservoir can be composed of multiple smaller reservoirs, which breaks with the common understanding of the reservoir as a single network.

Furthermore, we remove the randomness of the reservoir by using matrices of ones for the individual blocks.

We evaluate the short- and long-term prediction performance of block-diagonal reservoirs for two nonlinear chaotic systems: the Lorenz and Halvorsen systems. For that, we use three measures: forecast horizon, correlation dimension, and the largest Lyapunov exponent. We find that—overall—the quality of the predictions is comparable to classical random networks. Although the block-diagonal reservoirs tend to perform slightly worse than the traditional architecture on average, some block-diagonal reservoirs with appropriate size of the blocks perform as well and sometimes even better than the conventional network reservoirs.

We find the result to be robust over variations in network dimensions, block-sizes, target spectral radii, attractor starting points, input weights, and random seeds.

This modified reservoir architecture not only has immediate large benefits regarding the computational effort but also the great potential for simple and fast hardware implementations of reservoir computers becomes obvious. Following this line of research is subject to further research.

We discover many interesting lines of future research. Further directions we find promising to take a look at are: understanding whether block-diagonal reservoirs improve generalization, enhance robustness, or increase scalability.

Current and future work is dedicated to the investigation of these questions—not the least because the answers to them will shed new light on the complexity of the underlying dynamical system.

## ACKNOWLEDGMENTS

We would like to acknowledge the DLR for providing code and computational resources.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

**Haochun Ma:** Conceptualization (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Software (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). **Davide Prosperino:** Formal analysis (equal); Software (equal); Writing – original draft (equal); Writing – review & editing (equal). **Alexander Haluszczynski:** Formal analysis (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). **Christoph Räth:** Conceptualization (equal); Supervision (equal); Writing – original draft (equal); Writing – review & editing (equal).

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.

## REFERENCES

*Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control*

*Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005*(IEEE, 2005), Vol. 3, pp. 1463–1466.

*International Conference on Digital Audio Effects (DAFx)*(Citeseer, 2009).

*Advances in Chaos Theory and Intelligent Control*(Springer, 2016), pp. 225–247.

*Solving Ordinary Differential Equations I*

*et al.*, “

*Rough Sets and Knowledge Technology: 9th International Conference, RSKT 2014, Shanghai, China, October 24–26, 2014, Proceedings 9*(Springer, 2014), pp. 364–375.

*Multiple Classifier Systems: First International Workshop, MCS 2000 Cagliari, Italy, June 21–23, 2000 Proceedings 1*(Springer, 2000), pp. 1–15.

*The Theory of Chaotic Attractors*(Springer, 2004), pp. 170–189.