Forecasting the behavior of high-dimensional dynamical systems using machine learning requires efficient methods to learn the underlying physical model. We demonstrate spatiotemporal chaos prediction using a machine learning architecture that, when combined with a next-generation reservoir computer, displays state-of-the-art performance with a computational time 103104 times faster for training process and training data set 102 times smaller than other machine learning algorithms. We also take advantage of the translational symmetry of the model to further reduce the computational cost and training data, each by a factor of 10.

Modeling and predicting high-dimensional dynamical systems, such as spatiotemporal chaotic systems, continues to be a physics grand challenge and require efficient methods to leverage computational resources and efficiently process large amounts of data. In this work, we implement a highly efficient machine learning (ML) parallel scheme for spatiotemporal forecasting where each model unit predicts a single spatial location. This reduces the number of trainable parameters to the minimum number possible, thus speeding up the algorithm and reducing the data set size needed for training. Moreover, when combined with next-generation reservoir computers (NG-RCs), our approach presents state-of-the-art performance with a computational cost and training data dramatically reduced in comparison with other machine learning approaches. We also show that the computational cost and training data set size can be further reduced when the system displays translational symmetry, which is commonly present in spatiotemporal systems with cyclic boundary conditions. Although many real systems do not have such symmetry, our results highlight the importance of symmetry addressing when it is present in the system.

Many nonlinear systems display temporal dynamics that depend on spatial location, such as the heart,1 optical devices,2 and fluid flow.3 These systems may display spatiotemporal chaos, which has finite spatiotemporal correlations, a loss of long-term predictability,4 and the appearance of coherent structures.3 Also, these systems often display multi-scale behavior, where information and energy flow across scales. Modeling spatiotemporal chaos is difficult for these reasons and continues to be a physics grand challenge.

One approach to this problem is to use machine learning, which may speed up prediction by learning only variables of interest, such as the macroscale behavior5–7 or improve the prediction accuracy by fusing model predictions and experimental observations.8–13 Some researchers use ML algorithms, such as deep learning,6,14 time embedding techniques,15–17 or sparse system identifiers,18 to learn the underlying ordinary or partial differential equations, but this subsequently requires precise numerical methods for model integration. Another approach is to learn the system flow, which allows for one-step-ahead prediction using a coarser spatiotemporal grid, likely leading to faster prediction. The next-generation reservoir computer (NG-RC), for example, excels at this task17 and is mathematically equivalent to a traditional reservoir computer (RC) but has an optimal form.19 

In typical ML approaches, the spatial variable is discretized at L points with step size δL, assumed one-dimensional for exposition simplicity, and time is discretized with step size δt. During supervised training, blocks of data with Nin spatial points and k time steps are fed into the artificial neural network (ANN) used to predict the behavior at Nout locations at one or more temporal steps. Often, Nin=Nout=L and kδt is longer than the correlation time,6,20 which is problematic because it causes the model to focus on unrelated observations. Also, the ANN is large, which increases the number of trainable parameters and, hence, increases the required computer resources and the training data set size. Recently, a parallel reservoir computing scheme21 was introduced with Nout<Nin<L to reduce the computational cost.

Here, our primary contribution is to demonstrate a new algorithm for learning spatiotemporal systems. It makes predictions of the temporal dynamics of the system at a single spatial location based on data drawn from a small spatiotemporal neighborhood of the point. Prediction over all spatial points is realized using parallel machines. Quantitatively, we take Nin<L and Nout=1, where NinδL is less than or comparable to the spatial correlation length, and take k so that kδt is less than the correlation time. Hence, we use L parallel ANNs for one-step-ahead prediction. We apply our method to a heuristic atmospheric weather model using the NG-RC17 as the core learning machine, which reduces the computational complexity and the data set size required for training while displaying state-of-the-art accuracy. Accounting for the translational symmetry of this model further reduces the training computational time and data. Figure 1 illustrates our scheme.

FIG. 1.

Learning and predicting spatiotemporal chaos using our parallel scheme. (a) Learning mode: The NG-RC is trained to predict the next time step tm+1 at the lth spatial location using training data from the current tm and previous steps tm1 and tm2 (k=3) and the Nin=5 neighbors. (b) Prediction mode: Autonomous operation of L parallel NG-RCs where the output feeds the input to predict the next step of all spatial locations x=[x1,x2,,xL].

FIG. 1.

Learning and predicting spatiotemporal chaos using our parallel scheme. (a) Learning mode: The NG-RC is trained to predict the next time step tm+1 at the lth spatial location using training data from the current tm and previous steps tm1 and tm2 (k=3) and the Nin=5 neighbors. (b) Prediction mode: Autonomous operation of L parallel NG-RCs where the output feeds the input to predict the next step of all spatial locations x=[x1,x2,,xL].

Close modal

We highlight that addressing symmetries has proven to be important for improving other ML approaches.22–26 Furthermore, ML algorithms can reveal hidden symmetries, such as translational invariance present in a simple 1D uniform motion or black hole dynamics,27 even if their presence is not obvious.

The rest of this paper is organized as follows. In Sec. II, we formally describe the high-dimensional dynamical system used as the learning system to train our ML approach for spatiotemporal chaos prediction. In Sec. III, we describe our parallel scheme of ML models where each model unit predicts a single discretized spatial location represented by a system variable. In Sec. IV, we introduce the theoretical background of the NG-RC, followed by brief descriptions of the training procedure and the prediction mode. Section V is dedicated to the results. Here, we use our parallel scheme of NG-RCs to forecast high-dimensional chaos from the model equations introduced in Sec. II. We compare the cases where the parallel NG-RCs are trained independently to the case where a translational symmetry is taken into account to improve the performance. Finally, we apply our approach to a lower-dimensional case of a Lorenz96 model system and to an even lower-dimensional case where fine-scale variables are not present. Last, we present a discussion comparing our results to other works and our conclusions in Sec. VI. We also include details on ridge regression parameter optimization and computational complexity in  Appendixes A and  B.

We demonstrate our approach using numerically generated data from a heuristic atmospheric weather model introduced by Lorenz4,28 and extended by Thornes et al.29 It has an unspecified macroscopic scalar variable xl on a discrete grid (position l) representing the observations around a latitude circle. To represent some convective-scale quantity across spatiotemporal scales, this variable is driven by a finer-scale variable yj,l, which is coupled to the macroscopic variable as well as the finest-scale variable zi,j,l representing, for example, individual clouds in the atmosphere.

The model is described by a set of coupled differential equations given by

x˙l=xl1(xl+1xl2)xl+FhcbSyl,y˙j,l=cbyj+1,l(yj+2,lyj1,l)cyj,l+hcbxlhedSzj,l,z˙i,j,l=edzi1,j,l(zi+1,j,lzi2,j,l)gezi,j,l+hedyj,l,
(1)

where the indices l=1,,L, j=1,,J, and i=1,,I are mod(l,L), mod(j,J), and mod(i,I), respectively, to represent cyclic boundary conditions. The terms Syl=j=1Jyj,l and Szj,l=i=1Izi,j,l represent the couplings between the different spatiotemporal scales.

Here, F=20 is a spatially homogeneous, large-scale forcing term, h=1 is the coupling strength between the different spatial scales, and the parameters b=c=d=e=g=10 set the magnitude and time scale of the fast variables. With these parameters, there is a factor of 100 difference in spatiotemporal scales from the finest (z) to the coarsest (x) scale. For future reference, we specify time in model time units (MTU), where 1 MTU corresponds approximately to 5 atmospheric days.4 We take L=36 to set the number of coherent structures appropriate for the Earth’s weather and J=10.4 For the fastest variable, we take I=J following previous studies.6,7 There are LJ=360 fine-scale and LJI=3600 finest-scale variables, and hence, there are L[1+J(1+I)]=3996 total variables.

We focus on learning and predicting only the slow macroscopic variables xl without observing the finer-scale dynamics.6,7,30 Because of the fast time scale of yl,j and zi,j,l in comparison with xl, Syl acts as a noise-like term in driving xl. It is known that many ML algorithms, including an NG-RC,17 can learn in the presence of large noise, and hence, we expect that we can make accurate predictions as demonstrated below.

Our goal is to learn the one-step-ahead dynamics at a single location xl based on using Nin=(2Nnn+1) spatial points and k temporal points, illustrated by the dashed boundary shown in the middle panel of Fig. 1(a), shown for Nnn=2, Nin=5, and k=3, values we use in Sec. V. We seek an ML model that predicts xl at time step tm+1 based on these input data. Thus, there are L independent ML models to predict the dynamics at all spatiotemporal points. Our scheme can work with a variety of ML algorithms, but we use an NG-RC because of its proven ability to make accurate predictions with limited data and low computational resources.17 

We operate the NG-RC in two modes shown in Fig. 1: training and forecasting. In both cases, we create a feature vector

Ol,total(tm)=cOl,linOl,nonlin
(2)

composed of linear and nonlinear parts, respectively, where c is a constant and is the concatenation operator. The linear part is formed by the current and previous k1 values of the variable xl and its Nnn nearest neighbors of each side of this location. Hence, the dimension of Ol,lin is dlin=kNin=15 for our choice of parameters.

We take the nonlinear part to be the unique second-order monomials of Ol,lin, which is appropriate for this problem because Eq. (1) contains only quadratic nonlinear terms. For the unique quadratic monomials, the dimension of Ol,nonlin is dnonlin=dlin(dlin+1)/2=120 so that the total feature vector Ol,total has dtotal=1+dlin+dnonlin=136 components, which is also equal to the number of trainable parameters for each NG-RC. Minimizing dtotal is one important metric for reducing the computational resources during training. As an aside, we mention that other nonlinear functions can be used in the NG-RC, such as higher-order polynomials or radial basis functions, but are not needed here.

During training, data from the solution to Eq. (1) with Nin spatial points and M temporal points (ttrain=Mδt training time) is fed into each NG-RC (L total) in an open-loop manner as illustrated in Fig. 1(a). Here, the goal is to have the one-step-ahead prediction x¯l(tm+1) of the NG-RC equal to the model prediction xl(tm+1), where tm+1=tm+δt. That is, we seek a solution to

x¯l(tm+1)=WlOl,total(tm)
(3)

that minimizes ||x¯lxl||2+α||Wl||2 over all M temporal points. Here, Wl is found using regularized regression with regularization parameter α (see  Appendix A for α optimization). For our parameters, Wl is a (1×dtotal)=(1×136) matrix. For future reference, the computational complexity of training a single NG-RC scales as Mdtotal2 and, hence, as LMdtotal2 for all L NG-RCs.

After finding Wl, we switch to a prediction mode, where the training data are no longer input. The output of each NG-RC is sent to the input in a closed-loop manner and used to create Ol,total(t) for each NG-RC as shown in Fig. 1(b). The parallel NG-RCs now form an autonomous spatiotemporal dynamical system. Here, the “warm up” of the NG-RC requires k previous time steps as the initial conditions, which is often available if we immediately switching to the forecasting mode from the training mode.

To test the accuracy and speed of our new algorithm, we generate spatiotemporal data by integrating Eq. (1) using a fourth order Runge–Kutta method with a fixed step size of 0.001 MTU, where we save data at steps of δt=0.01 MTU. We use initial conditions x1=F+0.01, xl1=F, and yj,l=zi,j,l=0, integrate for ten MTU to dissipate transients, and discard these data. We integrate for an additional 11 000 MTU to generate the data from which we select an interval ttrain (M=ttrain/δt) to use as a training data set. After integration, we normalize the data to have zero mean and unit standard deviation. After training, we select k consecutive time steps to warm up the NGRCs for the prediction mode and select the following testing time interval to generate the ground-truth test data (data never seen by the NG-RCs during training), an example of which is shown in Fig. 2(a).

FIG. 2.

Spatiotemporal dynamics prediction with a non-parallel scheme with a single NG-RC. (a) Actual and (b) predicted dynamics for the extended Lorenz96 model. (c) Difference and (d) NRMSE between actual and predicted dynamics. The vertical dashed line indicates the prediction horizon. Parameters: k=3, Nin=Nout=L=36, and α=102.

FIG. 2.

Spatiotemporal dynamics prediction with a non-parallel scheme with a single NG-RC. (a) Actual and (b) predicted dynamics for the extended Lorenz96 model. (c) Difference and (d) NRMSE between actual and predicted dynamics. The vertical dashed line indicates the prediction horizon. Parameters: k=3, Nin=Nout=L=36, and α=102.

Close modal

As a baseline, we first predict the extended Lorenz96 system using a non-parallel architecture formed by a single NG-RC. The model receives all L variables as input and is trained to perform one-step ahead prediction at all spatial locations; i.e., Nin=Nout=L. Here, the dimension of Ol,lin is dlin=kNin=108 when k=3 as used here. Thus, the feature vector Ol,total has dtotal=5995 components. The large number of features causes the model to focus on unrelated observations outside of the spatial correlation length, and the single NG-RC fails to predict the spatiotemporal dynamics after a short period. Figure 2(b) shows the NG-RC-predicted spatiotemporal dynamics of the extended Lorenz96 model using a relatively small training data set size ttrain=10 MTU (M=1,000). As we show in Secs. V D and V E, the performance of this approach can be improved using a much larger large training data set, which helps the model learn the appropriated features and to filter out the uncorrelated ones.

To quantify the prediction quality, we determine the normalized root-mean-square error over all spatial locations given by

NRMSE(t)=1Ll=1L(xl(t)x¯l(t))2.
(4)

The right-hand side of Eq. (4) is already normalized because xl has unit standard deviation.

As shown in Fig. 2(d), the NRMSE for the single NG-RC prediction increases from nearly zero, eventually reaching a saturated value. We define a prediction horizon as the time where NRMSE = 0.3 (vertical dashed line), which is equal to 0.1 MTU for the single NG-RC prediction realization shown in Fig. 2. When averaged over 100 predictions for different initial conditions, the prediction horizon is equal to 0.01 ± 0.03 MTU or approximately 0.05 ± 0.14 atmospheric days for the extended Lorenz96 model.

In this section, we implement our scheme that uses smaller NG-RCs operating in parallel. We train L=36 NG-RCs, each with dtotal=136. In terms of computational complexity, training the L parallel NG-RCs is around 50 times less expensive than training the single NG-RC used in Sec. V A, as the NG-RC computational complexity scales as O(Mdtotal2) as we discuss later. Furthermore, we choose Nin=5 so that each parallel NG-RC focuses on data from nearby spatial locations within the spatial correlation distance. Figure 3(b) shows the NG-RC-predicted spatiotemporal dynamics of the extended Lorenz96 model using ttrain=10 MTU (M=1000), the same training data set as in the previous example. The difference between ground-truth and prediction is initially small [Fig. 3(c)] but eventually diverge because the chaotic nature of the system will amplify a small difference between the two. Even though long-term prediction is lost, the predicted behavior has coherent structures that are visually similar to the extended Lorenz96 model. Here, the prediction horizon is equal to 0.92 for the single prediction realization shown in Fig. 3. When averaged over 100 predictions for different initial conditions, the prediction horizon is equal to 0.66 ± 0.15 MTU or approximately 3.2 ± 0.7 atmospheric days for the extended Lorenz96 model. We comment on the accuracy of our prediction in Sec. VI.

FIG. 3.

Spatiotemporal dynamics prediction with parallel NG-RCs using L independent Wl’s. (a) Actual and (b) predicted dynamics for the extended Lorenz96 model. (c) Difference and (d) NRMSE between actual and predicted dynamics. The vertical dashed line indicates the prediction horizon. Parameters: k=3, Nnn=2, and α=102.

FIG. 3.

Spatiotemporal dynamics prediction with parallel NG-RCs using L independent Wl’s. (a) Actual and (b) predicted dynamics for the extended Lorenz96 model. (c) Difference and (d) NRMSE between actual and predicted dynamics. The vertical dashed line indicates the prediction horizon. Parameters: k=3, Nnn=2, and α=102.

Close modal

We further decrease the training data set size by taking into account the translational invariance of the extended Lorenz96 model given in Eq. (1), which arises from the cyclic boundary conditions and the spatially independent model parameters. Because of this symmetry, Wl should be independent of l in the asymptotic limit ttrain. To quantify the independence for the finite ttrain used here, we measure the similarity of the Wl’s by defining a correlation coefficient

C=1L(L1)lLllLWlTWlT||Wl||2,
(5)

where T is the transpose operation, is the dot product operation, and C=1 (C=0) indicates (un)correlated matrices. For the case presented in Fig. 3, we find that it is equal to 0.96, indicating that the NG-RC does a reasonable job of discovering this symmetry even with short ttrain.

We force translational symmetry by training a single W and use it for all spatial locations l. Operationally, we concatenate all Ol,total to create a data structure that has dimension LMdtotal, and hence, the training computational complexity scales as LMdtotal2, the same as in the previous scheme. This procedure relies on the fact that each spatial location has identical behavior—in a statistical sense—and hence, all the data are produced by the same underlying dynamical flow. Effectively, it increases the training time to Lttrain for an observation time ttrain of the spatiotemporal dynamics. A similar approach has been used recently when using a traditional RC for forecasting spatiotemporal dynamics.31 

Accounting for the translational symmetry somewhat improves the prediction horizon for the same training data size, i.e., the same value of M, as seen in Fig. 4, which displays a prediction horizon of 1.04 for the same initial condition used in Figs. 2 and 3. The mean prediction horizon for 100 different initial conditions increases by 29% to 0.85 ± 0.17 MTU.

FIG. 4.

Spatiotemporal dynamics prediction with parallel NG-RCs using a single W that respects translational symmetry. (a) Actual and (b) predicted dynamics for the extended Lorenz96 model. (c) Difference and (d) NRMSE between actual and predicted dynamics. The vertical dashed line indicates the prediction horizon. Parameters: k=3, Nnn=2, and α=102.

FIG. 4.

Spatiotemporal dynamics prediction with parallel NG-RCs using a single W that respects translational symmetry. (a) Actual and (b) predicted dynamics for the extended Lorenz96 model. (c) Difference and (d) NRMSE between actual and predicted dynamics. The vertical dashed line indicates the prediction horizon. Parameters: k=3, Nnn=2, and α=102.

Close modal

As seen in Fig. 5, the single NG-RC approach shows a mean prediction horizon near zero (poor prediction ability) for ttrain up to 60 MTU where it starts to improve and reaches the value of 0.82 ± 0.15 MTU for ttrain=1,000 (the maximum ttrain we explore in this work). For the L independent NG-RCs, the prediction horizon is near zero, begins to improve for ttrain2, and saturates above ttrain40, obtaining a similar value for the single NG-RC approach with a training data set 25 times smaller. This makes the L independent parallel NG-RCs to have a computational complexity 1.4×103 smaller than the single NG-RC. On the other hand, we obtain reasonable performance when respecting the translational symmetry for the smallest training time shown in the plot, and we obtain nearly the same prediction performance as the other approaches for ttrain1. Thus, we see that we can reduce the observation time and computational cost by 1/L, which is expected because the effective training time is L times longer.

FIG. 5.

Scaling of the mean prediction horizon with training time for the non-parallel model with a single NG-RC (green stars) and for the parallel NG-RCs using L independent Wl’s (blue square) and using a single Wl that respects translational symmetry (orange circles). Symbols represent the mean prediction horizon for ten different training sets. For each training set, we make predictions for 10 different initial conditions, total 100 predictions per point in the plot. Error bars represent the standard deviation of the mean over the 100 predictions. Parameters: k=3, Nnn=2 (for the parallel approaches), and α is optimized for each ttrain (see  Appendix A).

FIG. 5.

Scaling of the mean prediction horizon with training time for the non-parallel model with a single NG-RC (green stars) and for the parallel NG-RCs using L independent Wl’s (blue square) and using a single Wl that respects translational symmetry (orange circles). Symbols represent the mean prediction horizon for ten different training sets. For each training set, we make predictions for 10 different initial conditions, total 100 predictions per point in the plot. Error bars represent the standard deviation of the mean over the 100 predictions. Parameters: k=3, Nnn=2 (for the parallel approaches), and α is optimized for each ttrain (see  Appendix A).

Close modal

Here, we apply our model to lower-dimensional cases of the extended Lorenz96 model and compare our results to previous research that use these simplified models.

1. Parallel NG-RC model for L=J=I= 8

First, we apply our parallel NG-RC approach (Nin=5,Nout=1) to the extended Lorenz96 system with L=J=I=8 and compare our results to our baseline method (single NG-RC) and to previous works. Figure 6(a) shows the mean prediction horizon as a function of the number of training steps M for these cases. Here, we use training steps M rather than training time ttrain as the horizontal axis for a direct comparison to other works. For our previous results presented above, the conversion is ttrain=Mδt.

FIG. 6.

Mean prediction horizon for a Lorenz96 system with L=J=I=8 as a function of training steps M for the non-parallel model with a single NG-RC (green stars) and for the parallel NG-RCs using L independent Wl’s (blue square) and using a single Wl that respects translational symmetry (orange circles). (a) Circle, square, and star symbols represent the mean prediction horizon for ten different training sets. For each training set, we make predictions for 10 different initial conditions, totaling 100 predictions per point in the plot. The error bars represent the standard deviation of the mean over 100 predictions. (b) Circle, square, and star symbols represent the mean prediction horizon for the training set [out of that ten used in (a)] that returns the best prediction horizon. In both, the red down triangles represent the results of Chattopadhyay et al. using a single RC. Parameters: k=3, Nnn=2, and α are optimized for each M.

FIG. 6.

Mean prediction horizon for a Lorenz96 system with L=J=I=8 as a function of training steps M for the non-parallel model with a single NG-RC (green stars) and for the parallel NG-RCs using L independent Wl’s (blue square) and using a single Wl that respects translational symmetry (orange circles). (a) Circle, square, and star symbols represent the mean prediction horizon for ten different training sets. For each training set, we make predictions for 10 different initial conditions, totaling 100 predictions per point in the plot. The error bars represent the standard deviation of the mean over 100 predictions. (b) Circle, square, and star symbols represent the mean prediction horizon for the training set [out of that ten used in (a)] that returns the best prediction horizon. In both, the red down triangles represent the results of Chattopadhyay et al. using a single RC. Parameters: k=3, Nnn=2, and α are optimized for each M.

Close modal

When using a single NG-RC (green stars), we obtain a mean prediction horizon of 1.03±0.39 MTU for M=104 (ttrain100MTU ) where it starts to saturate. Here, the feature vector has dtotal=325 components. On the other hand, when using L independent NG-RCs (blue squares), our model obtains a mean prediction horizon of 1.05±0.38 MTU with M=4000 corresponding to a training time of ttrain=40 MTU. Here, each one of the L parallel NG-RCs is trained individually with M training points from the respective region using dtotal=136 features, which results in a computational cost 2 times smaller than the single NG-RC. Unsurprisingly, the single NG-RC performs very similar to the L parallel NG-RCs, as the number of spatial variables is low (L=8) and the single NG-RC input dimension Nin=L contains fewer unrelated variables than the higher dimensional case shown in Fig. 5. Finally, when using a single W that respects the translation symmetry (orange circles), we obtain a similar result (with no statistical difference) for M=400 (ttrain=4 MTU), reducing the training time by a factor of 10 (L) and a similar reduction in the computational cost. This reduction is expected because data from all L spatial locations are concatenated to form a single training data set in this method with an effective size L times longer than ttrain=4 MTU. Figure 7 shows typical predictions for the three cases discussed above.

FIG. 7.

Typical prediction for the extended Lorenz96 system with L=J=I=8 using (a) a non-parallel scheme with a single NG-RC with ttrain=100 MTU and α=101 and using our parallel NG-RC approach with Nin=5 and Nout=1 for (b) L independent Wl’s with ttrain=40 MTU and α=103 and (c) a single W that respects translational symmetry with ttrain=4 MTU and α=4×102. For the three panels: (i) Actual and (ii) predicted dynamics, (iii) the difference between actual and predicted dynamics, and (iv) NRMSE. The vertical dashed line indicates the prediction horizon and k=3.

FIG. 7.

Typical prediction for the extended Lorenz96 system with L=J=I=8 using (a) a non-parallel scheme with a single NG-RC with ttrain=100 MTU and α=101 and using our parallel NG-RC approach with Nin=5 and Nout=1 for (b) L independent Wl’s with ttrain=40 MTU and α=103 and (c) a single W that respects translational symmetry with ttrain=4 MTU and α=4×102. For the three panels: (i) Actual and (ii) predicted dynamics, (iii) the difference between actual and predicted dynamics, and (iv) NRMSE. The vertical dashed line indicates the prediction horizon and k=3.

Close modal

Chattopadhyay et al.6 also predict the dynamics of this simplified extended Lorenz96 system using Nin=Nout=L=8 using a traditional RC with dtotal=5000 nodes (equal to the size of their feature vector) with M=1×1042×106 training steps. Their results are shown in Fig. 6(a) (red triangles). Their approach shows an improvement in performance when increasing M to 5×105, where the mean prediction horizon is similar to our NG-RC approaches. We highlight that our parallel NG-RCs that respect translation symmetry (orange circles) present similar results with a computational cost 2.1×105 smaller than their RC trained with M=5×105 data points (see  Appendix B for more details in computational complexity comparisons).

Intriguingly, the results of Chattopadhyay et al. show a step-like improvement when increasing the training steps to M106. They do not know the reason for this step and leave it for future investigations. It is important to notice that they use a single training data set in their work, whereas we average our results over multiple training data sets. One possible hypothesis for their observation is that the step improvement is related to the specific details of their training set.

To explore this possibility, we show the mean prediction horizon of our approaches using the best-performing single training data set in Fig. 6(b). We see that our best-performing training data set shows a similar mean prediction horizon using less training data than the RC trained with M106 data points. Our results could be coincidental but points out one issue that may be responsible for explaining their observation. The one take-away message is that it is important to average over training data sets to remove possible spurious effects on the prediction error due to the specific details of the training data set, which becomes more important for the short set sizes used here.

As another comparison of our work to past reports, Pyle et al.7 also predict the same low-dimensional Lorenz96 system using a non-parallel ML scheme except that they use an approach similar to the NG-RC with k=1 and all monomials up to quartic order. They use a single training data set of M=5×105 training points, the same as Chattopadhyay et al., and obtain a mean prediction horizon of 1.60±0.53 MTU. In terms of computational cost, our parallel approaches that do not (do) exploit the symmetry of the model are 2.1×102 (2.1×103) less expensive than their approach.

2. Parallel NG-RC model for the Lorenz96 model without fine-scale variables with L= 40, J=I= 0

Here, we use our approach to predict the dynamics of a simpler Lorenz96 model that does not include the fine spatiotemporal variables yj,l and zi,j,l [J=I=0—see Eq. (1)] with L=40.

First, we use our single NG-RC baseline model, a single NG-RC with dtotal=7381 features. Similar to the predictions for the other Lorenz96 models shown in Secs. V AV D and V E 1 the single NG-RC only starts to improve its performance for a higher M in comparison with the parallel approaches, as shown in Fig. 8, which displays the mean prediction horizon as a function of the training steps M. The maximum mean prediction horizon for the single NG-RC is 7.3±1.3Λ for M=105 (ttrain=1000 MTU), where Λ=1/λ=1/1.68 MTU is the Lyapunov time with L=40 and F=8 (parameters used here).14 

FIG. 8.

Mean prediction horizon for a Lorenz96 system with (L=40, J=I=0) as a function of training steps M for the non-parallel model with a single NG-RC (green stars) and for the parallel NG-RCs using L independent Wl’s (blue square) and using a single Wl that respects translational symmetry (orange circles). For these plots, the symbols represent the mean prediction horizon for ten different training sets. For each training set, we make predictions for 10 different initial conditions, totalizing 100 predictions per point in the plot. Error bars represent the standard deviation of the mean over 100 predictions. The purple right triangle and the brown up triangle represent the results of Platt et al. for the same task using parallel RCs and single RC, respectively. The red down triangle represents the results of Vlachas et al. for the same task using parallel RCs. Parameters: k=3, Nnn=2, and α are optimized for each ttrain.

FIG. 8.

Mean prediction horizon for a Lorenz96 system with (L=40, J=I=0) as a function of training steps M for the non-parallel model with a single NG-RC (green stars) and for the parallel NG-RCs using L independent Wl’s (blue square) and using a single Wl that respects translational symmetry (orange circles). For these plots, the symbols represent the mean prediction horizon for ten different training sets. For each training set, we make predictions for 10 different initial conditions, totalizing 100 predictions per point in the plot. Error bars represent the standard deviation of the mean over 100 predictions. The purple right triangle and the brown up triangle represent the results of Platt et al. for the same task using parallel RCs and single RC, respectively. The red down triangle represents the results of Vlachas et al. for the same task using parallel RCs. Parameters: k=3, Nnn=2, and α are optimized for each ttrain.

Close modal

When using L independently trained NG-RCs (blue squares), the performance begins to improve for M200 (ttrain2) and saturates for M=6000 (ttrain=60), where the mean prediction horizon is 8.0±1.7Λ. Each parallel NG-RC has dtotal=136 features, which makes the computational complex of this approach 1.2×103 times smaller than the single NG-RC method.

In comparison, using a single W that respects the translation symmetry (orange circles) results in a mean prediction horizon of 7.7±1.7Λ with M=100 (a training time of ttrain=1 MTU). That is, we observe a similar prediction horizon using a factor of 60 smaller training data set in comparison with the L independent NG-RCs. Figure 9 shows typical predictions for the three cases.

FIG. 9.

Typical prediction for the extended Lorenz96 system with L=40 and J=I=0 using (a) a non-parallel scheme with a single NG-RC with ttrain=1000 MTU and using our parallel NG-RC approach with Nin=5 and Nout=1 for (b) L independent Wl’s with ttrain=60 MTU and (c) a single W that respects translational symmetry with ttrain=1 MTU. For the three panels: (i) Actual and (ii) predicted dynamics, (iii) difference between actual and predicted dynamics, and (iv) NRMSE. The vertical dashed line indicates the prediction horizon. Parameters: k=3 and α=1×105.

FIG. 9.

Typical prediction for the extended Lorenz96 system with L=40 and J=I=0 using (a) a non-parallel scheme with a single NG-RC with ttrain=1000 MTU and using our parallel NG-RC approach with Nin=5 and Nout=1 for (b) L independent Wl’s with ttrain=60 MTU and (c) a single W that respects translational symmetry with ttrain=1 MTU. For the three panels: (i) Actual and (ii) predicted dynamics, (iii) difference between actual and predicted dynamics, and (iv) NRMSE. The vertical dashed line indicates the prediction horizon. Parameters: k=3 and α=1×105.

Close modal

Vlachas et al.14 use a parallel RC scheme21 to predict the same Lorenz96 model. They use 20 parallel RCs, each with dtotal=3,000 nodes, Nin=10>Nout=2, and M=100000. Their method obtains a mean prediction horizon of approximately 3.3Λ, represented by the red down triangle in Fig. 8. When comparing to our approaches, we find that our L independent NG-RCs model (blue squares) obtains a prediction horizon 2.4 times longer using a training data set 17 times smaller and with a computational complexity 4×103 times shorter than their result. When considering the translational symmetry (orange circles), our approach obtains the same 2.4 improvement factor on the prediction horizon, but with a training data set 103 times smaller and with a computational complexity 2.4×105 times shorter than their approach.

Recently, Platt et al.32 optimized the parallel RC architecture to obtain a mean prediction horizon twice as long as Vlachas et al. using Nin=6>Nout=2 with smaller RCs (each with dtotal=720 nodes) and training data set (M=40000). Their result is represented by the purple right triangle in Fig. 8. Our parallel model with independent NG-RCs obtains slightly better performance in the mean prediction horizon with a computational complexity 102 shorter using 7× less training data than Platt et al. Considering the translational symmetry, our model presents a computational complexity 5.6×103 smaller using 4×102 less training data. See  Appendix B for more details on the computational complexity comparison. Last, Platt et al. also use a single RC with dtotal=6000 nodes to predict the same system. While this model presents a worse performance in comparison with the three NG-RC approaches presented here, it surprisingly outperforms the parallel RC model by Vlachas et al., indicating an important improvement due to the optimizations done by Platt et al.

We emphasize that parallel machine learning architectures provide high-efficiency prediction of a high-dimensional spatiotemporal dynamical system. Partitioning the learning system in small subsystems, each of which can be predicted using a smaller ML model unit, provides better predictions, and is less computationally expensive than using a single model to predict the entire system.

In our proposed method, we maximize the parallelism and let each parallel ML model unit predict a single variable of the system. Thus, there are as many parallel units as the number of predicted variables in the system (here the slow variables of the Lorenz96 model). Furthermore, this parallel approach lends itself to parallelization methods using graphical processor units or multi-processor computer clusters.

We show that our parallel scheme composed of independently trained NG-RCs outperforms a non-parallel model composed of a single larger NG-RC. For the higher dimensional Lorenz96 system addressed in this work (L=36,I=J=10), the parallelization provides a 1.4×103 improvement factor in the computational cost while obtaining similar prediction accuracy using 25 times smaller training data set than the non-parallel architecture. We further decrease both training data and computational cost by another factor of L by addressing the system translational symmetry, which is sometimes present in spatiotemporal systems with cyclic boundary conditions.

We also predict the dynamics of lower-dimensional Lorenz96-like models to compare our results to previous research. We demonstrate that our method is more accurate or can be training with less data and computational cost or both. For the extended Lorenz96 with L=I=J=8, we show that our parallel approach obtains similar results to a traditional RC implemented by Chattopadhyay et al.6 but with a computational cost up to 2.1×104 smaller using up to 1.2×102 less training data. When considering the translational symmetry, these numbers improve by another factor of 10. Chattopadhyay et al. also implement a deep learning network for this problem, but the deep learning method demonstrates no accuracy improvement and requires more computation time.

We also use our approach to predict the Lorenz96 system for L=40,I=J=0, where we obtain better results with a training data set up to 103 times smaller and computational costs up to 2.4×105 smaller than parallel implementations of traditional RCs implemented by Vlachas et al.14 We also obtain better results than parallel implementations of traditional RCs that have implemented Platt et al.32 using up to 4×102 times less training data and a computational costs up to 5.6×103 smaller.

The low computational cost, less training data requirement, and fewer optimizable parameters of the NG-RC in comparison with other ML approaches allow us to implement our parallel architecture in a standard desktop computer and obtain sub-second training and prediction times. To give an absolute scale for the computational cost for our approach, we produce the results for this paper using Python 3.7.6, NumPy 1.19.15, and scikit-learn 0.24.2 on an x86-64 CPU running Windows 10. For the results presented in Figs. 3 and 4, the computation time for training all L=36 NG-RCs with M=1000 data points is 55±1 ms, while the runtime for predicting one time step is 394±3 or 10.9±0.1μs per spatial location per step.

Future directions of our research include hybrid approaches using NG-RCs that combine model-generated and experimentally observed data, such as those explored using a traditional RC.9–12 Also, our method should generalize to two- and three-dimensional fluid dynamics problems where an even greater reduction in the required data set size is anticipated. Combined with our approach of one-step-ahead prediction, a coarser spatiotemporal grid can be used, offering the possibility of greatly speeding up spatiotemporal simulations.

We gratefully acknowledge discussions of this work with E. Bollt and the financial support of the Air Force Office of Scientific Research (Contract No. FA9550-20-1-0177).

The authors have no conflicts to disclose.

Wendson A. S. Barbosa: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Resources (lead); Software (lead); Visualization (lead); Writing – original draft (equal); Writing – review & editing (equal). Daniel J. Gauthier: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Funding acquisition (lead); Investigation (equal); Methodology (equal); Project administration (lead); Supervision (lead); Writing – original draft (equal); Writing – review & editing (equal).

The data that support the findings of this study are available from the corresponding author upon reasonable request.

We train our model using a supervised learning algorithm where the input data x and the desired output y are known in advance for the entire training period. Here, y is evaluated at time tm+1, whereas x is evaluated at time tm because we are making a next-step-ahead prediction. We use Ridge regression to find the matrix of weights W that minimizes

||y¯y||2+α||W||2,
(A1)

where y¯=WOtotal is the model output, Ototal is the feature vector obtained from the input data x, and α is the ridge parameter. Here, |||| represents the L2-norm. When α is zero, ridge regression reduces to regular least-square regression. The ridge parameter adds a penalty term to prevent overfitting.

We use a grid-search procedure to find the optimal α that maximizes the mean prediction horizon for each case shown in Sec. V. Figure 10 shows the optimization results for all cases shown in Figs. 5, 6, and 8. For each α, we calculate the mean prediction horizon for ten different training sets. For each training set, we perform predictions for 10 different initial conditions, totaling 100 predictions per α. We repeat this process for different training times ttrain. It is seen that for L=36,J=I=10 [Figs. 10(a)10(c)] and for L=J=I=8 [Figs. 10(d)10(f)], the three NG-RC algorithms are not sensitive to the value of α for higher training times. On the other hand, for L=40,J=I=0 [Figs. 10(g)10(i)], the three algorithms show a trend to have a worse performance for higher values of α.

FIG. 10.

Ridge parameter optimization for the results shown in Figs. 5, 6, and 8. Mean prediction horizon as a function of the ridge parameter α for different training times (see the legend color code) for the Lorenz96 system with L=36,J=I=10 [(a)–(c)], L=J=I=8 [(d)–(f)], and L=40,J=I=0 [(g)–(i)]. For each case, optimizations for a single NG-RC, L independent NG-RCs, and L NG-RCs using translational symmetry are presented. The colored area around the curves represents the standard deviation of the mean.

FIG. 10.

Ridge parameter optimization for the results shown in Figs. 5, 6, and 8. Mean prediction horizon as a function of the ridge parameter α for different training times (see the legend color code) for the Lorenz96 system with L=36,J=I=10 [(a)–(c)], L=J=I=8 [(d)–(f)], and L=40,J=I=0 [(g)–(i)]. For each case, optimizations for a single NG-RC, L independent NG-RCs, and L NG-RCs using translational symmetry are presented. The colored area around the curves represents the standard deviation of the mean.

Close modal

Here, we provide an estimation of the computational complexity of our parallel NG-RC approach in comparison with the other RC-based approaches mentioned above. The main contribution to the computational complexity for both the NG-RC and the regular RC is performing the ridge regression, which scales as O(Mdtotal2) for M training points and dtotal features. The comparison for the extended Lorenz96 system with (L=36,J=I=10), (L=J=I=8), and (L=40,J=I=0) is shown in Tables IIII, respectively. Note that for the case where we train a single W respecting the translational symmetry (first row of each table), the number of training points is multiplied by the number of spatial locations L to reflect the training data concatenation as discussed in Sec. V C.

TABLE I.

Training complexity comparison of different ML approaches for prediction of the extend Lorenz96 system with L = 36, J = I = 10.

ML modelMdtotalNinNoutParallel units trainedSpeed up
Our approach—Respecting symmetry NG-RC 100 × 36 136 … … 
Our approach—L independent NG-RCs NG-RC 4000 136 36 40 
Our approach—Single NG-RC NG-RC 100 000 5995 36 36 … 5.4 × 104 
ML modelMdtotalNinNoutParallel units trainedSpeed up
Our approach—Respecting symmetry NG-RC 100 × 36 136 … … 
Our approach—L independent NG-RCs NG-RC 4000 136 36 40 
Our approach—Single NG-RC NG-RC 100 000 5995 36 36 … 5.4 × 104 
TABLE II.

Training complexity comparison of different ML approaches for prediction of the extend Lorenz96 system with L = J = I = 8.

ML modelMdtotalNinNoutParallel units trainedSpeed up
Our approach—Respecting symmetry NG-RC 400 × 8 136 … … 
Our approach—L independent NG-RCs NG-RC 4000 136 10 
Our approach—Single NG-RC NG-RC 10 000 325 … 18 
Chattopadhyay et al.6  RC 500 000 5000 … 2.1 × 105 
Pyle et al.7  NG-RC 500 000 495 … 2.1 × 103 
ML modelMdtotalNinNoutParallel units trainedSpeed up
Our approach—Respecting symmetry NG-RC 400 × 8 136 … … 
Our approach—L independent NG-RCs NG-RC 4000 136 10 
Our approach—Single NG-RC NG-RC 10 000 325 … 18 
Chattopadhyay et al.6  RC 500 000 5000 … 2.1 × 105 
Pyle et al.7  NG-RC 500 000 495 … 2.1 × 103 
TABLE III.

Training complexity comparison of different ML approaches for prediction of the extend Lorenz96 system with L = 40 and J = I = 0.

ML modelMdtotalNinNoutParallel units trainedSpeed up
Our approach—Respecting symmetry NG-RC 100 × 40 136 … … 
Our approach—L independent NG-RCs NG-RC 6000 136 40 60 
Our approach—Single NG-RC NG-RC 100 000 7381 40 40 … 7 × 104 
Vlachas et al.14  RC 100 000 3,000 10 20 2.4 × 105 
Platt et al.32  RC 40 000 720 20 5.6 × 103 
ML modelMdtotalNinNoutParallel units trainedSpeed up
Our approach—Respecting symmetry NG-RC 100 × 40 136 … … 
Our approach—L independent NG-RCs NG-RC 6000 136 40 60 
Our approach—Single NG-RC NG-RC 100 000 7381 40 40 … 7 × 104 
Vlachas et al.14  RC 100 000 3,000 10 20 2.4 × 105 
Platt et al.32  RC 40 000 720 20 5.6 × 103 

In our analysis for the RC complexity, we do not account for the cost of multiplying the node states with the adjacency matrix that represent the network because these RC approaches use neural networks with sparse connectivity (we assume this fact when it is not stated). Also, we do not take into account special function evaluation costs, such as the hyperbolic tangent present in the traditional RC. For the NG-RC, we do not take into account the computational cost of the feature vector creation that happens before training.

1.
A.
Winfree
,
When Time Breaks Down
(
Princeton University Press
,
1987
).
2.
L.
Illing
,
D. J.
Gauthier
, and
R.
Roy
, Controlling Optical Chaos, Spatio-Temporal Dynamics, and Patterns, Advances in Atomic, Molecular, and Optical Physics Vol. 54, edited by P. Berman, C. Lin, and E. Arimondo (Academic Press, 2007), pp. 615–697.
3.
P.
Holmes
,
J. L.
Lumley
, and
G.
Berkooz
, Turbulence, Coherent Structures, Dynamical Systems and Symmetry, Cambridge Monographs on Mechanics (Cambridge University Press, 1996).
4.
E.
Lorenz
, “Predictability: A problem partly solved,” in Seminar on Predictability (ECMWF, Reading, UK, 1996), Vol. 1, pp. 1–18.
5.
D. S.
Wilks
, “
Effects of stochastic parametrizations in the Lorenz ’96 system
,”
Q. J. R. Meteorol. Soc.
131
,
389
407
(
2005
).
6.
A.
Chattopadhyay
,
P.
Hassanzadeh
, and
D.
Subramanian
, “
Data-driven predictions of a multiscale Lorenz 96 chaotic system using machine-learning methods: Reservoir computing, artificial neural network, and long short-term memory network
,”
Nonlinear Process. Geophys.
27
,
373
389
(
2020
).
7.
R.
Pyle
,
N.
Jovanovic
,
D.
Subramanian
,
K. V.
Palem
, and
A. B.
Patel
, “
Domain-driven models yield better predictions at lower cost than reservoir computers in Lorenz systems
,”
Philos. Trans. R. Soc. A: Math. Phys. Eng. Sci.
379
,
20200246
(
2021
).
8.
H. D. I.
Abarbanel
,
P. J.
Rozdeba
, and
S.
Shirman
, “
Machine learning: Deepest learning as statistical data assimilation problems
,”
Neural Comput.
30
,
2025
2055
(
2018
).
9.
A.
Wikner
,
J.
Pathak
,
B.
Hunt
,
M.
Girvan
,
T.
Arcomano
,
I.
Szunyogh
,
A.
Pomerance
, and
E.
Ott
, “
Combining machine learning with knowledge-based modeling for scalable forecasting and subgrid-scale closure of large, complex, spatiotemporal systems
,”
Chaos
30
,
053111
(
2020
).
10.
H.
Fan
,
J.
Jiang
,
C.
Zhang
,
X.
Wang
, and
Y.-C.
Lai
, “
Long-term prediction of chaotic systems with machine learning
,”
Phys. Rev. Res.
2
,
012080
(
2020
).
11.
A.
Wikner
,
J.
Pathak
,
B. R.
Hunt
,
I.
Szunyogh
,
M.
Girvan
, and
E.
Ott
, “
Using data assimilation to train a hybrid forecast system that combines machine-learning and knowledge-based components
,”
Chaos
31
,
053114
(
2021
).
12.
T.
Arcomano
,
I.
Szunyogh
,
A.
Wikner
,
J.
Pathak
,
B. R.
Hunt
, and
E.
Ott
, “
A hybrid approach to atmospheric modeling that combines machine learning with a physics-based numerical model
,”
J. Adv. Model. Earth Syst.
14
,
e2021MS002712
(
2022
).
13.
A.
Chattopadhyay
,
M.
Mustafa
,
P.
Hassanzadeh
,
E.
Bach
, and
K.
Kashinath
, “
Towards physics-inspired data-driven weather forecasting: Integrating data assimilation with a deep spatial-transformer-based U-NET in a case study with ERA5
,”
Geosci. Model. Dev.
15
,
2221
2237
(
2022
).
14.
P.
Vlachas
,
J.
Pathak
,
B.
Hunt
,
T.
Sapsis
,
M.
Girvan
,
E.
Ott
, and
P.
Koumoutsakos
, “
Backpropagation algorithms and reservoir computing in recurrent neural networks for the forecasting of complex spatiotemporal dynamics
,”
Neural Netw.
126
,
191
217
(
2020
).
15.
U.
Parlitz
and
C.
Merkwirth
, “
Prediction of spatiotemporal time series based on reconstructed local states
,”
Phys. Rev. Lett.
84
,
1890
1893
(
2000
).
16.
S.
Ørstavik
and
J.
Stark
, “
Reconstruction and cross-prediction in coupled map lattices using spatio-temporal embedding techniques
,”
Phys. Lett. A
247
,
145
160
(
1998
).
17.
D. J.
Gauthier
,
E.
Bollt
,
A.
Griffith
, and
W. A. S.
Barbosa
, “
Next generation reservoir computing
,”
Nat. Commun.
12
,
5564
(
2021
).
18.
Y.-C.
Lai
, “
Finding nonlinear system equations and complex network structures from data: A sparse optimization approach
,”
Chaos
31
,
082101
(
2021
).
19.
E.
Bollt
, “
On explaining the surprising success of reservoir computing forecaster of chaos? The universal machine learning dynamical system with contrast to VAR and DMD
,”
Chaos
31
,
013108
(
2021
).
20.
J.
Pathak
,
A.
Wikner
,
R.
Fussell
,
S.
Chandra
,
B. R.
Hunt
,
M.
Girvan
, and
E.
Ott
, “
Hybrid forecasting of chaotic processes: Using machine learning in conjunction with a knowledge-based model
,”
Chaos
28
,
041101
(
2018
).
21.
J.
Pathak
,
B.
Hunt
,
M.
Girvan
,
Z.
Lu
, and
E.
Ott
, “
Model-free prediction of large spatiotemporally chaotic systems from data: A reservoir computing approach
,”
Phys. Rev. Lett.
120
,
024102
(
2018
).
22.
Z.
Lu
,
J.
Pathak
,
B.
Hunt
,
M.
Girvan
,
R.
Brockett
, and
E.
Ott
, “
Reservoir observers: Model-free inference of unmeasured variables in chaotic systems
,”
Chaos
27
,
041102
(
2017
).
23.
J.
Herteux
and
C.
Räth
, “
Breaking symmetries of the reservoir equations in echo state networks
,”
Chaos
30
,
123142
(
2020
).
24.
W. A. S.
Barbosa
,
A.
Griffith
,
G. E.
Rowlands
,
L. C. G.
Govia
,
G. J.
Ribeill
,
M.-H.
Nguyen
,
T. A.
Ohki
, and
D. J.
Gauthier
, “
Symmetry-aware reservoir computing
,”
Phys. Rev. E
104
,
045307
(
2021
).
25.
M.
Favoni
,
A.
Ipp
,
D. I.
Müller
, and
D.
Schuh
, “
Lattice gauge equivariant convolutional neural networks
,”
Phys. Rev. Lett.
128
,
032003
(
2022
).
26.
M.
Oberlack
,
S.
Hoyas
,
S. V.
Kraheberger
,
F.
Alcántara-Ávila
, and
J.
Laux
, “
Turbulence statistics of arbitrary moments of wall-bounded shear flows: A symmetry approach
,”
Phys. Rev. Lett.
128
,
024502
(
2022
).
27.
Z.
Liu
and
M.
Tegmark
, “
Machine learning hidden symmetries
,”
Phys. Rev. Lett.
128
,
180201
(
2022
).
28.
E. N.
Lorenz
, “
Designing chaotic models
,”
J. Atmos. Sci.
62
,
1574
1587
(
2005
).
29.
T.
Thornes
,
P.
Düben
, and
T.
Palmer
, “
On the use of scale-dependent precision in Earth system modelling
,”
Q. J. R. Meteorol. Soc.
143
,
897
908
(
2017
).
30.
A.
Chattopadhyay
,
A.
Subel
, and
P.
Hassanzadeh
, “
Data-driven super-parameterization using deep learning: Experimentation with multiscale Lorenz 96 systems and transfer learning
,”
J. Adv. Model. Earth Syst.
12
,
e2020MS002084
(
2020
).
31.
M.
Goldmann
,
C. R.
Mirasso
,
I.
Fischer
, and
M. C.
Soriano
, private communication (2022).
32.
J. A.
Platt
,
S. G.
Penny
,
T. A.
Smith
,
T.-C.
Chen
, and
H. D. I.
Abarbanel
, “A systematic exploration of reservoir computing for forecasting complex spatiotemporal dynamics,” arXiv:2201.08910 (2022).