We use recent advances in the machine learning area known as “reservoir computing” to formulate a method for model-free estimation from data of the Lyapunov exponents of a chaotic process. The technique uses a limited time series of measurements as input to a high-dimensional dynamical system called a “reservoir.” After the reservoir's response to the data is recorded, linear regression is used to learn a large set of parameters, called the “output weights.” The learned output weights are then used to form a modified autonomous reservoir designed to be capable of producing an *arbitrarily long* time series whose ergodic properties approximate those of the input signal. When successful, we say that the autonomous reservoir reproduces the attractor's “climate.” Since the reservoir equations and output weights are known, we can compute the derivatives needed to determine the Lyapunov exponents of the autonomous reservoir, which we then use as estimates of the Lyapunov exponents for the original input generating system. We illustrate the effectiveness of our technique with two examples, the Lorenz system and the Kuramoto-Sivashinsky (KS) equation. In the case of the KS equation, we note that the high dimensional nature of the system and the large number of Lyapunov exponents yield a challenging test of our method, which we find the method successfully passes.

There have been notable recent advances in machine learning that have proven useful for tasks ranging from speech recognition^{1,2} to playing of the game Go at a level surpassing the best humans.^{3} In this paper, we build a machine learning model of a chaotic dynamical system using the neural computing framework known as reservoir computing.^{4} We show that such a model can be used to deduce the most important quantifiers of the system's chaotic behavior, namely, its Lyapunov exponents, using only limited time series measurements of the system. We envision that such artificial intelligence based models could be used to accurately capture the complex dynamics of many geophysical, ecological, biological, or economic systems that are often difficult to model from first principles.

## I. INTRODUCTION

We consider the frequently occurring situation in which limited duration time series data from some dynamical process is available, but a first-principles-based model of how that data is produced is either unavailable or too inaccurate to be useful. Thus, if one is interested in diagnosing the ergodic properties of the underlying processes producing the data, one is restricted to do so based only on the data itself. We call such a method “model-free.” Model-free analysis of dynamical time series is a long-standing subject of study in nonlinear dynamics.^{5–7} Perhaps, the most wide-spread approach uses delay-coordinate embedding.^{5–13} In this article, we discuss a very promising, entirely different approach to model-free analysis of dynamical time series. Our approach is based upon recent significant advances in the area known as *machine learning*. In particular, we will apply a type of machine learning known as reservoir computing,^{4} and, for definiteness, we focus on the problem of determining the Lyapunov exponents of the data-generating system. For this application, the key ability we require from machine learning is to replicate the ergodic properties of the system generating the input, and we call this replicating the “climate.”

The rest of this article is organized as follows: Section II reviews reservoir computing and its use for short-term prediction of chaotic time series; Sec. III illustrates our method using the well-known Lorenz 1963 model,^{14} and discusses the ability of reservoir computers to replicate the (long-term) climate; Sec. IV uses our approach to evaluate the Lyapunov exponents of the Kuramoto-Sivashinsky (KS) equation^{15–17} with periodic boundary conditions. This system provides an example of extensive spatiotemporal chaos,^{18–21} for which the attractor dimension and number of positive Lyapunov exponents increases linearly with the periodicity length *L*. In particular, Sec. IV considers cases with many positive Lyapunov exponents. The paper concludes with further discussion in Sec. V.

The main conclusion of this paper is that our machine learning approach offers a very attractive model-free method for obtaining Lyapunov exponents from data. Particularly notable are our results from Sec. IV where we obtain excellent agreement for all of the positive Lyapunov exponents and many of the negative exponents for a moderately high-dimensional system. In comparison with delay coordinate embedding, we remark that our method appears to be simpler to implement, and does not appear to suffer from the problem of yielding spurious positive Lyapunov exponents^{22,23} (these papers and references therein discuss a mechanism responsible for spurious positive Lyapunov exponents in delay coordinate embedding; this mechanism is inherently absent in our method). More broadly, our paper suggests that machine learning is useful for analysis of data from chaotic systems (e.g., previous work has treated model-free machine learning for prediction of future evolution of the states of a dynamical system^{24} and for inference of unmeasured dynamical variables^{25}).

## II. RESERVOIR COMPUTERS, SHORT TERM PREDICTION, AND ATTRACTOR CLIMATE

Reservoir computers^{4} originate from an idea independently put forth about 16 years ago in two papers.^{26,27} The general approach is illustrated in Fig. 1(a), which shows an input vector $u(t)$ fed into a “reservoir” [labeled R in Fig. 1(a)] through an input-to-reservoir coupler (labeled I/R), with an output vector $v$ coupled from the reservoir through an output coupler (labeled R/O). We regard the couplers as acting instantaneously and without memory (i.e., their output depends solely on their current input). Importantly, the reservoir has memory (i.e., it has internal dynamics so its state depends on its history). We assume that it receives input at discrete times *t*, and that its input $Winu(t)$ is combined with the reservoir state $r(t)$ to produce its output $r(t+\Delta t)$. In general, the reservoir is an appropriate complex dynamical system; here, we follow Refs. 26 and 27, and consider the reservoir to be a large random network with *D _{r}* nodes and an $Dr\xd7Dr$ adjacency matrix $A$. Specifically, we will henceforth consider the particular implementation (similar to Ref. 24) of Fig. 1(a) given by

where $r(t)$ represents the scalar states $ri(t)$ of the *D _{r}* network reservoir nodes, $r=[r1,r2,\u2026,rDr]T$; in Eq. (1), $Win$ is a $Dr\xd7D$ matrix, where

*D*is the dimension of $u$; also, in Eq. (1), for a vector $q=(q1,q2,\u2026)T$ the quantity $tanh(q)$ is the vector $(tanh(q1),tanh(q2),\u2026)T$. In Eq. (2), $Wout$ maps the

*D*dimensional vector $r$ to the output $v$, which, for the situations considered in this article, has the same dimension

_{r}*D*as $u$. In addition, we assume that $Wout$ depends on a large number of adjustable parameters given by the elements of the matrix $P$, and that $Wout(r,P)$ depends linearly on $P$ (e.g., in the past work the choice $Wout(r,P)=Pr$ has often been used).

In general, the goal of the system in Fig. 1(a) is for the outputs $v(t)$ to approximate the desired outputs, $vd(t)$, appropriate to the inputs $u(t)$ (e.g., in a pattern recognition task $u(t)$ might represent a sequence of patterns, and $vd(t)$ would represent classifications the patterns). To this end, during a training period, $\u2212T\u2264t\u22640$, an input $u(t)$ is fed into the reservoir and the resulting reservoir state evolution $r(t)$, along with $u(t)$, are recorded and stored as “training data.” Then, the parameters $P$ are chosen (“trained”) so as to approximately minimize the mean squared difference between $v(t)$ and its desired value $vd(t)$. As is common in reservoir computing, we use the Tikhonov regularized regression procedure^{28} to find an output matrix **P**, that minimizes the following function:

where $||P||2$ denotes the sum of the squares of elements of **P**. The regularization constant $\beta >0$ discourages overfitting by penalizing large values of the fitting parameters (in Sec. IV, we used a value $\beta >0$, but for Sec. III we found that using *β* = 0 was sufficient). For a given task, one hopes that for large enough *D _{r}* and

*T*, the system in Fig. 1(a) will yield subsequent (

*t*> 0) outputs $v(t)$ that closely approximate the desired $vd(t)$. Because $Wout(r,P)$ is taken to be linear in $P$, the problem of determining the parameters $P$ that minimize Eq. (3) is one of linear regression for which there are well-established techniques.

^{29}This approach has been shown to work extremely well for a wide variety of tasks.

^{4}

We now consider the task of prediction for the case, where $u(t)$ depends on the state of some deterministic dynamical system. This problem was originally considered in the reservoir computer framework by Jaeger and Haas.^{24} The idea is to take the desired output to be the same as the input, $vd(t+\Delta t)=u(t+\Delta t)$. When one wishes to commence prediction at *t* = 0, the configuration is switched from that in Fig. 1(a) to that in Fig. 1(b), and the reservoir system is run autonomously according to the following equation:

The output of the autonomous reservoir, $v(t)=Wout(r(t),P)$, gives the predicted value $u(t)$ for *t* > 0. Jaeger and Haas,^{24} using the example of the Lorenz system,^{14} indeed verified that this prediction scheme works and gives good short term predictions. As expected, the chaotic amplification of small errors leads to eventual breakdown of the prediction, limiting the prediction time. However, as shown in Secs. III and IV, following this breakdown of short-term prediction, the evolution of $v(t)$ often provides an accurate approximation for the climate corresponding to $u(t)$, and can be used in particular, to compute Lyapunov exponents of the process that generated $u(t)$.

## III. CLIMATE REPLICATION IN THE LORENZ SYSTEM

In this section, we illustrate the capability of our technique to replicate the “climate” of the Lorenz 1963 system^{14}

We construct and train reservoir computers with input $u=(x,y,z)T\u2208\mathbb{R}3$ and output $v\u2208\mathbb{R}3$, following Sec. II. The reservoir network is built from a sparse random Erdős-Rényi network whose average degree is *d* = 6. Each non-zero element in the adjacency matrix is drawn independently and uniformly from $[\u2212a,a]$, and *a* > 0 is adjusted so that the spectral radius of $A$ (the largest magnitude of its eigenvalues) has a desired value *ρ*. During the training phase, $\u2212T\u2264t\u22640$ (where *T* = 100), the reservoir computer evolves following equation (1) with $\Delta t=0.02$. In this Lorenz example, the reservoir output $v(t)=Wout(r(t),P)$ is defined as

where $p1,\u2009p2$, and $p3$ are the rows of the $3\xd7Dr$ matrix **P**. The quantity $r\u0303$ in the third line of Eq. (6) is defined in a way such that the first half of its elements are the same as that of $r$, i.e., $r\u0303i=ri$ for half ($Dr/2$) of the reservoir nodes, while $r\u0303i=ri2$ for the remaining half of the reservoir node (our use here of $r\u0303(t)$, rather than $r(t)$, to predict *z*(*t*) is related to the $x\u2192\u2212x,\u2009y\u2192\u2212y$ symmetry of the Lorenz equations as discussed in Ref. 25).

After we compute $r(t)$ for the training period, $\u2212T\u2264t\u22640$, we calculate the output weight parameters $P$ that minimize the function in Eq. (3) with the desired output being the state variables from the Lorenz system, $vd(t)=[x(t),y(t),z(t)]T$ (in an actual physical experiment, we assume $u(t)=vd(t)$ to have been measured for $\u2212T\u2264t\u22640$). After we find the output weights, we evolve the reservoir with the reconfigured reservoir system [Fig. 1(b)].

Following the above described procedure, we now report and compare results for two simulations using reservoir configurations (Table I) with $\rho =1.2$ (denoted R1) and $\rho =1.45$ (denoted R2). The prediction for $0<t\u226425$ for both trained reservoirs is shown in Fig. 2(a) (R1 with $\rho =1.2$) and Fig. 2(b) (R2 with $\rho =1.45$). Both reservoirs R1 and R2 generate correct short-term predictions and then deviate from the actual Lorenz trajectories, which are expected since any small error grows exponentially due to the chaotic dynamics of the Lorenz system. However, after the failure of the short-term prediction, the two reservoirs show qualitatively different dynamical patterns. In Fig. 2(a), it seems that, after $t\u22487$, although the R1 prediction deviates from the actual trajectory, the long-term dynamics appears to resemble that of the original Lorenz system. In contrast, as shown in Fig. 2(b), this is clearly not the case for R2.

In Fig. 3, we present a more accurate test than visual inspection of Figs. 2(a) and 2(b) for correctness of the climate. To do this, we follow Lorenz's procedure of plotting the return map of successive maxima of *z*(*t*). We first obtain *z*(*t*) for a long period of time, $0<t<1000$, for both the actual and the predicted time series. We then locate all local maxima of the actual and predicted *z*(*t*) in time order and denote them $[z1,z2,\u2026,zm]$. Then, we plot consecutive pairs of those maxima $[zi,zi+1]$ for $i=1,\u2026,m\u22121$ as dots in Fig. 3. The blue dots in both panels of Fig. 3 are from the actual Lorenz system, while the red dots *printed over* the blue dots are from the reservoir output prediction (*v*_{3}) of *z*(*t*). As confirmed by Fig. 3(a), the red dots produced by the R1 reservoir continue to fall on top of the blue dots (from the actual Lorenz system) throughout the entire run time ($0<t<1000$). In contrast, Fig. 3(b) shows that the blue dots remain largely uncovered, because, as indicated in the third panel of Fig. 2(b), the maximum value of *z*(*t*) for *t* > 5 is at a fixed point $zmax\u224830$. Thus, the R1 reservoir very accurately succeeds in reproducing the long-time climate of the attractor, while the R2 reservoir does not, and this is so even though both setups are apparently capable of producing useful short term predictions. (We have also obtained similar results for many other simulations.) Thus some parameter adjustment may be necessary to avoid unsuccessful reproduction of the climate. Fortunately, we usually find that when the climate is not reproduced it is fairly evident [as in Fig. 2(b), as well as Fig. 5 of Sec. IV]. More quantitatively, a promising general means of assessing whether the reservoir system has succeeded in mimicking the climate is to first use the training data to obtain finite-time estimates of the system's ergodic properties (e.g., frequency-power spectra, time correlations, moments, etc.). Once this is done, one can test whether those estimates are consistent with determinations of the same quantities obtained from the long-term reservoir dynamics. Section IV provides such an assessment for the Kuramoto-Sivashinsky system.

Parameter . | Value . | Parameter . | Value . |
---|---|---|---|

D _{r} | 300 | d | 6 |

T | 100 | $\Delta t$ | 0.02 |

$T/\Delta t$ | 5000 | β | 0 |

ρ | 1.2 | σ | 0.1 |

Parameter . | Value . | Parameter . | Value . |
---|---|---|---|

D _{r} | 300 | d | 6 |

T | 100 | $\Delta t$ | 0.02 |

$T/\Delta t$ | 5000 | β | 0 |

ρ | 1.2 | σ | 0.1 |

The reservoir in the autonomous configuration of Fig. 1(b) represents a known discrete-time, *D _{r}*-dimensional dynamical system (since we know $Win,\u2009A$, and the output parameters

**P**determined by the training). We compute the equations for the evolution of the tangent map corresponding to Eq. (4) and evolve a set of

*m*mutually orthogonal tangent vectors $R(t)={\delta rj}j=1m$ along with Eq. (4). We then compute the largest

*m*Lyapunov exponents of the reservoir dynamical system in the configuration shown in Fig. 1(b) using a standard algorithm based on

*QR*decomposition (e.g., see Ref. 7) of the matrix $R(t)$. The two right-most columns of Table II show the three largest Lyapunov exponents, $\Lambda 1\u2265\Lambda 2\u2265\Lambda 3$, of the reservoir system in the autonomous configuration [Fig. 1(b)] for the R1 reservoir (for which climate reproduction succeeds), and for the R2 reservoir (for which climate reproduction fails).

. | Actual Lorenz system . | R1 system . | R2 system . |
---|---|---|---|

$\Lambda 1$ | 0.91 | 0.90 | 0.01 |

$\Lambda 2$ | 0.00 | 0.00 | –0.1 |

$\Lambda 3$ | −14.6 | –10.5 | –9.9 |

. | Actual Lorenz system . | R1 system . | R2 system . |
---|---|---|---|

$\Lambda 1$ | 0.91 | 0.90 | 0.01 |

$\Lambda 2$ | 0.00 | 0.00 | –0.1 |

$\Lambda 3$ | −14.6 | –10.5 | –9.9 |

Comparing the Lyapunov exponents of the Lorenz system (first column of Table II) with those of the R1 reservoir, we see that the largest Lyapunov exponent of the R1 reservoir is a good approximation to the largest Lyapunov exponent of the Lorenz system. Also, consistent with the small value of $\Delta t$, the reservoir dynamics approximates that of a flow for which $\Lambda 2$ should be (and is) approximately zero. On the other hand, we see that the third Lyapunov exponent of the R1 system is less negative than the negative Lyapunov exponent of the true Lorenz system. In contrast with the good agreement of the $\Lambda 1$ values for the Lorenz system and the R1 reservoir, the positive Lyapunov exponent of the Lorenz system fails to be reproduced by the R2 system whose largest Lyapunov exponent is approximately zero; this is consistent with the observation from Fig. 2(b) that the long term reservoir attractor for R2 appears to be a periodic orbit.

The significant conclusion from the above is that the R1 system, as a result of successfully reproducing the climate, can be utilized to obtain an approximation to the positive and zero Lyapunov exponents of the process generating its input. We note, however, that the R1 system does not accurately reproduce the true negative Lyapunov exponent of the Lorenz attractor.

The inaccurate R1 reservoir estimation of $\Lambda 3$, noted above, can be understood by noting that, although the return map in Fig. 3 appears to be a curve, this apparent “curve” must, as noted by Lorenz,^{14} actually have some small width. The R1 reservoir succeeds in *approximating* the attractor of the Lorenz system as reflected by its apparent good reproduction of the return map shown in Fig. 3(a). In order to do this, however, the reservoir need not reproduce the very thin transverse structure within the apparent curve. Since, this very thin structure, as we next discuss, is the primary orbital evidence of the value of $\Lambda 3$, one might not expect the reservoir to accurately reproduce this very negative Lyapunov exponent. Specifically, using the Kaplan-Yorke formula for the information dimension^{30} of the fractal Lorenz attractor, we obtain a dimension of $[2+(\Lambda 1/|\Lambda 3|)]=2.06$, corresponding to 1.06 for the dimension of the structure in the return map [Fig. 3(a)]. This dimension is very close to one, in agreement with the approximate curve-like character of the return map. However, close examination of the return map “curve” of the Lorenz attractor has previously shown that, within its thickness, there is a fractal set of small transverse dimension (presumably $\Lambda 1/|\Lambda 3|=0.06$). On the other hand, the Kaplan-Yorke dimension for the return map for the climate of the R1 reservoir attractor is about 1.09. Since the primary orbital difference reflected by differing values of $\Lambda 3$ is the difference in very thin structure features of the return map that have only a small effect on the climate dynamics, it is not surprising that the R1 reservoir, while giving a good approximation to the true climate of the Lorenz system, gives only a rough approximation of $\Lambda 3$.

## IV. DETERMINING A LARGE NUMBER OF LYAPUNOV EXPONENTS OF A HIGH DIMENSIONAL SPATIOTEMPORAL CHAOTIC SYSTEM FROM DATA

We now consider a modified version of the Kuramoto-Sivashinsky (KS) system defined by the partial differential equation for the function *y*(*x*, *t*)

in the region $0\u2264x<L$ with periodic boundary conditions, $y(x,t)=y(x+L,t)$, and *λ* chosen so that *L* is an integer multiple of *λ*. This equation reduces to the standard KS equation when *μ* = 0. The cosine term makes the equation spatially inhomogeneous. We will subsequently consider the cases *μ* = 0 and $\mu \u22600$ in order to discuss the effect of the symmetries of the KS equation on the learning dynamics of the reservoir computer.

By numerically integrating Eq. (7) on an evenly spaced one-dimensional grid of size *Q*, we obtain a discretized multivariate data set of *Q* time series

As in the case of the Lorenz equations discussed in Sec. III, we consider the situation where we have access to the time series data but do not have information about the dynamical equation that generated the time series. In the absence of a model, we will use the data to train a reservoir computer to emulate the behavior of the true dynamical system, in this case Eq. (7).

The reservoir network is as described in Sec. II with the parameters listed in Table III. In the training phase, Fig. 1(a), we evolve the reservoir according to Eq. (1) from $t=\u2212T$ to *t* = 0. Next, we use Tikhonov regularized regression [see Eq. (3)] to compute the output parameters, **P** such that $Wout(r,P)=Pr\u0303(t)\u2243u(t)$ for $\u2212T\u2264t<0$. Here, $r\u0303$ is a *D _{r}*-dimensional vector such that the $ith$ component of $r\u0303$ is $r\u0303i=ri$ for half the reservoir nodes and $r\u0303i=ri2$ for the remaining half. With the output parameters determined, we let the reservoir evolve autonomously for

*t*> 0 as shown in Fig. 1(b) according to Eq. (4).

Parameter . | Value . | Parameter . | Value . |
---|---|---|---|

D _{r} | 9000 | d | 3 |

T | 20 000 | $\Delta t$ | 0.25 |

$T/\Delta t$ | 80 000 | β | 0.0001 |

ρ | 0.4 | σ | 0.5 |

Parameter . | Value . | Parameter . | Value . |
---|---|---|---|

D _{r} | 9000 | d | 3 |

T | 20 000 | $\Delta t$ | 0.25 |

$T/\Delta t$ | 80 000 | β | 0.0001 |

ρ | 0.4 | σ | 0.5 |

The predictions made by the reservoir system for *t* > 0 are given by, $Wout(r(t),P)$. Figure 4 shows the time evolution of one such reservoir prediction for *t* > 0 (middle panel), along with the true state (top panel) of the KS equation and the deviation (bottom panel) of the reservoir prediction from the true state (i.e., the difference between the top panel and the middle panel) Note that in Fig. 4 time (the horizontal axis) is in units of the Lyapunov time ($\Lambda 1\u22121$, where $\Lambda 1$ is the largest Lyapunov exponent of the KS attractor). We see that the reservoir gives good short term prediction for about 5 multiples of the Lyapunov time. A visual inspection of Fig. 4 suggests that the reservoir prediction may have also learned the correct “climate” of the KS system even after the state of the reservoir dynamical system has diverged from the true state of the KS system.

Figure 5 shows an example of an alternate scenario for another set of the reservoir parameters ($\rho =3.1$, *D _{r}* = 5000 with the rest of the parameters as shown in Table III). In this case, the reservoir still predicts accurately for a short period of time. However, the long term climate of the signal generated by the reservoir is no longer similar to that of the true KS climate.

A more quantitative assessment of the climate reproduction can be obtained by calculating the power spectrum of the reservoir prediction and comparing it with the power spectrum of the training data. Figure 6 shows the power spectrum of the training data, along with the power spectrum of the dynamics of the autonomous reservoir system in Figs. 4 and 5. We see that the reservoir system corresponding to Fig. 4 succeeds in reproducing the training data power spectrum, thus indicating that the long term system orbit reproduces the climate of the training data. On the other hand, the power spectrum of the reservoir system corresponding to Fig. 5 confirms our visual assessment that this reservoir system fails to reproduce the climate of the training data.

Similar to what was done in Sec. III, we use our complete knowledge of the dynamics of the reservoir computer to evaluate its Lyapunov exponents. By independently evaluating the Lyapunov exponents directly from the KS equation, Eq. (7), we obtain the true Lyapunov exponents and compare them with the corresponding Lyapunov exponents of the reservoir dynamical system.

### A. Homogeneous KS system (*μ* = 0)

Figure 7(a) shows the Lyapunov spectrum of the standard (*μ* = 0) KS system with *L* = 60 (red “×” markers), where, by definition, the subscript *k* is such that $\Lambda k\u2265\Lambda k+1$. The Lyapunov exponents of the reservoir trained to emulate this system are shown on the same axes (blue “+” markers). We observe that the positive Lyapunov exponents of the reservoir system match the corresponding exponents of the KS system very well. However, the negative exponents of the two systems do not seem to agree with each other at the first glance. We argue below that the standard KS system has three zero Lyapunov exponents, and we posit that the reservoir is unable to reproduce two of them. Indeed, Fig. 7(b) shows that if we remove the two of the computed exponents closest to zero ($\Lambda 7$ and $\Lambda 8$) for the KS system, the negative Lyapunov exponents of the reservoir system match those of the KS system very well.

We show now that when *μ* = 0 (as for Fig. 7), the standard KS equation (7) has three zero Lyapunov exponents associated with three continuous symmetries, namely time-translation invariance, space-translation invariance, and the so-called Gallilean invariance. Time and space translation invariance imply that if *y*(*x*, *t*) is a solution, then so are $y(x,t+t0)$ and $y(x+x0,t)$. By Gallilean invariance, we mean that for every solution *y*(*x*, *t*) of the KS equation and an arbitrary constant *v*, $y(x\u2212vt,t)+v$ is also a solution. This can be verified by direct substitution in Eq. (7) with *μ* = 0. Replacing *t*_{0}, *x*_{0}, and *v* by differentials ($t0\u2192\delta t0,\u2009x0\u2192\delta x0,\u2009v\u2192\delta v$), we have that, $\delta y(x,t)=\u2202y(x,t)\u2202t\delta t0,\u2009\delta y(x,t)=\u2202y(x,t)\u2202x\delta x0$, and $\delta y(x,t)=[1\u2212t\u2202y(x,t)\u2202x]\delta v$ all represent perturbations, $y(x,t)+\delta y(x,t)$, of Eq. (7) that are, to linear order in the differentials, solutions of Eq. (7). That is, all three of these $\delta y(x,t)$ are solutions of the variational equation, $\delta yt+\delta yyx+y\delta yx+\delta yxx+\delta yxxxx=0$. Furthermore, since the original solution *y*(*x*, *t*) does not decay exponentially to zero, nor increases exponentially to infinity, we conclude that these three expressions for *δy* represent Lyapunov vectors with zero Lyapunov exponents.

To see why the reservoir does not reproduce the Gallilean symmetry-associated zero Lyapunov exponent in the *μ* = 0 case, notice that there is a corresponding conserved quantity $c=\u222by(x,t)dx$. A particular KS system trajectory in phase space is thus restricted to a hypersurface with a constant value of *c* (say, *c *=* c*_{0}). Since the reservoir is trained with data from a single trajectory, it does not learn the dynamics of perturbations that take the trajectory off the *c*_{0} hypersurface. We are not certain why the reservoir does not reproduce both of the other two zero exponents.

### B. Inhomogeneous KS system ($\mu =0.1$)

As a further example that does not have additional symmetries beyond time-translation, we consider (Fig. 8) a KS equation with a nonzero value of *μ* ($L=60,\lambda =15,\mu =0.1$). As before, we train the reservoir using the time series data from the symmetry broken KS equation. After training, we run the reservoir in the autonomous prediction mode [Fig. 1(b)] and calculate its Lyapunov spectrum. Figure 8 shows that the reservoir reproduces the Lyapunov spectrum of the true KS system accurately in this case. Notably, in contrast with the case *μ* = 0, this good agreement is obtained without the need for discarding two zero Lyapunov exponents. We continue to use this modified KS system in the experiments described below.

For the cases shown in Figs. 7(b) and 8, the information dimension of the attractor, as computed from the Kaplan-Yorke conjecture,^{30} is about $DKY\u224815$ (roughly, the value of *k* at which $\u2211j=1k\Lambda j$ first becomes negative). We see from Figs. 7(b) and 8 that the reservoir continues to give reasonable estimates of Λ_{k} even for $k>DKY$. This was somewhat surprising to us, especially in view of the inaccurate reservoir estimate of $\Lambda 3$ in Sec. III.

### C. Effect of measurement noise

We now consider the effect of additive measurement noise on our Lyapunov exponent calculation scheme. We simulate measurement noise by adding a random vector $n(t)$ to the training data set $u(t)$ for all values of *t*. That is, at every time step $\Delta t$, we replace **u** in Eq. (1) by $u+n$, and we replace $vd=u$ used in Eq. (3) by $vd=u+n$. The scalar elements $nj(t)$ of the vector $n(t)$, for each value of *j* and *t*, are independent, identically distributed uniform random variables in the interval $[\u2212\alpha ,\alpha ]$. The constant *α* is chosen so that the RMS value of the noise is *f* times the RMS value of the noise-free signal $u(t)$. Figure 9(a) shows the noise-free time series at a single grid point, while Figs. 9(b) and 9(c) show the same time series with added noise of strength *f* = 0.05 and *f* = 0.2, respectively. We calculate the Lyapunov exponents of the reservoir as described above. Figure 10 shows the Lyapunov spectrum when the noise level *f* is varied from 0.05 to 0.20 along with the true Lyapunov spectrum of the KS equation. We see that the reservoir results for the positive Lyapunov exponents are quite robust to noise for $f\u22640.2$, but that the negative exponents are increasingly depressed to more negative values as *f* increases.

### D. Effect of training data length

We find that the amount of data used to train the reservoir computer can significantly affect the accuracy of the Lyapunov spectrum. The negative Lyapunov exponents are more sensitive to errors than the positive exponents due to insufficient training data. Figure 11 demonstrates this result through a plot of the Lyapunov spectrum of the reservoir trained on varying lengths of data from Eq. (7) with parameters *L* = 60, *λ* = 15, and $\mu =0.1$. In this example, we find that we need a training time series of greater than 20 000 time steps in order to obtain a reasonably accurate estimate of the negative Lyapunov exponents (20 000 time steps equals about 400 multiples of the Lyapunov time ($\Lambda 1\u22121$) which can be considered to be a natural time scale of the KS system).

## V. DISCUSSION AND CONCLUSION

We conclude that a suitably trained reservoir computing system is capable of approximating the ergodic properties of the true system that it was trained on. In the case of the Lorenz equations (Sec. III), our method is successful in calculating the positive and zero Lyapunov exponents with good accuracy. The negative Lyapunov exponent of the true Lorenz system has a high magnitude, and our method is not as successful in accurately calculating the numerical value of this exponent, although it does successfully capture that its magnitude is substantially larger than that of the positive exponent. Remarkably, as shown in Sec. IV for the Kuramoto-Sivashinsky system, it is possible to use the trained reservoir to calculate a large number of positive and negative Lyapunov exponents of a high dimensional spatio-temporal chaotic system with good accuracy.

In Fig. 11, we demonstrated that we can reproduce the Lyapunov exponents of an approximately 15-dimensional attractor from a “training” time series of 40 000 points ($T/\Delta t=40\u2009000$). By contrast, delay coordinate embedding methods that approximate the system Jacobian from the nearest neighbors have been argued to require a time series of length $10D$ or longer^{31,32} (where $D$ is the attractor dimension).

From a more general point of view, our paper suggests that the development of machine learning techniques for model-free analysis of measured data from chaotic systems may be a fruitful subject for further research.

## ACKNOWLEDGMENTS

This work was supported by Grant No. W911NF1210101 from the Army Research Office. We wish to acknowledge useful conversations with Daniel Gauthier, Alexander Hartemink, and Roger Brockett.