Deducing the state of a dynamical system as a function of time from a limited number of concurrent system state measurements is an important problem of great practical utility. A scheme that accomplishes this is called an “observer.” We consider the case in which a model of the system is unavailable or insufficiently accurate, but “training” time series data of the desired state variables are available for a short period of time, and a limited number of other system variables are continually measured. We propose a solution to this problem using networks of neuron-like units known as “reservoir computers.” The measurements that are continually available are input to the network, which is trained with the limited-time data to output estimates of the desired state variables. We demonstrate our method, which we call a “reservoir observer,” using the Rössler system, the Lorenz system, and the spatiotemporally chaotic Kuramoto–Sivashinsky equation. Subject to the condition of observability (i.e., whether it is in principle possible, by any means, to infer the desired unmeasured variables from the measured variables), we show that the reservoir observer can be a very effective and versatile tool for robustly reconstructing unmeasured dynamical system variables.

Knowing the state of a dynamical system as it evolves in time is important for a variety of applications. This paper proposes a general-purpose method for inferring unmeasured state variables from a limited set of ongoing measurements. Our method is intended for situations in which mathematical models of system dynamics are unavailable or are insufficiently accurate to perform the desired inference. We use the machine-learning technique called “reservoir computing,” with which we construct a system-independent means of processing the measurements. A key point is the extent to which this approach is “universal.” That is, our examples show that the same reservoir can be trained to infer the state of different systems. It is the training that relates to a specific system, not the “hardware.” The reservoir hardware plays a similar role to an animal's brain, which retrains itself as the system represented by its body and environment changes.

## I. INTRODUCTION

Frequently, when studying the dynamics of a physical system, one only has access to a limited set of measurements of the state variables and desires to deduce concurrent values of unmeasured state variables. In principle, it might be possible to accomplish this goal if, in addition to the measurements, one also has knowledge of the system dynamics. In control theory, a successful deduction method of this type is called an *observer*. Observers are of great utility for control and prediction of dynamics. The observer problem for the case in which the dynamical system is linear was fully solved in the classic work of Kalman, who also formulated conditions for “observability” under which it is possible to achieve the goal of deducing the full state of a linear system from a given partial set of state measurements (see textbooks on control theory, e.g., Ref. 1). Observers and observability have also been extensively investigated for nonlinear dynamical systems (e.g., Ref. 2). For example, in situations of chaotic dynamics, an approach using synchronization of chaos has been exploited.^{3,4} A concept related to observability, called “estimability” has been discussed in the context of delay coordinate embedding.^{5}

In this paper we consider the observer problem for situations in which one does not have a sufficiently accurate mathematical model of the nonlinear system of interest. In place of such a model, we assume that there exists an initial period of time for which measurements of all the desired system variables are available, and we seek to use these measurements in the initial period of time to deduce the full set of desired variables for the subsequent time, for which we assume that measurements of only a limited subset of the desired variables are possible. Our method utilizes a machine learning technique, called *reservoir computing* (see Ref. 6). This technique employs an input/output neural network with randomly generated parameters, and uses linear regression to choose “output weights” that fit the raw network output to a set of “training data.” We use the data from the initial period of full measurement as the training data. Then we continue to input the subsequent partial set of continually measured variables, and regard the weighted network output as the estimated current values of the variables that are no longer measured. (We emphasize that the goal we address is the inference of unmeasured state variables rather than their prediction.)

The main result of this paper is that this kind of “reservoir observer,” subject to certain limitations, can be extremely effective. In what follows we first describe a specific illustrative implementation and review relevant reservoir computing concepts. We then discuss applications to three examples of chaotic systems that highlight the strength and limitations of our method: (1) the Rössler system,^{7} for which we have done an intensive study of how results depend on design parameters of the reservoir observer; (2) the Lorenz system,^{8} which we use to illustrate an instance of the issue of observability for our method; and (3) the Kuramoto–Sivashinsky partial differential equation,^{9–11} which we use to illustrate the possible effectiveness of our method in cases of spatiotemporal chaos.

## II. SETUP

We consider a dynamical system $d\varphi /dt=f(\varphi )$ together with a pair of $\varphi $-dependent, vector valued variables, $u=h1(\varphi )\u2208\mathbb{R}M$ and $s=h2(\varphi )\u2208\mathbb{R}P$. We are interested in the situation in which **u** and **s** can both be measured over a specific period, [0, *T*], but that only **u** can be measured from that time forward; we seek a method for using the continued knowledge of **u** to determine an estimate of **s** as a function of time when direct measurement of **s** is not available, *t* > *T*. In contrast with most of the engineering literature devoted to problems of this kind, we do not assume knowledge of **f** but rather seek to infer the necessary information from the trajectories recorded on the interval [0, *T*].

For this purpose we use “reservoir computing,”^{6} which has previously been advocated for application to many tasks (e.g., prediction of time series, pattern recognition, etc.). There are many variations in implementation; in this paper we adopt the reservoir technique proposed by Jaeger and Haas.^{12} The reservoir computer has three components (Fig. 1), a linear input layer with *M* input nodes (one for each component of **u**), a recurrent, nonlinear reservoir network with *N* dynamical reservoir nodes whose state vector is $r\u2208\mathbb{R}N$, and a linear output layer with *P* output nodes, as shown in Fig. 1. For the specific reservoir computing implementation we use in this paper, the reservoir dynamics is defined as

where we assume the time step to be small Δ*t* ≪ 1, **A** is the (typically sparse) weighted adjacency matrix of the reservoir layer, and the *M*-dimensional input **u**(*t*) is fed in to the *N* reservoir nodes via a linear input weight matrix denoted by $Win\u2208\mathbb{R}N\xd7M$. (In Secs. III A and III B, *M* = 1 and the input **u** is a scalar, while, in Sec. III C, *M* ≥ 1.) The parameter 0 < *α* ≤ 1 is a “leakage” rate^{13} that makes the reservoir evolve more slowly as *α* → 0. We also use a bias term *ξ***1**, where **1** denotes a vector of ones and *ξ* is a scalar constant. The notation $tanh(\xb7)$ with a vector argument is defined as the vector whose components are the hyperbolic tangents of the corresponding components of the argument vector. The output, which is a *P*-dimensional vector, is taken to be a linear function of the reservoir state

As compared with other artificial neural network approaches, the advantage of reservoir computing is that training is made computationally feasible for relatively large *N*, since only the output weights **W**_{out} and the vector **c** are adjusted by the training process. (The input weight matrix **W**_{in} and the reservoir adjacency matrix **A** are initially randomly drawn and then fixed.) A key point is that the reservoir layer serves as an active medium driven by inputs **u**(*t*) where each reservoir node has a different nonlinear response to its inputs, so that for *N* ≫ 1 we can hope that almost a wide variety of desired outputs can be approximated by a linear combination of the *N*-dimensional reservoir nodal response states.

In addition to the parameters Δ*t*, *α*, and *ξ* in Eq. (1), and the reservoir size *N*, the reservoir dynamics depend on the parameters *D*, *ρ*, and *σ*, which govern the random generation of **A** and **W**_{in} as follows. The adjacency matrix **A** is built from a sparse random Erdős–Rényi matrix in which the fraction of nonzero matrix elements is *D*/*N*, so that the average degree of a reservoir node is *D*. The values of non-zero elements are randomly drawn independently from a uniform distribution between −1 and 1. We then uniformly rescale all the elements of **A** (i.e., multiply **A** by a positive scalar) so that the largest value of the magnitudes of its eigenvalues becomes *ρ*, which we refer to as the “spectral radius” of **A**. For the input layer, the *i-*th of the *M* input signals is connected to *N*/*M* reservoir nodes with connection weights in the *i-*th column of **W**_{in}. Each reservoir node receives input from exactly one input signal. The non-zero elements of **W**_{in} are randomly chosen from a uniform distribution in [–*σ*, *σ*]. (We emphasize that this choice of structure for **W**_{in} is fairly arbitrary and that other choices could reasonably be employed, e.g., every node could receive a mixture of several inputs.)

For the convenience of comparing the reservoir performances, we preprocess all the components of **u**(*t*) and **s**(*t*) so that they have zero mean and unit variance. Starting from a random initial state **r**(–*τ*), the reservoir evolves following Eq. (1) with input **u**(*t*). Here *τ* is a transient time, chosen large enough to make the reservoir state essentially independent of its initial state by time *t* = 0. We then record the *K* = *T*/Δ*t* reservoir states for 0 < *t* ≤ *T*

and the concurrent measurements of the state variables that are unmeasured for *t* > *T*

We then train the network by choosing the output layer quantities **W**_{out} and **c** by choosing them so that the reservoir output approximates the measurement for 0 < *t* ≤ *T*. We do this by minimizing the following quadratic form with respect to **W**_{out} and **c**

where $\Vert q\Vert 2=qTq$ for **q** a vector. The second term of Eq. (5), $\beta [Tr(WoutWoutT)]$, is a regularization term included to avoid overfitting **W**_{out}, where *β* > 0 (typically a small number) is the “ridge regression parameter.”

If the training is successful, the readout of the reservoir output should yield a good approximation (denoted $s\u0302(t)$) to the desired unmeasured quantity **s**(*t*) for *t* > *T*. Referring to Eq. (2)

where $Wout\u2217$ and $c*$ denote the solutions for the minimizers of Eq. (5)

where **I** is the *N* × *N* identity matrix, *δ***R** (respectively, *δ***S**) is the matrix whose *k*th column is $r(k\Delta t)\u2212r\xaf$ (respectively, $s(k\Delta t)\u2212r\xaf$).

We remark that Eq. (1) represents a special choice for the form of the reservoir that is convenient for our purposes. More generally, Eq. (1) can be expressed as

and other forms of **g**, different from the choice Eq. (1), have been employed for reservoir methods designed to implement goals different from the observer goal that we address here. For example, experimental reservoir implementations have been reported where the dynamics Eq. (10) were from an optical network of semiconductor lasers,^{14} a delay system with a single nonlinear node,^{15} a field-programmable gate array (FPGA),^{16} phase-delay electro-optic devices,^{17} and even a bucket of water,^{18} among others. Such choices might also work for a reservoir-based observer and may offer advantages such as the potential for huge increase in speed.^{17} The main requirement on the observer dynamics Eq. (10) seems to be that it is sufficiently complex and that the dimension of the reservoir state vector **r** is sufficiently large that the output Eq. (2) can be made to approximate the desired time series (for our goal, **s**(*t*)) by adjustment of **W**_{out} and **c**.

## III. EXAMPLES

In this section, we illustrate the reservoir-based observer method of Sec. II using three chaotic dynamical systems: the Rössler system, the Lorenz system, and the Kuramoto–Sivashinsky equations.

### A. Rössler system

We now test the observing method given in Sec. II on a chaotic Rössler system

where *a* = 0.5, *b* = 2.0, and *c* = 4.0. We choose one of the coordinates of the system as the “continually available” measurement **u**(*t*) defined in Sec. II (i.e., *M* = 1). The other two coordinates are taken to be the components of **s**(*t*), knowledge of which is only available for the training phase 0 ≤ *t* ≤ *T*.

We find that a trained reservoir computing network with input from any one of these three coordinates, *x*, *y*, and *z*, can generate output that accurately agrees with the two remaining coordinates. As shown in Fig. 2, a reservoir computer whose preprocessed input is

can accurately output the similarly defined versions ($y\u0303(t),\u2009z\u0303(t)$) of (*y*(*t*), *z*(*t*)) for *t* > *T*, i.e., $u(t)=x\u0303(t),\u2009s(t)=[y\u0303(t),z\u0303(t)]T$. (In Eq. (14) the angle brackets denote time average.)

For the results in Fig. 2 we utilize the following set of reservoir parameters (defined in Sec. II):

For this set of parameters the difference between the inferred (thin line in Figs. 2(b) and 2(c)) and actual (dashed line for Figs. 2(b) and 2(c)) values of $y\u0303(t)$ and $z\u0303(t)$ is very small.

We examined how the root mean square (RMS) error of the inferred values varies with the reservoir parameters. Results of this type are given in Figs. 3(a)–3(f) where we plot the RMS error over 500 time units in the inference of $y\u0303(t)$ from the measurement of $x\u0303(t)$ versus one of the parameters (*ρ*, *D*, *σ*, *ξ*, *α*, or *N*), with the other parameters held fixed as given in Eq. (15). Each plotted point in these figures is the median of 100 trials using the same signals $x\u0303(t)$ and $y\u0303(t)$, with each trial using an independent random realization of the reservoir system, and the error bars indicate the range in the performance of the observer from the second to the third quartile. Except for plots (b) and (f), the same 100 random choices are used for each value of the parameter. For example, Fig. 3(a) is a plot of the RMS error versus *ρ*. There is a wide range for which the error is small ($\u227210\u22122$), thus demonstrating that good observer performance is robust to significant changes in *ρ*. The plots in Figs. 3(a)–3(f) show that good performance is robust to changes in the other parameters, as well.

Motivated by the frequent consideration^{19–21} of time delayed measurements for the analysis of experimental data from chaotic systems, we have also tested incorporation of time delayed versions of the input into the output by adding the term

into the reservoir output Eq. (2) and correspondingly also into the expression given by Eq. (5). We then minimize the expression in Eq. (5) over **W**_{out}, **c**, and **d** = (*d*_{1}, *d*_{2},…, *d _{K}*)

^{T}to obtain our approximation $s\u0302$ for

**s**. An illustrative result is shown in Fig. 4(a), where we plot the median RMS error in $y\u0303$ inferred from $x\u0303(t)$ as a function of

*N*for the two cases where Eq. (16) is included (

*K*= 20 for Fig. 4(a)) and not included. We find that including delays does not significantly improve performance for

*N*> 100, but does make some improvement when the number of reservoir nodes

*N*is small and the error is relatively larger. For example, if one requires the error to be less than about 3 × 10

^{−2}, then one can use as few as

*N*∼ 20 reservoir nodes when the delays are included, while, without delays,

*N*∼ 50 is required. We note that for

*N*= 0 (no reservoir), the procedure reduces to a standard linear autoregressive approximation of

**s**. We also tested other values of

*K*and found that the quantitative results were very much the same as for

*K*= 20 (plotted as open circles in Fig. 4(a)) as long as

*K*> 5.

Figure 4(b) shows the effect of observational noise in the input $x\u0303(t)$ and the training data $y\u0303(t)$, where, at every time step Δ*t* = 0.1, observational noise was simulated by adding an independent random number with uniform distribution between –*η* and +*η* to each “measurement.” We see that the noise has little influence on the error for *η* ≤ 10^{−4}.

### B. Lorenz system

In this section we test our method on the Lorenz system. This example provides insight into how the issue of “observability” manifests itself for our method.

The Lorenz system is described by the equations

where *a* = 10, *b* = 28, and *c* = 8/3. As in Sec. III A, the time series *x*(*t*), *y*(*t*), and *z*(*t*) are available during the training phase, 0 < *t* < *T*. Also, as in Sec. III A, time series are pre-processed as in Eq. (14) to give $x\u0303(t),y\u0303(t)$, and $z\u0303(t)$.

First we consider the case where $u(t)=x\u0303(t)$. After the training phase is over, the reservoir can only access the series $x\u0303(t)$ and is able to infer $y\u0303(t)$ and $z\u0303(t)$ accurately as shown in Fig. 5. Clearly, there is no observability problem for $u(t)=x\u0303(t)$.

However, if $u(t)=z\u0303(t)$, then observability does not apply, i.e., it is not possible, even in principle, to infer $x\u0303(t)$ from $z\u0303(t)$. To see this, we note that the Lorenz system is invariant under the transformation $(x,y,z)\u2192(\u2212x,\u2212y,z)$. Because of this symmetry, any given time series measurement *z*_{0}(*t*) can correspond to either one of the two possible state time series on the attractor $(x0(t),y0(t),z0(t))$ and $(\u2212x0(t),\u2212y0(t),z0(t))$. This results in the system's “non-observability” when only the *z* coordinate is measured. Figure 6(a) shows that with $z\u0303(t)$ as input the reservoir output for $x\u0303(t)$ (solid line) completely fails to follow the actual $x\u0303(t)$ trajectory (dashed line). A similar result applies for $y\u0303(t)$.

However, since the ambiguity in the state lies purely in the sign of the *x* and *y* variables, it should be possible, in principle, to infer the state variables (*x*^{2}, *y*^{2}) unambiguously given measurements of the *z* variable. We show in Fig. 6(b) that this is indeed the case.

Another way to demonstrate the interplay, in this example, between the symmetry of the system and observability is to modify the Lorenz system slightly in a way that breaks the symmetry. The modified set of Lorenz equations is given by Eq. (17), but with an extra term *x*(*t*) added to the right hand side of the third equation (i.e., the equation for *dz*/*dt*). With this change, a visualization of the attractor (not shown) in (*x*, *y*, *z*) state space shows that the attractor is slightly perturbed from the picture obtained for the original Lorenz system, now with the size and orientations of the two lobes (wings of Lorenz's “butterfly attractor”) being slightly different from each other. As shown in Fig. 7, the reservoir is now able to infer the *x* and *y* state variables from measurement of the *z* variable.

It is interesting to note that, while, as shown in Fig. 5, inference of *z* when either *x* or *y* is given can work well, and inference of *z* completely fails when *ξ* = 0. The reason for this is interesting. Since the hyperbolic tangent in Eq. (1) is an odd function, when *ξ* = 0, the composite system consisting of reservoir plus the Lorenz equations has the overall symmetry $(x,y,z,r)\u2192(\u2212x,\u2212y,z,\u2212r)$. Thus, we see that, although (*x*, *y*) and (–*x*, –*y*) may correspond to the same value of *z*, this is incompatible with the linear output dependence hypothesized in Eq. (2), since **r** → –**r** would necessarily change the output estimate of *z*. However, if *ξ* ≠ 0 the symmetry of the composite Lorenz-reservoir system is broken, and inference of *z* becomes potentially possible. (Even with *ξ* = 0, another alternative, which we have found to work, is to introduce terms quadratic in **r** into Eq. (2).)

We remark that the symmetry mechanism we focused on in this subsection is meant only as one particular example of the very many ways that nonobservability can arise.

### C. Kuramoto–Sivashinsky equations

We now investigate the possibility of using the reservoir-based observation method of Sec. II to infer estimates of the state of a spatiotemporally chaotic system from spatially sparse measurements without the use of a model.

For this purpose, we test our model-free observation method on simulated data from the Kuramoto–Sivashinsky equation^{22} for the scalar variable *y*(*x*, *t*)

We impose spatially periodic boundary conditions, i.e., $y(x+\u2009L,t)=y(x,t)$ on a domain ${x\u2208(0,L)}$ of size *L* = 22 and integrate Eq. (18) from a randomly chosen initial condition. The integration was performed on an evenly spaced grid of size *Q* = 65. The simulated data consist of *Q* time series with a time step Δ*t* = 0.25 units. We denote this set of time series at the *Q* grid points by the vector $y(t)=(y1(t),y2(t),\u2026,yQ(t))T$ with $yi(t)=y(i\Delta x),\Delta x=L/(Q\u22121)$. Let $u(t)=(u1(t),u2(t),\u2026uM(t))T$ be a vector containing *M* out of the *Q* time series in **y**(*t*). In terms of the variables **s**(*t*) and **u**(*t*) introduced in Sec. II, the vector **u**(*t*) represents spatially sparse measurements performed at evenly spaced points on the grid. We denote the rest of the time series by **s**(*t*). We will vary the number of measurements *M* in the interval 1 ≤ *M* ≤ 8. We assume that we have access to the full set of state measurements **y**(*t*) for the “training period,” 0 ≤ *t* ≤ *T*. We further assume that after the training period, i.e., for *t* > *T*, the observer can only access the partial set of system state variables **u**(*t*). The goal is to infer the set of variables **s**(*t*) from the partial measurements **u**(*t*) for *t* > *T*.

The reservoir observer setup is described in Sec. II with the identification *Q* = *M* + *P*. For the results in this subsection, we use the following set of reservoir parameters:

To test the quality of the inference for *t* ≥ *T*, we compare the inferred data from the reservoir $s\u0302(t)$ with the data series **s**(*t*) obtained from integrating Eq. (18). We calculate the correlation between the real and inferred data, defined by

and the spatially averaged RMS error, defined by

Since the performance of the reservoir may depend on the particular random instance of the reservoir network that is used, we report the mean RMS error and the correlation between the inferred and actual data obtained from 20 observer trials each performing the inference task on the same data using an independent random realization of the reservoir observer setup. As a baseline comparison, we also report the RMS error and correlation coefficient values for an inference of the unmeasured variables obtained by cubic spline interpolation from the measured variables. Figure 8(a) shows the correlation between the reservoir inference and the actual data as we vary the number of measured variables. The correlation coefficient for the reservoir observer inference is compared with the corresponding value for the cubic spline interpolation scheme. Figure 8(b) shows the RMS error obtained by the reservoir setup as we vary the number of measured variables. These figures demonstrate that, as expected, spline interpolation yields good results at high measurement densities, but that, at lower measurement densities, where spline interpolation yields poor results, the reservoir observer can continue to function well. Figure 9 shows results for *M* = 2 inputs (Figs. 9(a)–9(c)) and *M* = 4 inputs (Figs. 9(d)–9(f)) for the actual data (Figs. 9(a) and 9(d)) and the reservoir inference (Figs. 9(b) and 9(e)), together with the difference between the inferred and the actual data (Figs. 9(c) and 9(f)). Even for *M* = 2 inputs the error is quite low in most of the plot (Figs. 9(c)) with bursts of error occurring intermittently in time and space.

## IV. CONCLUSIONS

In this paper we investigate the application of reservoir computing to infer unmeasured state variables of a chaotic dynamical system from a limited set of continually measured state variables for situations in which a mathematical model of the dynamical system is unavailable or is insufficiently accurate. Our main results are as follows.

Extremely accurate results can be robustly obtained over a wide range of reservoir parameters as shown in Sec. III A for the Rössler system.

The issue of “observability” and the limitation it implies for achieving our goal was investigated in an example (Sec. III B).

Use of reservoir computers for inference of spatiotemporally chaotic states from spatially sparse data was proposed and validated by application to an example (Sec. III C).

## ACKNOWLEDGMENTS

We thank D. Gauthier and A. Hartemink for their useful discussions. This work was supported by the Army Research Office under Grant No. W911NF-12-1-0101.