Machine learning has become a widely popular and successful paradigm, especially in data-driven science and engineering. A major application problem is data-driven forecasting of future states from a complex dynamical system. Artificial neural networks have evolved as a clear leader among many machine learning approaches, and recurrent neural networks are considered to be particularly well suited for forecasting dynamical systems. In this setting, the echo-state networks or reservoir computers (RCs) have emerged for their simplicity and computational complexity advantages. Instead of a fully trained network, an RC trains only readout weights by a simple, efficient least squares method. What is perhaps quite surprising is that nonetheless, an RC succeeds in making high quality forecasts, competitively with more intensively trained methods, even if not the leader. There remains an unanswered question as to why and how an RC works at all despite randomly selected weights. To this end, this work analyzes a further simplified RC, where the internal activation function is an identity function. Our simplification is not presented for the sake of tuning or improving an RC, but rather for the sake of analysis of what we take to be the surprise being not that it does not work better, but that such random methods work at all. We explicitly connect the RC with linear activation and linear readout to well developed time-series literature on vector autoregressive (VAR) averages that includes theorems on representability through the Wold theorem, which already performs reasonably for short-term forecasts. In the case of a linear activation and now popular quadratic readout RC, we explicitly connect to a nonlinear VAR, which performs quite well. Furthermore, we associate this paradigm to the now widely popular dynamic mode decomposition; thus, these three are in a sense different faces of the same thing. We illustrate our observations in terms of popular benchmark examples including Mackey–Glass differential delay equations and the Lorenz63 system.

1.
G.
Amisano
and
C.
Giannini
,
Topics in Structural VAR Econometrics
(
Springer Science & Business Media
,
2012
).
2.
P.
Antonik
,
M.
Gulina
,
J.
Pauwels
, and
S.
Massar
, “
Using a reservoir computer to learn chaotic attractors, with applications to chaos synchronization and cryptography
,”
Phys. Rev. E
98
(
1
),
012215
(
2018
).
3.
H.
Arbabi
and
I.
Mezic
, “
Ergodic theory, dynamic mode decomposition, and computation of spectral properties of the Koopman operator
,”
SIAM J. Appl. Dyn. Syst.
16
(
4
),
2096
2126
(
2017
).
4.
W. E.
Arnoldi
, “
The principle of minimized iterations in the solution of the matrix eigenvalue problem
,”
Q. Appl. Math.
9
(
1
),
17
29
(
1951
).
5.
C. A. L.
Bailer-Jones
,
D. J. C.
MacKay
, and
P. J.
Withers
, “
A recurrent neural network for modelling dynamical systems
,”
Netw. Comput. Neural Syst.
9
(
4
),
531
547
(
1998
).
6.
T. G.
Barbounis
,
J. B.
Theocharis
,
M. C.
Alexiadis
, and
P. S.
Dokopoulos
, “
Long-term wind speed and power forecasting using local recurrent neural network models
,”
IEEE Trans. Energy Convers.
21
(
1
),
273
284
(
2006
).
7.
E.
Bollt
, “Regularized kernel machine learning for data driven forecasting of chaos,”
Annual Review of Chaos Theory, Bifurcations and Dynamical Systems
9
,
1
26
(
2020
).
8.
E.
Bollt
, “Geometric considerations of a good dictionary for Koopman analysis of dynamical systems,” arXiv:1912.09570 (2019).
9.
E. M.
Bollt
, “
Model selection, confidence and scaling in predicting chaotic time-series
,”
Int. J. Bifurcation Chaos
10
(
06
),
1407
1422
(
2000
).
10.
E. M.
Bollt
,
L.
Billings
, and
I. B.
Schwartz
, “
A manifold independent approach to understanding transport in stochastic dynamical systems
,”
Physica D
173
(
3–4
),
153
177
(
2002
).
11.
E. M.
Bollt
,
Q.
Li
,
F.
Dietrich
, and
I.
Kevrekidis
, “
On matching, and even rectifying, dynamical systems through Koopman operator eigenfunctions
,”
SIAM J. Appl. Dyn. Syst.
17
(
2
),
1925
1960
(
2018
).
12.
E. M.
Bollt
and
N.
Santitissadeekorn
,
Applied and Computational Measurable Dynamics
(
SIAM
,
2013
).
13.
G. E. P.
Box
, , Time Series Analysis: Forecasting and Control, Holden-Day Series in Time Series Analysis, edited by G. E. P. Box, G. M. Jenkins, and G. C. Reinsel (Holden-Day, San Francisco, CA, 1994), pp. 199-201.
14.
S.
Boyd
and
L.
Chua
, “
Fading memory and the problem of approximating nonlinear operators with Volterra series
,”
IEEE Trans. Circuits Syst.
32
(
11
),
1150
1161
(
1985
).
15.
S. L.
Brunton
and
J.
Nathan Kutz
,
Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control
(
Cambridge University Press
,
2019
).
16.
M.
Budišić
,
R.
Mohr
, and
I.
Mezić
, “
Applied Koopmanism
,”
Chaos
22
(
4
),
047510
(
2012
).
17.
M.
Buehner
and
P.
Young
, “
A tighter bound for the echo state property
,”
IEEE Trans. Neural Netw.
17
(
3
),
820
824
(
2006
).
18.
D.
Canaday
,
A.
Griffith
, and
D. J.
Gauthier
, “
Rapid time series prediction with a hardware-based reservoir computer
,”
Chaos
28
(
12
),
123119
(
2018
).
19.
T. L.
Carroll
and
L. M.
Pecora
, “
Network structure effects in reservoir computers
,”
Chaos
29
(
8
),
083130
(
2019
).
20.
A.
Chattopadhyay
,
P.
Hassanzadeh
,
D.
Subramanian
, and
K.
Palem
, “Data-driven prediction of a multi-scale Lorenz 96 chaotic system using a hierarchy of deep learning methods: Reservoir computing, ANN, and RNN-LSTM,” arxiv:1906.08829 (2019).
21.
J.-F.
Chen
,
W.-M.
Wang
, and
C.-M.
Huang
, “
Analysis of an adaptive time-series autoregressive moving-average (ARMA) model for short-term load forecasting
,”
Electr. Power Syst. Res.
34
(
3
),
187
196
(
1995
).
22.
E.
Choi
,
A.
Schuetz
,
W. F.
Stewart
, and
J.
Sun
, “
Using recurrent neural network models for early detection of heart failure onset
,”
J. Am. Med. Inform. Assoc.
24
(
2
),
361
370
(
2017
).
23.
J.
Connor
,
L. E.
Atlas
, and
D. R.
Martin
, “Recurrent networks and NARMA modeling,” in
Proceedings of the 4th International Conference on Neural Information Processing Systems (Advances in Neural Information Processing Systems, 1992)
, pp. 301–308.
24.
D.
Darmon
,
C. J.
Cellucci
, and
P. E.
Rapp
, “
Information dynamics with confidence: Using reservoir computing to construct confidence intervals for information-dynamic measures
,”
Chaos
29
(
8
),
083113
(
2019
).
25.
P.
De Wilde
,
Neural Network Models: An Analysis
(
Springer
,
1996
).
26.
J.
Doyne Farmer
, “
Chaotic attractors of an infinite-dimensional dynamical system
,”
Physica D
4
(
3
),
366
393
(
1982
).
27.
M.
Farzad
,
H.
Tahersima
, and
H.
Khaloozadeh
, “Predicting the Mackey glass chaotic time series using genetic algorithm,” in 2006 SICE-ICASE International Joint Conference (IEEE, 2006), pp. 5460–5463.
28.
K.-I.
Funahashi
and
Y.
Nakamura
, “
Approximation of dynamical systems by continuous time recurrent neural networks
,”
Neural Netw.
6
(
6
),
801
806
(
1993
).
29.
C.
Gallicchio
, “Chasing the echo state property,” arXiv:1811.10892 (2018).
30.
D. J.
Gauthier
, “
Reservoir computing: Harnessing a universal dynamical system
,”
Phys. Rev. Lett.
120
(
2018
),
024102
(
2018
).
31.
G. H.
Golub
and
C. F.
Van Loan
,
Matrix Computations
, 4th ed. (
Johns Hopkins
,
2013
).
32.
L.
Gonon
and
J.-P.
Ortega
, “
Reservoir computing universality with stochastic inputs
,”
IEEE Trans. Neural Netw. Learn. Syst.
31
(
1
),
100
112
(
2019
).
33.
L.
Gonon
and
J.-P.
Ortega
, “Fading memory echo state networks are universal,” arXiv:2010.12047 (2020).
34.
A.
Griffith
,
A.
Pomerance
, and
D. J.
Gauthier
, “
Forecasting chaotic systems with very low connectivity reservoir computers
,”
Chaos
29
(
12
),
123108
(
2019
).
35.
B. J.
Grzyb
,
E.
Chinellato
,
G. M.
Wojcik
, and
W. A.
Kaminski
, “Which model to use for the liquid state machine?,” in 2009 International Joint Conference on Neural Networks (IEEE, 2009), pp. 1018–1024.
36.
M.
Han
,
Z.-W.
Shi
, and
W.
Guo
, “
Reservoir neural state reconstruction and chaotic time series prediction
,”
Acta Phys. Sin.
56
(
1
),
43
50
(
2007
).
37.
L.
Harrison
,
W. D.
Penny
, and
K.
Friston
, “
Multivariate autoregressive modeling of FMRI time series
,”
Neuroimage
19
(
4
),
1477
1491
(
2003
).
38.
A.
Hart
,
J.
Hook
, and
J.
Dawes
, “
Embedding and approximation theorems for echo state networks
,”
Neural Netw.
128
,
234
247
(
2020
).
39.
A. G.
Hart
,
J. L.
Hook
, and
J. H. P.
Dawes
, “Echo state networks trained by Tikhonov least squares are l2 (μ) approximators of ergodic dynamical systems,” arXiv:2005.06967 (2020).
40.
D.
Hartman
and
L. K.
Mestha
, “A deep learning framework for model reduction of dynamical systems,” in 2017 IEEE Conference on Control Technology and Applications (CCTA) (IEEE, 2017), pp. 1917–1922.
41.
S.
Hochreiter
and
J.
Schmidhuber
, “
Long short-term memory
,”
Neural Comput.
9
(
8
),
1735
1780
(
1997
).
42.
J.-Q.
Huang
and
F. L.
Lewis
, “
Neural-network predictive control for nonlinear dynamic systems with time-delay
,”
IEEE Trans. Neural Netw.
14
(
2
),
377
389
(
2003
).
43.
H.
Jaeger
, “The “echo state” approach to analysing and training recurrent neural networks-with an erratum note,” German National Research Centre for Information Technology, GMD Technical Reports, Bonn, Germany, 2001, Vol. 148, p. 13.
44.
H.
Jaeger
and
H.
Haas
, “
Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication
,”
Science
304
(
5667
),
78
80
(
2004
).
45.
J.
Jiang
and
Y.-C.
Lai
, “
Model-free prediction of spatiotemporal dynamical systems with recurrent neural networks: Role of network spectral radius
,”
Phys. Rev. Res.
1
(
3
),
033056
(
2019
).
46.
M. B.
Kennel
and
S.
Isabelle
, “
Method to distinguish possible chaos from colored noise and to determine embedding parameters
,”
Phys. Rev. A
46
(
6
),
3111
(
1992
).
47.
M.
Kimura
and
R.
Nakano
, “
Learning dynamical systems by recurrent neural networks from orbits
,”
Neural Netw.
11
(
9
),
1589
1599
(
1998
).
48.
S. N..
Kumpati
,
P.
Kannan
et al., “
Identification and control of dynamical systems using neural networks
,”
IEEE Trans. Neural Netw.
1
(
1
),
4
27
(
1990
).
49.
J.
Nathan Kutz
,
S. L.
Brunton
,
B. W.
Brunton
, and
J. L.
Proctor
,
Dynamic Mode Decomposition: Data-Driven Modeling of Complex Systems
(
SIAM
,
2016
).
50.
M.
Längkvist
,
L.
Karlsson
, and
A.
Loutfi
, “
A review of unsupervised feature learning and deep learning for time-series modeling
,”
Pattern Recognit. Lett.
42
,
11
24
(
2014
).
51.
A.
Lasota
and
M. C.
Mackey
,
Chaos, Fractals, and Noise: Stochastic Aspects of Dynamics
(
Springer Science & Business Media
,
2013
), Vol. 97.
52.
D. S.
Levine
,
Introduction to Neural and Cognitive Modeling
(
Routledge
,
2018
).
53.
Q.
Li
,
F.
Dietrich
,
E. M.
Bollt
, and
I. G.
Kevrekidis
, “
Extended dynamic mode decomposition with dictionary learning: A data-driven adaptive spectral decomposition of the Koopman operator
,”
Chaos
27
(
10
),
103111
(
2017
).
54.
A.
Lichtenberg
, “
J and Lieberman Ma 1983 regular and stochastic motion
,”
Appl. Math. Sci.
38
,
85
(
1983
).
55.
E. N.
Lorenz
, “
Deterministic nonperiodic flow
,”
J. Atmos. Sci.
20
(
2
),
130
141
(
1963
).
56.
Z.
Lu
,
B. R.
Hunt
, and
E.
Ott
, “
Attractor reconstruction by machine learning
,”
Chaos
28
(
6
),
061104
(
2018
).
57.
M.
Lukoševičius
, “A practical guide to applying echo state networks,” in Neural Networks: Tricks of the Trade (Springer, 2012), pp. 659–686.
58.
M.
Lukoševičius
and
H.
Jaeger
, “
Reservoir computing approaches to recurrent neural network training
,”
Comput. Sci. Rev.
3
(
3
),
127
149
(
2009
).
59.
H.
Lütkepohl
,
New Introduction to Multiple Time Series Analysis
(
Springer Science & Business Media
,
2005
).
60.
W.
Maass
,
T.
Natschläger
, and
H.
Markram
, “
Real-time computing without stable states: A new framework for neural computation based on perturbations
,”
Neural Comput.
14
(
11
),
2531
2560
(
2002
).
61.
M. C.
Mackey
and
L.
Glass
, “
Oscillation and chaos in physiological control systems
,”
Science
197
(
4300
),
287
289
(
1977
).
62.
S.
Marsland
,
Machine Learning: An Algorithmic Perspective
(
CRC Press
,
2015
).
63.
D.
Michie
,
D. J.
Spiegelhalter
,
C. C.
Taylor
et al., “
Machine learning
,”
Neural Stat. Classification
13
(
1994
),
1
298
(
1994
).
64.
M. R.
Muldoon
,
D. S.
Broomhead
,
J. P.
Huke
, and
R.
Hegger
, “
Delay embedding in the presence of dynamical noise
,”
Dyn. Stab. Syst.
13
(
2
),
175
186
(
1998
).
65.
M. M.
Nelson
and
W. T.
Illingworth
, “A practical guide to neural nets,” 1991; available at https://www.osti.gov/biblio/5633084.
66.
S.
Ortín González
,
M. C.
Soriano
,
L.
Pesquera González
,
D.
Brunner
,
D.
San Martín Segura
,
I.
Fischer
,
C.
Mirasso
,
J. M.
Gutiérrez Llorente
et al., “A unified framework for reservoir computing and extreme learning machines based on a single time-delayed neuron,”
Sci. Rep.
5
,
14945
(
2015
).
67.
N. H.
Packard
,
J. P.
Crutchfield
,
J. D.
Farmer
, and
R. S.
Shaw
, “
Geometry from a time series
,”
Phys. Rev. Lett.
45
(
9
),
712
(
1980
).
68.
S.
Madhavrao Pandit
,
S.-M.
Wu
et al.,
Time Series and System Analysis with Applications
(
Wiley
,
New York
,
1983
), Vol. 3.
69.
R.
Pascanu
,
T.
Mikolov
, and
Y.
Bengio
, “On the difficulty of training recurrent neural networks,” in International Conference on Machine Learning (PMLR, 2013), pp. 1310–1318.
70.
J.
Pathak
,
B.
Hunt
,
M.
Girvan
,
Z.
Lu
, and
E.
Ott
, “
Model-free prediction of large spatiotemporally chaotic systems from data: A reservoir computing approach
,”
Phys. Rev. Lett.
120
(
2
),
024102
(
2018
).
71.
J.
Paulsen
and
D.
Tjøstheim
, “
On the estimation of residual variance and order in autoregressive time series
,”
J. R. Stat. Soc. Ser. B
47
(
2
),
216
228
(
1985
).
72.
D.
Pena
,
G. C.
Tiao
, and
R. S.
Tsay
,
A Course in Time Series Analysis
(
John Wiley & Sons
,
2011
), Vol. 322.
73.
D.
Qin
, “
Rise of VAR modelling approach
,”
J. Econ. Surv.
25
(
1
),
156
174
(
2011
).
74.
A.
Mohiuddin Rather
,
A.
Agarwal
, and
V. N.
Sastry
, “
Recurrent neural network and a hybrid model for prediction of stock returns
,”
Expert Syst. Appl.
42
(
6
),
3234
3241
(
2015
).
75.
O. A.
Rosso
,
H. A.
Larrondo
,
M. T.
Martin
,
A.
Plastino
, and
M. A.
Fuentes
, “
Distinguishing noise from chaos
,”
Phys. Rev. Lett.
99
(
15
),
154102
(
2007
).
76.
C. W.
Rowley
,
I.
Mezić
,
S.
Bagheri
,
P.
Schlatter
,
D.
Henningson
et al., “
Spectral analysis of nonlinear flows
,”
J. Fluid Mech.
641
(
1
),
115
127
(
2009
).
77.
T.
Sauer
,
J. A.
Yorke
, and
M.
Casdagli
, “
Embedology
,”
J. Stat. Phys.
65
(
3–4
),
579
616
(
1991
).
78.
P. J.
Schmid
, “
Dynamic mode decomposition of numerical and experimental data
,”
J. Fluid Mech.
656
,
5
28
(
2010
).
79.
A.
Scott
,
Encyclopedia of Nonlinear Science
(
Routledge
,
2006
).
80.
C.
Serio
, “
Autoregressive representation of time series as a tool to diagnose the presence of chaos
,”
Europhys. Lett.
27
(
2
),
103
(
1994
).
81.
Q.
Song
and
Z.
Feng
, “
Effects of connectivity structure of complex echo state network on its prediction performance for nonlinear time series
,”
Neurocomputing
73
(
10–12
),
2177
2185
(
2010
).
82.
J.
Sun
,
D.
Taylor
, and
E. M.
Bollt
, “
Causal network inference by optimal causation entropy
,”
SIAM J. Appl. Dyn. Syst.
14
(
1
),
73
106
(
2015
).
83.
F.
Takens
, “Detecting strange attractors in turbulence,” in Dynamical Systems and Turbulence, Warwick 1980 (Springer, 1981), pp. 366–381.
84.
G. C.
Tiao
and
R. S.
Tsay
, “
Consistency properties of least squares estimates of autoregressive parameters in ARMA models
,”
Ann. Stat.
11
,
856
871
(
1983
).
85.
H. A.
Van der Vorst
,
Iterative Krylov Methods for Large Linear Systems
(
Cambridge University Press
,
2003
), Vol. 13.
86.
K.
Vandoorne
,
P.
Mechet
,
T.
Van Vaerenbergh
,
M.
Fiers
,
G.
Morthier
,
D.
Verstraeten
,
B.
Schrauwen
,
J.
Dambre
, and
P.
Bienstman
, “
Experimental demonstration of reservoir computing on a silicon photonics chip
,”
Nat. Commun.
5
(
1
),
1
6
(
2014
).
87.
D.
Verstraeten
,
B.
Schrauwen
,
M.
dÕHaene
, and
D.
Stroobandt
, “
An experimental unification of reservoir computing methods
,”
Neural Netw.
20
(
3
),
391
403
(
2007
).
88.
P. R.
Vlachas
,
W.
Byeon
,
Z. Y.
Wan
,
T. P.
Sapsis
, and
P.
Koumoutsakos
, “
Data-driven forecasting of high-dimensional chaotic systems with long short-term memory networks
,”
Proc. R. Soc. A
474
(
2213
),
20170844
(
2018
).
89.
P. R.
Vlachas
,
J.
Pathak
,
B. R.
Hunt
,
T. P.
Sapsis
,
M.
Girvan
,
E.
Ott
, and
P.
Koumoutsakos
, “Forecasting of spatio-temporal chaotic dynamics with recurrent neural networks: A comparative study of reservoir computing and backpropagation algorithms,” arXiv:1910.05266 (2019).
90.
H.
Wernecke
,
B.
Sándor
, and
C.
Gros
, “
Chaos in time delay systems, an educational review
,”
Phys. Rep.
824
,
1
40
(
2019
).
91.
M. O.
Williams
,
I. G.
Kevrekidis
, and
C. W.
Rowley
, “
A data-driven approximation of the Koopman operator: Extending dynamic mode decomposition
,”
J. Nonlinear Sci.
25
(
6
),
1307
1346
(
2015
).
92.
H. O.
Andreas Wold
,
A Study in the Analysis of Stationary Time Series: With an Appendix
(
Almqvist & Wiksell
,
1954
).
93.
K.
Yeo
, “Model-free prediction of noisy chaotic time series by deep learning,” arXiv:1710.01693 (2017).
94.
K.
Yeo
and
I.
Melnyk
, “
Deep learning algorithm for data-driven simulation of noisy dynamical system
,”
J. Comput. Phys.
376
,
1212
1231
(
2019
).
95.
K.
Yonemoto
and
T.
Yanagawa
, “
Estimating the embedding dimension and delay time of chaotic time series by an autoregressive model
,”
Bull. Inf. Cybern.
33
(
1–2
),
53
62
(
2001
).
96.
R. S.
Zimmermann
and
U.
Parlitz
, “
Observing spatio-temporal dynamics of excitable media using reservoir computing
,”
Chaos
28
(
4
),
043118
(
2018
).
You do not currently have access to this content.