Neural networks, and in general machine learning techniques, have been widely employed in forecasting time series and more recently in predicting spatial–temporal signals. All of these approaches involve some kind of feature selection regarding what past data and what neighbor data to use for forecasting. In this article, we show extensive empirical evidence on how to independently construct the optimal feature selection or input representation used by the input layer of a feed forward neural network for the purpose of forecasting spatial–temporal signals. The approach is based on results from the dynamical systems theory, namely, nonlinear embedding theorems. We demonstrate it for a variety of spatial–temporal signals and show that the optimal input layer representation consists of a grid, with spatial–temporal lags determined by the minimum of the mutual information of the spatial–temporal signals and the number of points taken in space–time decided by the embedding dimension of the signal. We present evidence of this proposal by running a Monte Carlo simulation of several combinations of input layer feature designs and show that the one predicted by the nonlinear embedding theorems seems to be optimal or close to being optimal. In total, we show evidence in four unrelated systems: a series of coupled Hénon maps, a series of coupled ordinary differential equations (Lorenz-96) phenomenologically modeling atmospheric dynamics, the Kuramoto–Sivashinsky equation, a partial differential equation used in studies of instabilities in laminar flame fronts, and finally real physical data from sunspot areas in the Sun (in latitude and time) from 1874 to 2015. These four examples cover the range from simple toy models to complex nonlinear dynamical simulations and real data. Finally, we also compare our proposal against alternative feature selection methods and show that it also works for other machine learning forecasting models.

1
R. L.
Devaney
,
An Introduction to Chaotic Dynamical Systems
(
CRC Press
,
2018
).
2
P.
Manneville
, Instabilities, Chaos and Turbulence: An Introduction to Nonlinear Dynamics and Complex Systems, edited by P. Manneville (World Scientific Press, 2004), ISBN: 9781860945335.
3
A.
Wolf
,
J. B.
Swift
,
H. L.
Swinney
, and
J. A.
Vastano
,
Phys. D Nonlinear Phenom.
16
,
285
(
1985
).
5
I. I.
Gikhman
and
A. V.
Skorokhod
, Introduction to the Theory of Random Processes, Dover Books on Mathematics (Dover Publications, 1996).
7
M. J. D.
Powell
, in Algorithms for Approximation, edited by J. C. Mason and M. G. Cox (Clarendon Press, New York, NY, 1987), pp. 143–167.
8
D. S.
Broomhead
and
D.
Lowe
,
Complex Syst.
2
,
321
, (
1988
).
9
J. D.
Farmer
and
J. J.
Sidorowich
,
Phys. Rev. Lett.
59
,
845
(
1987
).
10
G.
Box
,
G. M.
Jenkins
, and
G.
Reinsel
,
Time Series Analysis: Forecasting & Control
, 3rd ed. (
Prentice Hall
,
1994
).
11
L.
Rabiner
and
B.
Juang
,
IEEE ASSP Mag.
3
,
4
(
1986
).
12
M.
Längkvist
,
L.
Karlsson
, and
A.
Loutfi
,
Pattern Recognit. Lett.
42
,
11
(
2014
).
14
U.
Parlitz
and
G.
Mayer-Kress
,
Phys. Rev. E
51
,
R2709
(
1995
).
15
C.
López
,
A.
Álvarez
, and
E.
Hernández-García
,
Phys. Rev. Lett.
85
,
2300
(
2000
).
16
S.
Ørstavik
and
J.
Stark
,
Phys. Lett. A
247
,
145
(
1998
).
17
U.
Parlitz
and
C.
Merkwirth
, in 8th European Symposium on Artificial Neural Networks, Bruges, 26–28 April 2000.
18
U.
Parlitz
and
C.
Merkwirth
,
Phys. Rev. Lett.
84
,
1890
(
2000
).
19
E. O.
Covas
and
F. C.
Mena
, in Dynamics, Games and Science I (Springer, Berlin, 2011), pp. 243–251.
20
Y.
Xia
,
H.
Leung
, and
H.
Chan
,
IEEE Trans. Circuits Syst. II: Express Briefs
53
,
62
(
2006
).
21
D.
Gladish
and
C.
Wikle
,
Environmetrics
25
,
230
(
2014
).
22
E.
Covas
,
A&A
605
,
A44
(
2017
); e-print arXiv:1709.02796 [astro-ph.SR].
23
R. A.
Richardson
,
Environmetrics
28
,
e2456
(
2017
).
24
E.
Covas
,
N.
Peixinho
, and
J.
Fernandes
,
Sol. Phys.
294
,
24
(
2019
).
25
P. L.
McDermott
and
C. K.
Wikle
,
Stat
6
,
315
(
2017
).
26
P.
McDermott
and
C.
Wikle
,
Entropy
21
,
184
(
2019
).
27
M.
Raissi
,
P.
Perdikaris
, and
G. E.
Karniadakis
, e-print arXiv:1711.10566 [cs.AI] (2017).
28
M.
Raissi
,
P.
Perdikaris
, and
G. E.
Karniadakis
, e-print arXiv:1711.10561 [cs.AI] (2017).
29
Z.
Long
,
Y.
Lu
,
X.
Ma
, and
B.
Dong
, e-print arXiv:1710.09668 [math.NA] (2017).
30
J.
Cao
,
D. J.
Farnham
, and
U.
Lall
, e-print arXiv:1712.05293 [cs.LG] (2017).
31
A.
Ghaderi
,
B. M.
Sanandaji
, and
F.
Ghaderi
, e-print arXiv:1707.08110 (2017).
32
Z.
Lu
,
J.
Pathak
,
B.
Hunt
,
M.
Girvan
,
R.
Brockett
, and
E.
Ott
,
Chaos
27
,
041102
(
2017
).
33
M.
Raissi
and
G. E.
Karniadakis
,
J. Comput. Phys.
357
,
125
(
2018
); preprint arXiv:1708.00588 [cs.AI].
34
M.
Raissi
, “
Deep hidden physics models: Deep learning of nonlinear partial differential equations
,”
J. Mach. Learning Res.
(
to be published
).
35
There is also a new emerging field of research on solving PDEs (therefore implicitly predicting a spatial–temporal evolution) using deep learning—see Refs. 124126 and the references therein. Furthermore, notice that in this article, we are concerned with the full space–time prediction, as opposed to ongoing research on pattern recognition in moving images (2D and 3D), which attempts to pick particular features (e.g., car, pedestrian, bicycle, person, etc.) and to forecast where those features will be in subsequent images within a particular moving sequence—see Ref. 127 and the references therein.
36
P.
Lynch
,
The Emergence of Numerical Weather Prediction: Richardson’s Dream
(
Cambridge University Press
,
2006
).
37
E. N.
Parker
, Cosmical Magnetic Fields: Their Origin and their Activity, The International Series of Monographs on Physics (Oxford University Press, 1979).
38
M.
West
,
D.
Seaton
,
M.
Dominique
,
D.
Berghmans
,
B.
Nicula
,
E.
Pylyser
,
K.
Stegen
, and
J.
De Keyser
, in EGU General Assembly Conference Abstracts, EGU General Assembly Conference Abstracts (General Assemblies of the European Geosciences Union (EGU), 2013), Vol. 15, pp. EGU2013-10865.
39
C. J.
Schrijver
,
Space Weather
13
,
524
(
2015
); e-print arXiv:1507.08730 [physics.space-ph].
40
P.
Grassberger
,
Phys. Lett. A
107
,
101
(
1985
).
41
R.
Bellman
, Dynamic Programming, Dover Books on Computer Science (Dover Publications, 2003).
42
D.
Cox
and
N.
Pinto
, “
Beyond simple features: A large-scale feature search approach to unconstrained face recognition
,” in (
IEEE
,
2011
), pp.
8
15
.
43
A.
Waibel
,
T.
Hanazawa
,
G.
Hinton
,
K.
Shikano
, and
K. J.
Lang
, in Readings in Speech Recognition (Elsevier, 1990), pp. 393–404.
44
K.
Luk
,
J.
Ball
, and
A.
Sharma
,
J. Hydrol.
227
,
56
(
2000
).
45
R. J.
Frank
,
N.
Davey
, and
S. P.
Hunt
, “
Input window size and neural network predictors
,” in (
IEEE
,
2000
), Vol.
2
, pp.
237
242
.
46
R. J.
Frank
,
N.
Davey
, and
S. P.
Hunt
,
J. Intell. Robotic Syst.
31
,
91
(
2001
).
48
Z.
Sheng
,
L.
Hong-Xing
,
G.
Dun-Tang
, and
D.
Si-Dan
,
Chin. Phys.
12
,
594
(
2003
).
49
A. H.
Ghaderi
,
B.
Bharani
, and
H.
Jalalkamali
,
Int. J. Modern Phys. Appl.
1
,
64
(
2015
).
50
F.
Takens
, “
Detecting strange attractors in turbulence
,” (Springer Verlag, Berlin, 1981), Vol. 898, p. 366.
51
H.
Whitney
,
Ann. Math.
37
,
645
(
1936
).
52
R.
Mañé
, “
On the dimension of the compact invariant sets of certain non-linear maps
,” (Springer Verlag, Berlin, 1981), Vol. 898, p. 230.
53
T.
Sauer
,
J. A.
Yorke
, and
M.
Casdagli
,
J. Stat. Phys.
65
,
579
(
1991
).
54
Y.
Gutman
, “
Takens’ embedding theorem with a continuous observable
,” (
De Gruyter
,
2016
), pp.
142
.
55
Y.
Gutman
,
Y.
Qiao
, and
G.
Szabó
,
Nonlinearity
31
,
597
(
2018
).
56
Notice that another nonlinear dynamical system technique exists to calculate this time delay, the zero of the autocorrelation function,58,59 but essentially these two approaches are after the same objective, i.e., to select uncorrelated variables as much as possible for optimal reconstruction embedding. So, in this article, we focus only on the first minima of the mutual information for simplicity of analysis.
57
A. M.
Fraser
and
H. L.
Swinney
,
Phys. Rev. A
33
,
1134
(
1986
).
58
H. D. I.
Abarbanel
and
J. P.
Gollub
,
Phys. Today
49
(
1
),
86
(
1996
).
59
H.
Kantz
and
T.
Schreiber
, Nonlinear Time Series Analysis, Cambridge Nonlinear Science Series (Cambridge University Press, Cambridge, New York, 1997), originally published in 1997.
60
M. B.
Kennel
,
R.
Brown
, and
H. D. I.
Abarbanel
,
Phys. Rev. A
45
,
3403
(
1992
).
61
J. M.
Martinerie
,
A. M.
Albano
,
A. I.
Mees
, and
P. E.
Rapp
,
Phys. Rev. A
45
,
7058
(
1992
).
62
H. D. I.
Abarbanel
,
R.
Brown
,
J. J.
Sidorowich
, and
L. S.
Tsimring
,
Rev. Mod. Phys.
65
,
1331
(
1993
).
63
R.
Archana
,
A.
Unnikrishnan
, and
R.
Gopikakumari
, in 2012 International Conference on Power, Signals, Controls and Computation (IEEE, 2012).
64
M.
Ragulskis
and
K.
Lukoseviciute
,
Neurocomputing
72
,
2618
(
2009
).
65
F.
Liu
,
C.
Quek
, and
G. S.
Ng
, in Proceedings of 2005 IEEE International Joint Conference on Neural Networks (IEEE, 2005).
66
P. J.
Werbos
, in System Modeling and Optimization (Springer-Verlag, 2005), pp. 762–770.
67
D. E.
Rumelhart
,
G. E.
Hinton
, and
R. J.
Williams
,
Nature
323
,
533
(
1986
).
68
P.
Werbos
,
Proc. IEEE
78
,
1550
(
1990
).
69
Y.
Lecun
,
L.
Bottou
,
Y.
Bengio
, and
P.
Haffner
,
Proc. IEEE
86
,
2278
(
1998
).
71
R.
Reed
, Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks, A Bradford Book (A Bradford Book, 1999).
72
Z.
Wang
,
A.
Bovik
,
H.
Sheikh
, and
E.
Simoncelli
,
IEEE Trans. Image Process.
13
,
600
(
2004
).
73
It has also been used in the context of deep learning used for enhancing resolution on two dimensional images128 and restoring missing data in images.129 
74
F.
Bellot
and
E. E.
Krause
,
Math. Gazette
72
,
255
(
1988
).
75
E.
Covas
,
R.
Tavakol
,
P.
Ashwin
,
A.
Tworkowski
, and
J. M.
Brooke
,
Chaos
11
,
404
(
2001
).
76
J. P.
Boyd
,
Chebyshev & Fourier Spectral Methods
(
Springer
,
Berlin
,
1989
).
77
Given that the surface solar rotation varies with time and latitude, any approach of comparing positions on the Sun over a period of time is necessarily subjective. Therefore, solar rotation is arbitrarily taken to be 27.275 231 6 days for the purpose of Carrington rotations. Each solar rotation is given a number, the so-called Carrington Rotation Number, starting from 9th November, 1853.
78
E. W.
Maunder
,
Mon. Notices R. Astron. Soc.
64
,
747
(
1904
).
79
See http://solarcyclescience.com/bin/bfly.jpg (jpeg image, 8192 × 4358 pixels) for “Daily sunspot area averaged over individual solar rotations,” 2018 (last accessed April 20, 2018).
81
H.
Lundstedt
,
M.
Wik
, and
P.
Wintoft
, AGU Fall Meeting Abstracts, SH21A-0315 (2006).
82
M.
Wik
, in 37th COSPAR Scientific Assembly, COSPAR Meeting, (2008), Vol. 37 p. 3467.
83
J.
Jiang
,
R. H.
Cameron
,
D.
Schmitt
, and
M.
Schüssler
,
A&A
528
,
A82
(
2011
); e-print arXiv:1102.1266 [astro-ph.SR].
84
R. H.
Cameron
,
J.
Jiang
, and
M.
Schüssler
,
ApJ
823
,
L22
(
2016
); e-print arXiv:1604.05405 [astro-ph.SR].
85
S. W.
McIntosh
,
X.
Wang
,
R. J.
Leamon
,
A. R.
Davey
,
R.
Howe
,
L. D.
Krista
,
A. V.
Malanushenko
,
R. S.
Markel
,
J. W.
Cirtain
,
J. B.
Gurman
,
W. D.
Pesnell
, and
M. J.
Thompson
,
Astrophys. J.
792
,
12
(
2014
); e-print arXiv:1403.3071.
86
J.
Jiang
and
J.
Cao
,
J. Atmos. Solar-Terr. Phys.
176
,
34
(
2018
); e-print arXiv:1707.00268 [astro-ph.SR].
87
N.
Safiullin
,
N.
Kleeorin
,
S.
Porshnev
,
I.
Rogachevskii
, and
A.
Ruzmaikin
,
J. Plasma Phys.
84
,
735840306
(
2018
); e-print arXiv:1712.07501 [astro-ph.SR].
88
We use exactly the same data set as in Refs. 22 and 24 for consistency, even if more data are already available at this time.
89
J.
Garland
and
E.
Bradley
,
Chaos
25
,
123108
(
2015
).
90
K.
Kaneko
,
Progr. Theor. Phys. Suppl.
99
,
263
(
1989
).
91
G.
Mayer-Kress
and
K.
Kaneko
,
J. Stat. Phys.
54
,
1489
(
1989
).
92
K.
Kaneko
, Theory and Applications of Coupled Map Lattices, Nonlinear Science: Theory and Applications (Wiley, 1993).
93
E. N.
Lorenz
, “Predictability—A problem partly solved,” in Predictability of Weather and Climate, edited by T. Palmer and R. Hagedorn (Cambridge University Press, 2006), pp. 40–58.
94
See https://www.mathworks.com/matlabcentral/fileexchange/25054-lorenz–96-model? for more information about Lorenz ’96 model—file exchange—MATLAB central, 2018 (last accessed April 20, 2018).
95
Y.
Kuramoto
and
T.
Tsuzuki
,
Progr. Theor. Phys.
55
,
356
(
1976
).
96
G. I.
Sivashinsky
,
Acta Astronaut.
4
,
1177
(
1977
).
97
See http://chaosbook.org/extras/KSEproject/html/index.html for more information about Kuramoto–Sivashinsky: An investigation of spatiotemporal “turbulence,” 2007 (last accessed April 20, 2018).
98
See https://blog.datadive.net/selecting-good-features-part-i-univariate-selection for “Feature selection—Part I: Univariate Selection | Diving into Data,” 2019 (last accessed January 3, 2019).
99
See https://blog.datadive.net/selecting-good-features-part-ii-linear-models-and-regularization for “Selecting Good Features—Part II: Linear Models and Regularization | Diving into Data,” 2019 (last accessed January 3, 2019).
100
See https://blog.datadive.net/selecting-good-features-part-iii-random-forests for “Selecting Good Features—Part III: Random Forests | Diving into Data,” 2019 (last accessed January 3, 2019).
101
See https://blog.datadive.net/selecting-good-features-part-iv-stability-selection-rfe-and-everything-side-by-side for “Selecting Good Features—Part IV: Stability Selection, RFE and Everything Side by Side | Diving into Data,” 2019 (last accessed January 3, 2019).
102
See https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html for “3.2.4.3.2. sklearn.ensemble.RandomForestRegressor—scikit-learn 0.20.2 Documentation,” 2018 (last accessed December 31, 2018).
103
F.
Pedregosa
,
G.
Varoquaux
,
A.
Gramfort
,
V.
Michel
,
B.
Thirion
,
O.
Grisel
,
M.
Blondel
,
P.
Prettenhofer
,
R.
Weiss
,
V.
Dubourg
,
J.
Vanderplas
,
A.
Passos
,
D.
Cournapeau
,
M.
Brucher
,
M.
Perrot
, and
E.
Duchesnay
,
12
,
2825
(
2011
).
104
See https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html for “sklearn.linear_model.LinearRegression—scikit-learn 0.20.2 Documentation,” 2019 (last accessed January 3, 2019).
105
See https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html#sklearn.linear_model.Lasso for “sklearn.linear_model.Lasso—scikit-learn 0.20.2 Documentation,” 2019 (last accessed January 3, 2019).
106
T.
Hastie
,
R.
Tibshirani
, and
J.
Friedman
,
The Elements of Statistical Learning
(
Springer
,
New York
,
2009
).
107
See https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html for “sklearn.linear_model.Ridge—scikit-learn 0.20.2 Documentation,” 2019 (last accessed January 3, 2019).
108
109
I.
Guyon
,
J.
Weston
,
S.
Barnhill
, and
V.
Vapnik
,
Mach. Learn.
46
,
389
(
2002
).
110
See https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html for “sklearn.feature_selection.RFE—scikit-learn 0.20.2 Documentation,” 2019 (last accessed January 3, 2019).
111
A. D.
Gordon
,
L.
Breiman
,
J. H.
Friedman
,
R. A.
Olshen
, and
C. J.
Stone
,
Biometrics
40
,
874
(
1984
).
112
See https://scikit-learn.org/stable/auto_examples/tree/plot_tree_regression.html for “Decision Tree Regression—scikit-learn 0.20.2 Documentation,” 2019 (last accessed January 4, 2019).
113
A.
Cutler
,
D. R.
Cutler
, and
J. R.
Stevens
, in Ensemble Machine Learning (Springer US, 2012) pp. 157–175.
114
See https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html for “3.2.4.3.2. sklearn.ensemble.RandomForestRegressor—scikit-learn 0.20.2 Documentation,” 2019 (last accessed January 4, 2019).
115
P. E.
Pfeifer
and
S. J.
Deutsch
,
Trans. Inst. Br. Geogr.
5
,
330
(
1980
).
116
J. G. D.
Gooijer
and
R. J.
Hyndman
,
Int. J. Forecast.
22
,
443
(
2006
).
117
H.
Drucker
,
C. J. C.
Burges
,
L.
Kaufman
,
A. J.
Smola
, and
V.
Vapnik
, in Advances in Neural Information Processing Systems 9, edited by M. C. Mozer, M. I. Jordan, and T. Petsche (MIT Press, 1997), pp. 155–161.
118
See https://scikit-learn.org/stable/auto_examples/svm/plot_svm_regression.html for “Support Vector Regression (SVR) Using Linear and Non-Linear Kernels—scikit-learn 0.20.2 Documentation,” 2019 (last accessed January 3, 2019).
119
M.
Abadi
,
P.
Barham
,
J.
Chen
,
Z.
Chen
,
A.
Davis
,
J.
Dean
,
M.
Devin
,
S.
Ghemawat
,
G.
Irving
,
M.
Isard
,
M.
Kudlur
,
J.
Levenberg
,
R.
Monga
,
S.
Moore
,
D. G.
Murray
,
B.
Steiner
,
P.
Tucker
,
V.
Vasudevan
,
P.
Warden
,
M.
Wicke
,
Y.
Yu
,
X.
Zheng
, and
G.
Brain
, in 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’16) (USENIX Association, Savannah, GA, 2016); e-print arXiv:1605.08695.
120
D. P.
Kingma
and
J.
Ba
, e-print arXiv:1412.6980 [cs.LG] (2017).
121
We have four parameters for the feature selection in these cases, with one temporal and one spatial dimension. For higher dimensional systems, there will be more parameters, the exact number being double the number of dimensions of the system.
122
E.
Maiorino
,
F. M.
Bianchi
,
L.
Livi
,
A.
Rizzi
, and
A.
Sadeghian
,
Inf. Sci.
382–383
,
359
(
2017
).
123
S.
Hochreiter
and
J.
Schmidhuber
,
Neural Comput.
9
,
1735
(
1997
).
124
C.
Beck
,
W.
E
, and
A.
Jentzen
,
J. Nonlinear Sci.
(to be published).
125
J.
Sirignano
and
K.
Spiliopoulos
,
J. Comp. Phys.
375
,
1339
(
2018
).
126
W.
E
,
J.
Han
, and
A.
Jentzen
,
Commun. Math. Stat.
5
,
349
(
2017
).
127
C.
Li
,
B.
Yang
, and
C.
Li
,
DEStech Trans. Comp. Sci. Eng
2017
,
213
.
128
C.
Dong
,
C.
Change Loy
,
K.
He
, and
X.
Tang
,
IEEE
38
,
295
(
2016
).
129
Q.
Zhang
,
Q.
Yuan
,
C.
Zeng
,
X.
Li
, and
Y.
Wei
,
IEEE Trans. Geosci. Remote Sens.
56
,
4274
(
2018
); e-print arXiv:1802.08369[cs.CV].
You do not currently have access to this content.