Neural networks, and in general machine learning techniques, have been widely employed in forecasting time series and more recently in predicting spatial–temporal signals. All of these approaches involve some kind of feature selection regarding what past data and what neighbor data to use for forecasting. In this article, we show extensive empirical evidence on how to independently construct the optimal feature selection or input representation used by the input layer of a feed forward neural network for the purpose of forecasting spatial–temporal signals. The approach is based on results from the dynamical systems theory, namely, nonlinear embedding theorems. We demonstrate it for a variety of spatial–temporal signals and show that the optimal input layer representation consists of a grid, with spatial–temporal lags determined by the minimum of the mutual information of the spatial–temporal signals and the number of points taken in space–time decided by the embedding dimension of the signal. We present evidence of this proposal by running a Monte Carlo simulation of several combinations of input layer feature designs and show that the one predicted by the nonlinear embedding theorems seems to be optimal or close to being optimal. In total, we show evidence in four unrelated systems: a series of coupled Hénon maps, a series of coupled ordinary differential equations (Lorenz-96) phenomenologically modeling atmospheric dynamics, the Kuramoto–Sivashinsky equation, a partial differential equation used in studies of instabilities in laminar flame fronts, and finally real physical data from sunspot areas in the Sun (in latitude and time) from 1874 to 2015. These four examples cover the range from simple toy models to complex nonlinear dynamical simulations and real data. Finally, we also compare our proposal against alternative feature selection methods and show that it also works for other machine learning forecasting models.

You do not currently have access to this content.