Prediction models that capture and use the structure of state-space dynamics can be very effective. In practice, however, one rarely has access to full information about that structure, and accurate reconstruction of the dynamics from scalar time-series data—e.g., via delay-coordinate embedding—can be a real challenge. In this paper, we show that forecast models that employ *incomplete* reconstructions of the dynamics—i.e., models that are not necessarily true embeddings—can produce surprisingly accurate predictions of the state of a dynamical system. In particular, we demonstrate the effectiveness of a simple near-neighbor forecast technique that works with a two-dimensional time-delay reconstruction of both low- and high-dimensional dynamical systems. Even though correctness of the topology may not be guaranteed for incomplete reconstructions like this, the dynamical structure that they *do* capture allows for accurate predictions—in many cases, even more accurate than predictions generated using a traditional embedding. This could be very useful in the context of real-time forecasting, where the human effort required to produce a correct delay-coordinate embedding is prohibitive.

Prediction models constructed from state-space dynamics have a long and rich history, dating back to roulette and beyond. A major stumbling block in the application of these models in real-world situations is the need to reconstruct the dynamics from scalar time-series data—e.g., via delay-coordinate embedding. This procedure, which is the topic of a large and active body of literature, involves estimation of two free parameters: the dimension *m* of the reconstruction space and the delay *τ* between the observations that make up the coordinates in that space. Estimating good values for these parameters is not trivial; it requires the proper mathematics, attention to the data requirements, computational effort, and expert interpretation of the results of the computations. This is a major challenge if one is interested in real-time forecasting, especially when the systems involved operate on fast time scales.

In this paper, we show that the full effort of delay-coordinate embedding is not always necessary when one is building forecast models—and that it can indeed be overkill. Using synthetic time-series data generated from the Lorenz-96 atmospheric model and real data from a computer performance experiment, we demonstrate that even a two-dimensional time-delay reconstruction of scalar data from a dynamical system gives simple forecast methods enough traction to generate accurate predictions of the future course of that time series—sometimes even more accurate than predictions created using a traditional embedding. Since 2*D* delay reconstructions are not guaranteed to preserve the topology of the full dynamics of systems like this, where the number of state-space dimensions ranges from 3 to 47, this result is interesting from a mathematical standpoint. It is also potentially useful in practice. When one is using a 2*D* delay reconstruction of the dynamics to generate a forecast, only one free parameter (*τ*) is involved, and it is possible to estimate good values for this parameter “on the fly” using a recently introduced information-theoretic algorithm. This reduced-order forecasting approach thus sidesteps much of the complexity of the embedding process—perhaps most importantly, the need for expert human interpretation—and thus could enable automated, real-time dynamics-based forecasting in practical applications.

## I. INTRODUCTION

Complicated nonlinear dynamics are ubiquitous in natural and engineered systems. Methods that capture and use the state-space structure of a dynamical system are a proven strategy for forecasting the behavior of systems like this, but their use is not always straightforward. The governing equations and the state variables are rarely known; rather, one has a single (or perhaps a few) series of scalar measurements that can be observed from the system. It can be a challenge to model the full dynamics from data like this, especially in the case of *forecast* models, which are only really useful if they can be constructed and applied on faster time scales than those of the target system. While the traditional state-space reconstruction machinery^{1} is a good way to accomplish the task of modeling the dynamics, it is problematic in real-time forecasting because it generally requires input from and interpretation by a human expert in order to work properly. The results in this paper suggest that that roadblock can be sidestepped by using a reduced-order variant of delay-coordinate embedding to build forecast models for nonlinear dynamical systems: the resulting forecasts can be as good as, or better than, those obtained using full embeddings, and with far less computational and human effort.

Modern approaches to modeling a time series for the purposes of forecasting arguably began with Yule's work on predicting the annual number of sunspots^{2} through what is now known as autoregression. Before this, time-series forecasting was done mostly through simple global extrapolation.^{3} Global linear methods, of course, are rarely adequate when one is working with nonlinear dynamical systems; rather, one needs to model the details of the state-space dynamics in order to make accurate predictions. The usual first step in this process is to reconstruct that dynamical structure from the observed data. The state-space reconstruction techniques proposed by Packard *et al.*^{4} in 1980 were a critical breakthrough in this regard. In the most common variant of this now-classic approach, one constructs a set of vectors $x\u2192j\u2208\mathbb{R}m$ where each coordinate is simply a time-delayed element of the scalar time-series data *x _{j}*, i.e., $x\u2192j=[xj\u2009xj\u2212\tau \u2009\u2026\u2009xj\u2212(m\u22121)\tau ]$ for $\tau >0$. In 1981, Takens showed that this

*delay-coordinate embedding*method provides a topologically correct representation of a nonlinear dynamical system if a specific set of theoretical assumptions are satisfied;

^{5}in 1991, Sauer

*et al.*extended this discussion and relaxed some of the theoretical restrictions.

^{6}This remains a highly active field of research; see, for example, Abarbanel

^{7}or Kantz and Schreiber

^{8}for good surveys.

A large number of creative strategies have been developed for using the state-space structure of a dynamical system to generate predictions (e.g., Refs. 3 and 9–13). Perhaps the most simple of all of these state-space based forecast methods is the “Lorenz Method of Analogues” (LMA), which is essentially nearest-neighbor prediction.^{14} Lorenz's original formulation of this idea used the full system state space; this method was extended to embedded dynamics by Pikovsky,^{13} but is also related to the prediction work of Sugihara and May,^{12} among others. Even this simple strategy—which, as described in more detail in Sec. II B, builds predictions by looking for the nearest neighbor of a given point and taking that neighbor's observed path as the forecast—works quite well for forecasting nonlinear dynamical systems. LMA and similar methods have been used successfully to forecast measles and chickenpox outbreaks,^{12} marine phytoplankton populations,^{12} performance dynamics of a running computer,^{15–17} the fluctuations in a far-infrared laser,^{3,11} and many more.

The reconstruction step that is necessary before any of these methods can be applied to scalar time-series data, however, can be problematic. Delay-coordinate embedding is a powerful piece of machinery, but estimating good values for its two free parameters, the time delay *τ* and the dimension *m*, is not trivial. A large number of heuristics have been proposed for this task (e.g., Refs. 8 and 18–29), but these methods are computationally intensive and they require input from—and interpretation by—a human expert. This can be a real problem in a prediction context: a millisecond-scale forecast is not useful if it takes seconds or minutes to produce. And it is even more of a problem in nonstationary systems, since the reconstruction machinery is only guaranteed to work for an infinitely long noise-free observation of a *single* dynamical system. If effective forecast models are to be constructed and applied in a manner that outpaces the dynamics of the target system, then, one may not be able to use the full, traditional delay-coordinate embedding machinery to reconstruct the dynamics.

The conjecture that forms the basis for this paper is that a full formal embedding, although mandatory for detailed dynamical analysis, *is not necessary for the purposes of prediction*—i.e., that reduced-order variants of delay-coordinate reconstructions are adequate for the purposes of forecasting, even though they are not true embeddings. As a first step towards validating that conjecture, we constructed two-dimensional time-delay reconstructions from a number of different time-series data sets, both simulated and experimental, and then built forecast models in those spaces. We found that forecasts produced using the Lorenz method of analogues on these reduced-order models of the dynamics are roughly as accurate as—and often even *more* accurate than—forecasts produced by the same method working in the full embedding space of the corresponding system. Figure 1 shows a quick example: a pair of forecasts of the so-called “Dataset A,” a time series from a far-infrared laser from the Santa Fe Institute prediction competition.^{3} Even though the low-dimensional reconstruction used to generate the forecast in the left panel of the figure is not completely faithful to the underlying dynamics for this system, it appears to be good enough to support accurate short-term forecast models of nonlinear dynamics.

While this proof-of-concept example is encouraging, Dataset A is only one time series and it was drawn from a comparatively simple system—one that is well-described by a first-return map (or, equivalently, a one-dimensional surface of section). Hence, a two-dimensional delay-coordinate reconstruction should naturally perform well for modeling this data. In practice, however, one rarely knows the governing equations, so one must fall back on traditional dimension-estimation techniques like false-near(est) neighbor (FNN), which suggest that at least five or six dimensions are required to successfully embed Dataset A.^{3} Indeed, as many as 16 dimensions have been used in LMA-based predictions of this canonical data set.^{11}

Sections III and IV offer a broader and deeper validation of this paper's central claim using forecasts constructed using two-dimensional delay reconstructions of ensembles of data sets from a number of different systems whose dynamics are far more complex than the time series in Figure 1. Uniformly, the results indicate that the full complexity (and effort) of the delay-coordinate “unfolding” process may not be strictly necessary to the success of forecast models of real-world nonlinear dynamical systems.

These results have significant practical implications. Fixing *m* = 2 effectively avoids all of the in-depth, post-facto, data-dependent analysis that is required to properly estimate a value for this parameter. It also avoids the high computational complexity that is involved in near-neighbor searches in high-dimension state spaces—an essential step in almost any forecast strategy. Of course, there is still a free parameter in a forecast strategy that uses 2*D* reconstructions: the delay *τ*. As described in Sec. IV, though, we believe that good values for this parameter can be estimated “on the fly,” with little to no human guidance, which would make this method both agile and resilient.

Of course, no forecast model will be ideal for every task. In fact, as a corollary of the undecidability of the halting problem,^{30} no single forecasting schema is ideal for all noise-free deterministic signals,^{3} let alone all real-world time-series data sets. We do not want to give the impression that the strategy proposed here will be effective for *every* time series, but we do intend to show that it is effective for a broad spectrum of signals. Additionally, we want to emphasize that it is intended as a *short*-term forecasting scheme. Dimensional reduction is a double-edged sword; it enables on-the-fly forecasting by eliminating a difficult estimation step, but it effects a guaranteed memory loss in the model. This well-known effect^{3} all but guarantees problems with accuracy as prediction horizons are increased. We explore this limitation in Sec. IV B.

## II. BACKGROUND AND METHODS

### A. Delay-coordinate embedding

The process of collecting a time series ${xj}j=1N$ from a dynamical system (aka a “trace”) is formally the evaluation of an *observation function*^{6} $h:X\u2192\mathbb{R}$ at the true system state $y\u2192(tj)$ at time *t _{j}* for $j=1,..,N$, i.e., $xj=h(y\u2192(tj))$ for $j=1,\u2026,N$. Provided that the underlying dynamics $\Phi $ and the observation function are both smooth and generic, Takens

^{5}proves that the delay coordinate map:

from a *d*-dimensional smooth compact manifold *M* to $\mathbb{R}2d+1$ is a diffeomorphism on *M*. To assure topological conjugacy, the proof requires that the embedding dimension *m* must be at least twice the dimension *d* of the ambient space; a tighter bound of $m>2dcap$, the capacity dimension of the original dynamics, was later established by Sauer *et al*.^{6} Operationalizing either of these theoretical constraints can be a real challenge. *d* is not known and accurate *d _{cap}* calculations are not easy with experimental data. And besides, one must first embed the data before performing those calculations.

As mentioned in Sec. I, the forecast strategy proposed in this paper sidesteps the challenge of estimating the embedding dimension by fixing *m* = 2, even if that value does not satisfy the $m>2dcap$ requirement. For those familiar with these proofs, the fact that forecasting works well in “incomplete reconstructions” like this may not be surprising. The worst-case bounds of $m>2dcap$ are intended to eliminate *all* projection-induced trajectory crossings in the reconstructed dynamics. For most systems, and most projections, the dimensions of the subspaces occupied by these false crossings are far smaller than those of the original systems;^{6} often, they are sets of measure zero. For the delay-coordinate map to be a diffeomorphism, all of these crossings must be unfolded by the embedding process. This is necessary if one is interested in calculating dynamical invariants like Lyapunov exponents. However, the near-neighbor relationships that most state-space forecast methods use in making their predictions are *not* invariant under diffeomorphism, so it does not make sense to require that strict condition on a model that one is using for those purposes. False crossings will, of course, cause incorrect predictions, but that is not a real problem in practice if the measure of that set is near zero, particularly when one is working with noisy, real-world data.

Fixing *m* = 2 takes care of one of the two free parameters in the delay-coordinate reconstruction process, but selection of a value for the delay, *τ*, is still an issue. The theoretical constraints in this regard are less stringent: *τ* must be greater than zero and not a multiple of any orbit's period.^{5,6} In practice, however, the noisy and finite-precision nature of digital data and floating-point arithmetic combine to make the choice of *τ* much more delicate.^{8} There are dozens of methods for this, for example, see Refs. 8 and 18–24. In this paper, we use the method of *time-delayed mutual information*,^{23,24} in which *τ* is chosen at the first minimum of the time-delayed mutual information, calculated using the TISEAN package.^{29} Fraser and Swinney argued that this minimizes the redundancy of the embedding coordinates, thereby maximizing the information content of the overall delay vector.^{24} This choice is discussed and empirically verified by Liebert and Schuster,^{23} although agreement on this topic is by no means universal.^{31,32} And it is well known that choice of *τ* is application- and system-specific: a *τ* that works well for a Lyapunov exponent calculation may not work well for other purposes.^{8,21,33} Indeed, as we show in Sec. IV A, the *τ* value suggested by mutual information calculations is rarely the optimal choice for forecasting. Even so, it is a reasonable starting point, as it is the standard technique used in the dynamical systems community.

Since full embeddings are the point of comparison for our results, we briefly review the traditional methods for estimating the dimension that is required to construct them. As in the case of *τ*, a number of creative strategies for doing so have been developed over the past few decades.^{8,20,22,26–29} Most of these are based in some way on minimizing the number of false crossings that are caused by projection. In this paper, we use the FNN approach of Kennel *et al*.,^{26} calculated using TISEAN^{29} with a $\u224820%$ threshold on the percentage of neighbors. Again, no heuristic method is perfect, but FNN is arguably the most widely used method for estimating *m*, and thus is useful for the purposes of comparison.

### B. Lorenz method of analogues

As mentioned in Sec. I, the dynamical systems community has developed a number of methods that capture and use state-space structure to create forecasts. Since the goal of this paper is to offer a proof of concept of the notion that incomplete reconstructions could give these kinds of methods enough traction to generate useful predictions, we chose one of the oldest and simplest members of that family to use in our experiments: Lorenz's method of analogues (LMA), which is essentially nearest-neighbor forecasting in reconstruction space.

To apply LMA to a scalar time-series data set ${xj}j=1n$, one begins by performing a delay-coordinate reconstruction to produce a trajectory of the form:

Forecasting the next point in the time series, $xn+1$, amounts to reconstructing the next delay vector $x\u2192n+1$ in the trajectory. Note that, by the form of delay-coordinate vectors, all but the first coordinate of $x\u2192n+1$ are known. To choose the first coordinate, one finds the nearest neighbor of $x\u2192n$ in the reconstructed space^{34}—namely, $x\u2192j(1,m)$—and maps that vector forward using the delay map, obtaining

Using the neighbor's image, one defines

The LMA forecast of $xn+1$ is then $pn+1=xj(1,m)+1$. If performing multi-step forecasts, one appends the new delay vector

to the end of the trajectory and repeats this process as needed.

Here, we use the LMA algorithm in two ways: first—as a baseline for comparison purposes—on an embedding of each time series, with *m* chosen using the false-near(est) neighbor method;^{26} second, with *m* fixed at 2. In the rest of this paper, we will refer to these as fnn-LMA and ro-LMA, respectively. The experiments reported in Sec. III use the same *τ* value for both fnn-LMA and ro-LMA, choosing it at the first minimum of the time-delayed mutual information of the time series.^{24} In Sec. IV, we explore the effects of varying *τ* on the accuracy of both methods.

Dozens—indeed, hundreds—of more-complicated variants of the LMA algorithm have appeared in the literature (e.g., Refs. 3, 9, 10, and 12), most of which involve building some flavor of local-linear model around each delay vector and then using it to make the prediction of the next point. For the purposes of this paper, we chose to use the basic original LMA because it is dynamically the most straightforward and thus provides a good baseline assessment. While we believe that the claims stated here extend to other state space-based forecast methods, the pre-processing steps involved in some of those methods make a careful analysis of the results somewhat problematic. One can use GHKSS-based techniques, for instance, to project the full dynamics onto linear submanifolds^{35} and then use those manifolds to build predictions. While it might be useful to apply a method like that to an incomplete reconstruction, the results would be some nonlinear conflation of the effects of the two different projections and it would be difficult to untangle and understand the individual effects. (Note that the careful study of the effects of projection in forecasting that are undertaken in this paper may suggest why GHKSS-based techniques work so well.) We do plan to see if our claim holds in the context of other methods, but for the purposes of this study, we feel that LMA is the best choice.

### C. Assessing forecast accuracy

To evaluate ro-LMA and compare it to fnn-LMA, we calculate a figure of merit in the following way. We split each *N*-point time series into two pieces: the first 90%, referred to as the “initial training” signal and denoted ${xj}j=1n$, and the last 10%, known as the “test” signal ${cj}j=n+1(k+n+1)=N$. Following the procedures described in Sec. II B, we build a time-delay reconstruction from the initial training signal, use that model to generate a prediction $pn+1$ of the value of $xn+1$, and compare $pn+1$ to the true continuation, $cn+1$. The experiments reported in Sec. III involve “one-step” models, which are rebuilt after each step, out to the end of the test signal, using ${cn+1}\u222a{xj}j=1n$. In Sec. IV, we extend that conversation to longer prediction horizons.

As a numerical measure of prediction accuracy, we calculate the mean absolute scaled error (*MASE*)^{36} between the true and predicted signals

*MASE* is a normalized measure: the scaling term in the denominator is the average in-sample forecast error for a random-walk prediction—which uses the previous value in the observed signal as the forecast—calculated over the initial training signal ${xj}j=1n$. That is, *MASE* < 1 means that the prediction error in question was, on the average, smaller than the in-sample error of a random-walk forecast on the same data. Analogously, *MASE* > 1 means that the corresponding prediction method did *worse*, on average, than the random-walk method. The *MASE* values for Figure 1, for instance, were 0.117 and 0.148—i.e., the fnn-LMA and ro-LMA forecasts of the SFI data set A were, respectively, $10.117=8.5$ and $10.148=6.5$ times better than a random-walk forecast of the initial training portion of same signal.

While its comparative nature may seem odd, this error metric allows for fair comparison across varying methods, prediction horizons, and signal scales, making it a standard error measure in the forecasting literature—and a good choice for the study described in Secs. III and IV, which involve a number of very different signals.

## III. PREDICTION IN PROJECTION

In this section, we demonstrate that the accuracies of forecasts produced by ro-LMA—Lorenz's method of analogues, operating on a two-dimensional time-delay reconstruction of a trajectory from a dynamical system—are similar to, and often better than, forecasts produced by fnn-LMA, which operates on an embedding of the same dynamics. While the brief example in Sec. I is a useful first validation of that statement, it does not support the kind of exploration that is necessary to properly evaluate a new forecast method, especially one that violates the basic tenets of delay-coordinate embedding. The SFI data set A is a single trace from a single system—and a low dimensional system at that. We want to show that ro-LMA is comparable to or better than fnn-LMA for a *range* of systems and parameter values—and to repeat each experiment for a number of different trajectories from each system. To this end, we studied two dynamical systems, one simulated and one real: the Lorenz-96 model and sensor data from a laboratory experiment on computer performance dynamics.

### A. A synthetic example: Lorenz-96

The Lorenz-96 model^{37} is a set of *K* first-order differential equations relating the *K* state variables $\xi 1\u2026\xi K$:

for $k=1,\u2026,K$, where $F\u2208\mathbb{R}$ is a constant forcing term that is independent of *k*. In this model, each *ξ _{k}* is some atmospheric quantity (such as temperature or vorticity) at a discrete location on a periodic lattice representing a latitude circle of the earth.

^{38}This model exhibits a wide range of dynamical behavior—everything from fixed points and periodic attractors to low- and high-dimensional chaos

^{38}—making it an ideal test case for our purposes.

We performed two sets of forecasting experiments with ensembles of traces from the Lorenz-96 model: one with *K* = 22 and the other with *K* = 47. Both experiments used constant forcing values of *F* = 5. These choices yield chaotic trajectories with low and high^{38} Kaplan-Yorke (Lyapunov) dimension:^{39} $dKY\u2a853$ for the *K* = 22 dynamics and $dKY\u224819$ for *K* = 47. Following standard practice,^{38} we enforced periodic boundary conditions and solved equation (2) from several randomly chosen initial conditions using a standard fourth-order Runge-Kutta integrator for 60 000 steps with a step size of $164$ normalized time units. We then discarded the first 10 000 points of each trajectory in order to eliminate transient behavior. Finally, we created scalar time-series traces by individually “observing” each of the *K* state variables of the trajectory: i.e*.,* $hi(\xi i(tj))=xj,i$ for $j\u2208{10,000,\u2026,60,000}$ and for $i\u2208{1,\u2026,K}$. We repeated all of this from a number of different initial conditions—seven for the *K* = 47 Lorenz-96 system and 15 for the *K* = 22 case—producing a total of 659 traces for our forecasting study. For each of these, we used the procedures outlined in Sec. II A to estimate values for the free parameters of the embedding process, obtaining *m* = 8 and *τ* = 26 for all traces in the *K* = 22 case, and *m* = 10 and *τ* = 31 for the *K* = 47 traces.^{40}

For the *K* = 22 dynamics, both ro-LMA and fnn-LMA worked quite well. See Figure 2(a) for a time-domain plot of an ro-LMA forecast of a representative trace from this system and Figures 2(b) and 2(c) for graphical representations of the forecast accuracy on that trace for both methods. The diagonal structure on the *p _{j}* vs.

*c*plots in the figure indicates that both of these LMA-based methods performed very well on this trace. More importantly—from the standpoint of evaluation of our primary claim—the

_{j}*MASE*scores of ro-LMA and fnn-LMA forecasts, computed following the procedures described in Sec. II C, were 0.391 ± 0.016 and $0.441\xb10.033$, respectively, across the 330 traces at this parameter value. That is, the LMA forecasting strategy worked

*better*on a two-dimensional reconstruction of these dynamics than on a full embedding, and by a statistically significant margin. This is somewhat startling, given that the two-dimensional delay reconstruction used by ro-LMA falls far short of the requirement for topological conjugacy

^{6}in this system. Clearly, though, it captures

*enough*structure to allow LMA to generate good predictions.

The *K* = 47 case is a slightly different story: here, ro-LMA still outperformed fnn-LMA, but not by a statistically significant margin. The *MASE* scores across all 329 traces were 0.985 ± 0.047 and 1.007 ± 0.043 for ro-LMA and fnn-LMA, respectively. In view of the higher complexity of the state-space structure of the *K* = 47 version of the Lorenz-96 system, the overall increase in *MASE* scores over the *K* = 22 case makes sense. Recall that *d _{KY}* is far higher for the

*K*= 47 case: this attractor fills more of the state space and has many more dimensions that are associated with positive Lyapunov exponents. This has obvious implications for predictability.

Since we used the same traces for both methods, one might be tempted to think that ro-LMA's better performance is a predictable consequence of data length—simply because filling out a higher-dimensional object like the fnn-LMA model requires more data. When we re-ran the experiments with longer traces, the *MASE* scores for ro-LMA and fnn-LMA did converge, but not until the trace was over 10^{6} points long, and at the (significant) cost of near-neighbor searches in a space with many more dimensions. Note, too that the longer delay vectors used by fnn-LMA span far more of the training set, which is a serious advantage from an information-theoretic standpoint. In view of this, the comparable performance of ro-LMA is quite impressive. These issues are explored at more length in Sec. IV.

### B. Experimental data: Computer performance dynamics

Validation with synthetic data is an important first step in evaluating any new forecast strategy, but experimental time-series data are the acid test if one is interested in real-world applications. Our second set of tests of ro-LMA, and comparisons of its accuracy to that of fnn-LMA, involved data from a laboratory experiment on computer performance dynamics. Like Lorenz-96, this system has been shown to exhibit a range of interesting deterministic dynamical behavior, from periodic orbits to low- and high-dimensional chaos,^{41,42} making it a good test case for this paper. It also has important practical implications; these dynamics, which arise from the deterministic, nonlinear interactions between the hardware and the software, have profound effects on execution time and memory use.

Collecting observations of the performance of a running computer is not trivial. We used the libpfm4 library, via PAPI (Performance Application Programming Interface^{43}) 5.2, to stop program execution at 100 000-instruction intervals—the unit of time in these experiments—and read the contents of the microprocessor's onboard hardware performance monitors, which we had programmed to observe important attributes of the system's dynamics. See Ref. 44 for an in-depth description of this experimental setup. The signals that are produced by this apparatus are scalar time-series measurements of system metrics like processor efficiency (e.g*.,* IPC, which measures how many instructions are being executed, on the average, in each clock cycle) or memory usage (e.g*.,* how often the processor had to access the main memory during the measurement interval).

We have tested ro-LMA on traces of many different processor and memory performance metrics gathered during the execution of a variety of programs on several different computers. Here, for conciseness, we focus on *processor* performance traces from two different programs, one simple and one complex, running on the same Intel i7-based computer. Figure 3(a) shows a small segment of an IPC time series gathered from that computer as it executed a four-line C program (col_major) that repeatedly initializes a 256× 256 matrix in column-major order. On the scale of this figure, these dynamics appear to be largely periodic, but they are actually chaotic.^{42} The bottom panel in Figure 3 shows a processor efficiency trace from a much more complex program: the 403.gcc compiler from the SPEC 2006CPU benchmark suite.^{45}

Since computer performance dynamics result from a composition of hardware and software, these two programs represent two different dynamical systems, even though they are running on the same computer. The dynamical differences are visually apparent from the traces in Figure 3; they are mathematically apparent from nonlinear time-series analysis of embeddings of those data,^{42} as well as in calculations of the information content of the two signals. Among other things, 403.gcc has much less predictive structure than col_major and is thus much harder to forecast.^{17} These attributes make this a useful pair of experiments for an exploration of the utility of reduced-order forecasting.

For statistical validation, we collected 15 performance traces from the computer as it ran each program, calculated embedding parameters as described in Sec. II A, and generated forecasts of each trace using ro-LMA and fnn-LMA. Figure 4 shows *p _{j}* vs.

*c*plots for these forecasts. The diagonal structure on the top two plots indicates that both ro-LMA and fnn-LMA performed well on the col_major traces. The

_{j}*MASE*scores across all 15 trials in this set of experiments were 0.050 ± 0.002 and 0.063 ± 0.003, respectively, i.e

*.,*ro-LMA's accuracy was somewhat worse than that of fnn-LMA. For 403.gcc, however, ro-LMA appears to be somewhat more accurate: 1.488 ± 0.016 versus fnn-LMA's 1.530 ± 0.021. Note that the 403.gcc

*MASE*scores were higher for both forecast methods than on col_major, simply because the 403.gcc signal contains less predictive structure.

^{17}This actually makes the comparison somewhat problematic, as discussed at more length on page 9.

Table I summarizes all of the experiments presented so far. Overall, the results on the computer-performance data are consistent with those that we obtained with the Lorenz-96 example in Sec. III B: prediction accuracies of ro-LMA and fnn-LMA were quite similar on all traces, despite the former's use of a theoretically incomplete reconstruction. This amounts to a validation of the conjecture on which this paper is based. And in both numerical and experimental examples, ro-LMA actually *outperformed* fnn-LMA on the more-complex traces (403.gcc, *K* = 47). We believe that this is due to the noise mitigation that is naturally effected by a lower-dimensional reconstruction. Finally, we found that it was possible to improve the performance of both ro-LMA and fnn-LMA on all four of these dynamical systems by adjusting the free parameter, *τ*. It is to this issue that we turn next; following that, we address the issue of prediction horizon.

Signal . | fnn-LMA . | ro-LMA . | Trials . |
---|---|---|---|

Lorenz-96 K = 22 | 0.441 ± 0.033 | 0.391 ± 0.016 | 330 |

Lorenz-96 K = 47 | 1.007 ± 0.043 | 0.985 ± 0.047 | 329 |

col_major | $0.050\xb10.002$ | 0.063 ± 0.003 | 15 |

403.gcc | $1.5297\xb10.0214$ | 1.488 ± 0.016 | 15 |

Signal . | fnn-LMA . | ro-LMA . | Trials . |
---|---|---|---|

Lorenz-96 K = 22 | 0.441 ± 0.033 | 0.391 ± 0.016 | 330 |

Lorenz-96 K = 47 | 1.007 ± 0.043 | 0.985 ± 0.047 | 329 |

col_major | $0.050\xb10.002$ | 0.063 ± 0.003 | 15 |

403.gcc | $1.5297\xb10.0214$ | 1.488 ± 0.016 | 15 |

## IV. TIME SCALES, DATA LENGTH, AND PREDICTION HORIZONS

### A. The *τ* parameter

The embedding theorems require only that *τ* be greater than zero and not a multiple of any period of the dynamics. In practice, however, *τ* can play a critical role in the success of delay-coordinate reconstruction—and any nonlinear time-series analysis that follows.^{8,21,24} It should not be surprising, then, that *τ* might affect the accuracy of an LMA-based method that uses the structure of a time-delay reconstruction to make forecasts.

Figure 5 explores this effect in more detail. Across all *τ* values, the *MASE* of col_major was generally lower than the other three experiments—again, simply because this time series has more predictive structure. The *K* = 22 curve is generally lower than the *K* = 47 one for the same reason, as discussed at the end of Sec. III. For both Lorenz-96 traces, prediction accuracy decreases monotonically with *τ*. It is known that increasing *τ* can be beneficial for longer prediction horizons.^{8} Our situation involves short prediction horizons, so it makes sense that our observation is consistent with the contrapositive of that result.

For the experimental traces, the relationship between *τ* and *MASE* score is less simple. There is only a slight upward overall trend (not visible at the scale of the figure) and the curves are nonmonotonic. This latter effect is likely due to periodicities in the dynamics, which are very strong in the col_major signal (viz., a dominant unstable period-three orbit in the dynamics, which traces out the top, bottom, and middle bands in Figure 3). Periodicities can cause obvious problems for delay reconstructions—and forecast methods that employ them—if the delay is a harmonic or subharmonic of their frequencies, simply because the coordinates of the delay vector are not independent samples of the dynamics. It is for this reason that Takens mentions this condition in his original paper. Here, the effect of this is an oscillation in the forecast accuracy vs. *τ* curve: low when it is an integer multiple of the period of the dominant unstable periodic orbit in the col_major dynamics, for instance, then increasing with *τ* as more independence is introduced into the coordinates, then falling again as *τ* reaches the next integer multiple of the period, and so on.

This naturally leads to the issue of choosing a good value for the delay parameter. Recall that all of the experiments reported in Sec. III used a *τ* value chosen at the first minimum of the mutual information curve for the corresponding time series. These values are indicated by the black vertical dashed lines in Figure 5. This estimation strategy was simply a starting point, chosen here because it is arguably the most common heuristic used in the nonlinear time-series analysis community. As is clear from Figure 5, though, it is *not* the best way to choose *τ* for reduced-order forecast strategies. Only in the case of col_major is the *τ* value suggested by the mutual-information calculation optimal for ro-LMA—that is, does it fall at the lowest point on the *MASE* vs. *τ* curve.

This is the point alluded to at the end of Sec. III: one can improve the performance of ro-LMA simply by choosing a different *τ*—i.e., by adjusting the one free parameter of that reduced-order forecast method. In all cases (aside from col_major where the default *τ* was the optimal value) adjusting *τ* brought ro-LMA's error down below fnn-LMA's. The improvement can be quite striking: for visual comparison, Figure 6 shows ro-LMA forecasts of a *K* = 47 Lorenz-96 trace using default and best-case values of *τ*. Again, this supports the main point of this paper: forecast methods based on incomplete reconstructions of time-series data can be very effective—and much less work than those that require a full embedding.

However, that comparison is not really fair. Recall that the embedding that is used by fnn-LMA, as defined so far, fixes *τ* at the first minimum of the average mutual information for the corresponding trace. It may well be the case that that *τ* value is suboptimal for *that* method as well—as it was for ro-LMA. To test this, we performed an additional set of experiments to find the optimal *τ* for fnn-LMA. Table II shows the numerical values of the *MASE* scores, for forecasts made with default and best-case *τ* values, for both methods and all traces. In the two simulated examples, best-case ro-LMA significantly outperformed best-case fnn-LMA; in the two experimental examples, the best-case fnn-LMA was better, but not by a huge margin. That is, even if one optimizes *τ* individually for these two methods, ro-LMA keeps up with, and sometimes outperforms, fnn-LMA.

Signal . | ro-LMA (default τ)
. | ro-LMA (best-case τ)
. | fnn-LMA (default τ)
. | fnn-LMA (best-case τ)
. |
---|---|---|---|---|

Lorenz-96 K = 22 | 0.391 ± 0.016 | 0.073 ± 0.002 | 0.441 ± 0.033 | 0.137 ± 0.006 |

Lorenz-96 K = 47 | 0.985 ± 0.047 | 0.115 ± 0.006 | 1.007 ± 0.043 | 0.325 ± 0.020 |

col_major | 0.063 ± 0.003 | 0.063 ± 0.003 | $0.050\xb10.002$ | 0.049 ± 0.002 |

403.gcc | 1.488 ± 0.016 | 1.471 ± 0.014 | $1.530\xb10.021$ | 1.239 ± 0.020 |

Signal . | ro-LMA (default τ)
. | ro-LMA (best-case τ)
. | fnn-LMA (default τ)
. | fnn-LMA (best-case τ)
. |
---|---|---|---|---|

Lorenz-96 K = 22 | 0.391 ± 0.016 | 0.073 ± 0.002 | 0.441 ± 0.033 | 0.137 ± 0.006 |

Lorenz-96 K = 47 | 0.985 ± 0.047 | 0.115 ± 0.006 | 1.007 ± 0.043 | 0.325 ± 0.020 |

col_major | 0.063 ± 0.003 | 0.063 ± 0.003 | $0.050\xb10.002$ | 0.049 ± 0.002 |

403.gcc | 1.488 ± 0.016 | 1.471 ± 0.014 | $1.530\xb10.021$ | 1.239 ± 0.020 |

In view of our claim that part of ro-LMA's advantage stems from the natural noise mitigation effects of a low-dimensional reconstruction, it may appear somewhat odd that fnn-LMA works better on the experimental time-series data, which certainly contain noise. Comparisons of large *MASE* scores are somewhat problematic, however. Recall that *MASE* > 1 means that the forecast is worse than an in-sample random-walk forecast of the same trace. The bottom row of numbers in Table II, then, indicate that both LMA-based methods—no matter the *τ* values—generate poor predictions for 403.gcc: 24%–53% worse, on the average, than simply predicting that the next value will be equal to the previous value. There could be a number of reasons for this poor performance. This signal has almost no predictive structure^{17} and fnn-LMA's extra axes may add to its ability to capture that structure—in a manner that outweighs the potential noise effects of those extra axes. The dynamics of col_major, on the other hand, are fairly low dimensional and dominated by a single unstable periodic orbit; it could be that the embedding of these dynamics used in fnn-LMA captures its structure so well that fnn-LMA is basically perfect and ro-LMA cannot do any better.

While the plots and *MASE* scores in this section suggest that ro-LMA forecasts are quite good, it is important to note that both “default” and “best-case” *τ* values were chosen after the fact in all of those experiments. This is not useful in practice. A significant advantage of a reduced-order forecast strategy like ro-LMA is its ability to work “on the fly” in situations where one may not have the leisure to run an average mutual information calculation on a long segment of the trace and find a clear minimum—let alone construct a plot like Figure 5 and choose an optimal *τ* from it. (Producing that plot required 3000 runs involving a total of 22 010 700 forecasted points, which took approximately 44.5 h on an Intel Core i7.)

Recent preliminary results,^{46} however, suggest that it is possible to estimate optimal *τ* values for delay reconstruction-based forecasting by calculating the value that maximizes the shared information between each delay vector and the future state of the system. For all of the examples in this paper, that strategy produces the same *τ* value as found with the exhaustive search mentioned above. This is a fairly efficient calculation: $O(n\u2009log\u2009n)$ time where *n* is the length of the time series. Even so, it can be onerous if *n* is very large. However, this measure can be calculated on very small subsets of the time series and still produce accurate results, which could allow *τ* to be selected *adaptively* for the purposes of forecasting nonstationary systems with ro-LMA.

### B. Prediction horizon

There are fundamental limits on the prediction of chaotic systems. Positive Lyapunov exponents make long-term forecasts a difficult prospect beyond a certain point for even the most sophisticated methods.^{3,8,17} Note that the coordinates of points in higher-dimensional delay-reconstruction spaces sample wider temporal spans of the time series. In theory, this means that one should be able to forecast further into the future with a higher-dimensional reconstruction without losing memory of the initial condition. This raises an important concern about ro-LMA: whether its accuracy will degrade with increasing prediction horizon more rapidly than that of fnn-LMA.

Recall that the formulations of both methods, as described and deployed in Secs. II and III of this paper, assume that measurements of the target system are available in real time: they “rebuild” the LMA models after each step, adding new time-series points to the embeddings or reconstructions as they arrive. Both ro-LMA and fnn-LMA can easily be modified to produce longer forecasts, however—say, *h* steps at a time, only updating the model with new observations at *h*-step intervals. Naturally, one would expect forecast accuracy to suffer as *h* increases for any non-constant signal. The question at issue in this section is whether the greater temporal span of the data points used by fnn-LMA mitigates that degradation, and to what extent.

Assessing the accuracy of *h*-step forecasts requires a minor modification to the *MASE* calculation, since its denominator is normalized by *one-step* random-walk forecast errors. In an *h*-step situation, one should instead normalize by *h*-step forecasts, which involves modifying the scaling term to be the average root-mean-squared error accumulated using random-walk forecasting on the initial training signal, *h* steps at a time.^{36} This gives a new figure of merit that we will call *h-MASE*

Note that the *h*-step forecast accuracy of the random-walk method will also degrade with *h*, so *h-MASE* will always be lower than *MASE*. This also means that *h-MASE* scores should not be compared for different *h*.

In Table III, we provide *h-MASE* scores for *h*-step forecasts of the different sets of experiments from Sec. III. The important comparisons here are, as mentioned above, across the rows of the table. The different methods “reach” different distances back into the time series to build the models that produce those forecasts, of course, depending on their delay and dimension. At first glance, this might appear to make it hard to sensibly compare, say, default-*τ* ro-LMA and best-case-*τ* fnn-LMA, since they use different *τ*s and different values of the reconstruction dimension and thus are spanning different ranges of the time series. Because *h* is measured in units of the sample interval of the time series here, however, comparing one *h*-step forecast to another (for the same *h*) does make sense.

Signal . | h
. | ro-LMA (default τ)
. | ro-LMA (best-case τ)
. | fnn-LMA (default τ)
. | fnn-LMA (best-case τ)
. |
---|---|---|---|---|---|

Lorenz-96 K = 22 | 1 | 0.391 ± 0.016 | 0.073 ± 0.002 | 0.441 ± 0.003 | $0.137\xb10.006$ |

Lorenz-96 K = 22 | 10 | 0.101 ± 0.008 | 0.066 ± 0.003 | 0.062 ± 0.011 | 0.033 ± 0.002 |

Lorenz-96 K = 22 | 50 | 0.084 ± 0.007 | 0.074 ± 0.008 | 0.005 ± 0.002 | 0.004 ± 0.001 |

Lorenz-96 K = 22 | 100 | 0.057 ± 0.005 | 0.050 ± 0.004 | 0.003 ± 0.001 | 0.003 ± 0.001 |

Lorenz-96 K = 47 | 1 | 0.985 ± 0.047 | 0.115 ± 0.006 | 0.995 ± 0.053 | $0.325\xb10.020$ |

Lorenz-96 K = 47 | 10 | 0.223 ± 0.011 | 0.116 ± 0.005 | $0.488\xb10.042$ | 0.218 ± 0.012 |

Lorenz-96 K = 47 | 50 | 0.117 ± 0.011 | 0.112 ± 0.010 | 0.127 ± 0.011 | 0.119 ± 0.010 |

Lorenz-96 K = 47 | 100 | 0.075 ± 0.006 | 0.068 ± 0.005 | 0.079 ± 0.005 | 0.075 ± 0.004 |

col_major | 1 | 0.063 ± 0.003 | 0.063 ± 0.003 | 0.050 ± 0.002 | 0.049 ± 0.002 |

col_major | 10 | 0.054 ± 0.006 | 0.046 ± 0.003 | 0.021 ± 0.001 | 0.018 ± 0.001 |

col_major | 50 | 0.059 ± 0.009 | 0.037 ± 0.003 | 0.012 ± 0.003 | 0.009 ± 0.001 |

col_major | 100 | $0.044\xb10.004$ | 0.028 ± 0.006 | 0.010 ± 0.003 | 0.007 ± 0.001 |

403.gcc | 1 | 1.488 ± 0.016 | 1.471 ± 0.014 | 1.530 ± 0.021 | 1.239 ± 0.020 |

403.gcc | 10 | 0.403 ± 0.009 | 0.396 ± 0.009 | 0.384 ± 0.007 | 0.369 ± 0.010 |

403.gcc | 50 | 0.154 ± 0.003 | 0.151 ± 0.005 | 0.143 ± 0.003 | 0.141 ± 0.003 |

403.gcc | 100 | 0.101 ± 0.002 | 0.101 ± 0.003 | 0.095 ± 0.002 | 0.093 ± 0.002 |

Signal . | h
. | ro-LMA (default τ)
. | ro-LMA (best-case τ)
. | fnn-LMA (default τ)
. | fnn-LMA (best-case τ)
. |
---|---|---|---|---|---|

Lorenz-96 K = 22 | 1 | 0.391 ± 0.016 | 0.073 ± 0.002 | 0.441 ± 0.003 | $0.137\xb10.006$ |

Lorenz-96 K = 22 | 10 | 0.101 ± 0.008 | 0.066 ± 0.003 | 0.062 ± 0.011 | 0.033 ± 0.002 |

Lorenz-96 K = 22 | 50 | 0.084 ± 0.007 | 0.074 ± 0.008 | 0.005 ± 0.002 | 0.004 ± 0.001 |

Lorenz-96 K = 22 | 100 | 0.057 ± 0.005 | 0.050 ± 0.004 | 0.003 ± 0.001 | 0.003 ± 0.001 |

Lorenz-96 K = 47 | 1 | 0.985 ± 0.047 | 0.115 ± 0.006 | 0.995 ± 0.053 | $0.325\xb10.020$ |

Lorenz-96 K = 47 | 10 | 0.223 ± 0.011 | 0.116 ± 0.005 | $0.488\xb10.042$ | 0.218 ± 0.012 |

Lorenz-96 K = 47 | 50 | 0.117 ± 0.011 | 0.112 ± 0.010 | 0.127 ± 0.011 | 0.119 ± 0.010 |

Lorenz-96 K = 47 | 100 | 0.075 ± 0.006 | 0.068 ± 0.005 | 0.079 ± 0.005 | 0.075 ± 0.004 |

col_major | 1 | 0.063 ± 0.003 | 0.063 ± 0.003 | 0.050 ± 0.002 | 0.049 ± 0.002 |

col_major | 10 | 0.054 ± 0.006 | 0.046 ± 0.003 | 0.021 ± 0.001 | 0.018 ± 0.001 |

col_major | 50 | 0.059 ± 0.009 | 0.037 ± 0.003 | 0.012 ± 0.003 | 0.009 ± 0.001 |

col_major | 100 | $0.044\xb10.004$ | 0.028 ± 0.006 | 0.010 ± 0.003 | 0.007 ± 0.001 |

403.gcc | 1 | 1.488 ± 0.016 | 1.471 ± 0.014 | 1.530 ± 0.021 | 1.239 ± 0.020 |

403.gcc | 10 | 0.403 ± 0.009 | 0.396 ± 0.009 | 0.384 ± 0.007 | 0.369 ± 0.010 |

403.gcc | 50 | 0.154 ± 0.003 | 0.151 ± 0.005 | 0.143 ± 0.003 | 0.141 ± 0.003 |

403.gcc | 100 | 0.101 ± 0.002 | 0.101 ± 0.003 | 0.095 ± 0.002 | 0.093 ± 0.002 |

There are a number of interesting questions to ask about the patterns in this table, beginning with the one that set off these experiments: how do fnn-LMA and ro-LMA compare if one individually optimizes *τ* for each method? The numbers indicate that ro-LMA beats fnn-LMA for *h* = 1 on the *K* = 22 traces, but then loses progressively badly (i.e., by more *σ*s) as *h* grows. col_major follows the same pattern except that ro-LMA is worse even at *h* = 1. For 403.gcc, fnn-LMA performs better at both *τ*s and all values of *h*, but the disparity between the accuracy of the two methods does not systematically worsen with increasing *h*. For *K* = 47, ro-LMA consistently beats fnn-LMA for both *τ*s for $h\u226410$ but the accuracy of the two methods is comparable for longer prediction horizons.

Another interesting question is whether the assertions in Sec. III stand up to increasing prediction horizon. Those assertions were based on the results that appear in the *h* = 1 rows of Table III: ro-LMA was better than fnn-LMA on the *K* = 22 Lorenz-96 experiments, for instance, for both *τ* values. This pattern does not persist for longer prediction horizons: rather, fnn-LMA generally outperforms ro-LMA on the *K* = 22 traces for $h=10,50,$ and 100. The *h* = 1 comparisons for *K* = 47 and col_major *do* generally persist for higher *h*, however. As mentioned before, 403.gcc is problematic because its *MASE* scores are so high, but the accuracies of the two methods are similar for all *h* > 1.

The fact that fnn-LMA generally outperforms ro-LMA for longer prediction horizons makes sense simply because ro-LMA samples less of the time series and therefore has less “memory” about the dynamics. This is a well-known effect.^{3} In view of the fundamental limits on prediction of chaotic dynamics, however, it is worth considering whether *either* method is really making correct long-term forecasts. Indeed, time-domain plots of long-term forecasts (e.g., Figure 7) reveal that both fnn-LMA and ro-LMA forecasts have fallen off the true trajectory and onto shadow trajectories—another well-known phenomenon when forecasting chaotic dynamics.^{11}

In other words, it appears that even a 50-step forecast of these chaotic trajectories is a tall order: i.e., that we are running up against the fundamental bounds imposed by the Lyapunov exponents. In view of this, it is promising that ro-LMA generally keeps up with fnn-LMA in many cases—even when both methods are struggling with the prediction horizon, and even though the model that ro-LMA uses has much less memory about the past history of the trajectory. An important aspect of our future research on this topic will be developing efficient methods for deriving bounds on reasonable prediction horizons purely from the time series, i.e., without using traditional methods such as Lyapunov exponents, which are difficult to estimate from experimental data. See Ref. 46 for some preliminary results on this line of research.

### C. Data length

Most real-world data sets are fixed in length and some are quite short. Moreover, many of the dynamical systems that one might like to predict are nonstationary. For these reasons, it is important to understand the effects of data length upon forecast methods that are based on delay reconstructions. For the reconstruction to be an actual embedding that supports accurate calculations of dynamical invariants, the data requirements are fairly dire. Traditional estimates (e.g., by Smith^{47} and by Tsonis *et al.*^{48}) suggest that $\u22481017$ data points would be required to embed the Lorenz-96 *K* = 47 data in Sec. III A, where the known *d _{KY}* values

^{38}indicate that one might need at least

*m*= 38 dimensions to properly unfold the dynamics. As described in Sec. II, however, that is only truly necessary if one is interested in preserving the diffeomorphism between true and reconstructed dynamics, down to the last detail. For the purposes of prediction, thankfully, one can make progress with far less data. For example, Sauer

^{11}successfully forecasted the continuation of a 16 000-point time series embedded in 16 dimensions; Sugihara and May

^{12}used delay-coordinate embedding with

*m*as large as seven to successfully forecast biological and epidemiological time-series data as short as 266 points.

While the results in Secs. III and IV are based on far longer traces than the examples mentioned at the end of the previous paragraph, it is still worth exploring whether data-length issues are affecting those results—and evaluating whether those effects differentially impact fnn-LMA because of its higher-dimensional model. This kind of test can be problematic in practice, of course, since it requires varying the length of the data set. In a synthetic example like Lorenz-96, that is not a problem, since one can just run the ODE solver for more steps.

Figure 8 shows *MASE* scores for ro-LMA and fnn-LMA forecasts of the Lorenz-96 system as a function of data length. The dashed lines show the mean forecast error for each method; the dotted lines indicate the range of the standard deviation. For both *K* = 22 and *K* = 47, the fnn-LMA error is higher than the ro-LMA error, corroborating the results in Secs. III A. Both generally improve with data length, as one would expect—but only up to a point. The *MASE* scores of the ro-LMA forecasts, in particular, reach a plateau at about 350 000 points in the *K* = 22 case and 150 000 in the *K* = 47 case. fnn-LMA, on the other hand, keeps improving out to the end of the figure. Eventually, there is a crossover for *K* = 22, but not until 1.6 × 10^{6} points. For *K* = 47, the curves were still converging out to 4 × 10^{6} points. The difference in crossover points is not surprising, given that the dimension of the *K* = 47 dynamics is so much higher: $dKY\u2a853$, versus $\u224819$. What *is* surprising, and useful, is that for traces with 1 × 10^{6} points—far more data than a practitioner can generally hope for—ro-LMA is still outperforming fnn-LMA. Moreover, it is doing so using fewer dimensions, which makes it computationally efficient.

Plateaus in curves like the ones in Figure 8 suggest that the corresponding forecast method has captured all of the information that it can use, so that adding data does not improve the forecast. This effect, which is described at more length in Ref. 46, depends on dimension for the obvious reason that filling out a higher-dimensional object requires more data. This suggests another piece of forecasting strategy: when one is data-rich, it may be wise to choose fnn-LMA over ro-LMA. In making this choice, though, one should also consider the added computational complexity, which will be magnified by the longer data length. When data are not plentiful, though, or the system is nonstationary, our results suggest that it is advantageous to ignore the theoretical bounds of Ref. 6 and use low-dimensional reconstructions in forecast models.

## V. CONCLUSION

In summary, it appears that incomplete reconstructions—time-delay reconstructions that do not satisfy the formal conditions of the associated theorems—are indeed adequate for the purposes of forecasting dynamical systems. Indeed, they appear to offer simple state-space based methods *even more* traction on this prediction task than complete embeddings, with greatly reduced computational effort.

The study in this paper specifically focuses on the 2*D* instantiation of the claim above, for two reasons. First, that is the extreme that, in a sense, most seriously violates the basic tenets of the delay-coordinate embedding machinery; second, it enables the largest reduction in the cost of the near-neighbor searches that are the bulk of the computational effort in most state space-based forecast methods. A number of other issues arise when one considers increasing the reconstruction dimension beyond two, besides raw computational complexity. It would introduce another free parameter into the method, thereby requiring some sort of principled strategy for choosing its value (a choice that ro-LMA completely avoids by fixing *m* = 2).

In general, one would expect forecast accuracy to improve with the number of dimensions, but not without limit. Among other things, noise effects grow with reconstruction dimension (simply because every noisy data point affects *m* points in the reconstructed trajectory). And from an information-theoretic standpoint, one would expect diminishing returns when the span of the delay vector ($m\xd7\tau $) exceeds the “memory” of the system. For all of those reasons, it would seem that there should be a plateau beyond which increasing the reconstruction dimension does not improve the accuracy of a forecast methods that use the resulting models. We explore that issue further in Ref. 46, using the measure mentioned in the last paragraph of Sec. IV A to derive optimal reconstruction dimensions for near-neighbor forecasting for a broad range of systems, noise levels, and forecast horizons—all of which turned out to be *m* = 2. In the bulk of the dynamical systems literature on forecasting, however, the optimal reconstruction dimension for the purposes of forecasting was thought to be near the value that provides an embedding of the data. The results in this paper suggest that this is not the case.

We chose the classic Lorenz method of analogues as a good exemplar of the class of state space-based forecast methods, but we believe that the results have broader implications for other members of that class (e.g., Refs. 3 and 9–13). Working with a low-dimensional reconstruction could potentially reduce the computational search and storage costs of *any* such method, while also avoiding the so-called “curse of dimensionality” and mitigating noise multipliers caused by extra embedding dimensions.^{18} It also reduces data requirements, since fewer points are required to fill out a lower-dimensional object. And when one fixes *m* = 2, there is only a single free parameter in the method—one that can be estimated effectively from a short sample of the data set, allowing them to adapt to nonstationary dynamics. There may be some limitations on the class of methods for which these claims hold, of course; matters may get more complicated, and the results less clear, for forecast methods that perform other kinds of projections. On the flip side, however, our results can be viewed as explaining why those methods work so well.

Again, no forecast model will be ideal for all noise-free deterministic signals, let alone all real-world time-series data sets. However, the proof of concept offered in this paper is encouraging: prediction in projection appears to work remarkably well, even though the models that it uses are not necessarily topologically faithful to the true dynamics. Our ultimate goal is to be able to show that this reduced-order modeling strategy—a simple yet powerful reduction of a time-tested method—has real practical utility for a wide spectrum of forecasting tasks as a simple, agile, adaptive, noise-resilient, forecasting strategy for nonlinear systems.

## ACKNOWLEDGMENTS

This work was supported by NSF Grant No. CMMI-1162440. We would also like to acknowledge Mark Muldoon, James D. Meiss, James P. Crutchfield, Holger Kantz, and Robert Easton for valuable discussions throughout the development of this work, as well as the anonymous reviewers for their insightful feedback.

## References

*Lecture Notes in Mathematics*, edited by

*K*= 22 and

*K*= 47 time series would require $m\u2a866$ and $m\u2a8638$, respectively. The values suggested by the false-near neighbor method for the

*K*= 22 traces are in line with this, but the

*K*= 47 FNN values are far smaller than $2dKY$.