Non-stationary systems are found throughout the world, from climate patterns under the influence of variation in carbon dioxide concentration to brain dynamics driven by ascending neuromodulation. Accordingly, there is a need for methods to analyze non-stationary processes, and yet, most time-series analysis methods that are used in practice on important problems across science and industry make the simplifying assumption of stationarity. One important problem in the analysis of non-stationary systems is the problem class that we refer to as parameter inference from a non-stationary unknown process (PINUP). Given an observed time series, this involves inferring the parameters that drive non-stationarity of the time series, without requiring knowledge or inference of a mathematical model of the underlying system. Here, we review and unify a diverse literature of algorithms for PINUP. We formulate the problem and categorize the various algorithmic contributions into those based on (1) dimension reduction, (2) statistical time-series features, (3) prediction error, (4) phase-space partitioning, (5) recurrence plots, and (6) Bayesian inference. This synthesis will allow researchers to identify gaps in the literature and will enable systematic comparisons of different methods. We also demonstrate that the most common systems that existing methods are tested on—notably, the non-stationary Lorenz process and logistic map—are surprisingly easy to perform well on using simple statistical features like windowed mean and variance, undermining the practice of using good performance on these systems as evidence of algorithmic performance. We then identify more challenging problems that many existing methods perform poorly on and which can be used to drive methodological advances in the field. Our results unify disjoint scientific contributions to analyzing the non-stationary systems and suggest new directions for progress on the PINUP problem and the broader study of non-stationary phenomena.

The analysis of time-series data is important for a range of fields, including physics, neuroscience, ecology, and finance. Time-series analysis methods commonly assume that the measured process is stationary, i.e., certain properties of the process, such as mean and variance, are constant across time, and yet, many important processes are non-stationary on relevant dynamical timescale. For example, time series measured from electrical activity of the human brain during sleep exhibit considerable variation in statistical properties across wakefulness and different stages of sleep. Thus, there is a need for methods that can quantify non-stationary dynamical variation. Here, we consider the specific problem of inferring the parameters that drive non-stationarity of a time series, without requiring knowledge or inference of a mathematical model of the underlying system. We call this problem parameter inference from a non-stationary unknown process (PINUP). We make two main contributions. First, we review and categorize the algorithmic approaches to PINUP across a fragmented literature, consolidating a diverse body of methodological research for the first time. Unifying the terminology of the problem and providing an overview of the scientific methods that have been developed to date will allow researchers to take a unified approach to targeting gaps in the interdisciplinary literature, thus facilitating progress on PINUP. Second, we show that the systems most commonly used for benchmarking—the Lorenz process and logistic map—admit trivial algorithmic solutions, undermining the practice of presenting good performance on these problems as evidence of algorithmic progress. We present more challenging test problems, which many existing methods perform poorly on, and that can be used to drive improvements in PINUP algorithms. In turn, this will foster the development of new approaches to the broader study of non-stationary phenomena.

Time-series data are ubiquitous across science and industry, making the tools of time-series analysis essential for contemporary analytic practice. To make analyses tractable, time series are commonly assumed to be stationary. The assumption of weak stationarity entails that the first and second moments of the distribution of a time series are constant, whereas strong stationarity means that all conditional probabilities are constant in time.1 However, many studied systems exhibit non-stationary behavior that violates one or both of these assumptions. Representative examples of non-stationary systems can be found in physics,2 engineering,3 ecology,4 climate science,5 neuroscience,6 and finance.7 Moreover, for many studied systems, the pattern of non-stationary variation in dynamical behavior is of interest. For example, in the neuroscientific context, it is useful to ask whether instances of non-stationary brain dynamics are driven by patterns of activity in certain brain regions.8 The non-stationarity of such systems can be conceptualized in terms of one or more processes that vary the conditional probabilities of the system by modulating parameters of a hypothetical model or set of generative equations. Furthermore, if we can reconstruct a time series of the parameter variation underlying non-stationarity, we can then seek a correspondence between this parameter time series and some part of the system or its environment. In some cases, assuming stationarity may not only be incorrect but can blind us to important sources of variation that are crucial to understanding the system. For example, a recent study suggests that the time-series features that best discriminate whether a subject is an experienced meditator using electroencephalography (EEG) are those that index non-stationarity, with traditional spectral properties showing no difference.9 In summary, there is a need to develop methods that can be used to analyze non-stationary systems—e.g., by identifying and quantifying non-stationarity and characterizing the dynamics of processes that drive the statistical variation.

A pragmatic definition of stationarity is needed for data-driven analyses because, as theoretically defined for stochastic processes, stationarity is a property of processes rather than of finite time series and can only be evaluated in the large data limit—in the univariate case, x t ( t = 1 , , T), as the number of samples T . When stationarity is discussed in relation to deterministic dynamical systems, an appeal is often made to the concept of ergodicity, which is also problematic for finite data.10 Additionally, there is no consensus theoretical definition of stationarity for deterministic dynamical systems: some authors require the existence of an invariant ergodic measure,11–13 some require that the system be autonomous (i.e., with equations and parameters that are fixed across time),14–18 while others require that time-series statistics do not vary over the duration of measurement.19,20 Similar to Schreiber19 and Aguirre and Letellier,20 here we adopt a definition of non-stationarity as the variation over time in the joint probability distribution of a time series, for which variation in dynamical properties may be used as a surrogate measure. In practice, measuring variation in joint probabilities or certain statistics necessitates specifying some timescale, i.e., some temporal window length over which this variation is evaluated, and in general, stationarity is timescale dependent. For example, a process with a constant mean over long time windows can have a wandering mean over short windows. What timescale to consider is best decided by a practitioner on the basis of domain knowledge. For example, given an EEG time series measured during sleep, the relevant timescale will be much shorter if 200 Hz hippocampal ripples21 are the object of study, as compared to sleep stages, which by convention are rated using 30 s epochs.22 

In this work, we consider the problem of parameter inference from a non-stationary unknown process (PINUP), for which we adopt the following definition (which we develop more formally in Sec. II). Starting with time-series observations of a non-stationary process and in the absence of a known mathematical model of the process, PINUP involves inferring one or more time-varying parameters (TVPs) that drive the non-stationarity. A PINUP method is any algorithmic procedure for solving this problem. The specification that the generative process be unknown a priori is important. First, there are many real-world settings where there is considerable uncertainty about the functional form of a process. For example, for complex systems such as the brain, where we often have large amounts of noisy, time-series data, finding relevant time-varying patterns within the data is often more feasible than developing an accurate model of the entire system. Second, it is important to distinguish PINUP from the problem of inferring static or time-varying parameters for a known model, an example of which is the inverse problem for differential equations.23 The Kalman filter, particle filter, and other Bayesian and statistical inferential methods have been developed for this purpose.24–28 In some cases, it may be possible to reduce PINUP to this latter problem class if a generative model can first be learned via data-driven discovery.29–32 

PINUP is an important problem that has been studied widely across different disciplinary contexts, resulting in a correspondingly fragmented literature on approaches to tackling it. Existing theory and algorithms that have been developed for it are broad, from dimension reduction to Bayesian inference. These myriad theoretical and algorithmic contributions are often disjoint, with overall minimal referencing of approaches developed in different fields, suggesting a lack of familiarity with the interdisciplinary literature. There is no agreed problem formulation, and variation in terminology—e.g., inferring a “drifting parameter”33 or a “driving force”34 vs a “causal driver”35—further compounds this. Furthermore, many papers only demonstrate the usefulness of their individual algorithm on a single problem, most commonly, the logistic map or the Lorenz process.33,34,36–45 In addition to the challenge of determining the current state of the interdisciplinary scientific literature on this topic, this lack of benchmarking standards or systematic comparison of algorithm performance across a diverse range of challenging systems (as is a common practice for other problem classes) makes it difficult for a reader to discern whether progress is being made. Here, we unify the disjoint literature on PINUP by clearly defining the problem class (Sec. II) and reviewing and categorizing the diverse, algorithmic, scientific literatures (Sec. III). We then demonstrate, using new numerical experiments, the trivial nature of problems commonly used for benchmarking PINUP algorithms—including the logistic map—and introduce more challenging problems on which PINUP performance may more appropriately be tested and on which many conventional methods fail (in Sec. IV).

The basic procedure of PINUP is illustrated in Fig. 1. PINUP aims to infer the time series of one or more sources of non-stationary variation directly from a measured (univariate or multivariate) time series. Taking an example from Fig. 1, suppose that the dynamics of a brain were to be deliberately manipulated by modulating brain dopamine activity over time. Given a time series of brain activity, such as EEG data, one could use PINUP to try to infer a time series of dopamine activity. Similarly, if a chaotic flow such as the Lorenz process has a time-varying parameter (TVP), then we can analyze the Lorenz process time series to try to infer the time series of the TVP.

FIG. 1.

Parameter inference from a non-stationary unknown process (PINUP). (a) A non-stationary process is generated by an unknown model M under the influence of some time-varying parameter(s) (TVPs) θ ( t ). A selection of example processes is depicted with TVPs denoted in red: (i) brain activity under the time-varying influence of ascending neuromodulation by dopamine; (ii) climate (e.g., mean temperature) under the influence of changing CO 2; and (iii) the Lorenz process with a time-varying parameter, ρ ( t ) [see Eq. (A3)]. (b) A time-series realization X of the non-stationary process is observed. (c) A PINUP algorithm is used to infer the time-varying parameter(s) θ ^ ( t ) directly from the time series X.

FIG. 1.

Parameter inference from a non-stationary unknown process (PINUP). (a) A non-stationary process is generated by an unknown model M under the influence of some time-varying parameter(s) (TVPs) θ ( t ). A selection of example processes is depicted with TVPs denoted in red: (i) brain activity under the time-varying influence of ascending neuromodulation by dopamine; (ii) climate (e.g., mean temperature) under the influence of changing CO 2; and (iii) the Lorenz process with a time-varying parameter, ρ ( t ) [see Eq. (A3)]. (b) A time-series realization X of the non-stationary process is observed. (c) A PINUP algorithm is used to infer the time-varying parameter(s) θ ^ ( t ) directly from the time series X.

Close modal
We consider a time series x ( t ) of a non-stationary process to be generated by a model M, the joint probability distribution of which depends on one or more time-varying parameters, θ ( t ), across some time interval t [ 0 , T max ]. By framing PINUP in terms of a general model M, we allow for a range of generative processes. For example, M could represent a system of ODEs d d t x = f ( x , θ ( t ) ) or an iterative map x t + 1 = f ( x t , θ t ), among other possibilities including noise processes, stochastic differential equations (SDEs), or even a process governed by rules that could not be written down in closed form. Stationarity then corresponds to the fixed-parameter case, θ ( t ) = θ ( 0 ), t. In practice, regardless of whether M is defined in continuous or discrete time, to obtain a time series of length T, the process must be sampled discretely, for example, at a uniform sampling rate as t = 0 , Δ t , 2 Δ t , , ( T 1 ) Δ t. We denote a time-series realization from M ( θ ( t ) ) as X R T × D, where D is the number of time-series variables and the value at time t of variable i is x t , i. Given an ascending sequence of sampling times t i [ 0 , T max ], we let x i x ( t i ) and can write X in terms of row vectors,
(1)
The corresponding discretization in time of θ ( t ) is θ t. More generally, if there is some observation function g and noise process η, then our observed time series is Y R T × D, where y t = g ( x t ) + η t. We obtain Y = X if observations are noise-free and g is the identity function.

PINUP can be summarized as follows: given an observed time-series dataset X (or Y), resulting from some unknown process M under the influence of TVP(s) θ ( t ), our goal is to infer an approximation of the TVP(s), denoted θ ^ t.

Although many of the papers in the PINUP literature refer to the inference of a time-varying parameter, other terminology includes inference of a “driving force,”38,39,41,46 a “driving signal,”37 a “drifting parameter,”33,43,47 a “dynamical parameter,”48 or a “causal driver.”35 In what follows, we treat these terms as synonyms and use the term TVP for consistency.

How might one approach PINUP? Starting from the concept of non-stationarity, one could proceed by developing a method that quantifies variation in the joint probability distribution and then use this information to infer one or more TVPs. For example, some statistics (or “features”) of the dynamics will be sensitive to changes in the joint probability distribution induced by θ t (and in the optimal case will be sufficient statistics with respect to θ t). Therefore, one could measure a set of time-series statistics across sliding time-series windows and then examine the resulting time series of statistics to obtain an estimate of the TVP(s).49 Similarly, one could compute windowed empirical probability distributions, say, through naive histogram binning, in order to estimate variation in the underlying probability density function (for a stochastic process) or invariant measure (for a deterministic process).50 Moreover, if a pattern of non-stationary variation manifests jointly across several time-series variables (or statistical features), then dimension reduction could be used to project the data onto a lower-dimensional space that captures shared variation that is driven by the underlying parameter(s).39 Indeed, many such PINUP algorithms have been proposed but they have never been systematically studied or compared.

In this work, we organize the literature of theory and algorithms for PINUP into six categories of conceptually distinct approaches, illustrated in Fig. 2, which we explain, in turn, through this section. In particular, we organize the existing methods into those based on

  1. Dimension reduction [Sec. III A, Fig. 2(a)]. These methods project time-series data onto lower-dimensional spaces that best capture some properties (such as variance, autocorrelation, or predictability) that are sensitive to underlying TVPs.

  2. Statistical time-series features [Sec. III B, Fig. 2(b)]. These methods track parameter variation using statistics that may be sensitive to variation in the joint probability distribution of the underlying process.

  3. Prediction error [Sec. III C, Fig. 2(c)]. These methods construct a time-series prediction/forecasting model and then infer TVPs from the time series of the prediction error.

  4. Phase-space partitioning [Sec. III D, Fig. 2(d)]. These methods infer TVPs from multivariate time series of metrics, such as the prediction error, that are measured across different regions of abstract space.

  5. Recurrence plots (RP) [Sec. III E, Fig. 2(e)]. These methods infer TVPs from recurrence plots that are computed based on the patterns of recurrence in the time-series data.

  6. Bayesian inference [Sec. III F, Fig. 2(f)]. These methods start from some prior distribution and then infer TVPs from the data using a probability model.

FIG. 2.

Schematic depiction of six categories of PINUP methods, each of which tracks non-stationarity through some aspects of parameter-driven variation in the joint probability distribution of a process. (a) Dimension reduction methods project time-series data onto lower-dimensional spaces that optimize with respect to some statistical properties (such as variance, slowness, autocorrelation, or predictability). The image shows data being mapped to a two-dimensional subspace via a linear transformation. (b) Statistical time-series feature methods infer parameter variation from statistical variation. The figure depicts mean values of a single time-series variable from the Lorenz process computed using sliding windows. (c) Prediction error methods construct a time-series prediction/forecasting model and then infer parameter variation from the resulting prediction error time series. The figure depicts a one-step prediction error between the true trajectory and a prediction using some function. (d) Phase space partitioning methods infer parameter variation from multivariate time series of metrics, such as the prediction error, that are measured across different regions. The figure depicts a partition of the phase space of a Lorenz process. (e) Recurrence plot methods infer parameter variation from recurrence plots that are computed based on the patterns of recurrence in the time-series data. The recurrence plot of a Lorenz process driven by a sinusoidal parameter ρ is shown. (f) Bayesian inference methods start from some prior distribution over models and TVP values and then infer parameter variation from the data on the basis of a likelihood model. In the figure, parameter values θ ( t ) and a model M are drawn from prior distributions, and then, data realizations generated from a process M ( θ ( t ) ) are compared to the observed time series X.

FIG. 2.

Schematic depiction of six categories of PINUP methods, each of which tracks non-stationarity through some aspects of parameter-driven variation in the joint probability distribution of a process. (a) Dimension reduction methods project time-series data onto lower-dimensional spaces that optimize with respect to some statistical properties (such as variance, slowness, autocorrelation, or predictability). The image shows data being mapped to a two-dimensional subspace via a linear transformation. (b) Statistical time-series feature methods infer parameter variation from statistical variation. The figure depicts mean values of a single time-series variable from the Lorenz process computed using sliding windows. (c) Prediction error methods construct a time-series prediction/forecasting model and then infer parameter variation from the resulting prediction error time series. The figure depicts a one-step prediction error between the true trajectory and a prediction using some function. (d) Phase space partitioning methods infer parameter variation from multivariate time series of metrics, such as the prediction error, that are measured across different regions. The figure depicts a partition of the phase space of a Lorenz process. (e) Recurrence plot methods infer parameter variation from recurrence plots that are computed based on the patterns of recurrence in the time-series data. The recurrence plot of a Lorenz process driven by a sinusoidal parameter ρ is shown. (f) Bayesian inference methods start from some prior distribution over models and TVP values and then infer parameter variation from the data on the basis of a likelihood model. In the figure, parameter values θ ( t ) and a model M are drawn from prior distributions, and then, data realizations generated from a process M ( θ ( t ) ) are compared to the observed time series X.

Close modal

Dimension reduction involves mapping data to a lower-dimensional space in a way that optimizes or preserves some properties of the data.51 For example, principal component analysis (PCA) learns linear projections that maximize variance or, equivalently, minimize squared reconstruction error.52,53 The structure of the time-series data, namely, that the data are ordered in time and will, in general, display autocorrelation pose both challenges and opportunities for the dimension reduction methods. For example, time-series autocorrelation violates the assumption of independent and identically distributed (i.i.d.) data used by many methods but can also form the basis of alternative methods that exploit this structure (as seen below). For PINUP, if non-stationary variation manifests across a multivariate time series X—say, by modulating variance or autocorrelation—then, a dimension-reduction method that is sensitive to such changes could be used to project onto one or more dimensions that track the TVPs. This idea is depicted schematically in Fig. 2(a), which depicts a linear dimension reduction from three dimensions to a two-dimensional subspace.

Dimension reduction was used by Wiskott39 to infer parameter variation from non-stationary time series. To do this, Wiskott used slow feature analysis (SFA) that extracts slowly varying features for which the first temporal derivative is minimized.54 In contrast to PCA, the first dimension of which identifies the direction of the greatest time-series variance, and SFA tracks the direction of the slowest variation in the time series. The inductive bias of SFA toward slowness makes it sensitive to non-stationary dynamical variation when fast observed dynamics are modulated by a slower TVP. When SFA is used for PINUP, time-delay embedding55 (discussed further in Sec. III C) and polynomial basis expansion are first applied54 (noting that the method is termed “SFA2” when a second-order polynomial basis expansion is used). The noise-robustness of SFA for parameter inference was recently examined by Wang et al.,45 and numerous SFA variants have been applied to a range of problems over the past two decades.56 A closely related dimension reduction method is time-lagged independent component analysis (TICA), which finds components that have the largest autocorrelation at a given time-lag.57,58 TICA is equivalent to SFA, without TDE or basis expansion, in the case where one-step autocorrelation is considered.59 Several papers have explored the mathematical relationships between techniques such as PCA, SFA, TICA, smooth orthogonal decomposition (SOD, detailed in Sec. III D), and other methods.59–61 Nonlinear neural-network extensions of these and similar dimension-reduction techniques have also been proposed, such as deep-TICA, time-lagged auto-encoders (TAEs), and variational approach for Markov process networks (VAMPnets), many of which have been applied to discover slow collective variables underlying molecular dynamics.62–64 In contrast to SFA2, which applies a polynomial basis expansion by default,54 deep-TICA learns and applies a nonlinear basis expansion followed by TICA decomposition using a differentiable network layer.64 The linear methods, such as PCA, SFA, and TICA, trade expressivity for interpretability, noting that linear transformation weights can be interpreted as indicating the relative importance of different variables or features for each dimensional projection. In contrast, the nonlinear methods can flexibly learn more complex mappings but pose a risk of overfitting, especially for small datasets or when the signal-to-noise ratio is low.65 

Several dimension reduction PINUP methods have been applied to non-stationary time series to infer TVPs as a part of scientific discovery. For example, SFA has been used in climate science to infer TVPs that drive climate change and other variations related to greenhouse and aerosol emissions,66 the North Atlantic oscillation,67 the El Niño-Southern oscillation and Hale sunspot cycles,68 the Atlantic Multi-decade oscillation,69,70 and the East Asian trough.71 Additionally, there is an active literature using methods such as TICA, TAEs, and VAMPnets to discover reduced-order models of the patterns of variation underlying molecular activity.62,64,72–74

A statistical time-series feature is the scalar output of a function that has been applied to a time series, mapping a time series x R T R.75 Thousands of such time-series features have been developed across a wide range of scientific disciplines, with simple examples including mean and variance, and more complex examples including entropy measures and nonlinear methods such as correlation dimension.76 A feature vector can be computed by applying multiple algorithms to a time series, thereby providing a multifaceted statistical description of the time series. Applying such algorithms across sliding time-series windows results in a time series of statistical features, which describes the variation in statistical properties over time. Provided the chosen statistics are sensitive to TVP-driven statistical variation, time-series features can be used individually or in aggregate to infer TVPs. When multiple time-series features are used, a natural step is to apply some form of dimension reduction in order to obtain the TVP estimate θ ^ t. As an example, consider a hypothetical process in which a single TVP θ ( t ) varies the mean offset of the process. Simply computing a moving average will track θ ( t ) accurately. Alternatively, if θ ( t ) were to modulate the frequency of an oscillatory component of a time series, then θ ( t ) could be tracked by computing variation in the peak location of the power spectrum over time. Crucially, such windowed, feature-based inference of TVPs assumes that the dynamics within each window are pseudostationary so that within-window parameter variation can be neglected. This approach is illustrated in Fig. 2(b) as the computation of mean time-series values across windows.

Statistical features of a time series—such as mean, variance, and autocorrelation—computed over sliding time-series windows were used by Güttler, Kantz, and Olbrich49 to successfully reconstruct the space of TVPs of dynamical systems. However, their method requires manual selection of features that are sensitive to the dynamical changes induced by the TVPs relevant to any given problem; it does not provide a feature set that is sensitive to parameter variation in general. In a similar vein, other authors have performed PINUP using one or more hand-selected features. For example, Yanson et al.77 showed that a slow TVP driving a Rössler attractor can be reconstructed using variance, and Bollt et al.17 proposed a complexity measure termed “control entropy” with the aim of tracking TVPs. Addressing the related problem of parameter variation across a set of time series (each generated with some fixed set of parameter values) rather than over time, Niknazar, Nasrabadi, and Shamsollahi44 assessed the performance of individual nonlinear time-series features and volumetric features, which measure occupied space and geometric trajectory properties, for inferring parameter settings of the Lorenz and Mackey–Glass systems. Inferring parameter variation across time series was similarly undertaken by Alao, Lu, and Soljacic,48 who applied PCA dimension reduction to feature vectors of regression coefficients from the output layer of an echo state network, and by Fulcher et al.,78 who addressed the issue of manual feature selection by applying PCA to a comprehensive set of over 7000 time-series features from the hctsa package.76 Harris and Fulcher79 demonstrated that feature-based inference of a TVP can be biased by feature–feature redundancy, but that this issue can be ameliorated by applying PCA to appropriately chosen baseline data to find suitable orthogonal coordinates.

Consider the simple case of a non-stationary process M ( θ ( t ) ) where the effect of the scalar-valued TVP θ ( t ) is to translate the mean of the process. For example, picture the Lorenz attractor [see Fig. 1(a)] translating up and down the z-axis with θ ( t ). Suppose that a mean-sensitive predictive model is then trained using a portion of the associated time series, e.g., a single-step nearest-neighbor predictor [see Fig. 2(d)]. If the predictive model is then applied across time-series windows, one would expect the prediction error to vary according to changes in the mean of the process under the influence of θ ( t ). Moreover, since the prediction error varies with Δ θ, one could use the time series of prediction errors to infer θ ( t ). More generally, the methods we group here under the category “prediction error” involve training a predictive model using part or all of a measured time series and then inferring a TVP based on prediction error computed locally across time-series windows, typically by taking θ ^ t to be the time series of prediction errors.

Many of these methods start with phase-space reconstruction (where the phase space is the space of all possible states of a dynamical system55). The most common approach is to use a scalar-valued time series to construct a time-delay embedding (TDE), i.e., a vector time series where each vector contains dimension d entries from the original time series, each separated by a time delay of τ.55 Note that TDE is unnecessary when one already has access to time series for a complete set of state space variables. The appeal of TDE stems from Taken’s theorem,80,81 which provides conditions under which there exists a diffeomorphism (a differentiable, invertible bijection) between the manifolds of the reconstructed and true dynamics, noting that a diffeomorphism preserves topologically invariant properties. Of importance for the PINUP setting, Takens’ theorem assumes the underlying dynamical system to be autonomous; however, Stark82 extended the theorem to include systems subject to unknown external forcing. Additionally, in the setting of a D-dimensional process subject to P slow TVPs, Hegger et al.13 advise using an embedding dimension m > 2 ( D + P ) for TDE. For the purpose of PINUP, with a phase-space reconstruction in hand, one can then fit prediction models and compute prediction errors with the goal of tracking TVPs.

Schreiber19 proposed testing for the presence of non-stationarity using “cross-prediction error,” which quantifies how well a nearest neighbor predictor for one time-series segment performs on another time-series segment, similar to our example above. Schreiber and Schmitz1 then used the cross-prediction error to perform PINUP. They partition a time series into windows and construct a dissimilarity matrix of cross-prediction errors between windows. Clustering is then performed and a TVP is inferred based on computed distances from clusters over time. Verdes et al.34 provided a more general formulation of the prediction error approach. Starting with a TDE X, they consider a predictive model f such that x t + 1 f ( x t , θ t ) and show that θ t can be expressed analytically in terms of prediction error, given (i) a local linear approximation and (ii) a smooth dependence of f with θ. Szeliga et al.37 later proposed that f and θ ^ t be learned jointly using a multilayer perceptron neural network. The network is trained to perform one-step time-series prediction using a mean squared error (MSE) loss function, and θ ^ t is treated as a trainable vector where smoothness is enforced via an additional loss function of MSE across adjacent values θ ^ t and θ ^ t + 1. In follow-up papers, Széliga et al.38 and Verdes, Granitto, and Ceccatto40 showed how to use over-embedding and validation data to cope with noisy time series and to select hyperparameters during model training. An advantage of the formulation of Verdes et al.34 is that the form of f is not specified, allowing f to be constructed using a wide range of predictive modeling techniques. Subsequent authors have used prediction models based on echo state networks41 and support vector machines.43 However, an important limitation is that most of the methods in this category of methods yield only a scalar-valued prediction error, so they can resolve a single TVP θ ^ t, but in order to resolve multiple TVPs θ ^ t, some additional demixing technique is required, such as single-channel blind source separation.83 

Instead of examining a process globally, phase-space partitioning methods analyze local regions of phase space, most commonly by partitioning phase space into hypercubes, as illustrated in Fig. 2(e). Phase space is either reconstructed, e.g., using the method of TDE (see Sec. III C), or else a multivariate time series is taken to define a phase space, with each variable treated as a separate dimension. The key intuition is that local analyses, e.g., in contiguous regions of phase space, may be more sensitive to subtle fluctuations in statistical properties that only occur in certain regions of phase space. These methods output a multivariate time series to which some form of dimension reduction is applied, making it possible to resolve multiple TVPs underlying a single time series. Note that some of the methods listed here compute the prediction error locally, and so, they are conceptually closely aligned with the prediction error methods (Sec. III C). A key distinction is that, as noted, the methods in this section analyze multivariate time series of locally computed metrics, allowing for the inference of multiple TVPs, in contrast to the scalar-valued prediction errors used by methods from Sec. III C. Additionally, the methods in this section represent a historical thread of PINUP research, mostly from the field of mechanical engineering, and form a coherent body of work.

Chelidze et al. developed a PINUP method called phase space warping (PSW)33,47,84–89 that partitions phase space and computes prediction error locally in each region, yielding an error vector that evolves over time windows, followed by dimension reduction via smooth orthogonal decomposition (SOD). Hyperparameter optimization is a challenge for this method and is discussed at length in Li and Chelidze.89 Note that SOD and SFA are equivalent, sharing the generalized eigenvalue problem formulation: C ˙ U = C U Λ, where C and C ˙ are the autocovariance matrices of a time series and its first temporal derivative, respectively, and U and Λ are eigenvectors and eigenvalues.85,90 However, there can be subtle differences in the algorithmic implementation of each method; for example, SFA incorporates a PCA whitening step during which near-zero eigenvalues can be optionally discarded according to some threshold.91 

Epureanu et al. proposed the method of sensitivity vector fields (SVF)42,92–96 that first partitions phase space and then computes a vector of local divergence between trajectories in each time window compared to a reference window, followed by dimension reduction via PCA. Although the PSW and SVF methods are similar, key differences are that the former constructs local predictive models and applies SOD dimension reduction, whereas the latter compares trajectories directly and uses PCA dimension reduction.

Nguyen and Chelidze97 noted the computationally intensive nature of PSW and SVF and so proposed a fast method called characteristic distance (CD) that is based on the Birkhoff ergodic theorem.98 CD selects a set of random points in phase space and then computes vectors of mean distance from each point, yielding a time series of mean distance vectors to which SOD dimension reduction is then applied. Sloboda et al.99–102 observed that PSW and SVF require various modeling and hyperparameter choices and, in response, proposed the boundary transformation vector (BTV) method as a way to directly measure local attractor deformation. BTV infers parameter variation by computing 2D Poincaré sections for a number of planes through phase space and then computing vectors of boundary deformation metrics across time windows. Sloboda et al. were also influenced by Carroll50 who proposed a density-based method that partitions phase space into bins in order to compute an empirical probability density, after which feature vectors are produced through projection onto a polynomial basis, followed by TVP reconstruction using the Euclidean distance between the feature vector for each time window compared to a reference window.

Many of the phase-space partitioning PINUP methods have been used in applied research on fault detection and prognostication of the condition of industrial assets.47,103 The idea is that for some non-stationary systems, there exists a TVP that tracks a condition such as the gradual mechanical or electrical failure of a component. In such cases, PINUP methods can be used to anticipate deteriorations in performance and eventual failure of the system. Applications that have been examined include tracking the discharge of a battery,47,84 crack growth in an oscillating system,85–87,89,104 voltage input to a magneto-elastic oscillator,97 behavior of an aeroelastic system,95 the position of additive mass on a vibrating beam,94 and variation in resistance in a chaotic Chua circuit,42,99 in addition to the biological case of tracking muscle fatigue in high-performance settings.88,105,106

Introduced by Eckmann, Kamphorst, and Ruelle,107 recurrence plots (RPs) represent the times at which states in a phase space recur over the course of a time series. Given a time series X, a threshold distance ϵ, a norm , and the Heaviside function Θ ( ), an RP is constructed as a two-dimensional matrix R according to R i , j = Θ ( ϵ x i x j ).108 Accordingly, a value of 1 at R i , j means that the time-series states at times i and j are within distance ϵ. An example RP matrix for a non-stationary Lorenz process driven by a sinusoidal ρ parameter is visualized in Fig. 2(c). Note that some form of phase-space reconstruction, typically time-delay embedding, is performed prior to constructing the RP, requiring the selection of additional hyperparameters.108 In this section, we focus on methods that analyze the RP directly as a matrix or graph, whereas analyzing the time series of recurrence quantification analysis (RQA) statistics would form a subset of the times-series feature-based methods grouped together in Sec. III B. Indeed, certain RQA statistics like determinism (DET) and laminarity (LAM) are plausible candidates for tracking time-varying parameters, but this remains an area in need of further research. When applied to PINUP, the intuition is that the RP structure is an informative signature of the (possibly nonlinear) statistical properties of a process, and so, non-stationarity should be reflected in the RP. For example, if similar values of θ ( t ) result in similar dynamical behavior, e.g., so that the process evolves on the same attractor, then the probability of recurrences will be higher, and so the similarity between rows or columns of the RP can be used as the basis for TVP reconstruction.

Casdagli36 observed that for appropriate TDE parameters, the RP of a TVP-driven non-stationary process approximates the RP of the TVP itself. Tanio, Hirata, and Suzuki46 later showed how to more reliably reconstruct a TVP from an RP by using combinatorial optimization to permute the RP so as to order the time-series points according to the similarity of their dynamics, thereby obtaining an approximate, monotonic transformation of the original TVP. Hirata, Horai, and Aihara109 developed another method for reconstructing the time series from RPs by first creating a weighted graph based on neighborhood overlap, then computing all-pairs shortest paths using Dijkstra’s algorithm,110 and then applying multidimensional scaling111 to reconstitute the original time series. This method also works when infinite-dimensional TDE is used for RP construction and can be used for PINUP if the RP is coarse-grained prior to applying the reconstruction procedure.112,113 Recently, Gilpin35 proposed inferring a causal, driving process by first constructing a distance-based adjacency matrix representation, i.e., RP, of a multivariate time series and then computing the subleading eigenvector, the graph Laplacian of this matrix.

When used to infer parameters, Bayesian inference starts with a prior probability distribution over possible parameter values (and/or models) and then computes a posterior distribution based on a probabilistic model of the conditional probability of the observed data given specific parameter values.114 The posterior distribution can then be used to calculate the expected parameter value(s) and to quantify uncertainty in this estimate. Bayesian methods have been used extensively to fit parameters to known models, as in the case of the ODE inverse problem, and have been extended to handle TVPs.24–28,115 Bayesian methods have also been used for data-driven discovery, i.e., system identification, where the functional form of an ODE, PDE, or similar, is learned along with the corresponding static coefficients for each equation term.31,116–118 A Bayesian approach to PINUP is illustrated in Fig. 2(f).

Bayesian inference was used by Smelyanskiy et al.119 to perform system identification for dynamical processes with an underlying SDE (or ODE) model. Importantly, they developed an analytical, variational inference approach that converges quickly compared to Markov Chain Monte Carlo (MCMC) sampling. The same research group subsequently showed that their technique can be applied to separate time-series windows in order to simultaneously resolve TVPs.29,120,121 An important limitation of this method is that it assumes dense sampling and minimal measurement noise, which may not apply in practice. Moreover, model selection is a problem, since the number of possible combinations of basis functions for the SDE model scales exponentially; for example, if a polynomial basis is used, there are 2 d + 1 possible combinations of polynomial terms up to order d for a single variable. Model selection was addressed through heuristic application of the Bayesian information criterion (BIC) and beam search.122 The restriction that the underlying model takes the form of an SDE corresponds to only a subset of our PINUP formulation but, nonetheless, includes a wide range of important processes. Additionally, achieving simultaneous system identification and PINUP is notable and is an important direction for further research.

The relative performance of different PINUP algorithms is largely unknown, owing to a lack of comparative analyses across the diverse methodological literature. In most studies, the performance of a given method is evaluated by comparing the TVP(s) θ ^ t inferred using a PINUP method against the ground truth θ t from a numerical simulation of a process. The advantage of simulated data is that the ground truth is known, allowing performance of the TVP inference to be quantified using a metric such as Pearson correlation123 or normalized mean squared error (NMSE).34 Many authors have focused the testing of their algorithms on simulations using well-characterized chaotic maps and flows, such as the Baker map,1 the Tent map,39,46,49 the Hénon map,40,49 the Mackey–Glass system,44,49 and the Rössler system.97,109 Notably, the logistic map and the Lorenz process are the most commonly used processes for evaluating PINUP methods across the publications that we surveyed.33,34,36–45 The common assumption is that these canonical, nonlinear, chaotic systems offer challenging cases that are useful for assessing the performance of PINUP algorithms. However, since so many PINUP methods perform well on the same test problems, it raises suspicion about whether these are truly challenging problems on which strong performance is meaningful or whether similar performance could be achieved by simpler baseline methods (which have not yet been adopted as a standard practice within the literature).

To examine the suitability of the logistic map and Lorenz processes as PINUP benchmarking problems, we performed numerical experiments comparing the performance of several representative PINUP methods to a simple baseline method that estimates parameter variation as a time series of statistics, either mean or standard deviation, computed over sliding windows. Undermining their use as meaningful test problems, we find that good PINUP performance in the logistic map and Lorenz process cases can be trivially achieved using our baseline method (Sec. IV A). Accordingly, we then introduce new, more challenging problems on which the same PINUP methods exhibit weaker performance, but for which we can still find statistical time-series features that perform well (Sec. IV B). Note that our aim in this section is not to draw conclusions about the differential performance of the exemplar methods (a complex task requiring comprehensive benchmarking that is more suitable for future work) but rather to provide guidance to PINUP researchers on the selection of appropriate problems.

We aimed to test whether strong PINUP performance on the non-stationary logistic map and Lorenz processes could be achieved by a simple, baseline method, relative to that of several existing PINUP methods. To do this, we first generated time series from each of the non-stationary logistic map and the Lorenz processes. To demonstrate the comparative performance of several existing methods, PINUP was conducted using methods chosen from the dimension reduction, prediction error, and phase-space partitioning categories: quadratic SFA (SFA2, a dimension reduction method),39 smoothed mean prediction errors from a bank of echo state networks (ESN error, a prediction error method),41 and characteristic distance (CD, a phase-space partitioning method).97 We chose these methods on the basis of ease of implementation, as well as to achieve a spread of methods across the categories described in Sec. III. A more detailed description of how we implemented each method is in Appendix A 2. In order to evaluate the performance of existing algorithms relative to a simple baseline method for tracking non-stationary variation, we used an approach belonging to the “time-series features” category, comprising single time-series features—mean and standard deviation—computed across sliding windows. The resulting time series of statistical features is taken to be the estimate θ ^ t. We were primarily interested in demonstrating the existence of statistics capable of tracking TVPs; so in each case, we selected the best-performing example of mean or standard deviation applied to a single variable, for comparison with the other methods. Since three of the methods—ESN error, CD, and our baseline—utilize time-series windows for the purpose of either smoothing or computing statistics, we adopted the same window length 10 3 for each method, judging that this was sufficiently small relative to the slow TVPs to resolve parameter variation while being sufficiently large to confer robustness to noise.

We simulated each non-stationary process using a slow, sinusoidal TVP with period T / 2, where T was the total Lorenz process integration time or the number of logistic map iterations, respectively. The amplitude of sinusoidal parameter variation was ± 10 % relative to the default value of the parameter that was varied for each system. We note that qualitatively similar results are obtained for other functional forms for the slow parameter and across a range of values of low amplitude parameter variation. We then evaluated the performance of each PINUP method by computing Pearson R 2 between the ground truth and inferred TVPs. The R 2 performance metric varies between 0 and 1, with values approaching 1 indicating good performance and values approaching 0 indicating poor performance. Mean R 2 was computed across 20 trials for each of the three different levels of additive, Gaussian observation noises: at signal-to-noise ratios (SNRs) of 0 and 20 dB, in addition to a noise-free condition. In Appendix A 1, we detail the equations, parameter settings, initial conditions, and functional form of the TVPs that were used for each simulated process.

The results of our experiments are shown in Figs. 3(a)3(f), which for the Lorenz and logistic map systems visualize the process, the TVP, and the TVP reconstruction accuracy of each PINUP method. Of the PINUP methods, the ESN error performed well on the non-stationary Lorenz process [Fig. 3(c)] and SFA2 and CD performed well on the non-stationary logistic map [Fig. 3(f)]. SFA2 also achieved near-perfect performance on the Lorenz process in the noise-free condition but failed under both noise conditions. In contrast, our simple baseline approach, which tracks either the mean or standard deviation across sliding windows, achieved high accuracy ( R 2 > 0.9) across all noise conditions. We are selecting features from catch24 to demonstrate that some problems are solvable with a sliding-window approach, but we are not presenting it as a directly comparable alternative method. For completeness, the performance for all catch24 features is shown in Fig. S1 in the supplementary material. Additionally, we conducted sensitivity analyses to examine the robustness of our findings to variations in time-series length and TVP sinusoid frequency, including lengths of 10 4 and 10 3 and TVPs with 2, 4, and 8 periods (for the Lorenz process) and 8, 16, and 32 periods (for the logistic map), with reductions in statistical window size proportional to time-series length. These hyperparameter values were chosen to capture the range of experimental settings that are typical in the existing PINUP literature. Sliding window mean and standard deviation, respectively, retained their high performance over this range of hyperparameters, supporting our claim that such Lorenz and logistic map PINUP problems are trivial (see Figs. S2–S3 in the supplementary material). However, we note that the performance of all methods deteriorated for non-stationary Lorenz time series of length 10 3. In principle, all PINUP problems can be made more challenging by reducing the time-series length or by using fast TVPs, but we leave the problem of systematically determining which algorithms perform best and under what conditions for future work.

FIG. 3.

Individual time-series features match or exceed the TVP reconstruction accuracy of several PINUP methods across four chaotic systems. The processes (and varied parameters) are (a)–(c) the Lorenz process (parameter ρ); (d)–(f) the logistic map (parameter r); (g)–(i) the Langford process (parameter ω); and (j)–(l) the sine map (parameter r); each of which displays results from a parameter inference experiment. A noise-free example of each process is visualized [in panels (a), (d), (g), and (j)] and colored according to the corresponding TVP value at each point, noting that the flows are visualized as two-dimensional spatial projections, whereas the one-dimensional maps are visualized using space and time. The time course of the associated sinusoidal TVP is visualized for each system [(b), (e), (h), and (k)] with color corresponding to the TVP value for comparison to the corresponding process plots. The numerical experiment result panels [(c), (f), (i), and (l)] show mean and standard deviation of TVP reconstruction accuracy (Pearson R 2) for each method across multiple trials using two levels of signal-to-noise-ratio (SNR) ( 0 and 20 dB) and a noise-free condition. The PINUP methods were quadratic slow feature analysis (SFA2), a prediction-error-based method using echo state networks (ESN error), and characteristic distance (CD). The single time-series features that were used were mean, standard deviation (std), the first time lag at which the autocorrelation function falls below 1 / e (feature name: acf_timescale124), and the centroid of the power spectrum (feature name: centroid_freq124).

FIG. 3.

Individual time-series features match or exceed the TVP reconstruction accuracy of several PINUP methods across four chaotic systems. The processes (and varied parameters) are (a)–(c) the Lorenz process (parameter ρ); (d)–(f) the logistic map (parameter r); (g)–(i) the Langford process (parameter ω); and (j)–(l) the sine map (parameter r); each of which displays results from a parameter inference experiment. A noise-free example of each process is visualized [in panels (a), (d), (g), and (j)] and colored according to the corresponding TVP value at each point, noting that the flows are visualized as two-dimensional spatial projections, whereas the one-dimensional maps are visualized using space and time. The time course of the associated sinusoidal TVP is visualized for each system [(b), (e), (h), and (k)] with color corresponding to the TVP value for comparison to the corresponding process plots. The numerical experiment result panels [(c), (f), (i), and (l)] show mean and standard deviation of TVP reconstruction accuracy (Pearson R 2) for each method across multiple trials using two levels of signal-to-noise-ratio (SNR) ( 0 and 20 dB) and a noise-free condition. The PINUP methods were quadratic slow feature analysis (SFA2), a prediction-error-based method using echo state networks (ESN error), and characteristic distance (CD). The single time-series features that were used were mean, standard deviation (std), the first time lag at which the autocorrelation function falls below 1 / e (feature name: acf_timescale124), and the centroid of the power spectrum (feature name: centroid_freq124).

Close modal

To understand why a simple windowed distributional feature can track the non-stationary variation underlying each of these nonlinear chaotic systems, first observe that when parameter values are mapped onto a two-dimensional projection of the Lorenz phase space [Fig. 3(a)] we see that the TVP induces a translation of the attractor along the z-axis. Such a translation in space is easily tracked by computing a mean statistic. Similarly, it is visually evident that the variance of the logistic map is modulated by the corresponding sinusoidal TVP [Fig. 3(d)]. The success of this simple baseline method undermines the apparent difficulty of these classical benchmark problems, which are ubiquitous in the PINUP literature and which are often (and incorrectly) used to evidence the usefulness of a given PINUP algorithm. Even though chaotic processes like the Lorenz process and logistic map under the influence of a TVP may exhibit variation in nonlinear properties, our experiments show that using nonlinear statistics is not required for strong PINUP performance. For these two systems, this is because the parameter variation does not uniquely affect the nonlinear structure but also varies simple properties—namely, the first two moments of the distribution.

Given that the Lorenz process and logistic map PINUP problems can be trivially solved by tracking mean and variance, respectively, we next aimed to find more challenging alternative test problems that may be more suitable for demonstrating impressive PINUP performance. Specifically, we aimed to demonstrate the existence of non-stationary processes for which accurate TVP reconstruction is possible using some PINUP methods, but for which the simple baseline method of windowed computation of mean or variance fails. To do this, we manually searched through the chaotic processes collated by Sprott125 (comprising 29 maps and 33 flows) and Gilpin126 (comprising 131 flows). For each such system, we followed the comparative methodology of Sec. IV A, i.e., varying a single parameter sinusoidally by ±10% of the default value and then quantifying the performance of our selected PINUP methods using Pearson R 2. To discover whether there exist problems for which other time-series features beyond mean and standard deviation can track TVPs, we also performed the same sliding-window approach using 22 additional time-series features from the CAnonical Time-series CHaracteristics (catch22) feature set.124 

Here, we present two processes, the Langford process and sine map, for which we found TVP inference to fail using sliding-window mean or standard variation, but for which at least one other method performs well (among SFA2, ESN error, CD, or one of the sliding-window catch22 features). Our comparative results for these two processes are shown in Figs. 3(g)3(l). Weak performance on both the Langford and sine map processes was seen for all three of SFA2, ESN error, and CD [Figs. 3(i) and 3(l)], in addition to mean and standard deviation (not shown). The only exception to this was that SFA2 performed very well on the Langford process in the noise-free condition, similar to what was seen for the Lorenz process. Observe that neither process exhibits obvious translation under the influence of the TVP [see Figs. 3(g) and 3(j)], consistent with the weak performance of the sliding-window mean. These are problems that require more algorithmic sophistication than simply computing mean and variance. But these are not unsolvable problems—indeed, in each case, we identified a single statistic from the catch22 feature set that could track the statistical changes induced by underlying parameter variation with reasonable accuracy (i.e., R 2 > 0.8) over two or more noise levels. For the non-stationary Langford process, a statistical measure of timescale based on the decay of the autocorrelation function (catch22 feature name: acf_timescale) successfully tracked parameter variation, albeit with a steep decline in performance with increasing noise, and for the non-stationary sine map, a measure of the centroid of the power spectrum (catch22 feature name: centroid_freq) performed well over all three noise levels. A benefit of using simple time-series statistics is that they point toward an interpretable theory of what is changing in either process (i.e., autocorrelation and the power spectrum, respectively).

Taken together, these findings demonstrate the existence of more challenging PINUP problems, for which accurate TVP reconstruction is possible but is not achieved by the first two moments of the distribution or by a set of conventional methods. If new PINUP methods can be developed that are capable of handling a range of such challenging problems, this will enable the inference of parameter variation for a wider range of non-stationary phenomena.

Progress on the PINUP problem has been hindered by the fragmentation of methodological literature across multiple disciplinary settings and journals, to the extent that many of the published PINUP methods reviewed here were developed with minimal knowledge of the progress made on the same problem in different fields. We have made headway on this issue by providing the first overview of this interdisciplinary literature: formulating the PINUP problem class (in Sec. II) and organizing the literature into different categories of PINUP methods (in Sec. III). This shared terminology and set of reference algorithms will enable greater interaction and sharing of ideas between disciplines, while also fostering a systematic, comparative, methodological approach (across methods and problems) in future research. In turn, we hope that such interdisciplinary, comparative work will enable progress on the PINUP problem.

A similar lack of systematic comparison has been faced by other algorithmic fields, such as computer vision,127 time-series classification,128 forecasting,129,130 and clinical biomarker discovery,131,132 and has been addressed in a range of ways. For example, progress in time-series classification has been assisted by (1) developing libraries of time-series classification methods,133 (2) curating repositories of benchmark problems with agreed performance metrics,134 and (3) conducting systematic comparisons of methods on these particular problems.135,136 We echo the “call to arms” issued by Keogh and Kasetty128 that the culture of reviewers and scientists needs to move toward testing new methods on a broad set of challenging problems with comparison to a range of other methods, including simple, baseline approaches. Additionally, it is important for researchers to share open and clearly documented implementations of any PINUP methods that are developed, noting that some methods are complicated and can depend on specific implementation details that are not fully specified in the papers themselves. Indeed, in spite of the theoretical promise of the Bayesian inference and recurrence plot PINUP categories, reliable implementations of these methods eluded us, and so, no such method was included in our experiments.

The PINUP literature has been in need of simple, baseline methods, to serve as a benchmark against which the performance of new methods can be compared. The claimed usefulness of a new algorithm is justified by its superior performance relative to simpler baseline alternatives, and the field can only reasonably claim to be working on challenging problems when such simple methods fail. We propose that mean or standard deviation computed across sliding time-series windows provides such a baseline. Crucially, we found that this simple approach is comparable or even superior to several existing methods applied to the non-stationary Lorenz and logistic map processes (Sec. IV A). Indeed, for many of the non-stationary chaotic systems that we searched through in Sec. IV B, either mean or standard deviation performed well. A similar phenomenon has been observed in the time-series classification literature, where complicated algorithms sometimes fail to outperform classification using only mean and variance, even though these distributional properties are insensitive to the time-ordering of the data.137 Benchmarking on problems that can be solved using mean or variance has the downside of being unable to distinguish algorithms that track simple, distributional properties from those that can track more subtle forms of non-stationary variation. By evaluating performance on new, more challenging problems, researchers in the PINUP field can more convincingly demonstrate advances in algorithmic performance.

Addressing this need, we performed a wide search across time-varying systems and demonstrated the existence of difficult problems for which TVP inference is possible, but where several existing methods (in addition to sliding-window mean and standard deviation) fail (Sec. IV B). In particular, we highlight two such problems: the non-stationary Langford and sine map processes. Additional problems of this type could be found by applying our comparative methodology across a range of non-stationary processes. More challenging PINUP problems such as these provide a valuable starting point for future research. Harking back to the definitions that opened this paper, we speculate that an outstanding challenge for the PINUP field may be to achieve accurate TVP inference for processes that exhibit weak but not strong stationarity, i.e., where non-stationarity manifests through variation in joint probabilities but not through variation in mean or variance.

For each of these two more challenging PINUP problems, we identified a single time-series statistic from the catch24 feature set that achieved accurate TVP reconstruction using the same sliding-window approach that we used for mean and variance (Sec. IV B). This simple sliding-window time-series feature-based approach is straightforward to implement, parsimonious, and enables the interpretation of non-stationary dynamics in terms of the variation of statistical properties over time. The fact that this simple sliding-window approach can achieve strong performance where other methods fail presents a promising direction for future research; for example, one could refine feature selection or apply dimension reduction in the space of statistical features. A downside of using statistical features is that a highly sampled time series is needed to enable robust statistical estimates. We expect the performance of PINUP methods to depend on a range of factors including the sampling rate, the timescale of variation of the TVP relative to the observed process, the amplitude of parameter variation, and the contribution of dynamical and measurement noise. Future work could explore the strengths and weaknesses of different PINUP methods in the presence of variation in these factors. Additionally, since in general, it is unknown a priori which statistics will track TVPs for a process, an outstanding problem is to determine how best to derive TVP estimates from multiple time-varying statistics. A promising approach is to combine time-series feature extraction with dimension reduction techniques such as PCA.78,79

By adopting a systematic approach and testing on appropriately difficult problems, PINUP researchers can both drive progress and help readers to discern genuine advances in the field. Improved performance on tracking non-stationarity from time-series data may enable advances on a range of related time-series analysis problems for which conventional methods assume stationarity, including forecasting138,139 and system identification.29,31 Taken together, our review and numerical results lay a foundation for interdisciplinary progress on the PINUP problem and the broader study of non-stationary phenomena.

See the supplementary material for three figures providing additional details on some of our numerical experiments for interested readers. This includes performance of all catch24 time-series features (where only selected features are shown in Fig. 3) and sensitivity analyses of PINUP for the non-stationary Lorenz process and logistic map to variations in time-series length and TVP period.

The authors have no conflicts to disclose.

Kieran S. Owens: Conceptualization (equal); Methodology (equal); Software (equal); Writing – original draft (equal); Writing – review & editing (equal). Ben D. Fulcher: Conceptualization (equal); Methodology (equal); Supervision (lead); Writing – review & editing (equal).

The data that support the findings of this study are openly available in Zenodo at https://doi.org/10.5281/zenodo.12716817, Ref. 144.

Here, we detail the non-stationary processes and methods that were used for the numerical experiments described in Sec. IV and shown in Fig. 3.

1. Non-stationary processes

Non-stationary time series were generated, either through integration of a flow d d t x = f ( x , θ ), given some initial state x 0, or iteration of a map x t + 1 = f ( x t , θ t ). In either case, the functional form of the TVP was a two-period sinusoid,
where p is the default parameter value, α reflects proportional variation, and T is the total integration time or number of iterations, respectively. For all systems, we used α = 0.1, corresponding to a sinusoidal variation of ± 10 % around the default parameter value p. For ODE integration, the Runge–Kutta–Fehlberg (RKF45) method was used.140 The models that were used are described below.
a. Iterative maps
For the logistic map141 with equation
(A1)
we used a parameter value of r = 3.6 and an initial condition x 0 = 0.6. For the TVP, we varied r.
For the sine map125 with equation
(A2)
we used a parameter value of r = 3.0 and the initial condition x 0 = 0.6. For the TVP, we varied r.

For both systems, we iterated to simulate a time series of length 10 5 samples.

b. Flows
The Lorenz process is a model of atmospheric convection142 with following equations:
(A3)
(A4)
(A5)
where parameters were set to values σ = 10, ρ = 28, and β = 8 / 3, and we used the initial condition ( x 0 , y 0 , z 0 ) = ( 9.79 , 15.04 , 20.53 ) from Gilpin.126 For the TVP, we varied ρ.
The Langford process yields a torus-like attractor143 with the following equations:
(A6)
(A7)
(A8)
where parameters were set to values α = 0.95 , β = 0.7 , λ = 0.6 , ω = 3.5 , ρ = 0.25 , and ε = 0.1, and we used the initial condition ( x 0 , y 0 , z 0 ) = ( 0.78 , 0.63 , 0.18 ) from Gilpin.126 For the TVP, we varied ω.

For both Lorenz and Langford systems, integration was performed over 1000 time steps, sampling at intervals of 0.01 time units.

2. Methods

For each system, PINUP was performed using each of the below methods (SFA2, ESN, CD, and statistical time-series features) for two levels of additive Gaussian noise, yielding SNRs of 0 and 20 dB, in addition to a noise-free condition. SNR was calculated as 10 log 10 ( σ s 2 / σ n 2 ), where σ s 2 is the variance of the signal and σ n 2 is the variance of the noise.

a. Quadratic slow feature analysis (SFA2)

This method is an example of a dimension reduction approach to PINUP (see Sec. III A). Following Wiskott,39 given a time series, we first applied time-delay embedding (TDE) with dimension m and delay τ and then applied a quadratic polynomial basis expansion, followed by single-component SFA, to obtain a parameter estimate θ ^ t. We set τ to the time delay corresponding to the first zero-crossing of the autocorrelation function (ACF) and, in the setting of multivariate data, selected the minimum such values computed across time-series variables. To set m, using Wiskott’s heuristic,39 we performed a search over integers m { 1 , , 20 } and selected the value of m for which the mean squared value of the first temporal derivative of the TVP was minimized.

b. Echo state network prediction error (ESN)

This method is an example of a prediction-error approach to PINUP (see Sec. III C). Since no out-of-the-box algorithm was available, we implemented a method inspired by the approach of Güntürkün:41 given a time series, we train 50 echo state networks of internal dimension 30 and then obtain the mean prediction error across the 50 networks. The mean prediction error time series was smoothed to obtain a TVP estimate. Rather than using adaptive filtering, we performed smoothing using a Gaussian convolutional filter to obtain a TVP estimate. We used σ = 166 for the Gaussian kernel standard deviation hyperparameter, yielding a filter window of approximately 1000 steps between ± 3 σ, to maintain consistency with the window sizes used in the CD and statistical time-series feature methods.

c. Characteristic distance (CD)

This method is an example of a phase-space partitioning approach to PINUP (see Sec. III D). Following the general approach of Nguyen and Chelidze,97 given a time series x t of length 10 5 samples, 50 random points were uniformly sampled from a phase space hypercube bounding the points in x t but with boundaries extended by ± 10 % in each dimension to allow for sampling of points “outside” the distribution. A vector of Euclidean distances from each of the 50 random points was computed for each of the points in x t. A vector of 50 mean distances was then computed for contiguous, non-overlapping windows of length 10 3 samples. Single-component SFA was then applied to obtain a TVP estimate.

d. Statistical time-series features

This simple baseline method is an example of a statistical time-series feature approach to PINUP (see Sec. III B), noting that it is not yet a standard practice within the PINUP literature to use this method as a baseline comparison. For each variable of each process, the time-series features from the catch22 library124 were computed over sliding windows. Specifically, each time series variable of length 10 5 samples was partitioned into contiguous, non-overlapping windows of length 10 3 samples, and the features were computed for each window, yielding a feature time series of length 100 samples.

1
T.
Schreiber
and
A.
Schmitz
, “
Classification of time series data with nonlinear similarity measures
,”
Phys. Rev. Lett.
79
,
1475
1478
(
1997
).
2
B.
Buča
,
J.
Tindall
, and
D.
Jaksch
, “
Non-stationary coherent quantum many-body dynamics through dissipation
,”
Nat. Commun.
10
,
1730
(
2019
).
3
D.
Chelidze
and
M.
Liu
, “
Dynamical systems approach to fatigue damage identification
,”
J. Sound Vib.
281
,
887
904
(
2005
).
4
D.
Summers
,
J. G.
Cranford
, and
B. P.
Healey
, “
Chaos in periodically forced discrete-time ecosystem models
,”
Chaos, Solitons Fractals
11
,
2331
2342
(
2000
).
5
F.
Zhang
,
P.
Yang
,
K.
Fraedrich
,
X.
Zhou
,
G.
Wang
, and
J.
Li
, “
Reconstruction of driving forces from nonstationary time series including stationary regions and application to climate change
,”
Physica A
473
,
337
343
(
2017
).
6
J.
Galadí
,
S.
Silva Pereira
,
Y.
Sanz Perl
,
M.
Kringelbach
,
I.
Gayte
,
H.
Laufs
,
E.
Tagliazucchi
,
J.
Langa
, and
G.
Deco
, “
Capturing the non-stationarity of whole-brain dynamics underlying human brain states
,”
NeuroImage
244
,
118551
(
2021
).
7
T. A.
Schmitt
,
D.
Chetalova
,
R.
Schäfer
, and
T.
Guhr
, “
Non-stationarity in financial time series: Generic features and tail behavior
,”
Europhys. Lett.
103
,
58003
(
2013
).
8
J. M.
Shine
, “
Neuromodulatory control of complex adaptive dynamics in the brain
,”
Interface Focus
13
,
20220079
(
2023
).
9
N. W.
Bailey
,
B. D.
Fulcher
,
B.
Caldwell
,
A. T.
Hill
,
B.
Fitzgibbon
,
H.
van Dijk
, and
P. B.
Fitzgerald
, “
Uncovering a stability signature of brain dynamics associated with meditation experience using massive time-series feature extraction
,”
Neural Netw.
171
,
171
185
(
2024
).
10
D.
Ruelle
,
Chaotic Evolution and Strange Attractors
, 1st ed. (
Cambridge University Press
,
Cambridge, New York
,
1989
).
11
M. B.
Kennel
, “
Statistical test for dynamical nonstationarity in observed time-series data
,”
Phys. Rev. E
56
,
316
321
(
1997
).
12
A.
Witt
,
J.
Kurths
, and
A.
Pikovsky
, “
Testing stationarity in time series
,”
Phys. Rev. E
58
,
1800
1810
(
1998
).
13
R.
Hegger
,
H.
Kantz
,
L.
Matassini
, and
T.
Schreiber
, “
Coping with nonstationarity by overembedding
,”
Phys. Rev. Lett.
84
,
4092
4095
(
2000
).
14
R.
Manuca
and
R.
Savit
, “
Stationarity and nonstationarity in time series analysis
,”
Physica D
99
,
134
161
(
1996
).
15
C.
Rieke
,
K.
Sternickel
,
R. G.
Andrzejak
,
C. E.
Elger
,
P.
David
, and
K.
Lehnertz
, “
Measuring nonstationarity by analyzing the loss of recurrence in dynamical systems
,”
Phys. Rev. Lett.
88
,
244102
(
2002
).
16
R.
Serquina
,
Y.-C.
Lai
, and
Q.
Chen
, “
Characterization of nonstationary chaotic systems
,”
Phys. Rev. E
77
,
026208
(
2008
).
17
E. M.
Bollt
,
J. D.
Skufca
,
S. J.
McGregor
,
E. M.
Bollt
,
J. D.
Skufca
, and
S. J.
McGregor
, “
Control entropy: A complexity measure for nonstationary signals
,”
Math. Biosci. Eng.
6
,
1
25
(
2009
).
18
A.
Ray
and
A.
Roy Chowdhury
, “
On the characterization of non-stationary chaotic systems: Autonomous and non-autonomous cases
,”
Physica A
389
,
5077
5083
(
2010
).
19
T.
Schreiber
, “
Detecting and analysing nonstationarity in a time series with nonlinear cross-predictions
,”
Phys. Rev. Lett.
78
,
843
846
(
1997
).
20
L. A.
Aguirre
and
C.
Letellier
, “
Nonstationarity signatures in the dynamics of global nonlinear models
,”
Chaos
22
,
033136
(
2012
).
21
A. G.
Siapas
and
M. A.
Wilson
, “
Coordinated interactions between hippocampal ripples and cortical spindles during slow-wave sleep
,”
Neuron
21
,
1123
1128
(
1998
).
22
D.
Moser
,
P.
Anderer
,
G.
Gruber
,
S.
Parapatics
,
E.
Loretz
,
M.
Boeck
,
G.
Kloesch
,
E.
Heller
,
A.
Schmidt
,
H.
Danker-Hopfe
,
B.
Saletu
,
J.
Zeitlhofer
, and
G.
Dorffner
, “
Sleep classification according to AASM and Rechtschaffen & Kales: Effects on sleep scoring parameters
,”
Sleep
32
,
139
149
(
2009
).
23
A.
Tarantola
, Inverse Problem Theory and Methods for Model Parameter Estimation, Other Titles in Applied Mathematics (Society for Industrial and Applied Mathematics, 2005).
24
S.
Särkkä
and
L.
Svensson
,
Bayesian Filtering and Smoothing
, 2nd ed. (
Cambridge University Press
,
2023
).
25
U.
Güntürkün
,
J. P.
Reilly
,
T.
Kirubarajan
, and
H.
deBruin
, “
Recursive hidden input estimation in nonlinear dynamic systems with varying amounts of a priori knowledge
,”
Signal Process.
99
,
171
184
(
2014
).
26
A.
Arnold
and
A. L.
Lloyd
, “
An approach to periodic, time-varying parameter estimation using nonlinear filtering
,”
Inverse Probl.
34
,
105005
(
2018
).
27
N.
Hauzenberger
,
F.
Huber
,
G.
Koop
, and
L.
Onorante
, “
Fast and flexible bayesian inference in time-varying parameter regression models
,”
J. Business Econ. Stat.
40
,
1904
1918
(
2022
).
28
A.
Arnold
, “
When artificial parameter evolution gets real: Particle filtering for time-varying parameter estimation in deterministic dynamical systems
,”
Inverse Probl.
39
,
014002
(
2023
).
29
D. G.
Luchinsky
,
V. N.
Smelyanskiy
,
A.
Duggento
, and
P. V. E.
McClintock
, “
Inferential framework for nonstationary dynamics. I. Theory
,”
Phys. Rev. E
77
,
061105
(
2008
).
30
S. L.
Brunton
,
J. L.
Proctor
, and
J. N.
Kutz
, “
Discovering governing equations from data by sparse identification of nonlinear dynamical systems
,”
Proc. Natl. Acad. Sci. U.S.A.
113
,
3932
3937
(
2016
).
31
J. S.
North
,
C. K.
Wikle
, and
E. M.
Schliep
, “
A review of data-driven discovery for dynamic systems
,”
Int. Stat. Rev.
91
,
464
492
(
2023
).
32
Z. G.
Nicolaou
,
G.
Huo
,
Y.
Chen
,
S. L.
Brunton
, and
J. N.
Kutz
, “
Data-driven discovery and extrapolation of parameterized pattern-forming dynamics
,”
Phys. Rev. Res.
5
,
L042017
(
2023
).
33
A.
Chatterjee
,
J.
Cusumano
, and
D.
Chelidze
, “
Optimal tracking of parameter drif in a chaotic system: Experiment and theory
,”
J. Sound Vibr.
250
,
877
901
(
2002
).
34
P. F.
Verdes
,
P. M.
Granitto
,
H. D.
Navone
, and
H. A.
Ceccatto
, “
Nonstationary time-series analysis: Accurate reconstruction of driving forces
,”
Phys. Rev. Lett.
87
,
124101
(
2001
).
35
W.
Gilpin
, “Recurrences reveal shared causal drivers of complex time series,” arXiv:2301.13516[nlin] (2023).
36
M. C.
Casdagli
, “
Recurrence plots revisited
,”
Physica D
108
,
12
44
(
1997
).
37
M.
Szeliga
,
P.
Verdes
,
P.
Granitto
, and
H.
Ceccatto
, “Extracting driving signals from non-stationary time series,” in VII Brazilian Symposium on Neural Networks, 2002. SBRN 2002. Proceedings (IEEE Comput. Soc, Pernambuco, Brazil, 2002), pp. 104–108.
38
M.
Széliga
,
P.
Verdes
,
P.
Granitto
, and
H.
Ceccatto
, “
Modelling nonstationary dynamics
,”
Physica A
327
,
190
194
(
2003
).
39
L.
Wiskott
, “Estimating driving forces of nonstationary time series with slow feature analysis,” arXiv:cond-mat/0312317 (2003).
40
P. F.
Verdes
,
P. M.
Granitto
, and
H. A.
Ceccatto
, “
Overembedding method for modeling nonstationary systems
,”
Phys. Rev. Lett.
96
,
118701
(
2006
).
41
U.
Güntürkün
, “
Sequential reconstruction of driving-forces from nonlinear nonstationary dynamics
,”
Physica D
239
,
1095
1107
(
2010
).
42
A. R.
Sloboda
and
B. I.
Epureanu
, “
Sensitivity vector fields in time-delay coordinate embeddings: Theory and experiment
,”
Phys. Rev. E
87
,
022903
(
2013
).
43
G. L.
Grinblat
,
L. C.
Uzal
,
P. F.
Verdes
, and
P. M.
Granitto
, “
Nonstationary regression with support vector machines
,”
Neural Comput. Appl.
26
,
641
649
(
2015
).
44
H.
Niknazar
,
A. M.
Nasrabadi
, and
M. B.
Shamsollahi
, “
Volumetric behavior quantification to characterize trajectory in phase space
,”
Chaos, Solitons Fractals
103
,
294
306
(
2017
).
45
S.
Wang
,
Y.
Chen
,
Y.
Mei
, and
W.
He
, “
The robustness of driving force signals extracted by slow feature analysis
,”
Chaos, Solitons Fractals
171
,
113447
(
2023
).
46
M.
Tanio
,
Y.
Hirata
, and
H.
Suzuki
, “
Reconstruction of driving forces through recurrence plots
,”
Phys. Lett. A
373
,
2031
2040
(
2009
).
47
D.
Chelidze
,
J. P.
Cusumano
, and
A.
Chatterjee
, “
A dynamical systems approach to damage evolution tracking, part 1: Description and experimental application
,”
J. Vib. Acoust.
124
,
250
257
(
2002
).
48
O.
Alao
,
P. Y.
Lu
, and
M.
Soljacic
, “Discovering dynamical parameters by interpreting echo state networks,” in NeurIPS 2021 AI for Science Workshop (Curran Associates, 2021).
49
S.
Güttler
,
H.
Kantz
, and
E.
Olbrich
, “
Reconstruction of the parameter spaces of dynamical systems
,”
Phys. Rev. E
63
,
056215
(
2001
).
50
T. L.
Carroll
, “
Attractor comparisons based on density
,”
Chaos
25
,
013111
(
2015
).
51
C. J. C.
Burges
, “
Dimension reduction: A guided tour
,”
Found. Trends Mach. Learn.
2
,
275
364
(
2009
).
52
I.
Jolliffe
, Principal Component Analysis, Springer Series in Statistics (Springer-Verlag, New York, 2002).
53
M.
Udell
,
C.
Horn
,
R.
Zadeh
, and
S.
Boyd
, “
Generalized low rank models
,”
Found. Trends Mach. Learn.
9
,
1
118
(
2016
).
54
L.
Wiskott
and
T. J.
Sejnowski
, “
Slow feature analysis: Unsupervised learning of invariances
,”
Neural Comput.
14
,
715
770
(
2002
).
55
H.
Kantz
and
T.
Schreiber
,
Nonlinear Time Series Analysis
, 2nd ed. (
Cambridge University Press
,
Cambridge
,
2003
).
56
P.
Song
and
C.
Zhao
, “
Slow down to go better: A survey on slow feature analysis
,”
IEEE Trans. Neural Netw. Learn. Syst.
35
,
1
21
(
2022
).
57
L.
Molgedey
and
H. G.
Schuster
, “
Separation of a mixture of independent signals using time delayed correlations
,”
Phys. Rev. Lett.
72
,
3634
3637
(
1994
).
58
S.
Schultze
and
H.
Grubmüller
, “
Time-lagged independent component analysis of random walks and protein dynamics
,”
J. Chem. Theory Comput.
17
,
5766
5776
(
2021
).
59
T.
Blaschke
,
P.
Berkes
, and
L.
Wiskott
, “
What is the relation between slow feature analysis and independent component analysis?
,”
Neural Comput.
18
,
2495
2508
(
2006
).
60
S.
Klus
,
F.
Nüske
,
P.
Koltai
,
H.
Wu
,
I.
Kevrekidis
,
C.
Schütte
, and
F.
Noé
, “
Data-driven model reduction and transfer operator approximation
,”
J. Nonlinear Sci.
28
,
985
1010
(
2018
).
61
A. A.
Khan
,
J.
Kuehl
, and
D.
Chelidze
, “
Toward a unified interpretation of the ‘proper’/‘smooth’ orthogonal decompositions and ‘state variable’/‘dynamic mode’ decompositions with application to fluid dynamics
,”
AIP Adv.
10
,
035225
(
2020
).
62
A.
Mardt
,
L.
Pasquali
,
H.
Wu
, and
F.
Noé
, “
VAMPnets for deep learning of molecular kinetics
,”
Nat. Commun.
9
,
5
(
2018
).
63
C.
Wehmeyer
and
F.
Noé
, “
Time-lagged autoencoders: Deep learning of slow collective variables for molecular kinetics
,”
J. Chem. Phys.
148
,
241703
(
2018
).
64
L.
Bonati
,
G.
Piccini
, and
M.
Parrinello
, “
Deep learning the slow modes for rare events sampling
,”
Proc. Natl. Acad. Sci. U.S.A.
118
,
e2113533118
(
2021
).
65
T.
Hastie
,
R.
Tibshirani
, and
J.
Friedman
,
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
, 2nd ed. (
Springer MIHE
,
New York
,
2017
).
66
P. F.
Verdes
, “
Global warming is driven by anthropogenic emissions: A time series analysis approach
,”
Phys. Rev. Lett.
99
,
048501
(
2007
).
67
G.
Wang
,
P.
Yang
, and
X.
Zhou
, “
Extracting the driving force from ozone data using slow feature analysis
,”
Theor. Appl. Climatol.
124
,
985
989
(
2016
).
68
G.
Wang
,
P.
Yang
, and
X.
Zhou
, “
Identification of the driving forces of climate change using the longest instrumental temperature record
,”
Sci. Rep.
7
,
46091
(
2017
).
69
P.
Yang
,
G.
Wang
,
F.
Zhang
, and
X.
Zhou
, “
Causality of global warming seen from observations: A scale analysis of driving force of the surface air temperature time series in the Northern Hemisphere
,”
Clim. Dyn.
46
,
3197
3204
(
2016
).
70
F.
Zhang
,
P.
Yang
,
K.
Fraedrich
,
X.
Zhou
,
G.
Wang
, and
J.
Li
, “
Reconstruction of driving forces from nonstationary time series including stationary regions and application to climate change
,”
Physica A
473
,
337
343
(
2017
).
71
W.
Lu
,
M.
Duan
, and
G.
Wang
, “
Case studies on driving factor with different scales: A modified Lorenz system and 500-hPa geopotential height
,”
Theor. Appl. Climatol.
141
,
455
463
(
2020
).
72
Y.
Naritomi
and
S.
Fuchigami
, “
Slow dynamics in protein fluctuations revealed by time-structure based independent component analysis: The case of domain motions
,”
J. Chem. Phys.
134
,
065101
(
2011
).
73
G.
Pérez-Hernández
,
F.
Paul
,
T.
Giorgino
,
G.
De Fabritiis
, and
F.
Noé
, “
Identification of slow molecular order parameters for Markov model construction
,”
J. Chem. Phys.
139
,
015102
(
2013
).
74
F.
Paul
,
H.
Wu
,
M.
Vossel
,
B. L.
De Groot
, and
F.
Noé
, “
Identification of kinetic order parameters for non-equilibrium dynamics
,”
J. Chem. Phys.
150
,
164120
(
2019
).
75
B. D.
Fulcher
, “Feature-based time-series analysis,” in Feature Engineering for Machine Learning and Data Analytics (CRC Press, 2018).
76
B. D.
Fulcher
and
N. S.
Jones
, “
Hctsa: A computational framework for automated time-series phenotyping using massive feature extraction
,”
Cell Syst.
5
,
527
531.e3
(
2017
).
77
N. B.
Yanson
,
A. N.
Pavlov
,
T.
Kapitaniak
, and
V. S.
Anishchenko
, “
Global reconstruction from nonstationary data
,”
Tech. Phys. Lett.
25
,
412
414
(
1999
).
78
B. D.
Fulcher
,
C. H.
Lubba
,
G. F.
Gilestro
,
S. R.
Schultz
, and
N. S.
Jones
, “Inferring low-dimensional parametric variation underlying time-series datasets using comprehensive time-series feature extraction” (unpublished).
79
B.
Harris
, “Inferring parametric variation across non-stationary time series,” Bachelor’s thesis (School of Physics, The University of Sydney, 2021).
80
F.
Takens
, “Detecting strange attractors in turbulence,” in Dynamical Systems and Turbulence, Warwick 1980, edited by D. Rand and L.-S. Young (Springer, Berlin, 1981), Vol. 898, pp. 366–381.
81
T.
Sauer
,
J. A.
Yorke
, and
M.
Casdagli
, “
Embedology
,”
J. Stat. Phys.
65
,
579
616
(
1991
).
82
J.
Stark
, “
Delay embeddings for forced systems. I. Deterministic forcing
,”
J. Nonlinear Sci.
9
,
255
332
(
1999
).
83
G.-J.
Jang
and
T.-W.
Lee
, “A probabilistic approach to single channel blind signal separation,” in Advances in Neural Information Processing Systems (MIT Press, 2002), Vol. 15.
84
D.
Chelidze
, “
Identifying multidimensional damage in a hierarchical dynamical system
,”
Nonlinear Dyn.
37
,
307
322
(
2004
).
85
D.
Chelidze
and
M.
Liu
, “
Dynamical systems approach to fatigue damage identification
,”
J. Sound Vibr.
281
,
887
904
(
2005
).
86
D.
Chelidze
and
J.
Cusumano
, “
Phase space warping: Nonlinear time-series analysis for slowly drifting systems
,”
Philos. Trans. R. Soc. A
364
,
2495
2513
(
2006
).
87
D.
Chelidze
and
M.
Liu
, “
Reconstructing slow-time dynamics from fast-time measurements
,”
Philos. Trans. R. Soc. A
366
,
729
745
(
2008
).
88
M.
Song
,
D. B.
Segala
,
J. B.
Dingwell
, and
D.
Chelidze
, “
Slow-time changes in human EMG muscle fatigue states are fully represented in movement kinematics
,”
J. Biomech. Eng.
131
,
021004
(
2009
).
89
H.-W.-X.
Li
, and
D.
Chelidze
, “
Geometry-informed phase space warping for reliable fatigue damage monitoring
,”
Struct. Health Monit.
23
,
14759217231174894
(
2023
).
90
P.
Berkes
and
L.
Wiskott
, “
Slow feature analysis yields a rich repertoire of complex cell properties
,”
J. Vision
5
,
9
(
2005
).
91
W.
Konen
, “On the numeric stability of the SFA implementation sfa-tk,” arXiv:0912.1064[stat] (2009).
92
B. I.
Epureanu
and
A.
Hashmi
, “
Parameter reconstruction based on sensitivity vector fields
,”
J. Vib. Acoust.
128
,
732
740
(
2006
).
93
A.
Hashmi
and
B.
Epureanu
, “
Sensitivity resonance and attractor morphing quantified by sensitivity vector fields for parameter reconstruction
,”
Nonlinear Dyn.
45
,
319
335
(
2006
).
94
S.-H.
Yin
and
B. I.
Epureanu
, “
Experimental enhanced nonlinear dynamics and identification of attractor morphing modes for damage detection
,”
J. Vib. Acoust.
129
,
763
770
(
2007
).
95
S.-H.
Yin
and
B. I.
Epureanu
, “
Structural health monitoring based on sensitivity vector fields and attractor morphing
,”
Philos. Trans. R. Soc. A
364
,
2515
2538
(
2006
).
96
A. R.
Sloboda
and
B. I.
Epureanu
, “
Maximizing sensitivity vector fields: A parametric study
,”
J. Comput. Nonlinear Dyn.
9
,
021018
(
2014
).
97
S. H.
Nguyen
and
D.
Chelidze
, “
New invariant measures to track slow parameter drifts in fast dynamical systems
,”
Nonlinear Dyn.
79
,
1207
1216
(
2015
).
98
J.
Hawkins
, Ergodic Dynamics: From Basic Theory to Applications, Graduate Texts in Mathematics Vol. 289 (Springer International Publishing, Cham, 2021).
99
A. R.
Sloboda
, “
Boundary transformation representation of attractor shape deformation
,”
Chaos
31
,
083133
(
2021
).
100
A. R.
Sloboda
and
C. T.
Kong
, “
Boundary transformation vectors: A geometric method of quantifying attractor deformation for structural health monitoring
,”
J. Comput. Nonlinear Dyn.
17
,
121004
(
2022
).
101
A. R.
Sloboda
and
C. T.
Kong
, “Damage assessment from attractor boundary deformation,” in ASME 2022 Conference on Smart Materials, Adaptive Structures and Intelligent Systems (American Society of Mechanical Engineers Digital Collection, 2022).
102
A. R.
Sloboda
and
R. S.
Sloboda
, “
Refinements to the boundary transformation vector representation of attractor shape deformation to enhance system parameter identification
,”
Chaos
32
,
083146-1–083146-15
(
2022
).
103
N.
Gorjian Jolfaei
,
R.
Rameezdeen
,
N.
Gorjian
,
B.
Jin
, and
C. W. K.
Chow
, “
Prognostic modelling for industrial asset health management
,”
Saf. Reliab.
41
,
45
97
(
2022
).
104
J. P.
Cusumano
,
D.
Chelidze
, and
A.
Chatterjee
, “
A dynamical systems approach to damage evolution tracking, part 2: Model-based validation and physical interpretation
,”
J. Vib. Acoust.
124
,
258
264
(
2002
).
105
D. B.
Segala
,
D. H.
Gates
,
J. B.
Dingwell
, and
D.
Chelidze
, “
Nonlinear smooth orthogonal decomposition of kinematic features of sawing reconstructs muscle fatigue evolution as indicated by electromyography
,”
J. Biomech. Eng.
133
,
031009
(
2011
).
106
K.
Bajelani
,
A. R.
Arshi
, and
A. N.
Akhavan
, “
Influence of compression garments on fatigue behaviour during running based on nonlinear dynamical analysis
,”
Sports Biomech.
24
,
1
14
(
2022
).
107
J.-P.
Eckmann
,
S. O.
Kamphorst
, and
D.
Ruelle
, “
Recurrence plots of dynamical systems
,”
Europhys. Lett.
4
,
973
(
1987
).
108
Recurrence Quantification Analysis: Theory and Best Practices, Understanding Complex Systems, edited by C. L. Webber and N. Marwan (Springer International Publishing, Cham, 2015).
109
Y.
Hirata
,
S.
Horai
, and
K.
Aihara
, “
Reproduction of distance matrices and original time series from recurrence plots and their applications
,”
Eur. Phys. J. Spec. Top.
164
,
13
22
(
2008
).
110
E. W.
Dijkstra
, “
A note on two problems in connexion with graphs
,”
Numer. Math.
1
,
269
271
(
1959
).
111
A.
Mead
, “
Review of the development of multidimensional scaling methods
,”
J. R. Stat. Soc. D
41
,
27
39
(
1992
).
112
Y.
Hirata
,
T.
Takeuchi
,
S.
Horai
,
H.
Suzuki
, and
K.
Aihara
, “
Parsimonious description for predicting high-dimensional dynamics
,”
Sci. Rep.
5
,
15736
(
2015
).
113
Y.
Hirata
and
K.
Aihara
, “
Dimensionless embedding for nonlinear time series analysis
,”
Phys. Rev. E
96
,
032219
(
2017
).
114
A.
Gelman
,
J. B.
Carlin
,
H. S.
Stern
,
D. B.
Dunson
,
A.
Vehtari
, and
D. B.
Rubin
,
Bayesian Data Analysis
, 3rd ed. (
Chapman and Hall/CRC
,
Boca Raton
,
2013
).
115
Z.
Zhang
,
Q.
Shen
, and
X.
Wang
, “
Parameter identification framework of nonlinear dynamical systems with Markovian switching
,”
Chaos
33
,
123117
(
2023
).
116
S. M.
Hirsh
,
D. A.
Barajas-Solano
, and
J. N.
Kutz
, “
Sparsifying priors for Bayesian uncertainty quantification in model discovery
,”
R. Soc. Open Sci.
9
,
211823
(
2022
).
117
K.
Course
and
P. B.
Nair
, “
State estimation of a physical system with unknown governing equations
,”
Nature
622
,
261
267
(
2023
).
118
Y.
Yuan
,
X.
Li
,
L.
Li
,
F. J.
Jiang
,
X.
Tang
,
F.
Zhang
,
J.
Goncalves
,
H. U.
Voss
,
H.
Ding
, and
J.
Kurths
, “
Machine discovery of partial differential equations from spatiotemporal data: A sparse Bayesian learning framework
,”
Chaos
33
,
113122
(
2023
).
119
V.
Smelyanskiy
,
D.
Luchinsky
,
D.
Timuçin
, and
A.
Bandrivskyy
, “
Reconstruction of stochastic nonlinear dynamical models from trajectory measurements
,”
Phys. Rev. E
72
,
026202
(
2005
).
120
A.
Duggento
,
D. G.
Luchinsky
,
V. N.
Smelyanskiy
,
I.
Khovanov
, and
P. V. E.
McClintock
, “
Inferential framework for nonstationary dynamics. II. Application to a model of physiological signaling
,”
Phys. Rev. E
77
,
061106
(
2008
).
121
D. G.
Luchinsky
,
V. N.
Smelyanskiy
,
M.
Millonas
, and
P. V. E.
McClintock
, “
Dynamical inference of hidden biological populations
,”
Eur. Phys. J. B
65
,
369
377
(
2008
).
122
R. D.
Morris
,
V. N.
Smelyanskiy
, and
M.
Millonas
, “Parameter and structure inference for nonlinear dynamical systems,” in AIP Conference Proceedings (AIP, San Jose, CA, 2005), Vol. 803, pp. 112–120.
123
J. L.
Rodgers
and
W. A.
Nicewander
,
Thirteen Ways to Look at the Correlation Coefficient
(
The American Statistician
,
1988
).
124
C. H.
Lubba
,
S. S.
Sethi
,
P.
Knaute
,
S. R.
Schultz
,
B. D.
Fulcher
, and
N. S.
Jones
, “
Catch22: CAnonical time-series characteristics
,”
Data Min. Knowl. Discovery
33
,
1821
1852
(
2019
).
125
J. C.
Sprott
,
Chaos and Time-Series Analysis
, illustrated ed. (
Oxford University Press
,
Oxford, New York
,
2001
).
126
W.
Gilpin
, “Chaos as an interpretable benchmark for forecasting and data-driven modelling,” arXiv:2110.05266[nlin] (2023).
127
J.
Deng
,
R.
Socher
,
L.
Fei-Fei
,
W.
Dong
,
K.
Li
, and
L.-J.
Li
, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2009), pp. 248–255.
128
E.
Keogh
and
S.
Kasetty
, “
On the need for time series data mining benchmarks: A survey and empirical demonstration
,”
Data Min. Knowl. Discovery
7
,
349
371
(
2003
).
129
S.
Makridakis
,
M.
Hibon
, and
C.
Moser
, “
Accuracy of forecasting: An empirical investigation
,”
J. R. Stat. Soc. Ser. A
142
,
97
145
(
1979
).
130
S.
Makridakis
,
E.
Spiliotis
, and
V.
Assimakopoulos
, “
Statistical and machine learning forecasting methods: Concerns and ways forward
,”
PLoS One
13
,
e0194889
(
2018
).
131
E. E.
Bron
,
S.
Klein
,
A.
Reinke
,
J. M.
Papma
,
L.
Maier-Hein
,
D. C.
Alexander
, and
N. P.
Oxtoby
, “
Ten years of image analysis and machine learning competitions in dementia
,”
NeuroImage
253
,
119083
(
2022
).
132
N.
Traut
,
K.
Heuer
,
G.
Lemaître
,
A.
Beggiato
,
D.
Germanaud
,
M.
Elmaleh
,
A.
Bethegnies
,
L.
Bonnasse-Gahot
,
W.
Cai
,
S.
Chambon
,
F.
Cliquet
,
A.
Ghriss
,
N.
Guigui
,
A.
de Pierrefeu
,
M.
Wang
,
V.
Zantedeschi
,
A.
Boucaud
,
J.
van den Bossche
,
B.
Kegl
,
R.
Delorme
,
T.
Bourgeron
,
R.
Toro
, and
G.
Varoquaux
, “
Insights from an autism imaging biomarker challenge: Promises and threats to biomarker discovery
,”
NeuroImage
255
,
119171
(
2022
).
133
M.
Löning
,
A.
Bagnall
,
S.
Ganesh
,
V.
Kazakov
,
J.
Lines
, and
F. J.
Király
, “Sktime: A unified interface for machine learning with time series,” arXiv:abs.1909.07872v1 (2019).
134
A.
Bagnall
,
H. A.
Dau
,
J.
Lines
,
M.
Flynn
,
J.
Large
,
A.
Bostrom
,
P.
Southam
, and
E.
Keogh
, “The UEA multivariate time series classification archive, 2018,” arXiv:1811.00075 [cs, stat] (2018).
135
A.
Bagnall
,
J.
Lines
,
A.
Bostrom
,
J.
Large
, and
E.
Keogh
, “
The great time series classification bake off: A review and experimental evaluation of recent algorithmic advances
,”
Data Min. Knowl. Discovery
31
,
606
660
(
2017
).
136
A. P.
Ruiz
,
M.
Flynn
,
J.
Large
,
M.
Middlehurst
, and
A.
Bagnall
, “
The great multivariate time series classification bake off: A review and experimental evaluation of recent algorithmic advances
,”
Data Min. Knowl. Discovery
35
,
401
449
(
2021
).
137
T.
Henderson
,
A. G.
Bryant
, and
B. D.
Fulcher
, “Never a dull moment: Distributional properties as a baseline for time-series classification,” arXiv:2303.17809 [cs, stat] (2023).
138
G.
Wang
,
P.
Yang
, and
X.
Zhou
, “
Nonstationary time series prediction by incorporating external forces
,”
Adv. Atmos. Sci.
30
,
1601
1607
(
2013
).
139
G.
Wang
and
X.
Chen
, “
Nonstationary time series prediction combined with slow feature analysis
,”
Nonlinear Processes Geophys.
22
,
377
382
(
2015
).
140
E.
Fehlberg
, “Low-order classical Runge-Kutta formulas with stepsize control and their application to some heat transfer problems,” Tech. Rep. NASA-TR-R-315 (NASA, 1969).
141
R. M.
May
, “
Simple mathematical models with very complicated dynamics
,”
Nature
261
,
459
467
(
1976
).
142
E. N.
Lorenz
, “
Deterministic nonperiodic flow
,”
J. Atmos. Sci.
20
,
130
141
(
1963
).
143
W. F.
Langford
, “Numerical studies of torus bifurcations,” in Numerical Methods for Bifurcation Problems: Proceedings of the Conference at the University of Dortmund, August 22–26, 1983, International Series of Numerical Mathematics, edited by T. Küpper, H. D. Mittelmann, and H. Weber (Birkhäuser, Basel, 1984), pp. 285–295.
144
KieranOwens (2024). “DynamicsAndNeuralSystems/pinup_paper: v0.1.0-alpha (PINUP),” Zenodo. https://doi.org/10.5281/zenodo.12716817
Published open access through an agreement with The University of Sydney