Complex and networked dynamical systems characterize the time evolution of most of the natural and human-made world. The dimension of their state space, i.e., the number of (active) variables in such systems, arguably constitutes their most fundamental property yet is hard to access in general. Recent work [Haehne et al., Phys. Rev. Lett. 122, 158301 (2019)] introduced a method of inferring the state space dimension of a multi-dimensional networked system from repeatedly measuring time series of only some fraction of observed variables, while all other variables are hidden. Here, we show how time series observations of one single variable are mathematically sufficient for dimension inference. We reveal how successful inference in practice depends on numerical constraints of data evaluation and on experimental choices, in particular the sampling intervals and the total duration of observations. We illustrate robust inference for systems of up to to variables by evaluating time series observations of a single variable. We discuss how the faithfulness of the inference depends on the quality and quantity of collected data and formulate some general rules of thumb on how to approach the measurement of a given system.
The dynamics of a complex system is fundamentally governed by the number of its active dynamical variables, the system’s state space dimension. However, identifying state space dimension constitutes a difficult task, in particular if the dimension is much larger than the number of variables observed. Here, we show that it is mathematically possible in principle to infer the dimension of the state space using time series observations of just one variable, for arbitrarily high state space dimensions. We discuss how in practice the success of this inference depends on numerical constraints of data evaluation and experimental choices, such as the sampling intervals and total duration of observations. We illustrate how the approach may be applied to high-dimensional systems, e.g., with 100 variables, and provide general rules of thumb for performing and evaluating measurements of a given system. Our results provide a novel approach for inferring the dimension of complex and networked dynamical systems from scalar time series data and may help to develop alternative methods, e.g., for the reconstruction of the dimensions of system attractors.
I. INTRODUCTION
II. BACKGROUND
Given an unknown time evolution rule, unknown interaction topology and most dynamical variables hidden, how may we infer the state space dimension?
Previous work has followed mainly three paths, one striving to infer the dimension of effective state space of an invariant set (attractor) after a transient,4–7 a second striving to infer the dimension of full state space by cleverly evaluating system responses to actively ongoing and well-controlled external driving signals,1 and a third striving to find from time series of some variables during transients in systems not externally driven.8 We briefly discuss all three paths.
First, consider methods of inferring the dimension of attractors after reconstructing those attractors from scalar time series. All dissipative dynamical systems in the long term, i.e., after possibly long transients, settle onto invariant sets of dimension . If , the dynamics on such attractors may be chaotic. A major branch of past research was to identify attractor dimension, i.e., the number of active dynamical degrees of freedom that remain after a transient. To estimate , one first finds suitable delay coordinates to embed a time series recorded from one variable 9 into some -dimensional space. One then estimates the dimension of the resulting geometric object, i.e., the set of sample points in the -dimensional state space obtained at many sampling times via multi-dimensional statistics, for instance, correlation dimension.10 Taken’s Theorem and its extensions11–15 guarantee that under mild constraints the attractor dimension is faithfully identifiable if the embedding dimension and delay parameter are chosen appropriately. There exist different approaches to identify such parameter choices, often based on empirical or heuristic arguments, but also using more strict statistical frameworks.5
However, both the delay-embedding technique and, more importantly, the statistical dimensional estimator based on geometric information limits the applicability of such techniques to low dimensions, with typical deduced values of the order of to . The core reason underlying this limitation is that the number of data points typically required for faithful dimension estimation scales exponentially with the dimension, thereby posing a major obstacle to dimension estimation.4 Moreover, the technique is applicable only to parts of the trajectory on or sufficiently close to the invariant sets, after any transients, and infers the attractor dimension , not the state space dimension which is often much larger.
The applicability of such probing schemes may be limited. In particular, the probing scheme is infeasible for systems that are not experimentally accessible in such a way that it allows applying controlled external driving signals. Moreover, e.g., for biological or other complex dynamical systems, their collective dynamics may not satisfy the conditions of symmetric odd coupling required for successful probing-based inference.
Third, consider the recent proposal by Haehne et al.16 to infer state space dimensions from time series data recorded from transients of a number of perceptible system variables. Porfiri17 showed that the method can be successful if all system variables are observable, in the control-theoretic sense, from the perceived variables experimentally recorded from. The two works jointly imply that under reasonably mild conditions, basically requiring local differentiability near some state space point and sufficiently many time series data recorded close to that point, the state space dimensions may be inferred even if .
We remark that this method relies on transients before reaching attractors and identifies the dimension of the entire state space, in contrast to attractor reconstruction methods discussed above which rely on trajectories on (or very close to) attractors and identify the dimension of the attractor. Early work18 had already considered a matrix rank approach for analyzing attractor in a way conceptually similar to that by Haehne et al.16 on identifying state space dimension.
III. RECONSTRUCTING SYSTEM DIMENSION FROM A FEW VARIABLES
A. Collecting and arranging time series data
We emphasize that the time evolution is obtained by formal solutions that yield but the actual time evolution encoded by is entirely unknown, because even and are unknown. Moreover, the initial conditions are also unknown up to those of the (typically few) observed variables . We only know that an equation of the form (9) exists and we will exploit this fact shortly.
B. State space dimension from the detection matrix—Basic theory
IV. DIMENSION ESTIMATE FROM A SINGLE VARIABLE—THEORY AND PRACTICE
In the following, we highlight that such inference is possible even if only variable is measured and explore practical constraints of such inference.
First, we remark that the derivation (5)–(14) is independent of the choice of the number of perceived variables. Core equation (14), in particular, also holds for , i.e., if time series of only one variable are available, as schematically shown in Fig. 1.
Identification of system dimension from a single unit. (a) Networked dynamical system composed from an unknown number of dynamical units . Depending on the experimental situation, only a number of variables can be measured. (b) The approach proposed in Ref. 16 introduces the detection matrix to reconstruct the number of active variables from multiple measurements of just a few perceptible dynamical variables (blue, purple, orange, yellow, and green, ). (c) The approach is equally valid also for only measured variable (blue).
Identification of system dimension from a single unit. (a) Networked dynamical system composed from an unknown number of dynamical units . Depending on the experimental situation, only a number of variables can be measured. (b) The approach proposed in Ref. 16 introduces the detection matrix to reconstruct the number of active variables from multiple measurements of just a few perceptible dynamical variables (blue, purple, orange, yellow, and green, ). (c) The approach is equally valid also for only measured variable (blue).
This statement is mathematically exact. Still, quantitative features of the time series data as well as numerical issues in determining the rank influence the practical feasibility of inferring state space dimension from time series recordings of only one variable. In particular, there are several options to numerically obtain the rank, for example, using the singular value spectrum (as discussed below).
Model system and rank extraction. (a)–(c) Typical realization of linear model as given by Eqs. (16) and (17), for , . (a) Network topology. Directed, weighted links encode the self- and inter-variable coupling strengths. (b) Plot of the adjacency matrix , same color code as in (a). (c) Eigenvalue spectrum of coupling matrix . In the limit , all eigenvalues are uniformly distributed on a disk in the complex plane with radius around point (d) and (g) The first component of trajectories consisting of equally spaced sampling points each are recorded and the resulting data points are (e) and (h) collated into a detection matrix . Sampling time steps in (d) and (g), respectively. (d) and (e) The prediction for the system dimension, is extracted performing a singular value composition and taking the number of singular values which lie above the largest gap in the log-scaled singular value spectrum as the matrix rank. (d)–(f) Successful reconstruction trial. The largest gap is in fact at the right place, , leading to a correct prediction . (g)–(i) Incorrect prediction. The logarithmic gap below the largest singular values is smaller than some other gap, , leading to a false prediction .
Model system and rank extraction. (a)–(c) Typical realization of linear model as given by Eqs. (16) and (17), for , . (a) Network topology. Directed, weighted links encode the self- and inter-variable coupling strengths. (b) Plot of the adjacency matrix , same color code as in (a). (c) Eigenvalue spectrum of coupling matrix . In the limit , all eigenvalues are uniformly distributed on a disk in the complex plane with radius around point (d) and (g) The first component of trajectories consisting of equally spaced sampling points each are recorded and the resulting data points are (e) and (h) collated into a detection matrix . Sampling time steps in (d) and (g), respectively. (d) and (e) The prediction for the system dimension, is extracted performing a singular value composition and taking the number of singular values which lie above the largest gap in the log-scaled singular value spectrum as the matrix rank. (d)–(f) Successful reconstruction trial. The largest gap is in fact at the right place, , leading to a correct prediction . (g)–(i) Incorrect prediction. The logarithmic gap below the largest singular values is smaller than some other gap, , leading to a false prediction .
Using the analytical solution to Eq. (16), we simulate the measurement of trajectories of the first variable , starting from different initial conditions, each chosen randomly from a uniform distribution on , see Figs. 2(d) and 2(g). The trajectories are evaluated at discrete time points, which for sake of simplicity are equally spaced, with a time step between successive measurements. For a single reconstruction trial, a detection matrix containing measurement series with sampling points each is constructed [Figs. 2(e) and 2(h)] and the rank is extracted by counting the number of singular values of which lie above the largest gap in the log-scaled singular value spectrum (as computed by the LAPACK routine dgejsv20). Figures 2(d)–2(i) exemplarily illustrate two dimension reconstruction trials for the same dynamical system, one leading to a correct prediction , and one leading to an incorrect prediction . The dimension of the detection matrix and even the different initial conditions are identical for both examples, the only difference lies in the choice of the time step between successive sampling points, which evidently may be important for successful reconstruction.
Reconstruction performance for linear system. (a) Predicted system dimension as a function of the (constant) time step between consecutive sampling points , . Typically, by varying , so that is maximized, we find a good predictor for the true system dimension . (b) Reconstruction quality in terms of the share of correct predictions as a function of the system dimension for different system parameters and . For each of system realizations for every choice of , , reconstruction is carried out for different time steps , with , and the maximum value of is taken as prediction for the specific system.
Reconstruction performance for linear system. (a) Predicted system dimension as a function of the (constant) time step between consecutive sampling points , . Typically, by varying , so that is maximized, we find a good predictor for the true system dimension . (b) Reconstruction quality in terms of the share of correct predictions as a function of the system dimension for different system parameters and . For each of system realizations for every choice of , , reconstruction is carried out for different time steps , with , and the maximum value of is taken as prediction for the specific system.
Figure 3(b) visualizes the reconstruction quality in terms of the share of correct predictions as a function of the system dimension , for different choices of and model realizations for each . Throughout, we set a graph density of . We find system dimension inference from a single variable to be quite robust up to dimension . Reconstruction quality generally is better for systems with smaller self-interaction strength , which for the considered model corresponds to Jacobian matrices with eigenvalues closer to the origin, which lead to eigenmodes decaying on relatively different time scales.
V. ROLE OF DATA COLLECTION
For successful reconstruction, it is necessary that both the number of trajectories and the number of time steps per trajectory are larger than the (unknown) system dimension . From a strictly mathematical viewpoint, these conditions are also sufficient, if the considered system is observable (cf. Ref. 17). However, in the last section, we saw that from a numerical perspective also the choice of the time step between successive sampling points is critical [see Fig. 3(a)]. Often, an optimal choice can be found by maximizing the prediction [see Eq. (18)] upon varying parameters. We explore in more detail how the number of time steps per trajectory , the time step , and the total measurement time co-act and together affect the inference quality.
Figure 4 illustrates the share of correct predictions as a function of the time steps and the number of trajectories (left column) as well as as a function of the time steps and the total measurement time (right column), for and different choices of the self-interaction strength model parameter . Though the exact details may depend on the specific model type and the system dimension , we can distinguish qualitatively different cases. Depending on the value of , the choice of time steps which leads to nearly perfect reconstruction is determined either by minimal and maximal values of either the time step itself, the total measurement time or combinations of both. For example, for [Figs. 4(a) and 4(b)], we find the choices of which lead to nearly perfect reconstruction to be determined only by the total measurement time , regardless of , while for [Figs. 4(c) and 4(d)], a minimum total measurement time has to be surpassed, while the time step should not exceed a maximum value. Finally, for larger [Figs. 4(e) and 4(f)], we find the choices of that lead to a high share of correct predictions to be entirely independent of (as long as ) and bounded by values of the time step itself. We point out that the situation changes qualitatively, if we do not require a high share of correct predictions but instead require only that (any correct predictions at all). In this case, for the considered class of linear systems, the lower boundary is typically defined in times of a minimal total measurement time , while the upper boundary is set by a maximum time step . We remark that the described effect of and on the share of correct predictions is inherently defined for ensembles. Across given model realizations, the effect of and may differ. We expect the observed behavior to be closely connected to the eigenvalue spectrum of the coupling matrix , which determines the time scales of the different decaying eigenmodes of the system. For successful reconstruction, all of these have to be appropriately resolved.
Effect of sampling time step and total covered time . Linear model as given by Eqs. (16) and (17), with parameters (a) and (b), (c) and (d), and (e) and (f). (a), (f), and (k) Share of correct predictions as a function of the time step between successive sampling points and the number of sampling points . (b), (g), and (e) Same data visualized as a function of the time step between successive sampling points and the total covered time per time series. (White regions: no data.) Blue/red lines: parameter choices with . Depending on , the range of time steps , total measurement times , and number of samples per trajectory which lead to robust reconstruction ( ) are determined by boundaries on either or or mixtures thereof (red lines). However, the possibility of having any correct predictions is always determined by a minimum total time , and a maximum time step , regardless of (blue lines). Throughout, , , , calculated for ensemble of realizations for each , , . Exemplaric eigenvalue spectra of coupling matrix shown as the inset for each respective choice of .
Effect of sampling time step and total covered time . Linear model as given by Eqs. (16) and (17), with parameters (a) and (b), (c) and (d), and (e) and (f). (a), (f), and (k) Share of correct predictions as a function of the time step between successive sampling points and the number of sampling points . (b), (g), and (e) Same data visualized as a function of the time step between successive sampling points and the total covered time per time series. (White regions: no data.) Blue/red lines: parameter choices with . Depending on , the range of time steps , total measurement times , and number of samples per trajectory which lead to robust reconstruction ( ) are determined by boundaries on either or or mixtures thereof (red lines). However, the possibility of having any correct predictions is always determined by a minimum total time , and a maximum time step , regardless of (blue lines). Throughout, , , , calculated for ensemble of realizations for each , , . Exemplaric eigenvalue spectra of coupling matrix shown as the inset for each respective choice of .
Thus, knowledge about the dependency of the optimal choice of sampling parameters and may allow a more effective calibration in terms of setting the time step . Without such knowledge, the reconstruction quality can still be optimized by varying it without any prior information.
In contrast to the number of trajectories , which indirectly might influence the reconstruction quality for fixed via the total measurement time , we find that the number of trajectories (i.e., the number of initial conditions) often does not influence the reconstruction quality at all, as long as it exceeds the actual system size, .
Effect of measurement precision. The resolution for recording the trajectories of the single accessible variable characterizes the maximum system dimension which can be robustly inferred. (a) Share of correct predictions as a function of for different numbers of decimal places and for floating precision with significant digits. (b) Share of correct predictions as a function of and the number of decimal places of the dynamical data. Note the approximately linear relation between and the number of variables which can be reliably reconstructed. averaged over linear model realizations per value of , with , . The time step is chosen to maximize , and , throughout.
Effect of measurement precision. The resolution for recording the trajectories of the single accessible variable characterizes the maximum system dimension which can be robustly inferred. (a) Share of correct predictions as a function of for different numbers of decimal places and for floating precision with significant digits. (b) Share of correct predictions as a function of and the number of decimal places of the dynamical data. Note the approximately linear relation between and the number of variables which can be reliably reconstructed. averaged over linear model realizations per value of , with , . The time step is chosen to maximize , and , throughout.
VI. ROBUSTNESS OF DIMENSION RECONSTRUCTION
So far, we only considered systems described by deterministic linear differential equations. This section addresses the effect of nonlinearities and stochastic noise on reconstruction quality.
Reconstruction quality for noisy and nonlinear dynamics. (a) Share of correct predictions as a function of system dimension for stochastic dynamics as given by Eq. (20) for different strengths of multiplicative noise. model realizations per value of , time step varied to maximize , as always. (b) Share of correct predictions as a function of system dimension for nonlinear networks of Kuramoto oscillators as described by Eq. (21) for different coupling strengths . Unlike with linear systems, maximizing across large ranges of may induce an overestimation of the true system dimension, see the inset for five system realizations (five colored lines), each with . For the main plot, we varied for , and for , as marked in the inset (black vertical lines). Throughout, we average over model realizations per value of .
Reconstruction quality for noisy and nonlinear dynamics. (a) Share of correct predictions as a function of system dimension for stochastic dynamics as given by Eq. (20) for different strengths of multiplicative noise. model realizations per value of , time step varied to maximize , as always. (b) Share of correct predictions as a function of system dimension for nonlinear networks of Kuramoto oscillators as described by Eq. (21) for different coupling strengths . Unlike with linear systems, maximizing across large ranges of may induce an overestimation of the true system dimension, see the inset for five system realizations (five colored lines), each with . For the main plot, we varied for , and for , as marked in the inset (black vertical lines). Throughout, we average over model realizations per value of .
VII. CONNECTION TO THE HANKEL-MATRIX APPROACH21
Depending on the experimental situation, one method may be chosen over the other. On one side, for the Hankel-matrix method,21 the experimental setup must allow measurement with a precise constant sample rate over a longer time—otherwise, the same data point cannot be used in different positions of the detection matrix. This also implies that the transient of the system dynamics is actually long enough, and that the measurement system’s resolution capabilities can sufficiently cover the whole observed dynamics. On the other side, for the more general method described in this manuscript and Ref. 16, a larger number of trajectories must be available. However, we point out that for practical applications also combinations of the approaches introduced in Refs. 21 and 16 may be an option. For example, suppose that experimental conditions allow measuring multiple trajectories but not enough to ensure . In such a case, additional series may be synthesized by splitting trajectories which are long enough into smaller parts, or by time-shifting them in a manner similar to Ref. 21. Of course, this approach can also be generalized to cases where multiple variables are perceivable, .
By performing simulations for , we find that generically using multiple shorter trajectories leads to a better dimension prediction performance than using a single longer trajectory, when the total number of sampled data points is the same, see Fig. 7. This might be explained by the fact that a single trajectory comes with higher levels of correlation and carries substantially less dynamical information, making numerical rank evaluation more ambiguous. Apart from this, long trajectories may be problematic because amplitudes get very small, deteriorating reconstruction quality. Hence, whenever possible, it might be preferable to distribute available measurement capacities on multiple time series starting from different initial conditions. We also exemplarily illustrate the inference quality for a mixed scenario, in which trajectories are recorded and split in the middle. As expected, its performance falls between the pure single-trajectory approach and the approach with , suggesting it as a viable alternative if the number of trajectories is constrained.
Comparison between multi-trajectory and single-trajectory approach. Share of correct predictions as a function of system dimension for different inference methods, using the same total number of data points, for noiseless dynamics (solid lines) and multiplicative noise ( , dashed lines). (a) The multi-trajectory approach as discussed in this manuscript with univariate time series consisting of samples typically performs better than the single-trajectory approach suggested in Ref. 21 with a single time series including a total of time points, leading to a Hankel-matrices . (b) Combined approach with time series consisting of data points each. From each of the time series, shorter time series with length are constructed by collating every th data point into the same series. This leads to a detection-matrix with an effective time step , with a total of data points. The reconstruction quality lies between the multi-trajectory approach with and the single-trajectory approach with , suggesting its use in cases when multiple trajectories are available, but not enough to guarantee that . shown for an ensemble of linear model realizations for each , with , . For each system realization and inference approach, is varied to maximize .
Comparison between multi-trajectory and single-trajectory approach. Share of correct predictions as a function of system dimension for different inference methods, using the same total number of data points, for noiseless dynamics (solid lines) and multiplicative noise ( , dashed lines). (a) The multi-trajectory approach as discussed in this manuscript with univariate time series consisting of samples typically performs better than the single-trajectory approach suggested in Ref. 21 with a single time series including a total of time points, leading to a Hankel-matrices . (b) Combined approach with time series consisting of data points each. From each of the time series, shorter time series with length are constructed by collating every th data point into the same series. This leads to a detection-matrix with an effective time step , with a total of data points. The reconstruction quality lies between the multi-trajectory approach with and the single-trajectory approach with , suggesting its use in cases when multiple trajectories are available, but not enough to guarantee that . shown for an ensemble of linear model realizations for each , with , . For each system realization and inference approach, is varied to maximize .
We point out that the Hankel-matrix method performs better In Ref. 21, than in our own simulations. This might be due to a thresholding scheme for the rank extraction which is more carefully calibrated to the model, while this work for sake of comparability and generality uses a completely model-free approach which is based on the largest logarithmic gap in the singular value spectrum (see Fig. 2). However, our results should qualitatively hold also for more sophisticated rank extraction schemes.
VIII. CONCLUSION
The dimension of a dynamical system, that is, its number of dynamical variables, is arguably one of its most fundamental properties. Knowing the exact number of variables of a network dynamical system is crucial for applying more specific reconstruction techniques23–28 as well as for estimating models of the system in the first place. While there is a notable amount of literature on inferring the network dimension from observable continuous time dynamics, all of these methods consider multiple perceivable variables,16 at most a few hidden units,8,29–31 or are limited to rather specific experimental situations.1,21 Our article addresses the challenge of inferring the dimension of a network dynamical system from univariate time series covering only the dynamics of a single variable.
Building on the detection-matrix method as introduced in Ref. 16, we show that from a mathematical perspective, reconstruction of the system dimension from dynamical data from only a single variable is not qualitatively different from reconstruction using multi-dimensional data. We demonstrate that dimension reconstruction from a single variable is not only possible in theory, but also numerically relatively robust. We explore which system properties and measurement decisions determine the prediction quality and point out that typically the inference quality is restricted only by the numerical precision of the measured data. In particular, we illustrate how the detection-matrix method comes out without any presumptions on the probed system, if the sampling rate is varied systematically, and that its performance may in fact improve with larger nonlinearities. Finally, we analytically demonstrate that another dimension-inference approach, which is based on Hankel-matrices build from single-trajectories, represents a special case of the detection matrix approach as treated in Ref. 16 and this work.
A possible future step would be to study the applicability of the proposed inference technique to real-world systems which consist of a small number of units, such as small biological networks, e.g., metabolic, genetic or neural, depending on available data quality and quantity. Even though in most realistic scenarios dimension reconstruction from a single node may not be yet practical due to unrealistic demands on the numerical precision of the observed data, a general bottleneck for detection-matrix approaches, our conceptual work may serve as a starting point for a systematic study of dimension reconstruction from a very small number of variables (e.g., ), in particular, regarding the choice of sampling time steps, and possibly the consideration of topological features. On the technical side, we point out that we chose the relative-thresholding approach for rank extraction primarily for sake of simplicity and generality, while other, more evolved approaches may increase the power of the inference method,32 in particular, if they are chosen based upon potential a priori information about the considered system.
Overall, we have demonstrated that inference of state space dimension, as inference of attractor dimension, from single variable time series is possible. It is also robust, with its accuracy mainly limited by the resolution of the time series data. By using the suggested calibrating scheme for the sampling time steps, the proposed approach is entirely model-free. A suitable, close to optimal sampling may be determined by either the time step or the total observed time, depending on the system type. Time series recorded from truly nonlinear dynamics, not only close-to-linear near fixed points, might increase predictive power because the recorded time series “look less similar,” i.e., fill a detection matrix with less aligned vectors. Moreover, as a rule of thumb, our results and previous works16,21 suggest that independent trajectories yield more robust results than inference based on a single long trajectory.
ACKNOWLEDGMENTS
We thank Ye Yuan for helpful discussions about the Hankel-matrix approach introduced in Ref. 21. This work was partially supported by the Development Bank of Saxony (Sächsische Aufbaubank, SAB) under Grant No. 100400118.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Georg Börner: Conceptualization (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Resources (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Hauke Haehne: Conceptualization (equal); Formal analysis (equal). Jose Casadiego: Conceptualization (equal); Formal analysis (equal). Marc Timme: Conceptualization (equal); Funding acquisition (equal); Methodology (equal); Project administration (equal); Resources (equal); Supervision (equal); Writing – original draft (equal); Writing – review & editing (equal).
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.