The ordinal pattern-based complexity–entropy plane is a popular tool in nonlinear dynamics for distinguishing stochastic signals (noise) from deterministic chaos. Its performance, however, has mainly been demonstrated for time series from low-dimensional discrete or continuous dynamical systems. In order to evaluate the usefulness and power of the complexity–entropy (CE) plane approach for data representing high-dimensional chaotic dynamics, we applied this method to time series generated by the Lorenz-96 system, the generalized Hénon map, the Mackey–Glass equation, the Kuramoto–Sivashinsky equation, and to phase-randomized surrogates of these data. We find that both the high-dimensional deterministic time series and the stochastic surrogate data may be located in the same region of the complexity–entropy plane, and their representations show very similar behavior with varying lag and pattern lengths. Therefore, the classification of these data by means of their position in the CE plane can be challenging or even misleading, while surrogate data tests based on (entropy, complexity) yield significant results in most cases.
I. INTRODUCTION
The term “complexity” is one that is widely used across many fields, and there are many definitions of it. In dynamical systems theory, “complexity” is usually associated with a system that displays chaotic behavior, in contrast to periodic or stochastic systems. The characterization of such systems, and the distinction from purely (“boring”) random behavior, has been a subject of study for a long time now. Many (often very similar) approaches have been suggested to classifying a time series from an unknown origin as either “complex” or random.1–8
A straightforward approach to characterizing the complexity of a time series has been dynamical entropies.3 Their disadvantage is, however, that they are large both for chaotic systems and purely random ones, rendering them incapable of differentiating between the two.1,4 This has inspired several authors to find an alternative or additional measure to better distinguish noise from chaos. The method investigated in this work is the complexity–entropy plane (CE plane, sometimes also referred to as complexity–entropy–causality plane) introduced by Rosso et al.4 It is based on ordinal pattern statistics,3 an approach to the symbolization of time series. In addition to the widely used permutation entropy, it uses a second quantity, the so-called statistical complexity of a time series for characterization of data.
The statistical complexity is defined by Rosso et al.4 via the Jensen–Shannon divergence of a distribution of symbols (patterns) to a uniform distribution. This way, a time series with maximum entropy has zero complexity. To ensure that regular data (with a non-uniform distribution of patterns and a low entropy) also display low complexity, the Jensen–Shannon divergence is multiplied with the permutation entropy value. This results in two dependent quantities, the entropy and the complexity, that both compare a distribution of patterns to a uniform distribution. The permutation entropy and the statistical complexity of a time series, i.e., its position in the CE plane, are widely used to characterize the complexity (or chaoticity) of time series or to distinguish between data of different origins.10–18
While several studies have examined the separation of stochastic time series and low-dimensional discrete chaotic dynamical systems in the complexity–entropy (CE) plane, the analysis of data from dynamical systems exhibiting high-dimensional chaos has been somewhat neglected. An exception is a study of Zunino et al.12 who investigated high-dimensional times series generated by the Mackey–Glass equation19 and found that the estimated complexity values highly depend on the lag of the ordinal patterns used. In this work, we aim to investigate the influence of increasing dimension of the attractor underlying the analyzed time series on estimated complexity and entropy values and to develop a guide on how to analyze (possibly very high-dimensional) real-world time series using the CE plane.
To do so, we chose high-dimensional dynamical systems from four different categories, representing continuous, discrete, time delay, and spatiotemporal chaos. For each system, we varied the attractor dimension, here estimated by the Kaplan–Yorke dimension. We compared the position of phase randomized surrogates in the CE plane with that of the original hyperchaotic time series and found that a visual distinction between the two is often barely possible, even in cases where a surrogate data test yields significant results. Even more important for the interpretation of the CE-diagrams, however, is the fact that stochastic surrogate data and time series from deterministic systems occur close together in the same region of the CE plane. This suggests that the more popular practice of visual distinction via the position on the CE plane can be problematic.
II. DYNAMICAL SYSTEMS
We chose four different high-dimensional dynamical systems for our comparison: the Lorenz-96 system20 as a continuous system, the discrete generalized Hénon map,21–23 the Mackey–Glass equation19 as an example of a time-delay system, and the spatiotemporally chaotic Kuramoto–Sivashinsky equation.24,25 The dynamical rule of each system is given in the following. The sampling times of the continuous-time systems were adjusted such that a given number of samples shows approximately the same number of oscillations for comparability; see also exemplary plots of time series in Fig. 1. The sampling rates , as well as all other parameters necessary to reproduce the simulations, are given in Table I.
System . | Initial condition . | Parameters . | Integration . | Dimensionality . |
---|---|---|---|---|
Lorenz-96 | xj(0) = j ⋅ 0.1 forj = 0, …, D − 1 | F = 24 | Tsit5,9 adaptive time step, δt = 0.02 | , Δ(KY) ≈ 43 for D = 50 |
Generalized Hénon | a = 1.76, b = 0.1 | … | , Δ(KY) ≈ 43 for D = 44 | |
Mackey–Glass | x(0) = 1.0, x(t) = 0 for t < 0 | β = 2, γ = 1, ν = 9.65 | RK4, Δt = 0.1, δt = 0.2 | , Δ(KY) ≈ 43 for τ = 39 |
Kuramoto–Sivashinsky | y(t = 0, x) = cos (x) + 0.1sin (x/8) + 0.01cos (2πx/L) | … | Tsit59 in spectral domain, adaptive time step, Δx = 0.2, δt = 1 | , Δ(KY) ≈ 43 for L = 30 |
System . | Initial condition . | Parameters . | Integration . | Dimensionality . |
---|---|---|---|---|
Lorenz-96 | xj(0) = j ⋅ 0.1 forj = 0, …, D − 1 | F = 24 | Tsit5,9 adaptive time step, δt = 0.02 | , Δ(KY) ≈ 43 for D = 50 |
Generalized Hénon | a = 1.76, b = 0.1 | … | , Δ(KY) ≈ 43 for D = 44 | |
Mackey–Glass | x(0) = 1.0, x(t) = 0 for t < 0 | β = 2, γ = 1, ν = 9.65 | RK4, Δt = 0.1, δt = 0.2 | , Δ(KY) ≈ 43 for τ = 39 |
Kuramoto–Sivashinsky | y(t = 0, x) = cos (x) + 0.1sin (x/8) + 0.01cos (2πx/L) | … | Tsit59 in spectral domain, adaptive time step, Δx = 0.2, δt = 1 | , Δ(KY) ≈ 43 for L = 30 |
Each of the systems is capable of displaying high-dimensional chaos. We calculated the Lyapunov spectrum of each system to estimate the Kaplan–Yorke (KY) dimension28 . The KY dimension and the Lyapunov spectra are displayed in Fig. 2. The simulations and calculation of the KY dimension were performed in Julia using the DynamicalSystems.jl29 library.
A. The surrogate data approach
To address the question whether (high-dimensional) chaotic time series can be distinguished from stochastic data by their position in the CE plane, we are interested in the most challenging cases, i.e., in time series that are random but share as many features as possible with the given observable from the deterministic system. Such stochastic time series are often called surrogate data, and there are many ways to generate them.30
The method of surrogate data was introduced by Theiler et al.31 in 1992 as a statistical approach for identifying nonlinearities in time series, based on the widely known and applied bootstrapping method.32 It is a way of generating new “surrogate” time series from original (measured) data to test a specific null hypothesis.30
Surrogate data are often generated by phase randomization. The original time series is transformed into Fourier space, where the phases of the spectrum are randomized. This keeps the power spectral density, and, thus, also the autocorrelation, unchanged.33,34 The spectrum with the randomized phases is then transformed back, which results in a time series that has the same autocorrelation as the original data, but could have been generated by a linear Gaussian process. This type of surrogates is usually referred to as FT surrogates.30
In addition to FT surrogates, we used surrogates generated by an amplitude adjusted Fourier transform31 (AAFT surrogates). Here, the amplitude distribution is rescaled to resemble a Gaussian distribution before randomizing the phases. After phase randomization, the amplitude distribution is rescaled again to match that of the original time series.
Once the surrogate time series have been generated, a discriminatory quantity of choice can be applied. It is calculated for both original and surrogate data. If the values of the original data differ significantly from those of the surrogates in a statistical test, the null hypothesis is rejected. Assuming Gaussian distributed surrogates, this test can be as simple as calculating the standard deviation of the surrogate distribution and rejecting the null hypothesis if the original data differ several standard deviations from the mean of the surrogates.
Here, all pairwise distances between the surrogates on the CE plane are calculated to generate a distribution. A second distribution is estimated from the distances between the original data point to each surrogate. The significance is then calculated using a two-sample Kolmogorov–Smirnov test35 in order to not depend on any underlying assumptions about the distribution of surrogates.
III. THE COMPLEXITY–ENTROPY PLANE
The concept of ordinal patterns was introduced by Bandt and Pompe3 in 2002 and can be considered a subbranch of symbolic dynamics.36 Originally introduced to measure complexity in time series, ordinal pattern-based methods have, for example, been successfully used to classify physiological time series,17,37–44 quantifying complex networks and synchronization,45–47 and finding similarities between neuronal and optical (laser) spikes.48–51 All complexity–entropy values in this paper were calculated using the open-source library ComplexityMeasures.jl.52
A. Ordinal patterns and permutation entropy
In ordinal pattern analysis, a real-valued time series , where with and being the sampling time, is transformed into a sequence of symbols from a finite alphabet. Given a length and a lag , , a pattern is defined by the sample points . For a specific length , there are different possible patterns. Each pattern is assigned a unique permutation index . If the pattern contains two identical amplitudes, these amplitudes are ordered with respect to their occurrence in time in the original formulation in Ref. 3. However, this can lead to false conclusions,68 which is why here, one of the two values is randomly chosen to be the “larger” value.
Once a time series is translated into a sequence of symbols, this sequence can be statistically analyzed: The probabilities of occurrence of each pattern can be estimated from the series, and the Shannon entropy53 of that probability distribution can be calculated.
For a signal generated by a deterministic dynamical system, only some of the possible patterns of a given length occur if one chooses a large enough pattern length. Patterns that cannot occur due to the underlying deterministic dynamical rule are called forbidden patterns. This results in a non-uniform distribution of patterns and, thus, a lower entropy than a more “complex” signal, where almost all patterns occur with similar probability.
It must be noted, though, that the interpretation of the word “complexity” needs to be thought of very carefully. In a (finite) time series containing only white noise, all possible patterns will occur with (almost) the same probability, resulting in the highest possible entropy—that of the uniform distribution. Thus, a more unbiased interpretation of the PE would be that it quantifies the irregularity of a signal.
On the other hand, a highly complex time series, containing not (only) white noise but an underlying deterministic complex dynamics, can also, for (too) short patterns, have a uniform distribution of patterns, thus giving the same entropy as a solely stochastic time series.
It is important to note that the above considerations are not directly applicable to continuous-time dynamical systems because the lag can become very small with respect to the typical time scales of the time series if the sampling time is (very) small. A chaotic, oversampled time series can still display comparatively low normalized permutation entropy and high complexity due to the fact that some patterns are (extremely) unlikely to occur simply because they would violate the smoothness of the curve.57
IV. STATISTICAL COMPLEXITY OF HIGH-DIMENSIONAL DYNAMICAL SYSTEMS
While the separation of stochastic and deterministic time series in the complexity–entropy (CE) plane has been investigated by many authors for data from low-dimensional and often discrete dynamical systems,10,11,13–16 only very few studies for high-dimensional systems exist.12
The first problem to consider here is the general problem of the amount of available data: Eckmann and Ruelle58 argued that for a faithful estimation of the Lyapunov exponents of a dynamical system with attractor dimensionality , data points are required. While to our knowledge no such estimation exists for the permutation entropy of a system, one could use this estimate as an orientation for the needed amount of data for ordinal pattern-based quantities. For a time series consisting of points, for example, this limits the resolvable attractor dimension to . Not to mention a number of other conservative estimates for the required number of data points.59–61
Viewing ordinal patterns from the delay embedding62–64 point of view, where classically, the embedding dimension should be with the attractor dimension , there arises a second computational problem. If the embedding dimension in delay embedding would be equivalent to the pattern length , one would need very large to resolve high-dimensional systems, for example, , even for just a four-dimensional attractor. While this case would technically be covered by the rule of Eckmann and Ruelle, the estimation of a histogram with bins cannot be done faithfully from data points, only. Additionally, the resulting histogram would differ from a uniform distribution due to the inevitable empty bins by construction, leading to possibly spuriously low entropy and high statistical complexity.
All of these considerations raise the question of whether analyses with ordinal pattern-based quantities provide the expected results even for data from high-dimensional dynamical systems. Zunino et al.12 have done an investigation of the CE plane for high-dimensional data using the Mackey–Glass equation19,26 and found that the estimated complexity values highly depend on the lag of the ordinal patterns used. In this work, we confirm this result for data from different dynamical systems as well as stochastic time series obtained via phase randomization (FT and AAFT surrogates). Furthermore, we investigate the influence of increasing attractor dimension on estimated complexity and entropy values and demonstrate the potential and limitations of CE-plane analysis of (possibly very high-dimensional) real-world time series.
To investigate these issues in more detail, we studied the influence of parameters that can be chosen during the analysis, such as the pattern length and the lag used for sampling the patterns and others that are given (and unknown), such as the dimensionality of the process generating the data or the amount of data available.
A. Control parameters: Pattern length and lag
The pattern length and the lag used for sampling and composing the pattern can be chosen by the user performing the time series analysis and both have a major impact on the results obtained. If the pattern length is too small, the patterns obtained cannot encode the dynamics similar to a too low embedding dimension. Therefore, the ordinal pattern distributions are (almost) uniform, resulting in high entropy and low complexity values, even in cases with a clear deterministic structure.65 If, on the other hand, is chosen very large, the given length of the time series is not sufficient anymore to fill all bins, and there is a systematic bias in the opposite direction where distributions appear more non-uniform than the true distribution for this given and . Therefore, there is a threshold that has to be exceeded by the pattern length to unfold a relevant structure in the data and another, , which must not be exceeded to guarantee a sound estimation of the distribution (for a given length of the time series). As long as , any choice of should provide useful ordinal pattern distributions, but for data from high dimensional systems, this may become a major challenge because there can be larger than .
Figure 3 shows the dependency of the complexity–entropy values on the pattern length for a fixed lag for the original time series of the systems investigated and for their FT and AAFT surrogate data. As expected, for the small values of shown there, the entropy decreases and the complexity increases with the length . No minimum- and maximum complexity–entropy curves are plotted in this diagram because these curves depend on the pattern length. For the continuous-time systems, we fixed the lag to the same value, but still, we find different behavior: While for the Lorenz-96 system, a clear separation between original data and surrogates is visible, this does not happen for the Mackey–Glass system. It should be noted that one can tune the parameters (lag) in a way that data from the Mackey–Glass system show the same behavior as the time series of the Lorenz-96 system, but we found it also noteworthy that for similar time scales, the two data sets display very different results.
In fact, the lag is the other relevant control parameter, which has a major influence on the estimates of entropy and complexity. Large lags result in patterns, which are composed of samples far apart in time, which are (almost) statistically independent. Therefore, the resulting ordinal pattern distributions are (almost) uniform with (almost) maximum permutation entropy and vanishing complexity. For densely sampled smooth time series from continuous-time systems, another pitfall exists. For small lags , a few patterns, such as increasing or decreasing time series values, occur with a (very) high probability due to correlations on short time scales. The corresponding very non-uniform distributions result in low or medium entropies and medium or high complexity. If the lag is increased, the complexity reaches a maximum before it decreases to zero for very large . This characteristic dependency was first reported by Zunino et al.12 for the Mackey–Glass system and is shown in Fig. 4 for the systems investigated in this study. It is important to note that for the values of the highest complexity, original and surrogate data can lie close together and follow a similar path when the lag is increased, suggesting that the intermediate entropy and high complexity values stem from fine sampling and the resulting smoothness of the time series in combination with relatively small lags.
Interestingly, the CE values of the FT surrogates and the AAFT surrogates do not differ significantly. We conjecture that this agreement is due to the fact that ordinal patterns are invariant with respect to monotonously increasing transformations of the data, and thus, the amplitude adjustment of the AAFT method has no major impact on the results (see the Appendix).
In order to evaluate the combined impact of pattern length and lag , Fig. 5 shows the significance of an AAFT-surrogate data test depending on . It can be seen that there is no common dependency of the significance on lag and word length across different dynamical systems. While the distinction tends to be more prominent for Mackey–Glass and Kuramoto–Sivashinsky systems with larger pattern lengths, time series of the Lorenz-96 system are always distinguishable from their surrogates for large amounts of data ( data points). The generalized Hénon map displays a dependency mostly on the chosen lag and not so much for the pattern length. This is presumably a result of the “cyclic” structure of the map. We find that the significance is high specifically for the cases where , and in fact, there are only a few cases for very short data lengths ( ) (not shown here), in which significance is not given for for any chosen pattern length between 3 and 7. To be able to use patterns up to length , of course, one needs to be able to measure enough data. For this estimation, we used the large amount of data points.
B. Given parameters: Attractor dimensionality and the amount of data
In this study, we address the scientific question what happens with the CE-analysis if the time series of interest was generated by a deterministic system with a chaotic attractor of (unknown) high dimensionality. Therefore, we investigated the influence of the attractor dimensionality on the position of points in the CE plane as shown in Fig. 6. For a constant pattern length and lag , the position of a time series in the CE plane quickly moves to the lower right corner of the plane, where purely stochastic systems would be expected. It should be mentioned again that for the continuous-time systems, this depends significantly on the chosen lag for a given sampling time . For very small sampling times and lags, even systems with display a high complexity and low to intermediate permutation entropy, as can also be observed in Fig. 4. In this case, the high complexity values are mostly a result of the smoothness of the finely sampled smooth continuous time series, which leads by construction to some patterns appearing significantly more often than others (e.g., strictly increasing or decreasing patterns). Again, in Fig. 4, there is almost no difference in CE values of FT- and AAFT-surrogate time series.
Figure 7 is an illustration of how statistical complexity and permutation entropy can be significantly over- and underestimated, respectively, if calculated from too little data. Both of these biases are expected due to the already discussed fact that the bins of the histograms must be properly filled.
Interestingly, the convergence to a limit appears to happen faster than one can naïvely expect when considering the amount of possible ordinal patterns. This could be considered an indication of forbidden patterns66,67 in the here considered chaotic systems, meaning that there are much less truly possible ordinal patterns, and thus, a significantly smaller amount of data is needed to correctly estimate the ordinal probability distribution.
V. DISCUSSION
Previous work has shown that time series generated by low-dimensional dynamical systems can be distinguished from stochastic time series (noise) by means of their entropy and statistical complexity values, both computed using ordinal pattern statistics.4 In this work, we addressed the question, whether this approach is also feasible with data from systems with high-dimensional chaotic attractors. As examples, we used time series from the Lorenz-96 system, a generalized Hénon map, the Mackey–Glass delay-differential equation, and the spatially extended Kuramoto–Sivashinsky equation given by a partial differential equation. For these systems, control parameters exist that allow one to increase the attractor dimension in a systematic manner that we used to generate data with Kaplan–Yorke dimensions ranging from 1 up to 50. The representation (i.e., location) of the time series in the complexity–entropy plane was studied, not only for varying attractor dimensions, but also its dependence on the length of the time series available and the length and lag of the ordinal patterns used. Furthermore, these results were compared with the corresponding values obtained for phase-randomized surrogate time series.
Both the entropy and the statistical complexity measure a deviation of the distribution of patterns to a uniform distribution, albeit in different ways. There are, however, at least two reasons why this deviation can be overestimated resulting in spuriously low entropies and spuriously high values of the statistical complexity. In the case of very densely sampled smooth continuous time series, (highly) nonuniform distributions of ordinal patterns occur even for stochastic data, obtained by phase randomization, for example. In this case, any (complexity, entropy)-curve in the CE plane parameterized by the lag exhibits a local maximum of the complexity for intermediate pattern lags, and it had been suggested to focus at complexity values at this maximum for distinguishing chaotic and stochastic dynamics.12 Our results show, however, that phase-randomized surrogate data follow the same path along the CE plane and have a very similar maximum. This limitation and the characteristic shape of the lag curve in the CE plane (Fig. 4) also occur with time series from low-dimensional chaotic systems (not shown here). Interestingly, amplitude adjusted phase-randomized (AAFT) and (just) phase-randomized (FT) surrogates provide essentially the same results because the adjustment has almost no impact on ordinal pattern distributions.
The second reason for observing (highly) non-uniform distributions is too few data points required for a sound estimation of ordinal pattern distributions. This problem occurs, in particular, for high dimensional time series. There, the pattern length has to exceed some lower bound to resolve or unfold the dynamics. Below this critical pattern length, the resulting distribution is close to a uniform distribution and not suitable for distinguishing the data from noise. If, however, the pattern length is increased, the number of bins grow as and the corresponding distribution has empty or poorly filled bins due to too few data points, even for long time series.
To avoid or at least detect both pitfalls resulting in spuriously low entropy and spuriously high complexity values, we suggest a comparison with the corresponding values of phase-randomized surrogate data because both reasons for highly nonuniform distributions have essentially the same effect for smooth stochastic data. If the null hypothesis that the given time series deviates from a linear stochastic process cannot be rejected using a combination of permutation entropy and statistical complexity as a discriminating statistic (see Sec. II A), any resulting position in the CE plane should be interpreted with caution. For this evaluation, diagrams, such as those shown in Figs. 3–5, provide valuable information, whether the desired characterization of the data using the location in the CE plane can be considered meaningful at all. Keeping in mind that for very high-dimensional data, large pattern lengths would be needed to properly separate chaos from noise, the often limited amount of data makes the estimation of such finely partitioned distributions difficult to impossible.
ACKNOWLEDGMENTS
We thank George Datseris, Stefan Luther, Alexander Schlemmer, and all members of the Research Group Biomedical Physics for continuous support and inspiring discussions and the reviewers for very constructive remarks. Furthermore, financial support by the German Centre for Cardiovascular Research (DZHK) e.V. and the Max Planck Society is gratefully acknowledged.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Inga Kottlarz: Conceptualization (supporting); Data curation (lead); Formal analysis (lead); Investigation (equal); Software (lead); Validation (equal); Visualization (lead); Writing – original draft (lead). Ulrich Parlitz: Conceptualization (lead); Investigation (equal); Project administration (lead); Supervision (lead); Validation (equal); Writing – review & editing (lead).
DATA AVAILABILITY
The code to produce the data that support the findings of this study are openly available in GitHub at https://github.com/ikottlarz/HighDimensionalComplexityEntropy.
APPENDIX: FT VS AAFT SURROGATES
Figure 8 shows the significance of an FT surrogate data test depending on . It can be seen that the results are very similar to those shown in Fig. 5.
Figure 9 shows the distributions of the normalized permutation entropy for both AAFT and FT surrogates, once for each of the dynamical systems considered in this paper. We find that the distributions of normalized permutation entropy of FT and AAFT surrogates are indistinguishable. Presumably, this is due to the fact that ordinal patterns are invariant with respect to monotonous amplitude transformations.