The fractal dimension is a central quantity in nonlinear dynamics and can be estimated via several different numerical techniques. In this review paper, we present a self-contained and comprehensive introduction to the fractal dimension. We collect and present various numerical estimators and focus on the three most promising ones: generalized entropy, correlation sum, and extreme value theory. We then perform an extensive quantitative evaluation of these estimators, comparing their performance and precision using different datasets and comparing the impact of features like length, noise, embedding dimension, and falsify-ability, among many others. Our analysis shows that for synthetic noiseless data, the correlation sum is the best estimator with extreme value theory following closely. For real experimental data, we found the correlation sum to be more strongly affected by noise vs the entropy and extreme value theory. The recent extreme value theory estimator seems powerful as it has some of the advantages of both alternative methods. However, using four different ways for checking for significance, we found that the method yielded “significant” low-dimensional results for inappropriate data like stock market timeseries. This fact, combined with some ambiguities we found in the literature of the method applications, has implications for both previous and future real-world applications using the extreme value theory approach, as, for example, the argument for small effective dimensionality in the data cannot come from the method itself. All algorithms discussed are implemented as performant and easy to use open source code via the DynamicalSystems.jl library.

When chaotic dynamical systems evolve in time, they typically create sets in the state space that have fractal properties. One of the major ways to characterize these chaotic sets is through a computationally feasible version of a fractal dimension (FD). In the field of nonlinear dynamics, the correlation sum and the generalized (Rényi) entropy are the two most commonly used approaches. One attempts to find a scaling exponent of these quantities vs a size parameter, and this exponent approximates the fractal dimension. A third method based on extreme value theory is a promising alternative, but it has been developed only recently and, hence, has not undergone the same amount of scrutiny as the previous two methods. Here, we provide a comprehensive, up to date, and self-contained analysis of available methods, comparing across every conceivable scenario. We also provide open source implementations to compute each method.

Fractal geometry deals with geometric objects (or sets) called fractals,1,2 which are “irregular” in terms of traditional Euclidean geometry. Their most striking property arguably is that they possess structure at all scales, which typically remains invariant in one form or another, no matter how much one zooms into the set. Because of this, the traditional topological dimension is not fit to describe these sets adequately (Chap. 5 of Ref. 3). The concept of a fractal dimension, a generally non-integer number, is, therefore, employed to characterize such objects.2,4 The fractal dimension, which will be shortened to FD in the rest of the article, can be used to quantify the complexity of the geometry, its scaling properties and self-similarity, and the effective ratio of surface areas to volumes.2 It has been applied in a vast array of different scenarios, from the archetypal measurement of coastlines5,6 to being suggested as a tool for validating abstract art, e.g., that of Pollock.7,8

The evolution of chaotic dynamical systems results in sets in the state space that are typically fractal4,9 and, thus, can be characterized by a FD,10–16 done to our knowledge for the first time by Russell et al.17 For dissipative systems, these sets are called strange or chaotic attractors17,18 (boundaries of basins of attraction and chaotic saddles can also be fractal19,20). For conservative systems, they often have special properties and are (typically) called fat fractals.21 For fractal sets resulting from evolving dynamical systems, an estimate of FD provides the additional crucial information of the effective degrees of freedom of the time evolution. This means that calculating the FD for an experimentally obtained dataset can be used to guide the modeling process,3,22,23 as the ceiling of the FD is the minimum number of independent variables that can model the system. Alternatively, if one has a model that confidently approximates the real system, due to physical arguments, then the FD of the model output can be used to tune model parameters: since FD is a dynamic invariant,3,22 one can compare FDs of the model simulations to the FD obtained by observed data and change parameters until those two values match.

These reasons have motivated many researchers to compute the FD of many real-world systems by delay embedding measured timeseries.22 Examples include global climate,24–26 physiology,27 and lasers,28 but many more exist. Unfortunately, some of these studies have been challenged, because calculating the fractal dimension in practice is a difficult, error prone process, with limited means of providing confidence to the obtained result. This is, for example, highlighted well in the controversy of estimating FDs of “climate attractors” (see, e.g., Ref. 26, and references therein), where FD estimates were incorrectly used to claim very low FD of 3 for the whole climate system. Researchers, thus, often need to interpret results partly subjectively, due to the lack of objective measures.

In this paper, we compare as many computationally feasible estimators of FDs as possible, in as objective manner as possible, and across as many scenarios as possible. With this, we provide an objective baseline with which researchers can check against their results, reducing the amount of interpretation and subjectivity. It is important to stress the separation between computing the FD of “the deterministic dynamics,” i.e., the dynamic invariant characterizing the flow in the state space, and the FD of “the graph of a timeseries f ( t ) vs t.” The latter is relevant for stochastic perspectives, is often related with the Hurst exponent,29,30 and is typically used as a quantifier of timeseries more suitable for classification tasks vs typical statistics-based quantifiers (see, e.g., Refs. 31 and 32). These two FD versions are very different and unrelated in the general sense. Our paper focuses exclusively on the first version.

Hence, our main operating assumption is that we have a multivariate (sometimes also called multidimensional) dataset, obtained from a dynamical system from which we do not know the dynamic rule (equations of motion). If one has a timeseries, it first must be reconstructed into a state space set via delay embedding or other means.3,22,23 Our operating assumption is on the one hand motivated by the increase of interest in observed or measured multivariate data, and higher accessibility of sensors and experimental data in preparation for a nonlinear-dynamics-based analysis.33 On the other hand, there is also noticeable recent progress regarding better attractor reconstruction techniques based on delay embeddings, which can also utilize multivariate measurements, yielding a higher quality reconstruction overall (see, e.g., Refs. 34 and 35 by Kraemer et al., and references therein). We point out that our goal here is estimating FDs of chaotic sets knowing that these exist, in order to separate the problem of the FD estimation from the scientific questions surrounding the interpretation of the data. The latter requires following best practices, e.g., various data pre- and post- processing, de-noising, or surrogate tests comparing the FD value of the real data with those of surrogates. These steps are entirely skipped here, and the reader can find more information in standard timeseries analysis textbooks or review articles, such as Refs. 23 and 28.

Since the first review on the topic of FD by Theiler in 1990,16 computers and software have improved and several new algorithms to estimate fractal dimensions have been proposed. It is, thus, timely to revisit the subject, and, here, we will provide a comparison and evaluation of FD estimators across a range of topics much larger than what has been done so far. Beyond comparing and evaluating various fractal dimension estimators, we provide optimized, easy-to-use, extensively tested, open source implementations for all algorithms discussed in this review in the DynamicalSystems.jl software library.36 

This review and comparison paper is structured as follows (and see also Fig. 1 for a summary of the paper). In Sec. II, we provide a self-contained concise definition of the major methods used to compute FDs and how they relate with the natural density and with each other. Connected with this section is  Appendix A, which presents all (computationally feasible) algorithms we have found for estimating a fractal dimension. The core of the paper are Secs. III and IV, which compare in detail the three best and most popular estimators of FDs: scaling of entropy, scaling of correlation sum, and an extreme value theory approach. We compare across data dimensionality, data length, different kinds of dynamical systems, noise levels, real-world data, various embeddings of timeseries, and the order q of the fractal dimension, among others. We close the paper with a summary of our findings in Sec. VI. Every result, plot, and method that we present in this paper is fully reproducible via open source code and adjustable to other input datasets (see  Appendix B for details).

FIG. 1.

Summary and overview of this paper. Section II A discusses top-left panel. Sections II BII D discuss bottom left panel. Sections III and IV discuss top right panel and lead to the results summarized in the table of the bottom right panel (which are also stated in Sec. VI).

FIG. 1.

Summary and overview of this paper. Section II A discusses top-left panel. Sections II BII D discuss bottom left panel. Sections III and IV discuss top right panel and lead to the results summarized in the table of the bottom right panel (which are also stated in Sec. VI).

Close modal

An easy to understand introduction and review regarding the fractal dimension was given by Theiler in 1990.16 The theoretical background of the fractal dimension and the methods (known until 1990) to compute it are summarized, and a plethora of historic references is given there. A more recent publication on fractal dimensions in a style of a review is given by Lopes and Betrouni in 2009.37 However, it is focused on image analysis and pattern recognition and does not do any quantitative comparison. The most recent detailed source that provides quantitative information on the limitations and pitfalls of fractal dimension estimates is the well known textbook by Kantz and Schreiber.22 

A number of ways to estimate a FD, which are applicable to dynamical systems, have been devised in the literature. For this comparison, we found, implemented, and compared the methods that we briefly summarize in Table I. From these, our main analysis focuses on the three most prominent ones. In the rest of this section, we will provide a concise, yet self-contained, summary of the concept of a fractal dimension in dynamical systems. Then, we introduce the three main estimators that we use in the extensive comparisons done in Secs. III and IV and illustrate how the different estimators connect to the natural density and with each other. In the following, we assume that we have a set X that contains N D-dimensional points representing a (possibly observed) multivariate timeseries of a dynamical system. We will use the letter Δ to denote various versions of a fractal dimension, and we will use a superscript in parenthesis, such as Δ ( S ) to denote the particular estimator used to estimate Δ.

TABLE I.

Description of various estimators for fractal dimension considered in this paper (see also  Appendix A).

Estimator Brief description Main Refs.
Natural measure entropy  Scaling of generalized entropy Hq of amplitude binning, vs box size  17  
Molteno’s histogram optimization  Optimized algorithm for amplitude binning with restricted size ε  38  
Correlation sum  Scaling of correlation sum Cq vs radius ε  10 and 39  
Box/prism-assisted correlation sum  Optimized algorithm to calculate the correlation sum  40  
Performance-optimized box size  Optimized for performance box-assisted algorithm  41  
Logarithmic correction  Better converging fit instead of the standard least square fit of log (C2) vs log ( ε )  42  
Takens’ estimator  Maximum likelihood estimation of scaling exponent, C 2 ( ε ) ε Δ for ε ( 0 , ε max ]  13 and 43  
Judd’s estimator  Binned MLE with additional degrees of freedom from polynomial  44 and 45  
Mean return times  Logarithmic scaling of mean return time to an ε-sphere, vs ε  46 and 16  
Lyapunov dimension  Kaplan and Yorke’s linear interpolation for sum of Lyapunov exponents = 0  47  
Lyapunov dimension via fits  Higher-order interpolations and other fits for sum of Lyapunov exponents = 0  48  
Extreme value theory based  Rare events follow a Pareto distribution whose parameter is the fractal dim.  49  
Persistent homology  Quantify how the topology of shape changes as it is thickened  50  
Estimator Brief description Main Refs.
Natural measure entropy  Scaling of generalized entropy Hq of amplitude binning, vs box size  17  
Molteno’s histogram optimization  Optimized algorithm for amplitude binning with restricted size ε  38  
Correlation sum  Scaling of correlation sum Cq vs radius ε  10 and 39  
Box/prism-assisted correlation sum  Optimized algorithm to calculate the correlation sum  40  
Performance-optimized box size  Optimized for performance box-assisted algorithm  41  
Logarithmic correction  Better converging fit instead of the standard least square fit of log (C2) vs log ( ε )  42  
Takens’ estimator  Maximum likelihood estimation of scaling exponent, C 2 ( ε ) ε Δ for ε ( 0 , ε max ]  13 and 43  
Judd’s estimator  Binned MLE with additional degrees of freedom from polynomial  44 and 45  
Mean return times  Logarithmic scaling of mean return time to an ε-sphere, vs ε  46 and 16  
Lyapunov dimension  Kaplan and Yorke’s linear interpolation for sum of Lyapunov exponents = 0  47  
Lyapunov dimension via fits  Higher-order interpolations and other fits for sum of Lyapunov exponents = 0  48  
Extreme value theory based  Rare events follow a Pareto distribution whose parameter is the fractal dim.  49  
Persistent homology  Quantify how the topology of shape changes as it is thickened  50  

In the introduction, we discussed how the FD is useful for practical matters and, hence, worthy of being estimated. However, before discussing any estimators for a FD, it is useful to conceptualize what the FD truly characterizes, as this is the origin of all practical estimations.

For this discussion, we ignore the presence of noise existing in real data, and the fact that observed timeseries need to be delay embedded to yield higher dimensional data. The starting assumption, therefore, is that the set X = { x 1 , , x N } we have at hand is a faithful sampling of some sort of D-dimensional invariant set (typically a chaotic attractor) of a dynamical system, i.e., X A. A is itself characterized by its natural, or invariant, measure μ ( x ), which defines a probability space on top of A by requiring μ ( A ) = 1. Equivalently, we may use the natural density ρ ( x ) d x = d μ, which in practical terms is the D-dimensional histogram of X. To learn more on the natural measure, one should consult various textbooks on nonlinear dynamics3,18,21 or the review by Theiler.16 In essence, while the samples x i X form a time sequence of points with deterministic origin, they can be also be thought of as points sampled “at random” from A according to the measure μ. Note that throughout this FD description, we make the fundamental assumption that A (with measure μ) is ergodic.

A FD is a number characterizing the scaling of μ with the “scale” one looks at μ at. Let B ( x , ε ) be a D-dimensional sphere51 centered at x with radius ε. For sufficiently small ε, and for almost all x i A, it is assumed that the scaling of μ with ε is exponential, so that
(1)
Here, Δ i is labeled the local dimension at x i, and by construction Δ i D. The assumption in Eq. (1) is a reflection of our intuitive notion of dimension, which describes that the “bulk” or the “amount” of a set scales with a linear size (scale) exponentiated to the dimension, e.g., the amount of a cube scales with its side length to the third power. Here, we measure “amount” of a subset of A by its natural measure, i.e., its relative probability mass. From Δ i a fractal dimension characterizing A as a whole can be obtained as the mean,
(2)

This intuition-based definition of a local and attractor dimension is further motivated by Theiler in his review article on the basis of the Hausdorff dimension.16 Equation (1), however, is noncomputable because μ is unknown, since in practice, we have only partial knowledge of μ due to the finite observations of X A. In addition, for the overwhelming majority of dynamical systems, μ does not have an analytic expression anyway, irrespectively of finite data. An estimator of a FD, therefore, attempts to measure either the local scaling of Eq. (1) or the global scaling by averaging in Eq. (2).

Before going into specific estimators, we need to stress that none of the estimators we considered in this review yield the Hausdorff dimension.2,52 Often in the literature, researchers use the term “Hausdorff dimension” to refer to the output of the estimators, but no one has provided any formal proof of the equivalence between the estimates and the Hausdorff dimension (which itself has a very precise and rigorous mathematical definition). This is especially so for the box-counting dimension, where it is easy to prove that it cannot be the same as the Hausdorff dimension.53 

The first way to define and compute Δ is based on an approximation of the natural density ρ of the set. By discretizing the state space in boxes of size ε one can assign a probability p i , i = 1 , , M to each box, which is simply the count of points in the box divided by the total amount of points N. These probabilities approximate ρ, and from there, the Rényi (also called generalized) entropy can be obtained as54 
(3)
Here, q is the order of the entropy, and it allows putting more ( q < 1) or less ( q > 1) weight to boxes with relatively smaller visitation frequency by the trajectory (and, hence, smaller p i). For q = 1, H q reduces to the known Shannon entropy while for q = 0, it becomes log ( M ) with M the minimum number of non-overlapping boxes needed to cover the set X. More than one way exists for estimating the probabilities p i (see  Appendix A). In the main comparison of Sec. III, we will use the algorithm Sec. 1 of  Appendix A, which works for any values of D , N , ε while having performance scaling of D N log ( N ).

To connect H q with μ, we re-write i p i q = i p i p i ( q 1 ). This is a weighted average, equaling to p ( q 1 ) , since p i = 1. We note that p is our notion of “amount,” as we measure the amount by the probability mass. Regularizing the expression by its exponent, the quantity p ( q 1 ) 1 / ( q 1 ) exp ( H q ) is the “average bulk” or “average amount” in a hypercube of linear size (i.e., scale) ε. The number q settles the way we average: for q = 2, we have the arithmetic (typical) average; for q = 3, a root mean square; and for q 1, a geometric average.

By following the same intuition that led to Eq. (1) (that “amount” scale ^ exponent), the so-called generalized dimension of order q,55,56 is defined as
(4)
This definition was used, to our knowledge, for the first time for q = 1 by Russell et al.17  Δ 0 ( H ) is called the box counting or capacity dimension, while Δ 1 ( H ) is called the information dimension. The box-counting dimension can also be thought as the fractal dimension of the support of the attractor, as it disregards the values of μ.

In Eq. (4), both limits are theoretical and cannot be realized in practice. As a result, Δ q ( H ) is estimated by plotting H q vs log ( ε ) and estimating the slope of a linear scaling region (for sufficiently large N, more on this in Sec. III I).

If the fractal dimension depends on q, the set is called multi-fractal. This is the case when the natural measure underlying the attractor is strongly non-uniform. Large positive values of q then put more emphasis on regions of the attractor with a higher natural measure. In dynamical systems theory, almost all chaotic sets are multi-fractal.14 The dependence of Δ q ( H ) on q is connected with another concept, the so-called singularity spectrum, or multifractality spectrum f ( α ); however, we will not be calculating f ( α ) here and refer to other sources for more (see, e.g., Chap. 9 of Ref. 18 by Ott).

The second way of defining and estimating a fractal dimension uses the correlation sum.10 The correlation sum is an alternative way of estimating the average amount at a given scale, based on the points nearby a given point. Specifically, we associate an amount S at point x i X and at scale ε as
(5)
with | | | | some distance norm (here always the Euclidean, but different norms have little practical impact and theoretically exactly no impact) and B ( ) = 1 if its argument is true and 0 otherwise, i.e., we count the ε-close neighbors of x i in X. The exponential scaling of S ( x i , ε ) vs ε is called the pointwise (local) dimension.57 The average “amount” S over X is called the correlation sum, given by C ( ε ) = i S ( x i , ε ) / N.
Using the same reasoning as in Sec. II B, this average amount is expected to scale with the linear size ε exponentiated to the FD. Again, similarly with Sec. II, we do not have to limit ourselves to the typical arithmetic average when defining the correlation sum, but can introduce an order q that adjusts how we are averaging the “amount.” This leads to the definition of the q-order correlation sum,11,22,39
(6)
Here, we also added w 0, the correction by Theiler58 (known as Theiler window), which excludes as neighbors points that are temporally close. This removes spurious correlations due to dense sampling of continuous dynamical systems. We choose w as follows: for each timeseries present in the multi-dimensional input dataset X, we calculate the first minimum of its self-mutual information.3 The maximum of the time shifts corresponding to these minima is chosen as w.

Originally, the version explicitly having q = 2 was used to define the correlation dimension,10 and the process of defining a FD was, in fact, the same as in H q for any q. A linear scaling region is estimated from the curve of log ( C q ) vs log ( ε ). Then, the correlation-sum-based FD Δ q ( C ) is the slope of that linear region.

We may leverage two potential improvements here. First, to calculate C q ( ε ), we used a box-assisted method.40,41 We modify this method as discussed in Sec. 4 of  Appendix A, because otherwise it fails for data with even a small amount of noise (see discussions in Sec. III F and Sec. 4 of  Appendix A). Second, in addition to the standard least squares fit log ( C q ) a + Δ q ( C ) log ( ε ), we also used the correction by Sprott and Rowlands42 when possible (i.e., when at least half the range has log ( ε ) < 0). Reference 42 optimizes the fit log ( C 2 ) a + Δ 2 ( C ) log ( ε ) + b log ( log ( ε ) ) ( a , b Δ 2 ( C ) ) are parameters to be optimized in parallel with Δ 2 ( C ) ). It is intended to give better fits for sets that have a slowly converging fractal dimension estimate, but as we will show later, it is best to not use it in practice.

In the rest of the manuscript, we will use H q or C q to refer to the methods of estimating FD via the (generalized) entropy or (generalized) correlation sum. We will explicitly use a subscript H 2 , C 2 when we make statements that apply only to this particular order q = 2.

The third major way of defining and estimating a FD from a set X is based on extreme value theory (EVT) applied to dynamical systems.49 The method estimates a local dimension Δ i ( E ), as in Eq. (1), and then provides the FD as the average. Interestingly, the method utilizes exactly the same information as the correlation sum: all inter-point distances. However, Δ i ( E ) is not estimated directly via an exponential scaling relationship in contrast to the generalized entropy and correlation sum methods.

To the best of our knowledge, the method has been developed using progress across several papers59–64 (see also Chaps. 4 and 9 of Ref. 49). Even though relatively recent, this method has been applied already to a plethora of real-world cases (see, e.g., Refs. 65–74) and many more (see Ref. 75 for a summary of recent applications). Despite this plethora of applications, we have noticed that some applications have ambiguities with the basic theory that connects EVT with FD, and we discuss these issues in Sec. 10 of  Appendix A.

First, let us summarize the computational algorithm to estimate a FD via EVT and then discuss how it connects to the natural measure μ. Let g ( ε ) = log ( ε ) be a function of a distance (or radius) in state space. For the ith point in X, we estimate
(7)
with | | | | the Euclidean distance. Note that g i is a real-valued vector of length N 1. Next, we choose an “extreme” probability p for a quantile of the distribution of g i (e.g., p = 0.99). We found no reference that clarified what “extreme” means in a mathematically precise way, but we discuss in high detail how the choice of p impacts the results in Sec. IV B.
In any case, we then compute g p as the p quantile of g i. Then, we collect the exceedances of g i, defined as
(8)
i.e., all values of g i larger or equal to g p, also shifted by g p. E i is also a real-valued vector with n = N ( 1 p ) values in total. Now, according to extreme value theory,62,76 in the limit N , p 1, and for the particularly chosen form of the function g, the values E i follow a generalized Pareto distribution (GPD) with parameters σ , ξ and shift parameter 0 (see Sec. 10 of  Appendix A for more on GPD). However, if the measure μ and attractor A satisfy the criterion of Eq. (1), then the GPD is reduced to an exponential distribution (EXPD) with parameter σ, E exp ( E / σ ). Within this extreme value theory approach, the local dimension Δ i ( E ) assigned to state space point x i is given by the inverse of the σ i parameter of the EXPD fit to the exceedances, i.e.,
(9)
In the above expression, we explicitly used the maximum likelihood estimator for fitting the σ parameter, which is simply the sample mean for an exponential distribution. Additionally, the above expression places one additional assumption on the local scaling of the natural measure μ, see discussion below around Eq. (13). The FD for A follows as the arithmetic mean,
(10)
which means that the EVT FD corresponds to a FD of order q = 2.

Let us now discuss how this theory connects to Eq. (1) and, hence, relates Δ i to the σ i parameter. Due to the invertibility of the g function, the exceedances E i correspond to when the orbit of the dynamical system comes closer than ε to the reference point x 0, with ε = exp ( g p ). This clarifies why we focus on extremes of g i: we are interested in what happens around a small radius ε around a reference point x 0, in order to connect with the fundamental definition of the local dimension of Eq. (1).

Now, let us discuss the probability π i ( E ) that given that an exceedance has occurred (i.e., a state x j is at least ε -close to x), there is an exceedance of E. This probability is by construction given by 1 minus the cumulative distribution function of the fitted EXPD, i.e., π i ( E ) = exp ( E / σ i ). However, the same probability can be constructed in terms of the natural measure μ to be
(11)
with
(12)
We can relate Δ i with σ i if we place one more assumption on μ. We assume that locally around x i,
(13)
where f i ( ε ) is a slowly varying function of ε as ε 0, which may or may not depend on reference point x i. With this assumption, both numerator and denominator of Eq. (11) scale exponentially with Δ i and the remaining factors cancel out even though ε ε . By taking into account also the expressions of Eq. (12), and that π i ( E ) = exp ( E / σ i ), we put everything together and have
(14)
from which Δ i = 1 / σ i.
Last, a FD estimate proposed by Kaplan and Yorke47 is based on the Lyapunov exponents characterizing an attractor. Let { λ i } denote the Lyapunov spectrum, with λ 1 λ 2 λ D. Then, the Lyapunov dimension is defined as
(15)
In simple terms, it is the (linearly interpolated) index value where the sum of the Lyapunov exponents of the set first crosses zero. Table II provides estimates for the systems we use in this paper. It is conjectured that Δ ( L ) Δ 1 ( H ) (see also Ref. 77) and generally one does find similar values in practice; however, there is no formal proof yet. An extension to Eq. (15) has been proposed in Ref. 48 that is not included in the main comparison but discussed in  Appendix A. Keep in mind that Eq. (15) is defined only for dissipative systems, where λ i < 0. Applying it to conservative systems does not make much sense, e.g., Hamiltonian systems satisfy λ i = λ D i + 1 and obtain Δ ( L ) = D always, even though the motion might happen in a lower-dimensional manifold. For the 4D Hénon–Heiles system (Fig. 3), energy conservation limits the dynamics in a 3D manifold, and, thus, its FD cannot be greater than 3.
TABLE II.

Lyapunov (Kaplan–Yorke) dimensions for systems with known dynamic rule (listed in Table III).

System D Δ(L)
Lorenz96  2.99 
Lorenz96  4.93 
Lorenz96  6.91 
Lorenz96  10  8.59 
Lorenz96  12  10.35 
Lorenz96  14  12.10 
Lorenz96  32  27.68 
Rössler (chaotic)  1.9 
Hénon map  1.26 
Kaplan–Yorke map  1.43 
Towel map  2.24 
Coupled logistic maps 
Kuramoto–Sivashinsky  101  31.76 
System D Δ(L)
Lorenz96  2.99 
Lorenz96  4.93 
Lorenz96  6.91 
Lorenz96  10  8.59 
Lorenz96  12  10.35 
Lorenz96  14  12.10 
Lorenz96  32  27.68 
Rössler (chaotic)  1.9 
Hénon map  1.26 
Kaplan–Yorke map  1.43 
Towel map  2.24 
Coupled logistic maps 
Kuramoto–Sivashinsky  101  31.76 

Δ ( L ) has a huge advantage when compared to the previous definitions of fractal dimensions: it can be computed with very high precision, even for high-dimensional systems (where the other methods typically suffer from accuracy, as we will show below). However, it also has a huge disadvantage: practically, it can be computed only if the dynamic rule (equations of motion) is known. Only using the dynamical rule and its linearization one can estimate the entire Lyapunov spectrum with satisfactory precision, for example, by means of the known algorithm due Shimada and Nagashima78 and Benettin et al.79 From a finite, and often noisy real-world dataset, calculating the entire spectrum of exponents is a very challenging task that requires for higher-dimensional attractors very large data sets.80 Therefore, one is in most cases better off calculating a fractal dimension directly from data, or instead fit an explicit model to the data, e.g., Ref. 81.

Unfortunately, it is not straightforward to connect Δ ( L ) to Eq. (1). One may find some intuition in Sec. 11.5.2 of Ref. 22, but this discussion relies on expansion and contraction rates in state space and, hence, falls outside the scope of this review.

In this section, we perform a quantitatively rigorous and exhaustive comparison of the methods based on entropy H q and correlation sum C q. Even though fundamentally different, both rely on estimating the scaling of some quantity vs some size ε. For a given q, they both (in theory) approximate the same quantity, the exponential scaling of the q-average of “amount” of measure vs the scale, as we illustrated in Sec. II. To compute H q, we use the method Sec. 1 of  Appendix A, and for C q the method Sec. 4 of  Appendix A for most cases, and the straightforward implementation Sec. 3 of  Appendix A for very high-dimensional data. For C 2, we also tested the logarithmic correction of Ref. 42. The motivation of choosing these methods, and why an exhaustive comparison of other methods is not presented, is explained in  Appendix A. Before any numerical analysis, we normalize input data X so that each of its columns is transformed to have 0 mean and standard deviation of 1. This linear transformation leaves dynamic invariants (like the FD) unaffected, however, makes all numerical methods more accurate and faster to converge.

In Secs. III AIII I, all results will be presented with the same plot type, as, e.g., shown in Fig. 2. The legend shows the different datasets used in the plot and a description of the plot’s purpose. The top panel is the entropy estimate while the bottom is the correlation sum. To estimate Δ q ( H ) , Δ q ( C ), for each curve, we identify automatically, and objectively, a linear scaling region as discussed in Sec. III I. This region is denoted by markers of the same color on each curve. The secondary legends inside the panels provide the 5%–95% confidence intervals for the estimated slopes of these segments. Unless otherwise stated, all datasets used have length N = 10 5, which is for many experiments a typical upper bound for the amount of data one has access to. Continuous time systems are sampled with approximately 10 points per characteristic oscillation period.

FIG. 2.

Fractal dimension estimates of Δ 2 ( H ) and Δ 2 ( C ) for sets with analytically known fractal dimension. The figure title maps colors to set names. The legends within each axis map colors to two numbers which mean the (5%, 95%) confidence intervals corresponding to the estimation of the fractal dimension (curve slope). The markers on the curves denote the start and end of the estimated linear region.

FIG. 2.

Fractal dimension estimates of Δ 2 ( H ) and Δ 2 ( C ) for sets with analytically known fractal dimension. The figure title maps colors to set names. The legends within each axis map colors to two numbers which mean the (5%, 95%) confidence intervals corresponding to the estimation of the fractal dimension (curve slope). The markers on the curves denote the start and end of the estimated linear region.

Close modal

The best place to start is a simple sanity and accuracy test of the two main estimators for sets whose fractal dimension can be computed analytically in a straightforward manner. This is shown in Fig. 2. We use a periodic orbit from the Rössler system ( Δ = 1), a quasiperiodic orbit of order 2 from the Hénon–Heiles system ( Δ = 2), the Koch snowflake ( Δ = log ( 4 ) / log ( 3 ) 1.262), the Kaplan–Yorke map ( Δ = 1 log ( 2 ) / log ( 0.2 ) 1.4306), a uniform filling of the 3D sphere ( Δ = 3) and a chaotic trajectory of the Standard Map (SM) for very high k = 64, which covers uniformly the state space and, thus, has Δ = 2.

Some results that will be repeatedly seen throughout this section become clear. For noiseless data, the correlation sum method is clearly better for three reasons. First, its linear scaling region covers a wider range of scales, while H 2 saturates much more quickly to flatness for small ε. Notice that log ( C 2 ) cannot saturate for small ε, but instead diverges to , but we can never reach this point due to the process that chooses the appropriate overall range of ε (Sec. III I). Both curves would in principle saturate to flatness for very large ε, specifically exceeding the total size of X, but we again do not reach this threshold based on the choice of the range of ε.

Second, within the linear scaling region, the curves of log ( C 2 ) fluctuate less than those of H 2, resulting in a narrower confidence interval. Comparing confidence intervals is only meaningful if the same method is used to extract them. That is why specifically for Fig. 2, we used the standard least squares fit for log ( C 2 ) instead of the aforementioned correction of Ref. 42. Third, the actual numbers we obtain for Δ from the correlation sum method are closer to the analytically expected values than those of the entropy-based method. In particular, for sets that should have an integer fractal dimension, the correlation sum method is much closer to the actual result.

In the following, we cross-compare with the value obtained from the Lyapunov (Kaplan–Yorke) Δ ( L ) dimension, which for all systems of interest used in this paper is found in Table II. We note that Δ ( L ) is conjectured to equate to the q = 1 version of dimensions; however, here we use q = 2. That is, because the correlation sum algorithm does not apply to q = 1, unless one uses a fixed mass approach, but we explain in Sec. 5 of  Appendix A why we do not. We assume that the differences between the two estimators, when compared to Δ ( L ), should not depend much when changing q = 1 to q = 2; hence, the following results remain valid.

In Fig. 3, we compare three discrete and three continuous dynamical systems of different input dimensionality (systems are defined in detail in  Appendix C). We confirm the result of Sec. II, i.e., the correlation sum method is more accurate than the entropy one because it is much closer to the values expected from the Lyapunov dimension (Kaplan–Yorke conjecture). Furthermore, the entropy method seems to underestimate the fractal dimension more strongly as the state space dimensionality increases. This is expected given the fact that the entropy method works via a histogram approximation of the natural density, and it is well known that the higher dimensional the data, the less accurate producing a histogram for them becomes (i.e., space is covered more sparsely by N points as D increases).

FIG. 3.

Fractal dimension estimates Δ 2 ( H ) and Δ 2 ( C ) for sets coming from different dynamical systems with known dynamical rule. CLM stands for Coupled Logistic Maps. We are not sure why H 2 of CLM yields a jagged curve instead of a clear linear slope, but the focus of this paper is not individual systems, so we leave this for further study.

FIG. 3.

Fractal dimension estimates Δ 2 ( H ) and Δ 2 ( C ) for sets coming from different dynamical systems with known dynamical rule. CLM stands for Coupled Logistic Maps. We are not sure why H 2 of CLM yields a jagged curve instead of a clear linear slope, but the focus of this paper is not individual systems, so we leave this for further study.

Close modal

This figure also allows us to indeed confirm that the logarithmic correction to the correlation sum by Sprott and Rowlands42 can be impactful, especially for high-dimensional sets where the convergence is the slowest. For example, the standard linear regression fit would give confidence intervals ( 6.44 , 6.6 ) for the eight-dimensional Lorenz96 model and ( 5.78 , 6 ) for the eight coupled logistic maps. These values are closer to the entropy-based estimates but further away from the Δ ( L ) estimates of Table II.

Keeping the dimensionality constant but varying data length leads us to an interesting observation in Fig. 4. The entropy-based method performs very poorly for small N and clearly should never be used with small datasets. In contrast, the correlation dimension is much more robust and yields already useful lower bounds for Δ 2 ( C ) for very short timeseries like N = 500. Eckmann and Ruelle82 discussed the data requirements for estimating the correlation dimension and pointed out that the minimal number N min of data points needed to estimate a dimension Δ 2 ( C ) quickly increases like log ( N min ) Δ 2 ( C ). Assuming linear scaling for ε = ρ E, where E denotes the diameter of the set (largest pointwise distance) and ρ is a small number like 0.1, Eckmann and Ruelle state that for N data points the estimation of Δ 2 ( C ) will not provide values larger than
(16)
For ρ = 0.1, the upper limits for N = 500, 1000, 10 000, 100 000 are Δ 2 max ( C ) ( N ) = 5.4, 6, 8, 10, respectively. In this context, other choices of ρ may also be justified and will decrease (for ρ < 0.1) or increase (for ρ > 0.1) the upper limit Δ 2 max ( C ) ( N ). The method for detecting linear scaling regions used in this study, for example, starts searching at ε-values corresponding to ρ 1 / e 0.368 (see Sec. III I). Therefore, depending on the data, in some cases, linear scaling may occur for ρ > 0.1 resulting in larger upper limits Δ 2 max ( C ) ( N ).
FIG. 4.

Dependence of estimators on data length N. The plot in the top panel shows results for data from an eight-dimensional Lorenz96 system and the bottom plot for the experimental dataset “electroch. 1” from Fig. 8. The curves have been vertically offset for visual clarity. For these plots, we have used the automatic ε range estimation described in Sec. III I only for the largest datasets and use the same range for the rest. As a result log ( C 2 ) cuts off to as N decreases and curves are truncated correspondingly.

FIG. 4.

Dependence of estimators on data length N. The plot in the top panel shows results for data from an eight-dimensional Lorenz96 system and the bottom plot for the experimental dataset “electroch. 1” from Fig. 8. The curves have been vertically offset for visual clarity. For these plots, we have used the automatic ε range estimation described in Sec. III I only for the largest datasets and use the same range for the rest. As a result log ( C 2 ) cuts off to as N decreases and curves are truncated correspondingly.

Close modal

The results for the 8D Lorenz96 model shown in Fig. 4 are essentially consistent with the estimates of Eckmann and Ruelle. For N = 500 the ( 5 % , 95 % ) confidence interval of the slope is ( 5.57 , 5.94 ), i.e., slightly larger than Δ 2 max ( C ) ( N ) = 5.4 assuming ρ = 0.1 ( Δ 2 max ( C ) ( N ) = 5.7 )a value of ρ = 0.113, for example, would provide Δ 2 max ( C ) ( N ) = 5.7 ). With increasing N, the estimated slopes converge toward the value Δ ( L ) = 6.91 of the Kaplan–Yorke dimension of the 8D Lorenz96 system, a value that is already included in the interval ( 5.51 , 7.46 ) obtained with N = 10 000, a number of points large enough to estimate correlation dimensions up to Δ 2 max ( C ) ( N ) = 8 according to the Eckmann–Ruelle limit.

The bottom panel of Fig. 4 shows log–log plots of C 2 for real-world experimental data (see Sec. III G for a description). The dimension estimates Δ 2 ( C ) for different lengths N are all about 3 and only the corresponding confidence intervals shrink for increasing N. This is also in agreement with the Eckmann–Ruelle bound, because dimensions Δ 2 ( C ) 3 can, in principle, be achieved with N = 500 or larger.

On the other hand, Tsonis et al.,26 using the results of Nerenberg and Essex,83 argue that the minimum number of points N min to estimate a dimension x with 95% confidence is scaling like N min 10 2 + 0.4 x. In our example, assuming true value x 7, it would require at least N min = 63 095 points, which is too high compared to what we can estimate from Fig. 4. However, the estimate presented by Tsonis et al. is surrounded by its fair share of ambiguity, because it involves deciding a priori a scaling region extent, and it does not say whether the dimension x in the expression should be the embedding dimension or the actual fractal dimension (that is unknown). We will discuss this topic again in Sec. III H.

If only small data sets are available, finite sample corrections derived by Grassberger84 could be used to improved estimates of H q or C q.

Here, we examine how well the estimators capture multi-fractal properties, i.e., the dependence of Δ q on q, or the absence of multi-fractality, i.e., results that should be invariant to q. This dependence on q was discussed in the review by Theiler,16 but we think an even better reference is in the book by Rosenberg.85 A known theoretical result is that Δ q is a non-increasing function of q, Δ q 2 Δ q 1 for q 2 > q 1 (see, e.g., Ref. 11).

Here, we will use two examples. The first is the Koch snowflake, which has an (approximately) uniform density and, thus, its FD should have no dependence on q whatsoever. The second is the Hénon map, which has a strongly non-uniform natural measure, giving the expectation of a clear decrease of Δ q with increasing q. The results are shown in Fig. 5.

FIG. 5.

Impact of order q on the fractal dimension. We used the standard least squares fit for log ( C q ) as Ref. 42 does not discuss q 2. The curves have been vertically offset for visual clarity.

FIG. 5.

Impact of order q on the fractal dimension. We used the standard least squares fit for log ( C q ) as Ref. 42 does not discuss q 2. The curves have been vertically offset for visual clarity.

Close modal

Both the entropy and correlation sum approaches perfectly capture the absence of multi-fractality, giving identical curves for all q for the Koch snowflake. For the Hénon map estimates, both methods satisfy the criterion of a decreasing Δ q with q. But there is a problem. For the correlation sum method and for q 2, the function log ( C q ) vs log ( ε ) is no longer a straight line but is composed of two linear regions with significantly different slope, making the results ambiguous. It is also not obvious why the slopes change at the given ε value, or why the slopes below that indicate a significantly lower dimension value. The slopes change at log ε 10 and for q = 3 the left slope becomes 0.58 while for q = 4, the left slope becomes 0.39 (right slopes are shown in figure legend). We observed this behavior of log ( C q ) having two slopes in practically all example sets with a non-uniform measure.

We could not find anything in the literature about this observation, and, in fact, we could not find a single figure in the literature plotting C q vs ε for q 2, even though the correlation sum for q 2 is provided in several publications,11,22,39 which report a value for Δ q 2 ( C ) (but it is unclear whether they have encountered the same problem as we have or not). We have extensively tested our code and we are confident that the implementation of Eq. (6) is correct.

For multi-fractal analysis, we are (typically) interested in quantifying the most fine properties of the fractal. Perhaps then one should determine the slope of the linear region at the smallest ε values, instead of the slope of the linear region covering the largest range of ε (but for very small ε, the statistics becomes worse for finite data sets). However, the slopes of the smallest ε are clearly incorrect; the correct slopes are the ones of the largest ε values (those also highlighted in Fig. 5). In any case, using the slope of the largest ε values gives the correct results, but this strong dependence of slope with ε when q 2 is worrisome and indicates that more clarity regarding C q 2 must be established in the literature.

This section examines the impact of varying state space dimensionality D of input data, which is common case when delay-embedding timeseries22,86,87 (because there one increases D and searches for convergence of Δ). In principle, provided the condition d > Δ 0 ( H ) is met, with d N the embedding dimension, then the reconstructed set has the same fractal dimension as the original set the timeseries was recorded from Ref. 88. In Fig. 6, we check how the methods fare with this statement, and whether their accuracy decreases with increasing input dimensionality, i.e., embedding dimension.

FIG. 6.

Effect of increasing input dimensionality of an input timeseries from a set of known fractal dimension utilizing delay embeddings of dimension d. The timeseries used is the first variable of a chaotic Hénon–Heiles trajectory.

FIG. 6.

Effect of increasing input dimensionality of an input timeseries from a set of known fractal dimension utilizing delay embeddings of dimension d. The timeseries used is the first variable of a chaotic Hénon–Heiles trajectory.

Close modal

In Fig. 6, we used a chaotic timeseries from the Hénon–Heiles system. This has Δ = 3, and as such, we expect convergence of the fractal dimension estimates to a value around 3 for d 4. We see in Fig. 6 that this is indeed the case. Δ ( C 2 ) does not seem to drop in performance with increasing d, besides a very small decrease in the overall range of order of magnitudes the linear scaling region covers. The same results were obtained using a timeseries from the chaotic Rössler system with Δ 2 or a chaotic Lorenz96 system with D = 4, which also has Δ 3. Δ 2 ( H ) seems to perform poorer with increasing d by either significantly reducing the linear scaling region or being less accurate in the dimension estimates. However, how much poorer it performs depends on the dataset: for the Hénon–Heiles, it is still decent, while for Lorenz96, it performs much worse (not shown). Hence, we conclude that Δ 2 ( C 2 ) performs much better as the state space dimensionality of data increases (while keeping data length and other aspects constant) vs Δ 2 ( H ), which is somewhat already evident from Fig. 3.

Real-world data are always accompanied by noise, and, therefore, the impact of noise on the calculation is highly important for the choice of the method. As it is well known, the presence of noise in the data makes estimating a fractal dimension harder, as the fractal dimension of the noise is equal to that of the state space and, hence, almost always larger than the fractal dimension of the clean data. Figure 7 shows results for various kinds, and amount, of noise added to the (normalized) chaotic Rössler attractor. On purpose for this plot, we have used ζ = 1 (see Sec. III I), and the standard linear regression method for log ( C 2 ), because the logarithmic correction of Ref. 42 overestimates Δ > 3 for noisy data (the input dataset is three-dimensional and, thus, cannot have Δ > 3).

FIG. 7.

Impact of noise using the (chaotic) Rössler system as an example. The noise percentages approximately indicate the ratio of the std. of the noise divided by the std. of the deterministic dynamics. See Sec. 4 of  Appendix A for a discussion of the vertical dotted line.

FIG. 7.

Impact of noise using the (chaotic) Rössler system as an example. The noise percentages approximately indicate the ratio of the std. of the noise divided by the std. of the deterministic dynamics. See Sec. 4 of  Appendix A for a discussion of the vertical dotted line.

Close modal

We start with the case of additive noise. There, it is known that there is some distance ε σ called the “noise level,” below which the slope of log ( C 2 ) changes from being the fractal dimension of the chaotic set to that of the noise,89 This fact can be used to actually estimate the noise level of the data.22 In Fig. 7, we see how quickly this really happens for log ( C 2 ). Even for 5% additive noise, the curve is already dominated with the noise slope (which has Δ 3 for three-dimensional additive noise), and only a small segment of the curve at large ε values where the slope becomes the deterministic value. The entropy curve does not have this property and remains having a single slope throughout (except the saturation part of course), while the slope value is an average between the purely deterministic Δ and that of the noise. On one hand, this may be considered a downside, because it does not allow estimation of noise level. On the other hand, we should note the majority of the log ( C 2 ) curve is already reflecting the noise slope. Thus, if we estimated the average slope of the log ( C 2 ) curve, it would be much larger than that of H 2, i.e., it puts much more weight on the noise dimension than the deterministic data. These results regarding additive noise are typical and do not seem to strongly depend on the system considered. Note that in the case of delay reconstruction, the slope of the noise should increase with increasing embedding dimension. The slope corresponding to the FD of the deterministic set should remain constant for embedding dimensions larger than a minimum value required for successful reconstruction of the state space of the dynamical system generating the data.

For dynamic noise, we turned the ordinary differential equation (ODE) of the Rössler system into a stochastic differential equation by adding a Wiener process term η d W in the second equation. For small amount of dynamic noise (which here reflects a proportionality of η with the expected size of the x variable), the fractal dimension increases slightly as expected, but does not have any noticeable change in its numerical value up to 10% noise. When one turns up the dynamic noise more, the dynamics collapse and there is no “chaotic” attractor anymore (not shown). Both entropy and correlation sum methods perform equally well vs this kind of noise, and there is no noise radius or change of slope discernible in the correlation sum case. We also looked at low-resolution data, by rounding Röessler timeseries to 2 digits after the decimal. This obviously decreases the valid ε-ranges one can do the computation for (see Sec. III I). For Δ 2 ( C ), this also significantly changes the result to a value smaller than “correct,” while for Δ 2 ( H ), it has no impact. This means that Δ 2 ( H ) performs better for rounded data, which makes sense given the way it is computed (as long as points are in the same box, it does not matter how close they are to each other).

In Fig. 8, we show fractal dimension estimates for real-world experimental data. We focused specifically in experiments that are relatively clean (large signal-to-noise ratio) and where the underlying dynamics is well known to display low dimensional deterministic chaos. This is important, because for this review, we do not want to mix the scientific question of whether an observed system accommodates a low-dimensional deterministic representation, with the technical/computational question of whether an estimator would actually detect that.

FIG. 8.

Fractal dimension estimates for experimental systems known to be underlined by low-dimensional nonlinear dynamics.

FIG. 8.

Fractal dimension estimates for experimental systems known to be underlined by low-dimensional nonlinear dynamics.

Close modal

In Sec. III H, we further discuss what happens with real-world data where neither of these two conditions apply. We limited densely sampled experimental data to at most N = 50 000, with sampling of about ten samples per characteristic timescale. Because of the observations of Sec. III H, we have used ζ = 1 and the standard linear regression method for log ( C 2 ) instead of the logarithmic correction of Ref. 42.

The datasets are as follows: two electrochemical oscillator datasets (the second being more chaotic than the first);34 timeseries from a circuit replicating the dynamics of the Shinriki oscillator;90 the mean field of a network of 28 circuits following Rössler dynamics from Ref. 91; data from a mechanical double pendulum from Ref. 92; and ECG recordings during a pacing experiment of healthy individual from Ref. 32. All experimental timeseries were delay embedded using the recent automated method due to Kraemer et al.34 (see  Appendix D for the embedding parameters). The method yielded embedding space of seven or less for all experimental timeseries, giving even more confidence that the data may display low-dimensional deterministic chaos.

For dataset “Rössler Net,” the curve ( log ( ε ) , log ( C 2 ) ) continuously changes its slope instead of having one, or at most two, constant-slope segments. This makes deducing a single fractal dimension ambiguous. Slight curving can be observed also in “electroch. 1,” “electroch. 2” and “ECG IBI” but it is a weak enough effect that two scaling regions can nevertheless be extracted (for “ECG IBI,” Fig. 8 reports the slope of the noise). On the other hand, the entropy-based approach does not suffer from this problem and, besides the expected saturation for small ε, seems to be described quite accurately by a single slope and, thus, a specific FD value. The results shown in Fig. 8 are from five to six-dimensional embeddings yet both H 2 , C 2 yield FDs that are less than the embedding dimensions (excluding the “Rössler Net” case for C 2 where a FD cannot be estimated). Hence, we can assume that the FD of the underlying dynamics has to be somewhere between the low bounds of Δ 2 ( H ) and Δ 2 ( C ).

The “constant slope curving” of C 2 is something we have not seen before with synthetic data. From the discussion of Sec. III F, the problem may be because realistic noise may be neither white nor stationary, or because a too high of a noise level in the data. Indeed, we saw that for 5%–10% relative noise, the slopes of C 2 already reflect the noise FD. In Fig. 8, C 2 yields consistently higher FD than H 2, even though we know that an underlying low-dimensional representation exists. To extract this lower FD value from C 2, one, therefore, must focus on the larger scales ε and try to find this consistent smaller slope by increasing embedding dimension (as is standard practice22). We show such an analysis in  Appendix F. Nevertheless, it appears true that C 2 is more strongly affected by noise when compared to H.

In this subsection, we examine the result of applying the aforementioned methods to ill-conditioned data, which may be non-deterministic or non-stationary, or to extremely high-dimensional data, where there exists this notion in the literature that the methods used so far are unlikely to succeed. For the first dataset, we used data from the Lorenz-96 model, with D = 6, while having the F parameter increase linearly during the time evolution from 1.0 (periodic motion) to 24.0 (chaotic motion with Δ 5). The second dataset is the concatenation of a periodic trajectory from the Rössler system with noise uniformly distributed on the 3D sphere. The third dataset is paleoclimate temperature timeseries from the Vostok Ice core, embedded in eight-dimensional space, which is unlikely to be stationary or to accommodate a low-dimensional representation.25 The fourth dataset is a stock market timeseries for the “nifty50” index embedded in six-dimensional space, which is definitely non-stationary and rather unlikely to be deterministic. The last two datasets are extremely high-dimensional data of the Lorenz96 and the Kuramoto–Sivashinsky spatiotemporal system (the latter having 101 dimensions after discretization). The results of the dimension estimation are shown in Fig. 9.

FIG. 9.

Fractal dimension estimates for extreme cases.

FIG. 9.

Fractal dimension estimates for extreme cases.

Close modal

Generally speaking, in Fig. 9, the results of the first four datasets show that something is “wrong.” There is a large miss-match between the estimates of the entropy and correlation sum methods and the curves do not seem to be composed of a single slope. Oddly, for the non-stationary Lorenz96 data, the correlation sum has a clear straight slope with fractal dimension somewhere between the extreme values. The Vostok data, in particular, are plagued both by a continuous change in slope, especially in log ( C 2 ), but also, the resulting FD values do not converge when increasing the embedding dimension (not shown). Additionally, the FD values obtained from H 2 or C 2 are very different. Notice that in all cases, our automated algorithm finds a value for Δ nevertheless. This only serves to highlight how careful one should be, and to always plot the curves of H 2, log ( C 2 ).

Although in the literature there exist several tests for non-stationarity using, e.g., permutation entropy or other methods, we will now describe a simple, fractal-dimension-based scheme. One can divide the data into m equal parts in two ways: Making m segments of length N / m of successive points, or by choosing every mth point, each time starting from point 1 to m 1. For these subsets, the same fractal dimension estimation is done. If there is non-stationarity, the first kind of selection will show significantly different estimates across its sub-datasets, while the second will show approximately the same estimates.

Let us now consider the last two datasets of Fig. 9 (32-dimensional Lorenz96 and Kuramoto–Sivashinsky). Surprisingly, C 2 shows a rather clear linear scaling region that has a very high slope, but not as high as the expected dimension values Δ ( L ) (28 and 32, respectively). This is consistent with the upper bound equation (16) of Eckmann and Ruelle with Δ 2 max ( C ) ( 100 000 ) 10 for ρ = 0.1 or Δ 2 max ( C ) ( 100 000 ) 23 for ρ = 1 / e, because even in cases where linear scaling occurs already for relatively large values ε = ρ E it cannot be expected to obtain slopes of size 28 or 32 with N = 100 000 data points, only. Furthermore, Fig. 9 shows that for these data the range of scales covered by the linear region is very small: only one e. Increasing the amount of data also increases the linear scaling region and the resulting confidence intervals shrink as we get more data points in the linear region (not shown here).

As discussed already in Sec. III C, these examples show again that with high-dimensional data, one needs much longer timeseries to properly cover the (typically high-dimensional) chaotic set, and this affects any kind of estimate of dynamical properties; it is not a problem specific to the correlation dimension. Additionally, one needs a very low signal-to-noise ratio, because due to the coverage of a very small range of scaling factors ε, even a small amount of noise may ruin the estimation. However, if one does have such a clean high-dimensional dataset, Δ 2 ( C ) may still provide a useful estimator at least for a lower bound of the correlation dimension of that data set, where the value obtained should always be compared with Δ 2 max ( C ) ( N ) for the available amount of data N and dimension estimates of surrogate data (see Sec. V) to avoid wrong conclusions.

To estimate the value of Δ ( H q ) or Δ ( C q ) we need to find the slope of H q or log ( C q ) vs log ( ε ). This matter is typically resolved in a context-specific manner, where each plot is carefully examined and the “linear region” is decided by the practitioner by eye. This approach cannot work in an objective comparison. Here, we formulate an entirely objective and sensible (but not flawless) automated process that is separated into two parts: the choice of which sizes ε to calculate H q or log ( C q ) for, and how to estimate a linear scaling region from the respective curves. Once the linear scaling region is identified, actually obtaining the fractal dimension is a simple least squares fit. Note that small deviations from a straight line for densely sampled values of ε may occur due to lacunarity of fractal sets.126–130 While we can also observe this for very densely sampled ε, this effect is so miniscule that we consider it irrelevant for almost all data sets in practice. These oscillations are not suitable for the quantification of lacunarity in fractal sets, but other measures exist for this purpose.131,132

The range of ε is always decided with generating formula ε = e x where x are k linearly spaced values from log ( ε min ) + ψ to log ( ε max ) ζ, i.e., the values of ε are exponentially ranged in base e. ε min is the smallest inter-point distance existing in the set and ε max the average of the lengths along each of the variables of the set, and ψ , ζ constants. Unless stated otherwise, we have used k = 16 , ψ = 1 , ζ = 1 in Sec. III. In essence, we are limiting ε to be one order of magnitude (in base e) larger than the smallest inter-point distance and one order of magnitude smaller than ε max. If the resulting ε range does not cover at least two orders of magnitude (common case in high-dimensional data), we use ζ = 0 instead. This choice of ε max brings very good performance in the box-assisted algorithm for the correlation sum (Sec. 4 of  Appendix A) but it is not so small as to make the computation meaningless for realistic and/or noisy data. Notice that the automated fractal dimension estimates can be sensitive on k , ζ , ψ. In practice, we would recommend producing several estimates by varying these parameters and obtaining the median of Δ.

To estimate the linear region, we proceed as follows. We scan the local slopes of each one of the k 1 segments of the curve y vs log ( ε ) starting from the leftmost one (here y = H q or log ( C q )). If the local slope of the preceding segment is approximately equal to that of the next one with relative tolerance tol, i.e., | s i 1 s i | tol max ( s i 1 , s i ), then these two segments belong to the same linear region. We move to the next segment and compare it in the same way with the first segment belonging to the same linear region. When we find a mismatch, we start a new linear region. This way we have segmented the curve y vs log ( ε ) into approximately linear regions. We then choose the linear region which spans the largest amount of the log ( ε ) axis, and label it “the” linear region. We finally perform a least squares fit there and report the 5-95% confidence interval of the fitted slope. In Fig. 10, we visually demonstrate the process. We also compare it with another standard way fractal dimension related plots are presented: the successive local slopes δ 1 of each point of the curves and the same slopes δ 5 but fitted in a five-long data window. Our linear regions approach is equivalent with finding the largest plateau in the local slopes plots.

FIG. 10.

Demonstration of the algorithm estimating fractal dimensions from the curves of H 2 , log ( C 2 ). Shaded intervals are the estimated linear scaling regions. See text for the definition of δ 1 , δ 5 (differences of the top panels).

FIG. 10.

Demonstration of the algorithm estimating fractal dimensions from the curves of H 2 , log ( C 2 ). Shaded intervals are the estimated linear scaling regions. See text for the definition of δ 1 , δ 5 (differences of the top panels).

Close modal

Notice that there is a clear pitfall here. This algorithm will deduce a linear region no matter what. In many scenarios, this region might be meaningless, being too small to be of actual value, or could even be the scaling of the noise in noisy data, as shown in Sec. III F. So after all, careful consideration of the result is always necessary.

In addition to the algorithm presented here, also worth mentioning is recent work by Deshmukh et al.93 that offers an alternative way to estimate a slope. All possible slopes that could be estimated from the curve (by choosing all possible segments of length more than a specified minimum) are estimated. These are weighted by their length and by their inverse error and compose a distribution. The mean of the distribution is presented as the slope, while the quantiles of the distribution can be used as confidence intervals. For the work presented here, we believe our approach is more fitting, because how large a scaling region is also part of the accuracy of a FD estimator. Furthermore, we still wanted to display problems in the presence of two scaling regions (as in, e.g., Sec. III F), while the approach of Deshmukh et al. transforms the “looking at the curve y ( x ) for two scaling regions” problem into “looking at the distribution of all possible slopes for two peaks,” which, while easier to resolve, still would add more information content in our already extensive article. Nevertheless, the approach Deshmukh et al. is most likely better suited to use in practice, when estimating a fractal dimension from experimental data, as the authors have extended their method to also provide convergence criteria of fractal dimension estimates in Ref. 94.

In this section, we thoroughly analyze the power and shortcomings of the extreme value theory based FD Δ ( E ) introduced in Sec. II D for computing FDs under a similar lens as the comparison of Sec. III. Unless stated otherwise, we use N = 10 5 as the data length, standardize all input sets before any computations, and use p = 0.99 as the quantile probability of Sec. II D (due to the discussion in Sec. IV B). Since Δ ( E ) is obtained via an arithmetic mean, it can be compared to Δ 2 ( H ) and Δ 2 ( C ), which is what is used by most plots in Sec. III.

We start with Fig. 11, which shows Δ ( E ) computed for exemplary sets. The figure should be compared with Figs. 2 and 3. The figure style is typical for the rest of this section and shows the dimension estimates as distributions, with dashed white lines indicating the mean, and dotted white lines indicating the “expected” value, for which we use Δ ( L ) if possible, otherwise Δ 2 ( C ). Small horizontal red lines cap the strict limit that the dimension estimates should not exceed, i.e., the state space dimensionality. The inner legend in the plot displays information about the distribution: the mean, and in parenthesis the percentage of values that exceed the dimensionality limit cap (the red line).

FIG. 11.

Fractal dimension estimates for exemplary systems using the extreme value theory method. Shown are the distributions of Δ i ( E ) for each system. The inner legends display the mean of the distribution and in brackets the percentage of the distribution outside a strict cutoff value (the state space dimensionality). For the two eight-dimensional systems (Lorenz96, coupled logistic maps), we divide dimension values by 4 for visual purposes. More details in Sec. IV A.

FIG. 11.

Fractal dimension estimates for exemplary systems using the extreme value theory method. Shown are the distributions of Δ i ( E ) for each system. The inner legends display the mean of the distribution and in brackets the percentage of the distribution outside a strict cutoff value (the state space dimensionality). For the two eight-dimensional systems (Lorenz96, coupled logistic maps), we divide dimension values by 4 for visual purposes. More details in Sec. IV A.

Close modal

All in all, the Δ ( E ) estimates seem to match well those obtained by Δ 2 ( C ) with two notable exceptions: the method performs “poorly” for a quasiperiodic ( Δ = 2) trajectory of the Hénon–Heiles system, for the chaotic attractor of the Hénon map and for the Lorenz-96 chaotic attractor. Here, “poorly” means significant inaccuracy in the first decimal digit. It is not clear exactly why the EVT method is not particularly accurate for these systems; however, we can speculate the respective reasons. For quasiperiodic trajectories, the sampling time may be near-commensurate with one of the two periods, making some points very rarely visited in the finite set, even if they would be uniformly visited in the limit N . As the method assigns a higher dimension value to a point according to its visitation frequency, these rarely visited points on the quasiperiodic orbit get higher dimension values than they should (within the EVT framework, the lower the visitation frequency of a state space point, the higher its local FD). For the Hénon map, the only thing to note is that the attractor natural density is extremely singular and the assumption of Eq. (13) likely does not hold. For the Lorenz96 and the coupled logistic maps, the thing to note is that the attractors have a high “expected” (here Lyapunov) FD, which means that for given amount of data and high underlying FD, the EVT underestimates the FD more than C 2.

Unlike the entropy and correlation sum methods for computing FD presented so far, the EVT one is parametric:95 it requires the choice of an “extreme” probability value p for which to extract the quantile of g i when calculating the exceedances E i in Eq. (8). Therefore, before performing any further evaluation of the method, we must examine how it depends on its parameter p. We have found no formal mathematical definition of an “extreme” in the literature of this EVT methodology, or how to practically compute an “optimal” value for p, or whether an optimal value exists at all. Reference 65 provides some methodology for checking whether the chosen p is inappropriate that we evaluate in Sec. IV C.

In this subsection, we examine the impact of choices of p. This choice is somewhat linked with the data length N, as the local dimension estimation for each state space point is done based on N ( 1 p ) points. In Fig. 12 we, therefore, vary p with fixed N but also co-vary N , p with fixed N ( 1 p ).

FIG. 12.

Impact of choice for extreme probability p when estimating a quantile for Eq. (7). Top: under fixed data length, bottom: under fixed length of exceedances. D = 8 is used for Lorenz96 and Δ ( E ) for it is divided by 4 as in other plots.

FIG. 12.

Impact of choice for extreme probability p when estimating a quantile for Eq. (7). Top: under fixed data length, bottom: under fixed length of exceedances. D = 8 is used for Lorenz96 and Δ ( E ) for it is divided by 4 as in other plots.

Close modal

The results show that increasing p increases Δ ( E ). It also appears that not only the mean of the distribution of Δ i ( E ) depends on p but the shape of the distribution as well. On one hand, it is somewhat reassuring than once a fixed N ( 1 p ) is chosen, the results do not vary as wildly as when N is fixed, provided that N ( 1 p ) is large enough. However, we also noticed that under fixed N ( 1 p ), the estimated dimension seems to monotonically decrease further and further away from the expected value when p is decreased.

On the other hand, in most real-world applications, it is N that is fixed, and, hence, the results of the top panel of Fig. 12 are what is of most interest. In addition, even if it was possible to co-vary N , p in a realistic application, we still cannot provide instructions of what the best choice should be for p: while Fig. 12 reveals the dependence on p, it does not lead to any obvious conclusions on what p should be. A saving grace here is that while there is a clear dependence on p, the mean value Δ ( E ) does not change significantly (i.e., differences span less than one integer, which is anyways the accuracy we are interested in practice), provided that p remains in a range so that N ( 1 p ) 100 -- 1000.

The entropy or correlation sum approaches provide a relatively straightforward way to check for the significance of results: there should a single slope covering several orders of magnitude ε. The larger the range of magnitudes, the more significant the results. Such a simple visual significant check does not exist for the EVT approach. In this section, we examine possible ways to test for the significance of the EVT results based on what has been suggested in the literature or with alternative means we devised while composing this review.

In practical applications as in Ref. 65, the authors identify a range of p values that are appropriate using a statistical hypothesis test of whether E i follows an exponential distribution (EXPD). Examples of such statistical tests are the Anderson–Darling96 or the Kolmogorov–Smirnov.97 Here, we used the Kolmogorov–Smirnov exact one sample test. The test proceeds as follows: for a given p, each of the E i are first fitted to a EXPD. Then, the fitted EXPD is used in a statistical test for the null hypothesis: “the data coming from the given GPD.” The test yields a p-value. Typically, if the p-value is very small, e.g., p<0.05, the null hypothesis can be rejected, which may mean unsuitable data altogether, or not enough data (e.g., quantile probability p was chosen too high). In practice, one hopes that the majority of the p-values (each for each E i) is significantly larger than some low threshold of, e.g., 0.05.

Formally speaking however, a p-value greater than some threshold does not mean we can accept the null hypothesis; only that we fail to reject it. Any other distribution may have generated the data equally well. Hence, the convincing power of this line of argumentation (checking for large p-values) is weak from a statistical inference point of view. An alternative test mentioned in the literature is to check how stable the distributions for σ of the fitted EXPDs are when varying p; but we did not find this argument convincing (stability of parameters does not mean significant fit), so we ignored it.

A third way that one may judge the significance of the results is to directly estimate the error of the EXD fit for each exceedances vector E i. To do this, we use a form of a normalized mean squarer error given by
(17)
Here, P j is the empirical probability density of the measured E i values at their jth bin (i.e., the jth bin’s histogram height). G j is the fitted EXPD estimated at the bin’s midpoint. U is the same as G j but assuming a uniform distribution fitted to data instead of a EXPD (hence, U does not depend on j). The denominator normalizes the measure so that the error of the EXPD fit is measured with respect to the error of the uniform distribution (and hence, it is meaningful to compare NRMSEs across different E i). If NRMSE > 1, the uniform distribution is a better distribution model. We have used this error measure in the past in varying scenarios and noticed that values <0.5 correspond to a visually relatively correct fit98,99 (without any mathematical guarantee of course).

In Fig. 13, we show distributions of p-values and NRMSEs for a chaotic trajectory of the towel map. We provide more similar such plots in  Appendix E, which establish that our observations do not depend at all on the input set X.

FIG. 13.

Distributions of p-values of the statistical hypothesis test of whether the data are coming from a EXPD, and the relative error of the EXPD fits normalized by the error of a uniform distribution. Inner legends show the distribution medians. The figure is divided into three rows for three quantile probabilities p. Next to the distributions of p-values and NRMSE, there are three exemplary plots of the distributions of the exceedances E i and with black line the fitted EXPD (the text indicates the corresponding p-value and NRMSE). The bottom panel has the NRMSE distribution clamped in [0, 1]. The input dataset X is a chaotic trajectory of the towel map. See discussion in Sec. IV C for more.

FIG. 13.

Distributions of p-values of the statistical hypothesis test of whether the data are coming from a EXPD, and the relative error of the EXPD fits normalized by the error of a uniform distribution. Inner legends show the distribution medians. The figure is divided into three rows for three quantile probabilities p. Next to the distributions of p-values and NRMSE, there are three exemplary plots of the distributions of the exceedances E i and with black line the fitted EXPD (the text indicates the corresponding p-value and NRMSE). The bottom panel has the NRMSE distribution clamped in [0, 1]. The input dataset X is a chaotic trajectory of the towel map. See discussion in Sec. IV C for more.

Close modal

The results are very surprising. It appears that the p-value based test is unhelpful and/or misleading. For example, it shows that quantile p = 0.95 is a bad choice, because the overwhelming majority of p-values are 0.05, eluding to a rejection of the hypothesis “the data come from a EXPD.” Yet, if we look at the actual fitted data to the right of the top panel, we do not observe “bad quality” fits at all. This is further established by the distribution of NRMSE values, which has all of its mass in low values. We are not sure why the hypothesis test behaves this way in this scenario.

On the other extreme of p = 0.999, the p-value based test is again misleading. This is the case where most of the p-values are >0.05; hence, it would be the most trustworthy in terms of the p-value test. Yet clearly, this is the case where the actual fits are the worst by far. We have performed extensive numerical tests and are confident that the code implementations yielding the p-value are correct.100,132 

This leads us to conclude that the NRMSE based test is much more trustworthy. If a practitioner wants to transform the NRMSE data into a Boolean decision “is this p okay?,” we would argue to check if the majority (i.e., 99%) of the mass of the NRMSE distribution is less than 0.5. Even so, the NRMSE test may only provide a range for p where the EXPD fits are of sufficiently high quality. It cannot instruct how to pick a p from that range. Thankfully, from what we have seen in Sec. IV B, the fluctuations of Δ ( E ) with p are relatively small if p is in an appropriate range.

What made the discussion of this subsection difficult is that we have not found any information regarding the p-values of this test in the literature, despite the plethora of real-world applications (see  Appendix A 10). While it has been mentioned that the p chosen in these real-world applications “satisfy” this p-value test,65 the actual p-value distributions were not shown.

In fact, we have not found any discussions in general regarding the significance of the EVT results. The main argument the EVT FD literature has used in favor of significance of results is that the Δ i ( E ) distributions did not change much within a range of appropriate p. However, by itself, this argument is not convincing of the validity of the results, only of their stability. We will discuss these aspects again in Sec. IV H and in the conclusions. For the rest of the article, we will be using p = 0.98 -- 0.99, as these values seem to yield correct results for synthetic X with lengths 10 4 -- 10 5.

In Fig. 14, we again compute Δ ( E ) for exemplary sets as in Sec. IV A, but now we compare it with the pointwise dimension, i.e., the scaling of the inner sum of Eq. (6) vs ε. Our goal with this comparison is to see how well either method captures the “spread” of dimension values. We expect that for rather uniform fractal sets the distribution should be narrow, and wide for sets with highly non-uniform natural measure. Note that the average of the pointwise dimensions of the correlation sum does not coincide with Δ 2 ( C ). As is also made clear in Ref. 16, Δ 2 ( C ) gives a more accurate result for the FD of the whole attractor because it utilizes more points to estimate the scaling behavior.

FIG. 14.

Comparison of local fractal dimension estimates using the extreme value theory (EVT) method and the pointwise dimension. Left and shaded distributions are EVT, right and transparent are pointwise. Dashed white lines are means of EVT, and dotted colored lines are means of pointwise.

FIG. 14.

Comparison of local fractal dimension estimates using the extreme value theory (EVT) method and the pointwise dimension. Left and shaded distributions are EVT, right and transparent are pointwise. Dashed white lines are means of EVT, and dotted colored lines are means of pointwise.

Close modal

The most important result here is that for the chosen p = 0.99, the two methods yield very similar results, hence establishing the overall accuracy of the EVT method for synthetic data. The pointwise dimension estimates, however, are more accurate for the Hénon map attractor and a quasiperiodic orbit, which we already discussed in Sec. IV A. Hence, C q is slightly more accurate for noiseless deterministic sets.

We now test the EVT approach while varying various aspects of the input data: length, state space dimensionality (using delay embeddings of increasing dimension as in Sec. III E) and sampling time. The reason to judge the quality of the EVT approach vs sampling time is because, unlike the correlation sum approach, which explicitly takes into account dense time sampling via the Theiler window w, the EVT approach is typically presented as agnostic to the sampling time. The analysis is presented in Fig. 15.

FIG. 15.

Fractal dimension estimates using the extreme value theory (EVT) when varying data aspects: length (top panel, using a chaotic Lorenz96 D = 8 trajectory), dimensionality (middle panel, using delay embeddings of a chaotic Hénon–Heiles timeseries), or sampling time (bottom panel, using a chaotic Rössler trajectory).

FIG. 15.

Fractal dimension estimates using the extreme value theory (EVT) when varying data aspects: length (top panel, using a chaotic Lorenz96 D = 8 trajectory), dimensionality (middle panel, using delay embeddings of a chaotic Hénon–Heiles timeseries), or sampling time (bottom panel, using a chaotic Rössler trajectory).

Close modal

Regarding data length, EVT scales well with decreasing N up to a threshold. When N becomes too low, so that N ( 1 p ) becomes less than 50, the results significantly lose accuracy, making the estimated Δ ( E ) increase rapidly. Interestingly, with decreasing N, the EVT overestimates the FD instead of underestimating it like H 2 or C 2. Note that one cannot simply fix this problem by reducing p because, as illustrated in Sec. IV B, this has its own downside of decreasing the estimated dimension. Still, we may conclude that for the data considered here, with N max > 10 3 the EVT performs well, which is a better scaling with N than H 2, but worse than C 2. Like with C 2, this data length N max should scale with the FD; however, there is no analytic treatment as to how (while for C 2 analytic bounds are discussed in Sec. III C). Given that real-world data are typically small in length, one has to be particularly aware about this point.

As far as input dimensionality is concerned, we observe similar results as with Δ 2 ( C ) in Fig. 6: Δ ( E ) seems quite robust when increasing dimensionality.

When it comes to sampling time, the correlation sum approach utilizes the Theiler window, which in practice shortens the distance calculations from N to N w. This has a negligible impact on data length but a significant positive impact on the FD estimate in cases where data are sampled densely in time.3 Similar results are obtained for Δ ( E ): for very small sampling times, the FD is biased toward lower values. Hence, it would make sense to include a Theiler window in Eq. (7), i.e., only include j with absolute distance from index i greater than some w like in Eq. (6). We note that Ref. 66 also considered the impact of “temporal neighbors” and reached the same negative bias conclusion.

We repeat here the analysis of Sec. III F in Fig. 16 using the Rössler system combined with various forms of noise. For additive noise, Δ ( E ) behaves much more similarly to Δ 2 ( H ) because the mean FD value of EVT is an average without focus at a specific scale. This means that the FD value in the presence of additive noise is between 1.9 (deterministic) and 3 (state space dimensionality). Additionally, the FD values of Δ ( E ) are the smallest (and, hence, closest to the deterministic FD value) out of the three ( E , C 2 , H 2). Δ ( E ) also seems to be completely unaffected by rounding.

FIG. 16.

Impact of noise in the fractal dimension estimates of extreme value theory.

FIG. 16.

Impact of noise in the fractal dimension estimates of extreme value theory.

Close modal

These observations make sense if one considers how Δ ( E ) is computed (Sec. II D). The logarithms of all inter-point distances are taken into account for the computation of the quantile of g i. Rounding will have a negligible effect on the distribution of inter-point distances, and additive noise will have a diminished effect due to being “averaged out” in some sense when computing the quantile. These properties make Δ ( E ) preferable in the presence of noise, unless one wants to identify the noise radius, in which case C 2 is more suited.

Last, we mention that in the case of dynamic noise, Δ ( E ) provides slightly higher values than H 2 or C 2 (but, of course, we do not know whether any of the three is the more “correct” number). It is worth noting nevertheless that the distribution of Δ ( E ) for dynamic noise is much narrower than for other types of noise, which we found unexpected. We examined no further however, as the results with dynamic noise depend strongly on the system used and the exact form the noise was added, and, hence, cannot lead to any general statements.

In Fig. 17, we use the same datasets as in Sec. III G. Because the datasets are smaller in size than the typical length we used before ( N = 10 5), we used p = 0.98 instead of p = 0.99. This value for p also satisfies the “NRMSE test” we described in Sec. IV C, in the sense of most NRMSE values being less than 0.5.

FIG. 17.

Fractal dimension estimates of extreme value theory for experimental data known to accommodate a low-dimensional deterministic representation.

FIG. 17.

Fractal dimension estimates of extreme value theory for experimental data known to accommodate a low-dimensional deterministic representation.

Close modal

In addition to the large extent of some of the distributions of Δ i ( E ), we do not notice any downside or incorrectness in the mean FD values: they are comparable with those coming from H 2 (and we cannot know whether H 2 or EVT are more correct in their estimate). Given the results of the preceding Sec. IV F, this is expected due to the better response EVT has to noise (vs C 2).

In Fig. 18, we apply Δ ( E ) in various extreme cases as in Sec. III H. Δ ( E ) does a good job distinguishing that two sets of very different dimensions have been artificially merged with each other in the first two cases of Fig. 18, because it yields bi-modal distributions for Δ ( E ). However, this is mainly due to the way that the sets were created. When compared to the case of adding noise to the data, we did not see bimodal distributions with peaks at 2 and 3 dimension values, i.e., Δ ( E ) cannot be used to identify a noise radius unlike Δ 2 ( C ). Nevertheless, Δ ( E ) could be a good tool detecting non-stationarity in an observed set, if that non-stationarity significantly changed the FD value over time.

FIG. 18.

Application of extreme value theory fractal dimension estimates on extreme cases. Values for Lorenz96 and Kuramoto–Sivashinsky have been divided by 4 for visual purposes. We used p = 0.95 for the “nifty50” and “vostok” data, because they are much smaller (otherwise p = 0.98). However, mean FD values depend only weakly on p.

FIG. 18.

Application of extreme value theory fractal dimension estimates on extreme cases. Values for Lorenz96 and Kuramoto–Sivashinsky have been divided by 4 for visual purposes. We used p = 0.95 for the “nifty50” and “vostok” data, because they are much smaller (otherwise p = 0.98). However, mean FD values depend only weakly on p.

Close modal

Unfortunately, for the sets that do not accommodate a low-valued FD description, like the Vostok and nifty50, a straightforward application of Δ ( E ) gives a rather “clear” picture that a low-dimensional FD value describes the data. This is especially obvious in the nifty50 set (which we remind is a stock market timeseries), where Δ ( E ) gives the lowest dimension values and a narrow distribution. Surrogate testing did not help here either: generated random-Fourier surrogates101 had consistently higher Δ ( E ) than the original data, enhancing the wrong conclusion that the estimated Δ ( E ) of the data is valid. Additionally, performing the significance tests of Sec. IV C made things worse: the resulting plots (shown in Fig. 23) look very similar to those obtained from systems with a legitimate low-dimensional representation, again reinforcing the wrong conclusion. Last, we attempted to perform the standard analysis of checking whether the FD result converges with increasing embedding dimension of the timeseries. In  Appendix F, we find that, in contrast to the H 2 , C 2 methods, the EVT method shows a convergence of the “nifty50” timeseries FD. Already in embedding dimension d = 5, it shows a constant mean Δ ( E ) for any d 5, which, again, re-enforces the wrong conclusion. The same results were obtained for the “Vostok” timeseries.

It appears that four different significance testing methods (p-values, NRMSEs, surrogate timeseries, convergence with increasing embedding dimension) all yielded with confidence that inappropriate data like “vostok” or “nifty50” have a small FD, which is incorrect. These results have major implications for both previous and future applications of the EVT method in real-world data. We have not found a way to test whether the FD values yielded from the EVT method actually represent a low-dimensional deterministic system or not, i.e., there is no way to falsify the method. Hence, extreme care must be taken when applying the EVT method to arbitrary real-world datasets, and whether the data accommodate a deterministic representation must be confirmed by other means (e.g., a self-consistent physical theory or using the correlation sum with increasing embedding dimension as is standard practice in nonlinear timeseries analysis22).

For extremely high-dimensional but deterministically chaotic data, Δ ( E ) does an excellent job in identifying a very high FD (note the FD values are divided by 4 in the figure), which is also very close to the expected value Δ ( L ). This was expected, as C 2 also does an excellent job identifying a very high dimension for clean data, and, in general, EVT and C 2 perform very similarly overall when it comes to deterministic noiseless data. For the Kuramoto–Sivashinsky example, we see that EVT estimates higher FD than C 2 (while typically we noticed that for high-dimensional data, it underestimates the FD when compared to C 2). However, in this example, we are not sure what the results are because EVT is indeed more accurate or because EVT overestimates the FD due to having too small N when compared to the expected FD value (see discussion in Sec. IV E).

Surrogate timeseries have been recommended by Theiler et al. in the early 1990s101 as a means to test for nonlinearity in noisy timeseries. It was suggested there that a discriminatory statistic for the test can be the FD computed via the correlation sum. That requires a bit of care. If one uses the algorithm we described here, i.e., using Eq. (6) and then deducing the slope of the largest linear scaling region, then the user risks estimating the fractal dimension of the noise (already existing in the original data) instead of the “underlying deterministic nonlinear dynamics data” (if any exist), thus invalidating the hypothesis testing approach in the first place. Other discriminatory statistics should be used instead (see Ref. 102, for example).

If a fractal dimension is chosen as a discriminatory statistic nevertheless, we propose the following alternatives for computing it: (i) use Takens’ estimator (Sec. 6 of  Appendix A) with the empirically good estimate ε max = 0.25 std ( x ) with x the original timeseries. Because Takens’ estimator performs a maximum likelihood estimation instead of a linear fit and, thus, considers all regions of the correlation sum up to some ε max, it produces an “average” fractal dimension of the noise and the underlying data. It should be clear, however, that the Takens estimator is used in this context (just) as a discriminating statistics and its results must not be interpreted as meaningful dimension estimates, or (ii) use the EVT version Δ ( E ), which also similarly produces an “average” of fractal dimensions of noise and deterministic set.

In this paper, we have analyzed many different practically relevant fractal dimension (FD) estimators we found in the literature. From all these estimators, we focused on an extensive quantitative comparison between the entropy-from-histogram method H q (Sec. II B and Sec. 1 of  Appendix A), the (potentially box-assisted) correlation sum method C 2 (Sec. II C) without any logarithmic corrections, and the extreme value theory method (EVT, Sec. II D). Based on our review, these three estimators are the ones most worthy using in practice ( Appendix A).

To keep this paper within sensible limits, we have used a relatively small set of possible dynamical systems and real-world data. We cross-checked our results with different systems (not shown), and we are confident that the results presented are robust. Nevertheless, it is impossible to guarantee their universality for all possible input datasets. To amend this, the reader can repeat our extensive analysis for any other dynamical system or input dataset of choice by only changing a couple of lines of code in the provided code base (see  Appendix B).

Our conclusions are as follows (see also Fig. 1). When comparing H 2 with C 2, we found that for synthetic (i.e., noiseless) data, C 2 is clearly superior to H 2, retaining much better accuracy for decreasing N (amount of points), increasing D (state space dimension), or decreasing ε (size scale). C 2 can also be used to detect the “noise radius,” as, e.g., illustrated by Kantz and Schreiber.22 For real data, this can become a downside, making the FD estimation using C 2 ambiguous due to either an almost continuous change of slope or the majority of the slope reflecting the slope of the noise (Sec. III G). This can be partly alleviated by examining the behavior of C 2 with increasing embedding dimension d. In Sec. III C, we saw that, provided that a deterministic chaotic attractor is known to generate the data, it is still quite sensible to estimate a FD of relatively small datasets using C 2, as the estimated FD remains very accurate even for relatively small data lengths. On the other hand, H q is very sensitive to data length and underestimates the FD strongly even for moderately small data (which is often the case in experiments). We had difficulties making sense of correlation sum for order q 2, even though it has been mentioned several times in the literature. C q for q 2 gives correct results only when one considers the slope at the largest ε values and it is unclear why the slopes (i.e., FD values) at small ε values are incorrect. As such, we suggest that when treating multifractality, the entropy method should be preferred, or, if one has too few data points (where H q performs poorly), then using C q and recording the slope of log ( C q ) at largest ε is the best alternative.

We then compared C 2 with EVT. In deterministic datasets, we found a very high degree of agreement between C 2 and EVT, confirming EVT’s accuracy. EVT performed equally well for high-dimensional data but worse than C 2 for decreasing N. Still, EVT performs better with decreasing N, increasing D, or increasing underlying FD when compared to H 2. When it comes to noise, EVT appears to have similar results with H 2, i.e., the results are an average of the deterministic (smaller) FD and that of the noise. EVT reports the smallest FD values when contaminated by noise, and hence, EVT is affected less strongly by noise than C 2 or H 2. One more advantage of EVT vs C 2 , H 2 is that it foregoes the identification, fitting, and extraction of slope from a scaling region and, hence, does not face the same limitations when the data can cover only very small range of magnitudes of ε. This advantage is balanced by the disadvantage of EVT being much more of a black box method than C 2 , H 2.

It appears that the EVT is a promising method that combines benefits from both H 2 , C 2: it scales well with decreasing N or increasing D and is more tolerant to noise than C 2. However, we also observed that it has a huge downside: it cannot be falsified. Or, at least, at the moment, there does not exist a method (visual or statistical) in the literature that can confidently falsify it, nor that it can quantify its significance meaningfully. All four methods we utilized and discussed in Sec. IV C (p-values, root-mean-squared errors, surrogate tests, and increasing embedding dimension) failed to indicate that, e.g., stock market timeseries are an inappropriate input (despite being non-stationary and not satisfying practically any of the assumptions underlying the EVT method). Instead, all ways to test for significance gave big confidence that stock market timeseries are described by a very small FD of 2.5.

Surprisingly, we found practically no discussion in the literature about falsifiability, despite the plethora of real-world applications in very varied input datasets (Sec. II D). Nevertheless, we believe that falsifiability is important, because with it, one resolves controversies like the fractal dimension of “global climate attractors” (Ref. 26). The lack of falsifiability has major implications for both previous and future real-world applications using the extreme value theory approach: the argument for low-dimensional determinism in the data cannot come from the EVT method itself, at least not at the moment.

It is clear, as it was before this work, that estimating a FD is not an easy task, and, hence, focusing on only a single number can mislead. The best practice we feel is to calculate several versions of Δ, from different methods and with varying the parameters of each method (including the range of ε or the quantile probability p) and produce a median of the results. In addition, given the software implementation we provide here, calculating all FD variants only necessitates a couple of lines of code (see  Appendix B). Furthermore, plotting of the appropriate quantities H q , log ( C q ) vs ε is a must and can hint whether the methods are applied at inappropriate data. In addition, expecting more than one decimal point of accuracy is unrealistic in most practical applications.

As an outlook, we believe there is still some future research to be done regarding estimating FDs. In addition to the Lyapunov dimension, which is not easily applicable to observed data, every other estimator disregards the time-ordering information in observed data (i.e., that the sequence of points follows the flow in state space instead of being randomly drawn samples on the attractor). Perhaps here is a way to make a more powerful estimator by using this discarded information of the time-ordering of the points in the dataset. For example, this time-ordering information has been used in, e.g., estimating the transfer operator,103,104 which can also yield the natural density, and perhaps this operator can be utilized to create a FD estimator of higher accuracy or with better scaling with the number of points N.

Regarding the EVT approach, we believe future research can improve in two fronts: (1) developing a mathematically rigorous framework for choosing p and (2) developing statistical tests for correctness and significance of the method’s results that can successfully falsify the method with inappropriate data. With new statistical indicators, the method may be applied with more confidence to data of unknown dynamical origin.

We thank Ignacio Del Amo for an initial draft code implementation of the extreme value theory dimension estimator and for providing the simple proof of non-equivalence between the box-counting and Hausdorff dimensions; three independent reviewers for constructive criticism that greatly improved the quality of the manuscript; Nils Bertschinger for helpful discussions regarding distributions of p-values; and Gabrielle Messori and Davide Faranda for discussions and clarifications regarding the usage of extreme value theory for estimating a fractal dimension, including its p-value test of Ref. 65.

The authors have no conflicts to disclose.

George Datseris: Conceptualization (lead); Data curation (lead); Formal analysis (lead); Methodology (lead); Software (lead); Supervision (lead); Visualization (lead); Writing – original draft (lead); Writing – review & editing (lead). Inga Kottlarz: Formal analysis (supporting); Software (supporting); Writing – original draft (supporting). Anton P. Braun: Formal analysis (supporting); Software (supporting); Writing – original draft (supporting). Ulrich Parlitz: Conceptualization (lead); Supervision (lead); Writing – review & editing (lead).

 Appendix B discusses software implementations and a code base for reproducing the submitted paper.

1. Optimized histograms for arbitrary ε

Here, we describe an optimized method to calculate histograms, which to our knowledge neither has been published before nor could we find a faster method that works for arbitrary ε. The process has memory allocation scaling of D N and performance scaling of D N log ( D N ), neither of which depends on ε. The process is as follows. Every point in the dataset is first mapped to its corresponding bin via the operation b i = ( x i x min ) / ε , where x min is a vector containing the minimum value of the dataset along each dimension and is the floor operation. The resulting b ( N in total) are then sorted with a quick sorting algorithm, which results in all equal b i being successive to each other. The sorting is lexicographic, i.e., sorting by first dimension, then by second, and so on. We then count the successive occurrences of equal b i b i + 1, which gives the amount of points present in the corresponding bin and move on to the next bin (there are no actual bins being created in memory, a bin is conceptually defined as the group of successively equal b i). Dividing by the total amount of points gives the probabilities that can then be plugged into Eq. (3) to yield the entropy H q.

2. The box-counting algorithm by Molteno

This box-counting algorithm for the generalized dimension was introduced by Molteno38 as an improvement for the method introduced by Grassberger.105 It claims to be of order D N. The algorithm partitions the data into boxes and counts the number of points in each box to retrieve the probabilities p i necessary to calculate the generalized dimension Δ q ( H ). The faster runtime of the algorithm is due to the use of integer manipulations for the division of the data points into boxes.

To perform these manipulations, all data values are converted to unsigned integers. These are calculated by finding the maxima x max and minima x min in each dimension to then identify the dimension that covers the largest range X,
(A1)
where N bits is the bit size of the unsigned integer type that was used and is the floor function.

The first box contains all indices to the array that stores the data points, where a box refers to an array of indices. The boxes are subsequently partitioned, until the mean number of points per filled box falls below a threshold k 0, recommended by Molteno to be 10.

The kth step of partitioning divides the previous box into 2 D new boxes with the dimension of the data D. For a data point x, the index i of its new box is calculated as
(A2)
Here, » is the logical shift operation, shifting the bit representation of the first value to the right by the second value, and is the bit-wise “and” operation. Thereby ( x j  > >  ( N bits k ) ) and 1 translates to checking whether the bit at position N bits k + 1 is one. In a cycle of partition, all boxes that contain more than one data point are split up and empty ones are deleted. To retrieve the probabilities p i, the number of data points contained in each box is counted and divided by the total number of data points.

The algorithm provides exponentially scaled sets of values that suit the approximation of the limit in Eq. (4) and it only needs to compute the operation (A2) D N times per partitioning process. The main disadvantage is the static box size choice. It works well for low-dimensional data sets, but for larger dimensions, the number of boxes increases exponentially with dimension, thereby increasing the number of points necessary to calculate the dimension exponentially. This drastically limits the ε range to which the algorithm is applied over, and for some sets makes the computation of low accuracy or even straight-out impossible. Given how fast our histogram algorithm already is (all H q curves in this paper took on average 0.1 s to compute), we saw no reason to use the Molteno method. Also, the Molteno method does not allow the user to decide ε values, making it less flexible.

3. Correlation sum

The original correlation sum can be calculated as given in (6). For q = 2, the second sum can be changed to only include indices higher than the current one ( i),
(A3)
which makes the computation twice as fast. In all cases, the calculation time scales with D N 2, and, therefore, this method is exceptionally slow for high N. Note that the version of the q-order correlation sum given in Eq. (6) differs from the version provided by Kantz and Schreiber22 in the exponent of 1 / ( q 1 ). If the q-order correlation given by Kantz and Schreiber is called C q ¯ ( ε ), it scales as C q ¯ ( ε ) ε ( q 1 ) Δ q ( C ), then C q ¯ ( ε ) 1 / ( q 1 ) = C q ( ε ) ε Δ q ( C ) and both versions are equal. In addition to the exponent, the formulation of C q also differs in the indices of the outer of the sums, which range from 1 to N, requiring a slightly changed normalization.

4. The box and prism-assisted correlation sum

Theiler40 proposed an improvement over the calculation of the correlation dimension by Grassberger and Procaccia106 that divides the data into boxes before calculating the distances between points. Thereby, the number of distance calculations is reduced and the scaling becomes faster than N 2 (how much, it depends on the box size r 0). After division into boxes, the formula given in (6) is used to calculate the correlation sum, therefore an extension to the q-order correlation sum is possible.

The integer representations of the boxes b ( x ) with side length r 0 can be calculated by
(A4)
for each point x and the minimum of each dimension x min. This method is based on  Appendix A 1, as Theiler does not specify a method for the calculation of the boxes. The permutations needed to sort all representations in lexicographic order are calculated with a quick sort algorithm. The representations have the same ordering as the points they were calculated with, therefore sorting the points into boxes reduces to the iteration through the permutations and searching for changing integer representations. If the integer representation changes, all previous permutations are stored in an array.

For a point inside a box, there may be points outside the box that are within the given distance of the point. Therefore, the neighboring boxes with respect to the current box also have to be found and checked. The distances between the points in the box and its neighbors are calculated as given in Eq. (6). The first sum runs over all points of the initial box and the second sum uses all points in the box and adjacent boxes. For q = 2, only boxes with equal or larger indices are included in the neighbor search and the optimization, presented in the previous chapter, is used.

A point of criticism for this algorithm is its poor runtime for higher dimensions. The advantage of distributing the points into boxes beforehand diminishes as the number of boxes increases considerably with dimension. To reduce the amount of boxes, Theiler proposed a prism algorithm, where only the first P dimensions are used to distribute the data into boxes. These new boxes, where P sides are of side length r 0 and the other D P sides cover the whole range, are called prisms. The best choice given by Theiler is P = 0.5 log 2 N and should be used when D exceeds 0.75 log 2 N. A downside of this prism approach is that for any P < D, some point pairs that should have been discarded may be included due to having small distances in the first P dimensions but a larger distance in at least one of the remaining dimensions.

According to Theiler, the size of the boxes r 0 can be computed as
(A5)
where R is the size of the chaotic attractor and the dimension Δ 2 ( C ) is estimated by computation of the correlation sum for N points. Since these points are chosen randomly, the estimated dimension and in consequence the box size and even the final dimension can vary strongly. A downside is that for some sets, the proposed box size can drop below the minimum interpoint distance of the set. Furthermore, our tests showed an irregular output value of r 0 with differences as high as two orders of magnitude for two box size estimates on the Hénon map.
Bueno-Orovio and Pérez-García41 proposed a different, and more stable, algorithm for the calculation of the optimal box size r 0. They optimized the expected calculation times for the optimal number of filled boxes to
Here, P is the number of dimensions used for the boxing, with D for boxes and P for prisms. Introducing the effective length of the chaotic attractor = r η 1 / Δ 2 ( C ) and solving η opt = ( / r 0 ) Δ 2 ( C ) for the box size r 0 yields
(A6)
Here, is the effective length, r is the box size used to calculate the effective length with r = R / 10, and η is the number of filled boxes in case of the effective length. η is calculated by distributing N / 10 points into boxes of size r . The dimension Δ 2 ( C ) used for the calculation of r 0, η opt, and is again estimated by computation of the correlation sum for N points.

The box size estimator still varies in its choice of box size but shows fluctuations of smaller amplitude than the Theiler estimator. The choice of a box size smaller than the smallest interpoint distance only occurs for high dimensional data sets with a low number of points. Furthermore, Bueno-Orovio and Pérez-García chose a prism dimension of always P = 2 (for D > 2). However, a prism dimension of 2 can result in box size estimates smaller than the minimal interpoint distance for high dimensional datasets with comparably low size. In this paper, we used P = 2 but we have different r 0 (see below).

The main benefit of a small r 0 is that it makes the computations much faster. Unfortunately, both suggestions, and especially that of Ref. 41, often fail in practice. They give much too small r 0 values and for data with any amount of noise whatsoever this value is well below the noise radius. This is displayed in Fig. 7. The vertical dotted line shows the r 0 estimated by Eq. (A6) (the calculation of C 2 would be limited up to this r 0). In fact, r 0 is so small that even for only 5% additive noise the computation would only show the noise dimension and no hint of the deterministic (and smaller) slope of log ( C 2 ) vs log ( ε ). That is why in this paper, we decided to use the box-assisted version for better performance, but with r 0 = ε max / e 2 as discussed in Sec. III I. The performance is not too bad, e.g., for the typical data lengths N = 10 5 considered here computing the entire C 2 curve takes about a minute on an average computer. Notice that even with the optimized-for-performance r 0 of Ref. 41, the entropy method is still massively faster (in fact, it is even faster than just computing r 0).

5. Fixed mass correlation dimension

The correlation dimension stems from the assumption that the probabilities p i scale as i ( p i q / δ i ( q 1 ) Δ q ( M ) ) 1 for ε 0 with δ i being the diameter of the partition. In most methods that compute the dimension, the diameter is fixed and the probabilities are defined by the set. Termonia and Alexandrowicz107–109 proposed a method using a fixed mass algorithm that does the opposite: the probabilities are fixed by defining p i = n / N, where n is a chosen number of points and N is the total number of points in the set. The diameter is chosen to include n points. Following the explanation by Grassberger84 the scaling can be rewritten to assume the form of Eq. (A7),
(A7)
Here, ( r ( j ) ) τ ¯ is the mean of all radii that contain j points, τ = ( q 1 ) Δ q, M is the number of points of the set considered for the calculation, and Γ is the gamma function. Equation (A7) is not solvable for Δ q in general. Equation (A8) can be obtained from Eq. (A7) from the limit q 1 and applying L’Hôspital’s rule,
(A8)
Ψ is the digamma function, Ψ ( x ) = d log Γ ( x ) / d x. With this general form that is not solvable, this algorithm is restricted to q = 1. We compared this version with the traditional correlation sum with q = 2 and we found very similar accuracy in the estimated FD, provided that maximum j was large enough. However, we also noticed that the fixed mass results spanned much less orders of magnitude in the estimated log r ( j ) ¯, even for very high j (see Fig. 19). Furthermore, it is not clear to us how to estimate up to what maximum j should the calculations be performed, i.e., it is not something that could be straightforwardly extracted from data as in Sec. III I. Regarding computational performance, the fixed mass algorithm is implemented using KDTrees,110 giving a very high performance for low-dimensional dataset, scaling poorly for very high-dimensional ones. Hence, we decided to use only the traditional correlation sum version in the main comparison of Sec. III.
FIG. 19.

Fractal dimension estimates for sets coming from different dynamical systems with known dynamical rule to compare the correlation sum to the fixed mass formulation.

FIG. 19.

Fractal dimension estimates for sets coming from different dynamical systems with known dynamical rule to compare the correlation sum to the fixed mass formulation.

Close modal

6. Takens’ estimator

The estimator introduced by Takens13 aims to estimate the correlation dimension using the method of maximum likelihood estimation (MLE).111 From m interpoint distances ρ j < ε max, one can estimate the correlation dimension as
(A9)
The derivation of this formula starts with the assumption that ε max > 0, so that 0 ε ε max, the correlation sum
(A10)
without any higher-order terms, for some proportionality constant c.
The transformed variable y i = log ( ρ i / ε max ) is distributed exponentially with parameter Δ ( T ). The log-likelihood-function of this distribution is given by
l is then maximized with respect to Δ ( T ) to obtain the most likely value of the correlation dimension. As Takens noted correctly, just taking the maximum of l is biased,13,43 which can be easily corrected by writing

For a Gaussian distributed random variable, the log-likelihood function is a parabola, that at 1 σ has fallen by 0.5 from its maximum and at 2 σ by 2. By invariance,112 this is also the case for a non-Gaussian random variable, letting us easily estimate the variance of Δ ( T ).

When testing the algorithm and its dependency on ε max (Fig. 20) on different dynamical systems, we found that the variation of Δ ( T ) exceeds the confidence intervals at any fixed ε max for low-dimensional systems.

FIG. 20.

Dependency of Δ ( T ) for different systems on the parameter ε max. The shaded 5%–95% confidence intervals around the curves are not visible in most cases. Clearly, the variation of Δ ( T ) over different values of ε max exceeds the confidence intervals.

FIG. 20.

Dependency of Δ ( T ) for different systems on the parameter ε max. The shaded 5%–95% confidence intervals around the curves are not visible in most cases. Clearly, the variation of Δ ( T ) over different values of ε max exceeds the confidence intervals.

Close modal

These variations occur because for the estimation, it is assumed that Eq. (A10) holds. Thus, the estimated Δ ( T ) and its confidence intervals are of no use as long as the validity of the assumption (A10) is not known.

While Takens’ estimator does not provide a significant advantage in the precise estimation of the fractal dimension, compared to correlation-sum-based methods, it can be useful in the case of surrogate timeseries as described in Sec. V.

7. Judd’s estimator

Judd’s “improved” estimator for the correlation dimension44,45 also uses maximum likelihood estimation. To account for deviations from uniform scaling, it allows a polynomial a of degree t so that the assumed correlation sum is
(A11)
It is stated that a degree t 2 is “usually sufficient.”
The estimator performs a binned maximum likelihood estimation of Δ 2 ( J ) alongside the coefficients of the polynomial. Judd, therefore, introduces a logarithmic binning, where the bins are defined by B 0 = [ ε 0 , ), ε i = λ i ε 0 and B i = [ ε i , ε i 1 ) for i > 0 and λ < 1. The parameter ε 0 is called the cutoff and w = log ( 1 / λ ) the bin width. Now, the probability of observing a distance in bin B i becomes p i = P i P i + 1, with
If the bin contents b i are distributed multinomially, the negative log-likelihood function for the bin contents { b i } is
(A12)
which must be minimized under the constraints
(A13)
Equation (A12) does not necessarily only have one minimum because it depends on t + 2 parameters and must be minimized numerically.
The optimal bin width for the estimator w minimizes
(A14)
where m is the index of the last bin for which the bin content b m 0 and n is the number of considered interpoint distances. Because the computation time of Eq. (A14) becomes unreasonably large for large numbers of distances, the estimator is restricted to small sample sizes.

Once the optimal bin width is found, the cutoff ε 0 is chosen as the right edge of the fullest bin of the histogram. All bins to the right of this bin are joined to form B 0 = [ ε 0 , ).

The minimization of Eq. (A12) is subject to two difficulties that are already noted by Judd. First, an optimizer cannot understand the idea that the exponential ε Δ ( J ) is the essence of the model, while the polynomial is only a device to correct for deviations from the scaling law. Second, the optimizer is highly sensitive to the initial condition of the optimization, which could be reduced by a tailored optimizer,113 but still one can observe a very broad distribution of Δ 2 ( J ) values over different samples of a long trajectory, especially for higher-dimensional systems, as is shown in Fig. 21.

FIG. 21.

Estimates of Δ 2 ( J ) for different dynamical systems. For each box, 100 different samples have been drawn from a trajectory. Each individual sample contained 100 points. The degree of the polynomial was deg ( a ) = 1.

FIG. 21.

Estimates of Δ 2 ( J ) for different dynamical systems. For each box, 100 different samples have been drawn from a trajectory. Each individual sample contained 100 points. The degree of the polynomial was deg ( a ) = 1.

Close modal

Due to these problems, we decided not to include the estimator in the main comparison.

8. Dimension from Lyapunov exponents

In Sec. II E, we described the Lyapunov dimension Δ ( L ) due to Kaplan and Yorke. We do not have anything to add here regarding Δ ( L ), but we want to mention Ref. 48 by Chlouverakis and Sprott. They suggest that instead of a linear interpolation to the sum of λ i, a polynomial interpolation should be done instead. However, as we found no theoretical foundation for this proposal, we decided to skip it (and we also did not notice any significant improvement with the numeric results).

9. Mean return times

According to the Poincaré recurrence theorem, any trajectory within an ergodic set114 will return arbitrarily often and arbitrarily close to any neighborhood in the ergodic set.99 We represent this neighborhood as a hypersphere of radius ε, centered at some point x 0 in the ergodic set, and define as γ the mean return time to this hypersphere. Then, one expects that log ( γ ) Δ ( γ ) log ( ε ) with Δ ( γ ) the fractal dimension, as estimated from the mean return times. A more formal discussion of this fact and explicit connection with the natural measure of the ergodic set and the fractal dimension obtained via the generalized entropy (4) is given by Theiler.16 The earliest reference we found using return times to estimate fractal dimensions was Ref. 46.

Unfortunately, the method using mean return times is not recommended at all. A fundamental limitation is that knowledge of the dynamic rule f is necessary, otherwise the results of the method for measured data are too inaccurate to be considered seriously. Even for known rule f, the method converges slowly (numerically). Furthermore, it provides an estimate of the local dimension around the point of return, similarly with the point-wise dimension. Thus, it has to be further averaged over several state-space points, requiring several orders of magnitude more computation time than the correlation sum method or the Lyapunov dimension method. Last, we found its numeric output (not shown, see online repository) to be quite far from the results of the correlation dimension and, thus, we do not consider it accurate enough.

10. Extreme value theory

We introduced the algorithm of estimating a FD via extreme value theory (EVT) in Sec. II D. Here, we will expand more on how the algorithm has been used in the literature and highlight potential ambiguities we have noticed in its real-world applications.

Recall from the discussion of Sec. II D that the exponential distribution (EXPD) the extremes of the g ( ε ) function (Sec. II D) follow is, in fact, a simplification of a Generalized Pareto Distribution (GPD). The cumulative function of GPD is
(A15)
and is valid for x 0 if ξ 0 and for 0 x σ / ξ if ξ < 0. It reduces to EXPD for ξ = 0. The first ambiguity we found in the literature concerns the recently published work of Pons et al.,75 as well as real-world applications that explicitly fit a GPD to data (i.e., allowing ξ 0 instead of enforcing ξ = 0). It is correct that the exceedances E in a dataset may follow a GPD, but if one wants to make the claim that the σ parameter is connected with the fractal dimension, then one must assume that the data follow the reduced EXPD instead. That is because of Eq. (14). If one uses the GPD cumulative function instead of the EXPD one, one would get the expression exp ( E Δ i ) to be equated to ( 1 + ξ E / σ i ) 1 / ξ, from which it is impossible to claim Δ i = 1 / σ i like in Eq. (14). Hence, when using GPD fits instead of EXPD, one cannot simply equate the (local) fractal dimension with the σ parameter of the fit, in contrast to what has been done in the literature.75 

The second ambiguity we encountered is the lack of application, and even discussion, of delay coordinates embeddings. Many real-world applications, e.g., Ref. 65, analyze a single dynamic variable (such as sea level pressure in the case of climate applications). This dynamic variable is a spatiotemporal field and, hence, provides a multi-dimensional input dataset. Despite the high input dimensionality, this single variable is likely coupled to many other dynamic variables in a coupled dynamical system (which is especially true in the case of climate dynamics). The theory of delay embeddings is supposed to re-construct the missing dynamic variables and as a result provide a more correct representation of the dynamical flow. Delay embedding is missing from almost all applications of EVT we reviewed and cited, even though they all use the timeseries of only one dynamic variable. Given that delay embedding is a well established analysis step28 that is completely separate from estimating fractal dimensions, we are not sure why there is a lack of discussion of it.

The third ambiguity we want to highlight is the report of relatively small values for the fractal dimensions of spatiotemporal, and highly complex, real-world data. For example, Ref. 70 reports dimensions of Δ 3.5 for the dynamics of slow earthquakes in the Cascadia region, Ref. 72 reports Δ 15 for spatiotemporal atmospheric flow (of daily resolution; hence, large scale turbulence is considered), Ref. 73 report a difference of at most δ Δ i 2 from the states of largest and smallest local fractal dimension of the 500-hPa geopotential height (Z500) dynamic variable for the weather of the European-Atlantic region. In particular, in the last case, this would mean that, out of the potentially millions of available degrees of freedom in this (discretized) spatiotemporal system, there is a difference of at most two additional degrees accessed by the state space flow between the least and most stable regions in state space. Given that in this review, we noticed much higher differences of local dimensions in much lower-dimensional systems (see, e.g., Fig. 11 or 18), we find this reported small number difficult to grasp. In general, given the discussion on falsifiability of Secs. IV C and IV H, as well as limitations that come from the length of input data that we discussed in Sec. III C, we believe that the absolute value of the fractal dimensions reported in these publications should be taken with a grain of salt and not be equated with the available degrees of freedom in the state space. Whether or not the relative values of the local dimensions Δ i ( E )(in the sense that a relatively higher value means higher local state instability for the real system) can be used to draw conclusions or not depends on the confidence one has on the stability of the distributions of Δ i ( E ) (see Sec. IV C).

11. Persistent homology

Persistent homology methods are based on topological timeseries analysis and applications in dynamical systems. These techniques have been relatively recently applied to estimate fractal dimensions, and a quantitative review was recently published by Jaquette and Schweinhart.50 

The methodology is based on tracking how d-dimensional holes form or disappear as the point cloud that composes X is “inflated” or “thickened.” This means that each point in X is taken as a sphere with radius initially 0, and this radius is increased as the point cloud is “inflated.” Estimators based on zero-dimensional persistent homology using minimal spanning trees were proposed already in the early 1990s by van de Weygaert et al.115 and Martínez et al.116 who stress that this approach provides an estimate of a (generalized) fractal dimension and also works with relative small data sets. For more information about implementations of the method, see Ref. 50.

The review of Ref. 50 compared fluctuations in the output values and the distance of the output values themselves from the reference “true” values of the test sets applied to. The results showed that the persistent homology method has similar performance for d = 0, and dramatically worse performance for d > 0, when compared to the correlation sum. Unfortunately, for d = 0, the method performs poorly when noise is present, i.e., it does not distinguish two slopes (of the noise and of the deterministic set) and instead shows that of the noise, while it does find two slopes for d = 1. Additionally, the method output depends strongly on its meta-parameter α, whose value cannot be deduced from input data. For these reasons, and because the methodology itself is more complicated to both explain and implement than the correlation sum of Sec. II C, we deem the method worse than the correlation sum and only considered the correlation sum for a more in-depth comparison in Sec. III.

The work done in this paper and the figures produced are available as a fully reproducible code base, which can be found on GitHub.117 It is written in the Julia language118 and is using the software: DynamicalSystems.jl,36 DifferentialEquations.jl,119 BenchmarkTools.jl,120 ComplexityMeasures.jl,104 LsqFit.jl, and DrWatson.121 Figures were produced with Makie.122 All methods, with the exception of Judd’s algorithm and the persistent homology method, are implemented, documented, and tested extensively in the FractalDimensions.jl123 submodule of DynamicalSystems.jl. The implementations follow best practices in scientific code124 and are highly optimized, utilizing multi-threading whenever possible. The following code is a simple example of calculating Δ 2 ( C ), Δ 1 ( H ), and Δ i ( E ) with DynamicalSystems.jl and the Julia language:

All dynamical systems used for generating data are listed in Table III.

TABLE III.

Description of various systems considered in this paper.

System Dynamical rule Initial conditions Parameters
Hénon map  x n + 1 = 1 a x n 2 + y n , y n + 1 = b x n  (0.08, 0.12)  a = 1.4, b = 0.3 
Kaplan–Yorke map  x n + 1 = 2 x n % 1 , y n + 1 = λ y n + cos ( 4 π x n )  (0.15, 0.2)  λ = 0.2 
Towel map  x n + 1 = 3.8 x n ( 1 x n ) 0.05 ( y n + 0.35 ) ( 1 2 z n ) , y n + 1 = 0.1 ( ( y n + 0.35 ) ( 1 + 2 z n ) 1 ) ( 1 1.9 x n ) , z n + 1 = 3.78 z n ( 1 z n ) + b y n  (0.085, − 0.121, 0.075)   
Hénon–Heiles  x ˙ = p x , y ˙ = p y , p ˙ x = x 2 x y , p ˙ y = y ( x 2 y 2 )  ( 0 , 0.25 , 0.420 81 , 0 )   
Coupled logistic maps  u n + 1 ( i ) = 4 v n ( i ) ( 1 v n ( i ) ) , v n ( i ) = u n ( i ) + k ( u n ( i 1 ) 2 u n ( i ) + u n ( i + 1 ) )  ( 0.1 , , 0.9 )  k = 0.1 
Rössler  x ˙ = y z , y ˙ = x + a y z ˙ = b + z ( x c )  (0.1, − 0.2, 0.1)  a = b = 0.2, c = 5.7 
    Periodic parameters:  a = b = 0.2, c = 3 
Lorenz-96  x i ˙ = ( x i + 1 x i 2 ) x i 1 x i + F  (j × 0.1 for j ∈ 0, …, D − 1)  F = 24 
System Dynamical rule Initial conditions Parameters
Hénon map  x n + 1 = 1 a x n 2 + y n , y n + 1 = b x n  (0.08, 0.12)  a = 1.4, b = 0.3 
Kaplan–Yorke map  x n + 1 = 2 x n % 1 , y n + 1 = λ y n + cos ( 4 π x n )  (0.15, 0.2)  λ = 0.2 
Towel map  x n + 1 = 3.8 x n ( 1 x n ) 0.05 ( y n + 0.35 ) ( 1 2 z n ) , y n + 1 = 0.1 ( ( y n + 0.35 ) ( 1 + 2 z n ) 1 ) ( 1 1.9 x n ) , z n + 1 = 3.78 z n ( 1 z n ) + b y n  (0.085, − 0.121, 0.075)   
Hénon–Heiles  x ˙ = p x , y ˙ = p y , p ˙ x = x 2 x y , p ˙ y = y ( x 2 y 2 )  ( 0 , 0.25 , 0.420 81 , 0 )   
Coupled logistic maps  u n + 1 ( i ) = 4 v n ( i ) ( 1 v n ( i ) ) , v n ( i ) = u n ( i ) + k ( u n ( i 1 ) 2 u n ( i ) + u n ( i + 1 ) )  ( 0.1 , , 0.9 )  k = 0.1 
Rössler  x ˙ = y z , y ˙ = x + a y z ˙ = b + z ( x c )  (0.1, − 0.2, 0.1)  a = b = 0.2, c = 5.7 
    Periodic parameters:  a = b = 0.2, c = 3 
Lorenz-96  x i ˙ = ( x i + 1 x i 2 ) x i 1 x i + F  (j × 0.1 for j ∈ 0, …, D − 1)  F = 24 

The method of Kraemer et al.34 finds optimal delay times that may not be equispaced. The amount of delay times found is equal to the embedding dimension. The delay times found are listed in the below list for each system (delay times are always integers, in units of the sampling time). For Vostok and “nifty50,” data we used traditional techniques of optimizing delay time and embedding dimension individually via Cao’s method and minimum of mutual information, because the method of Ref. 34 (correctly) yields that the data do not accommodate a proper embedding:

  1. “electroch. 1”: (0, 26, 13, 5, 20),

  2. “electroch. 2”: (0, 25, 16, 148, 138, 87, 60, 105),

  3. “Shinriki”: (0, 19, 38, 57),

  4. “nifty50”: (0, 43, 86, 129, 172, 215),

  5. “vostok”: (0, 50, 100, 150, 200, 250, 300),

  6. “double pendulum”: (0, 51, 25, 39, 12),

  7. “Roessler”: (0, 6, 3, 14), and

  8. “EEG IBI”: (0, 13, 26, 39, 7).

See Figs. 22 and 23.

FIG. 22.

Same as Fig. 13 but now for the experimental dataset “electrochemical 1.”

FIG. 22.

Same as Fig. 13 but now for the experimental dataset “electrochemical 1.”

Close modal
FIG. 23.

Same as Fig. 13 but now for the “nifty50” stock market timeseries (embedded in six-dimensional space). Note that this timeseries is only 3125 samples long, hence we used much smaller p when compared to other plots. The proportion N ( 1 p ), however, remains similar.

FIG. 23.

Same as Fig. 13 but now for the “nifty50” stock market timeseries (embedded in six-dimensional space). Note that this timeseries is only 3125 samples long, hence we used much smaller p when compared to other plots. The proportion N ( 1 p ), however, remains similar.

Close modal

In Figs. 24 and 25, we perform what is known as standard practice when estimating FDs: increasing the embedding dimension iteratively until a convergence of FD appears at the largest scales of ε.22 In particular, for this subsection, we estimate the slope of the right-most linear scaling region (i.e., the one at the largest ε), as opposed to the slope of the largest linear region. That is because in real data, it is this slope that should indicate the FD of the underlying deterministic dynamics, if any exist (see Sec. III F).

FIG. 24.

Analysis of behavior of FD estimators as we increase embedding dimension of real world data (delay time was estimated as the minimum of self-mutual information), here for the “electrochemical 2” dataset.

FIG. 24.

Analysis of behavior of FD estimators as we increase embedding dimension of real world data (delay time was estimated as the minimum of self-mutual information), here for the “electrochemical 2” dataset.

Close modal
FIG. 25.

Same as in Fig. 24 but for the “nifty50” dataset. Note the convergence of the EVT FD into a small value of Δ ( E ) 2.5 already for d 5. Practically identical results are obtained with the “vostok” dataset: C 2 fails to converge to any FD with increasing d, while EVT converges to Δ ( E ) 3.3 for d 5.

FIG. 25.

Same as in Fig. 24 but for the “nifty50” dataset. Note the convergence of the EVT FD into a small value of Δ ( E ) 2.5 already for d 5. Practically identical results are obtained with the “vostok” dataset: C 2 fails to converge to any FD with increasing d, while EVT converges to Δ ( E ) 3.3 for d 5.

Close modal

Indeed, for the “electrochemical 2” dataset, the convergence of Δ 2 ( C ) becomes apparent very quickly. On the other hand, for “nifty50,” there is no convergence (we computed C 2 up to d = 13, not shown). The results of H 2 for “nifty50” are inaccurate because the timeseries has only 3125 points; they should not be trusted (but anyways they do not show any convergence either).

We also perform the same analysis for the EVT approach, which once again reinforces that the method which fails to understand the stock market timeseries should not have a convergent dimension. Instead, the dimension estimates converge very rapidly with increasing d.

1
B. B.
Mandelbrot
,
The Fractal Geometry of Nature
(
W.H. Freeman
,
New York
,
1982
), Vol. 2.
2
K.
Falconer
,
Fractical Geometry Mathematical Foundations and Applications
(
Wiley
,
2003
).
3
G.
Datseris
and
U.
Parlitz
, “Nonlinear dynamics,” in Undergraduate Lecture Notes in Physics, 1st ed. (Springer Nature, Cham, 2022).
4
E.
Ott
, “
Attractor dimensions
,”
Scholarpedia
3
,
2110
(
2008
), revision #91015.
5
L. F.
Richardson
, “
The problem of contiguity: An appendix to statistics of deadly quarrels
,”
Gen. Syst. Yearbook
6
,
139
187
(
1961
); available at https://cds.cern.ch/record/434833.
6
B.
Mandelbrot
, “
How long is the coast of Britain? Statistical self-similarity and fractional dimension
,”
Science
156
,
636
638
(
1967
).
7
R. P.
Taylor
,
A. P.
Micolich
, and
D.
Jonas
, “
Fractal analysis of Pollock’s drip paintings
,”
Nature
399
,
422
(
1999
).
8
R. P.
Taylor
,
R.
Guzman
,
T. P.
Martin
,
G. D. R.
Hall
,
A. P.
Micolich
,
D.
Jonas
,
B. C.
Scannell
,
M. S.
Fairbanks
, and
C. A.
Marlow
, “
Authenticating Pollock paintings using fractal geometry
,”
Pattern Recognit. Lett.
28
,
695
702
(
2007
).
9
Dimensions and Entropies in Chaotic Systems, edited by G. Mayer-Kress (Springer, Berlin, 1986).
10
P.
Grassberger
and
I.
Procaccia
, “
Characterization of strange attractors
,”
Phys. Rev. Lett.
50
,
346
349
(
1983
).
11
P.
Grassberger
and
I.
Procaccia
, “
Measuring the strangeness of strange attractors
,”
Physica D
9
,
189
208
(
1983
).
12
P.
Frederickson
,
J. L.
Kaplan
,
E. D.
Yorke
, and
J. A.
Yorke
, “
The Lyapunov dimension of strange attractors
,”
J. Differ. Equ.
49
,
185
207
(
1983
).
13
F.
Takens
, “On the numerical determination of the dimension of an attractor,” in Dynamical Systems and Bifurcations, edited by B. L. J. Braaksma, H. W. Broer, and F. Takens (Springer, Berlin, 1985), pp. 99–106.
14
G.
Paladin
and
A.
Vulpiani
, “
Anomalous scaling laws in multifractal objects
,”
Phys. Rep.
156
,
147
225
(
1987
).
15
D.
Ruelle
,
Chaotic Evolution and Strange Attractors
(
Cambridge University Press
,
1989
).
16
J.
Theiler
, “
Statistical precision of dimension estimators
,”
Phys. Rev. A
41
,
3038
3051
(
1990
).
17
D. A.
Russell
,
J. D.
Hanson
, and
E.
Ott
, “
Dimension of strange attractors
,”
Phys. Rev. Lett.
45
,
1175
1178
(
1980
).
18
E.
Ott
,
Chaos in Dynamical Systems
(
Cambridge University Press
,
2012
), see arXiv:1011.1669.
19
C.
Grebogi
,
S. W.
McDonald
,
E.
Ott
, and
J. A.
Yorke
, “
Final state sensitivity: An obstruction to predictability
,”
Phys. Lett. A
99
,
415
418
(
1983
).
20
S.
Banerjee
, “
Coexisting attractors, chaotic saddles, and fractal basins in a power electronic circuit
,”
IEEE Trans. Circuits Syst. I
44
,
847
849
(
1997
).
21
T.
Tel
and
M.
Gruiz
,
Chaotic Dynamics, An Introduction Based on Classical Mechanics
(
Cambridge University Press
,
2007
).
22
H.
Kantz
and
T.
Schreiber
,
Nonlinear Time Series Analysis
,
2nd ed.
(
Cambridge University Press
,
2003
).
23
H. D. I.
Abarbanel
,
Analysis of Observed Chaotic Data
(
Springer
,
New York
,
1996
).
24
C.
Nicolis
and
G.
Nicolis
, “
Is there a climatic attractor?
,”
Nature
311
,
529
532
(
1984
).
25
P.
Grassberger
, “
Do climatic attractors exist?
,”
Nature
323
,
609
612
(
1986
).
26
A. A.
Tsonis
,
J. B.
Elsner
, and
K. P.
Georgakakos
, “
Estimating the dimension of weather and climate attractors: Important issues about the procedure and interpretation
,”
J. Atmos. Sci.
50
,
2549
2555
(
1993
).
27
A.
Babloyantz
, “Some remarks on nonlinear data analysis of physiological time series,” in Measures of Complexity and Chaos, edited by N. B. Abraham, A. M. Albano, A. Passamante, and P. E. Rapp (Springer New York, Boston, MA, 1989), pp. 51–62.
28
H.
Kantz
,
T.
Schreiber
,
I.
Hoffmann
,
T.
Buzug
,
G.
Pfister
,
L. G.
Flepp
,
J.
Simonet
,
R.
Badii
, and
E.
Brun
, “
Nonlinear noise reduction: A case study on experimental data
,”
Phys. Rev. E
48
,
1529
1538
(
1993
).
29
H. E.
Hurst
, “
Long-term storage capacity of reservoirs
,”
T. Am. Soc. Civ. Eng.
116
,
770
799
(
1951
).
30
J. W.
Kantelhardt
, “Fractal and multifractal time series,” in Encyclopedia of Complexity and Systems Science, edited by R. A. Meyers (Springer, New York, NY, 2009), pp. 3754–3779.
31
D.
Mayor
,
D.
Panday
,
H. K.
Kandel
,
T.
Steffert
, and
D.
Banks
, “
CEPS: An open access MATLAB graphical user interface (GUI) for the analysis of complexity and entropy in physiological signals
,”
Entropy (Basel)
23
,
321
(
2021
).
32
D.
Mayor
,
T.
Steffert
,
G.
Datseris
,
A.
Firth
,
D.
Panday
,
H.
Kandel
, and
D.
Banks
, “
Complexity and entropy in physiological signals (CEPS): Resonance breathing rate assessed using measures of fractal dimension, heart rate asymmetry and permutation entropy
,”
Entropy
25
,
301
(
2023
).
33
E.
Bradley
and
H.
Kantz
, “
Nonlinear time-series analysis revisited
,”
Chaos
25
,
097610
(
2015
). arXiv:1503.07493.
34
K. H.
Kraemer
,
G.
Datseris
,
J.
Kurths
,
I. Z.
Kiss
,
J. L.
Ocampo-Espindola
, and
N.
Marwan
, “
A unified and automated approach to attractor reconstruction
,”
New J. Phys.
23
,
033017
(
2021
).
35
K. H.
Kraemer
,
M.
Gelbrecht
,
I.
Pavithran
,
R. I.
Sujith
, and
N.
Marwan
, “
Optimal state space reconstruction via Monte Carlo decision tree search
,”
Nonlinear Dyn.
108
,
1525
1545
(
2022
).
36
G.
Datseris
, “
Dynamicalsystems.jl: A Julia software library for chaos and nonlinear dynamics
,”
J. Open Source Softw.
3
,
598
(
2018
).
37
R.
Lopes
and
N.
Betrouni
, “
Fractal and multifractal analysis: A review
,”
Med. Image Anal.
13
,
634
649
(
2009
).
38
T. C. A.
Molteno
, “
Fast O ( N ) box-counting algorithm for estimating dimensions
,”
Phys. Rev. E
48
,
R3263
R3266
(
1993
).
39
P.
Grassberger
, “
Grassberger-Procaccia algorithm
,”
Scholarpedia
2
,
3043
(
2007
), revision #91330.
40
J.
Theiler
, “
Efficient algorithm for estimating the correlation dimension from a set of discrete points
,”
Phys. Rev. A
36
,
4456
4462
(
1987
).
41
A.
Bueno-Orovio
and
V. M.
Pérez-García
, “
Enhanced box and prism assisted algorithms for computing the correlation dimension
,”
Chaos Soliton. Fract.
34
,
509
518
(
2007
).
42
J. C.
Sprott
and
G.
Rowlands
, “
Improved correlation dimension calculation
,”
Int. J. Bifurcation Chaos
11
,
1865
1880
(
2001
).
43
S.
Borovkova
,
R.
Burton
, and
H.
Dehling
, “
Consistency of the Takens estimator for the correlation dimension
,”
Ann. Appl. Probab.
9
,
376
390
(
1999
).
44
K.
Judd
, “
An improved estimator of dimension and some comments on providing confidence intervals
,”
Physica D
56
,
216
228
(
1992
).
45
K.
Judd
, “
Estimating dimension from small samples
,”
Physica D
71
,
421
429
(
1994
).
46
M. H.
Jensen
,
L. P.
Kadanoff
,
A.
Libchaber
,
I.
Procaccia
, and
J.
Stavans
, “
Global universality at the onset of chaos: Results of a forced Rayleigh-Bénard experiment
,”
Phys. Rev. Lett.
55
,
2798
2801
(
1985
).
47
J. L.
Kaplan
and
J. A.
Yorke
, “Chaotic behavior of multidimensional difference equations,” in Functional Differential Equations and Approximation of Fixed Points, edited by H.-O. Peitgen and H.-O. Walther (Springer, Berlin, 1979), pp. 204–227.
48
K. E.
Chlouverakis
and
J. C.
Sprott
, “
A comparison of correlation and Lyapunov dimensions
,”
Physica D
200
,
156
164
(
2005
).
49
V.
Lucarini
,
D.
Faranda
,
A. C. G. M.
Moreira de Freitas
,
J. M. M.
de Freitas
,
M.
Holland
,
T.
Kuna
,
M.
Nicol
,
M.
Todd
, and
S.
Vaienti
, “Extremes and recurrence in dynamical systems,” in Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts (John Wiley & Sons, Nashville, TN, 2016).
50
J.
Jaquette
and
B.
Schweinhart
, “
Fractal dimension estimation with persistent homology: A comparative study
,”
Commun. Nonlinear Sci. Numer. Simul.
84
,
105163
(
2020
).
51
Sphere is used due to the usage of the Euclidean norm to calculate distances. If using the maximum norm, we would use a hypercube instead. The fractal dimension is invariant to the choice of the distance norm.
52
F.
Hausdorff
, “
Dimension und äußeres Maß
,”
Math. Ann.
79
,
157
179
(
1918
).
53
Proof: Consider the rationals in the interval [ 0 , 1 ]. The rationals are dense in this interval, and as such, the box-counting dimension of the rationals is exactly 1, as, no matter the box size ε, the whole number line will always be covered by boxes (i.e., there will never be a box without a rational in it). The rationals, however, are a countable set, and the Hausdorff dimension of any countable set is 0.
54
A.
Rényi
, “
On the dimension and entropy of probability distributions
,”
Acta Math. Acad. Scient. Hungar.
10
,
193
215
(
1959
).
55
P.
Grassberger
, “
Generalized dimensions of strange attractors
,”
Phys. Lett. A
97
,
227
230
(
1983
).
56
H.
Hentschel
and
I.
Procaccia
, “
The infinite number of generalized dimensions of fractals and strange attractors
,”
Physica D
8
,
435
444
(
1983
).
57
S.
Hidaka
and
N.
Kashyap
, “On the estimation of pointwise dimension,” arXiv:1312.2298 [physics.data-an] (2014).
58
J.
Theiler
, “
Spurious dimension from correlation algorithms applied to limited time-series data
,”
Phys. Rev. A
34
,
2427
2432
(
1986
).
59
M.
Felici
,
V.
Lucarini
,
A.
Speranza
, and
R.
Vitolo
, “
Extreme value statistics of the total energy in an intermediate-complexity model of the midlatitude atmospheric jet. Part II: Trend detection and assessment
,”
J. Atmos. Sci.
64
,
2159
2175
(
2007
).
60
A. C. M.
Freitas
,
J. M.
Freitas
, and
M.
Todd
, “
Hitting time statistics and extreme value theory
,”
Probab. Theory Relat. Fields
147
,
675
710
(
2010
).
61
D.
Faranda
,
V.
Lucarini
,
G.
Turchetti
, and
S.
Vaienti
, “
Numerical convergence of the block-maxima approach to the generalized extreme value distribution
,”
J. Stat. Phys.
145
,
1156
1180
(
2011
).
62
V.
Lucarini
,
D.
Faranda
, and
J.
Wouters
, “
Universal behaviour of extreme value statistics for selected observables of dynamical systems
,”
J. Stat. Phys.
147
,
63
73
(
2012
). arXiv:1110.0176.
63
T.
Caby
,
D.
Faranda
,
G.
Mantica
,
S.
Vaienti
, and
P.
Yiou
, “
Generalized dimensions, large deviations and the distribution of rare events
,”
Physica D
400
,
132143
(
2019
). arXiv:1812.00036.
64
F. M. E.
Pons
,
G.
Messori
,
M. C.
Alvarez-Castro
, and
D.
Faranda
, “
Sampling hyperspheres via extreme value theory: Implications for measuring attractor dimensions
,”
J. Stat. Phys.
179
,
1698
1717
(
2020
).
65
D.
Faranda
,
G.
Messori
, and
P.
Yiou
, “
Dynamical proxies of North Atlantic predictability and extremes
,”
Sci. Rep.
7
,
41278
(
2017
).
66
S.
Buschow
and
P.
Friederichs
, “
Local dimension and recurrent circulation patterns in long-term climate simulations
,”
Chaos
28
,
083124
(
2018
).
67
D.
Faranda
,
G.
Messori
, and
S.
Vannitsem
, “
Attractor dimension of time-averaged climate observables: Insights from a low-order ocean-atmosphere model
,”
Tellus, Ser. A: Dyn. Meteorol. Oceanogr.
71
,
1
11
(
2019
).
68
M.
Brunetti
,
J.
Kasparian
, and
C.
Vérard
, “
Co-existing climate attractors in a coupled aquaplanet
,”
Clim. Dyn.
53
,
6293
6308
(
2019
).
69
D.
Rodrigues
,
M. C.
Alvarez-Castro
,
G.
Messori
,
P.
Yiou
,
Y.
Robin
, and
D.
Faranda
, “
Dynamical properties of the North Atlantic atmospheric circulation in the past 150 years in CMIP5 models and the 20CRV2C reanalysis
,”
J. Clim.
31
,
6097
6111
(
2018
).
70
A.
Gualandi
,
J. P.
Avouac
,
S.
Michel
, and
D.
Faranda
, “
The predictable chaos of slow earthquakes
,”
Sci. Adv.
6
,
1
11
(
2020
).
71
G.
Messori
and
D.
Faranda
, “
Technical note: Characterising and comparing different palaeoclimates with dynamical systems theory
,”
Clim. Past
17
,
545
563
(
2021
).
72
K.
Giamalaki
,
C.
Beaulieu
,
S. A.
Henson
,
A. P.
Martin
,
H.
Kassem
, and
D.
Faranda
, “
Future intensification of extreme Aleutian low events and their climate impacts
,”
Sci. Rep.
11
,
1
12
(
2021
).
73
A.
Hochman
,
G.
Messori
,
J. F.
Quinting
,
J. G.
Pinto
, and
C. M.
Grams
, “
Do Atlantic-European weather regimes physically exist?
,”
Geophys. Res. Lett.
48
,
e2021GL095574
, https://doi.org/10.1029/2021GL095574 (
2021
).
74
F.
Falasca
and
A.
Bracco
, “
Exploring the tropical pacific manifold in models and observations
,”
Phys. Rev. X
12
,
021054
(
2022
).
75
F.
Pons
,
G.
Messori
, and
D.
Faranda
, “
Statistical performance of local attractor dimension estimators in non-axiom a dynamical systems
,”
Chaos
33
,
073143
(
2023
).
76
M. R.
Leadbetter
,
G.
Lindgren
, and
H.
Rootzen
, “Extremes and related properties of random sequences and processes,” in Springer Series in Statistics, 1983rd ed. (Springer, New York, NY, 2012).
77
F.
Ledrappier
, “
Some relations between dimension and Lyapunov exponents
,”
Commun. Math. Phys.
81
,
229
238
(
1981
).
78
I.
Shimada
and
T.
Nagashima
, “
A numerical approach to ergodic problem of dissipative dynamical systems
,”
Prog. Theor. Phys.
61
,
1605
1616
(
1979
).
79
G.
Benettin
,
L.
Galgani
,
A.
Giorgilli
, and
J.-M.
Strelcyn
, “
Lyapunov characteristic exponents for smooth dynamical systems and for Hamiltonian systems; A method for computing all of them, Part 1: Theory
,”
Meccanica
15
,
9
20
(
1980
).
80
U.
Parlitz
, “Estimating Lyapunov exponents from time series,” in Chaos Detection and Predictability, Lecture Notes in Physics Vol. 915, edited by C. H. Skokos, G. A. Gottwald, and J. Laskar (Springer, Berlin, 2016), pp. 1–34.
81
J.
Pathak
,
Z.
Lu
,
B. R.
Hunt
,
M.
Girvan
, and
E.
Ott
, “
Using machine learning to replicate chaotic attractors and calculate Lyapunov exponents from data
,”
Chaos
27
,
121102
(
2017
).
82
J. P.
Eckmann
and
D.
Ruelle
, “
Fundamental limitations for estimating dimensions and Lyapunov exponents in dynamical systems
,”
Physica D
56
,
185
187
(
1992
).
83
M. A. H.
Nerenberg
and
C.
Essex
, “
Correlation dimension and systematic geometric effects
,”
Phys. Rev. A
42
,
7065
7074
(
1990
).
84
P.
Grassberger
, “
Finite sample corrections to entropy and dimension estimates
,”
Phys. Lett. A
128
,
369
373
(
1988
).
85
E.
Rosenberg
, “Generalized dimensions and multifractals,” in Fractal Dimensions of Networks (Springer International Publishing, Cham, 2020), pp. 325–364.
86
N. H.
Packard
,
J. P.
Crutchfield
,
J. D.
Farmer
, and
R. S.
Shaw
, “
Geometry from a time series
,”
Phys. Rev. Lett.
45
,
712
716
(
1980
).
87
F.
Takens
, “Detecting strange attractors in turbulence,” in Dynamical Systems and Turbulence, Warwick 1980, edited by D. Rand and L.-S. Young (Springer, Berlin, 1981), pp. 366–381.
88
T.
Sauer
,
J.
Yorke
, and
M.
Casdagli
, “
Embedology
,”
J. Stat. Phys.
65
,
579
616
(
1991
).
89
V.
Deshmukh
,
E.
Bradley
,
J.
Garland
, and
J. D.
Meiss
, “
Using curvature to select the time lag for delay reconstruction
,”
Chaos
30
,
063143
(
2020
).
90
M.
Shinriki
,
M.
Yamamoto
, and
S.
Mori
, “
Multimode oscillations in a modified van der Pol oscillator containing a positive nonlinear conductance
,”
Proc. IEEE
69
,
394
395
(
1981
).
91
V. P.
Vera-Ávila
,
R.
Sevilla-Escoboza
,
A. A.
Lozano-Sánchez
,
R. R.
Rivera-Durón
, and
J. M.
Buldú
, “
Experimental datasets of networks of nonlinear oscillators: Structure and dynamics during the path to synchronization
,”
Data Br.
28
,
105012
(
2019
).
92
A.
Asseman
,
T.
Kornuta
, and
A.
Ozcan
, “Learning beyond simulated physics,” in Modeling and Decision-making in the Spatiotemporal Domain Workshop, 2018.
93
V.
Deshmukh
,
E.
Bradley
,
J.
Garland
, and
J. D.
Meiss
, “
Toward automated extraction and characterization of scaling regions in dynamical systems
,”
Chaos
31
,
123102
(
2021
).
94
V.
Deshmukh
,
R.
Meikle
,
E.
Bradley
,
J. D.
Meiss
, and
J.
Garland
, “
Using scaling-region distributions to select embedding parameters
,”
Physica D
446
,
133674
(
2023
).
95
While one could argue that estimating the slope of, e.g., the entropy vs a size requires a choice of sizes, we counter-argue that the choice of sizes comes objectively and naturally from the dataset itself, as we illustrate in Sec. III I.
96
T. W.
Anderson
and
D. A.
Darling
, “
Asymptotic theory of certain “Goodness of fit” criteria based on stochastic processes
,”
Ann. Math. Stat.
23
,
193
212
(
1952
).
97
N.
Smirnov
, “
Table for estimating the goodness of fit of empirical distributions
,”
Ann. Math. Stat.
19
,
279
281
(
1948
).
98
J.
Isensee
,
G.
Datseris
, and
U.
Parlitz
, “
Predicting spatio-temporal time series using dimension reduced local states
,”
J. Nonlinear Sci.
30
,
713
735
(
2019
).
99
G.
Datseris
,
J.
Blanco
,
O.
Hadas
,
S.
Bony
,
R.
Caballero
,
Y.
Kaspi
, and
B.
Stevens
, “
Minimal recipes for global cloudiness
,”
Geophys. Res. Lett.
49
,
e2022GL099678
, https://doi.org/10.1029/2022GL099678 (
2022
).
100
We have repeated the numerical experiments with the approximate one sample Kolmogorov–Smirnov and the Cramer Von Mises tests and got practically identical results. We also tested obtaining p-values from data directly sampled from a constructed EXPD and found the p-values uniformly distributed as is theoretically expected.
101
J.
Theiler
,
S.
Eubank
,
A.
Longtin
,
B.
Galdrikian
, and
J.
Doyne Farmer
, “
Testing for nonlinearity in time series: The method of surrogate data
,”
Physica D
58
,
77
94
(
1992
).
102
G.
Lancaster
,
D.
Iatsenko
,
A.
Pidde
,
V.
Ticcinelli
, and
A.
Stefanovska
, “
Surrogate data for hypothesis testing of physical systems
,”
Phys. Rep.
748
,
1
60
(
2018
).
103
D.
Diego
,
K. A.
Haaga
, and
B.
Hannisdal
, “
Transfer entropy computation using the Perron-Frobenius operator
,”
Phys. Rev. E
99
,
042212
(
2019
).
104
K. A.
Haaga
and
G.
Datseris
(
2023
). “ ,”
Zenodo
.
105
P.
Grassberger
, “
An optimized box-assisted algorithm for fractal dimensions
,”
Phys. Lett. A
148
,
63
68
(
1990
).
106
P.
Grassberger
and
I.
Procaccia
, “
Measuring the strangeness of strange attractors
,”
Physica D
9
,
189
208
(
1983
).
107
Y.
Termonia
and
Z.
Alexandrowicz
, “
Fractal dimension of strange attractors from radius versus size of arbitrary clusters
,”
Phys. Rev. Lett.
51
,
1265
1268
(
1983
).
108
P.
Asvestas
,
G.
Matsopoulos
, and
K.
Nikita
, “
Estimation of fractal dimension of images using a fixed mass approach
,”
Pattern Recognit. Lett.
20
,
347
354
(
1999
).
109
P.
Grassberger
, “
Generalizations of the Hausdorff dimension of fractal measures
,”
Phys. Lett. A
107
,
101
105
(
1985
).
110
K.
Carlsson
(2022). “ ,”
Zenodo
.
111
A.
Mallet
, “
A maximum likelihood estimation method for random coefficient regression models
,”
Biometrika
73
,
645
656
(
1986
).
112
P. W.
Zehna
, “
Invariance of maximum likelihood estimators
,”
Ann. Math. Stat.
37
,
744
(
1966
).
113
The optimization is split into two parts, where first Δ 2 ( J ) is optimized for a fixed a, and afterward, the coefficients of a are optimized. This process is repeated until a convergence of the entire parameter vector, ( Δ 2 ( J ) , a ) T is observed.
114
For the purposes of this paper, we assume that all attractors are ergodic.
115
R.
van de Weygaert
,
B. J.
Jones
, and
V. J.
Martínez
, “
The minimal spanning tree as an estimator for generalized dimensions
,”
Phys. Lett. A
169
,
145
150
(
1992
).
116
V. J.
Martínez
,
R.
Domínguez-Tenreiro
, and
L. J.
Roy
, “
Hausdorff dimension from the minimal spanning tree
,”
Phys. Rev. E
47
,
735
738
(
1993
).
117
G.
Datseris
, “Fractal dimension code base,” see https://github.com/datseris/fractaldimension (last accessed September 13, 2021).
118
J.
Bezanson
,
A.
Edelman
,
S.
Karpinski
, and
V. B.
Shah
, “
Julia: A fresh approach to numerical computing
,”
SIAM Rev.
59
,
65
98
(
2017
).
119
C.
Rackauckas
and
Q.
Nie
, “
Differentialequations.jl—A performant and feature-rich ecosystem for solving differential equations in Julia
,”
J. Open Res. Softw.
5
,
15
(
2017
), exported from https://app.dimensions.ai on 2019/05/05.
120
J.
Chen
and
J.
Revels
, “Robust benchmarking in noisy environments,” arXiv:1608.04295 [cs.PF] (2016).
121
G.
Datseris
,
J.
Isensee
,
S.
Pech
, and
T.
Gál
, “
Drwatson: The perfect sidekick for your scientific inquiries
,”
J. Open Source Softw.
5
,
2673
(
2020
).
122
S.
Danisch
and
J.
Krumbiegel
, “
Makie.jl: Flexible high-performance data visualization for Julia
,”
J. Open Source Softw.
6
,
3349
(
2021
).
123
G.
Datseris
(2023). “ ,”
Zenodo
.
124
G.
Datseris
(2023). “ ,”
Zenodo
.
125
R.
Badii
and
A.
Politi
, “
Intrinsic oscillations in measuring the fractal dimension
,”
Phys. Lett. A
104
,
303
305
(
1984
).
126
L. A.
Smith
, “
Lacunarity and intermittency in fluid turbulence
,”
Phys. Lett.
114
,
465
468
(
1986
).
127
A.
Arneodo
,
G.
Grasseau
, and
E. J.
Kostelich
, “
Fractal dimensions and j(x) spectrum of the Henon attractor
,”
Phys. Lett. A
124
(
8
),
426
432
(
1987
).
128
L. A.
Smith
, “
Intrinsic limits on dimension calculations
,”
Phys. Lett. A
133
,
283
288
(
1988
).
129
J.
Theiler
, “
Lacunarity in a best estimator of fractal dimension
,”
Phys. Lett. A
133
,
195
200
(
1988
).
130
Y.
Gefen
,
Y.
Meir
,
B. B.
Mandelbrot
, and
A.
Aharony
, “
Geometric implementation of hypercubic lattices with noninteger dimensionality by use of low lacunarity fractal lattices
,”
Phys. Rev. Lett.
50
,
145
148
(
1983
).
131
C.
Allain
and
M.
Cloitre
, “
Characterizing the lacunarity of random and deterministic fractal sets
,”
Phys. Rev. A
44
,
3552
3558
(
1991
).
132
N.
Altman
and
M.
Krzywinski
, “
Interpreting p values
,”
Nat. Methods
14
,
213
214
(
2017
).