The fractal dimension is a central quantity in nonlinear dynamics and can be estimated via several different numerical techniques. In this review paper, we present a self-contained and comprehensive introduction to the fractal dimension. We collect and present various numerical estimators and focus on the three most promising ones: generalized entropy, correlation sum, and extreme value theory. We then perform an extensive quantitative evaluation of these estimators, comparing their performance and precision using different datasets and comparing the impact of features like length, noise, embedding dimension, and falsify-ability, among many others. Our analysis shows that for synthetic noiseless data, the correlation sum is the best estimator with extreme value theory following closely. For real experimental data, we found the correlation sum to be more strongly affected by noise vs the entropy and extreme value theory. The recent extreme value theory estimator seems powerful as it has some of the advantages of both alternative methods. However, using four different ways for checking for significance, we found that the method yielded “significant” low-dimensional results for inappropriate data like stock market timeseries. This fact, combined with some ambiguities we found in the literature of the method applications, has implications for both previous and future real-world applications using the extreme value theory approach, as, for example, the argument for small effective dimensionality in the data cannot come from the method itself. All algorithms discussed are implemented as performant and easy to use open source code via the DynamicalSystems.jl library.

When chaotic dynamical systems evolve in time, they typically create sets in the state space that have fractal properties. One of the major ways to characterize these chaotic sets is through a computationally feasible version of a *fractal dimension* (FD). In the field of nonlinear dynamics, the correlation sum and the generalized (Rényi) entropy are the two most commonly used approaches. One attempts to find a scaling exponent of these quantities vs a size parameter, and this exponent approximates the fractal dimension. A third method based on extreme value theory is a promising alternative, but it has been developed only recently and, hence, has not undergone the same amount of scrutiny as the previous two methods. Here, we provide a comprehensive, up to date, and self-contained analysis of available methods, comparing across every conceivable scenario. We also provide open source implementations to compute each method.

## I. INTRODUCTION

Fractal geometry deals with geometric objects (or sets) called *fractals*,^{1,2} which are “irregular” in terms of traditional Euclidean geometry. Their most striking property arguably is that they possess structure at all scales, which typically remains invariant in one form or another, no matter how much one zooms into the set. Because of this, the traditional topological dimension is not fit to describe these sets adequately (Chap. 5 of Ref. 3). The concept of a *fractal dimension*, a generally non-integer number, is, therefore, employed to characterize such objects.^{2,4} The fractal dimension, which will be shortened to FD in the rest of the article, can be used to quantify the complexity of the geometry, its scaling properties and self-similarity, and the effective ratio of surface areas to volumes.^{2} It has been applied in a vast array of different scenarios, from the archetypal measurement of coastlines^{5,6} to being suggested as a tool for validating abstract art, e.g., that of Pollock.^{7,8}

The evolution of chaotic dynamical systems results in sets in the state space that are typically fractal^{4,9} and, thus, can be characterized by a FD,^{10–16} done to our knowledge for the first time by Russell *et al*.^{17} For dissipative systems, these sets are called *strange* or *chaotic attractors*^{17,18} (boundaries of basins of attraction and chaotic saddles can also be fractal^{19,20}). For conservative systems, they often have special properties and are (typically) called *fat fractals*.^{21} For fractal sets resulting from evolving dynamical systems, an estimate of FD provides the additional crucial information of the effective degrees of freedom of the time evolution. This means that calculating the FD for an experimentally obtained dataset can be used to guide the modeling process,^{3,22,23} as the ceiling of the FD is the minimum number of independent variables that can model the system. Alternatively, if one has a model that confidently approximates the real system, due to physical arguments, then the FD of the model output can be used to tune model parameters: since FD is a dynamic invariant,^{3,22} one can compare FDs of the model simulations to the FD obtained by observed data and change parameters until those two values match.

These reasons have motivated many researchers to compute the FD of many real-world systems by delay embedding measured timeseries.^{22} Examples include global climate,^{24–26} physiology,^{27} and lasers,^{28} but many more exist. Unfortunately, some of these studies have been challenged, because calculating the fractal dimension in practice is a difficult, error prone process, with limited means of providing confidence to the obtained result. This is, for example, highlighted well in the controversy of estimating FDs of “climate attractors” (see, e.g., Ref. 26, and references therein), where FD estimates were incorrectly used to claim very low FD of $\u22483$ for the whole climate system. Researchers, thus, often need to interpret results partly subjectively, due to the lack of objective measures.

In this paper, we compare as many computationally feasible estimators of FDs as possible, in as objective manner as possible, and across as many scenarios as possible. With this, we provide an objective baseline with which researchers can check against their results, reducing the amount of interpretation and subjectivity. It is important to stress the separation between computing the FD of “the deterministic dynamics,” i.e., the dynamic invariant characterizing the flow in the state space, and the FD of “the graph of a timeseries $f(t)$ vs $t$.” The latter is relevant for stochastic perspectives, is often related with the Hurst exponent,^{29,30} and is typically used as a quantifier of timeseries more suitable for classification tasks vs typical statistics-based quantifiers (see, e.g., Refs. 31 and 32). These two FD versions are very different and unrelated in the general sense. Our paper focuses exclusively on the first version.

Hence, our main operating assumption is that we have a *multivariate* (sometimes also called *multidimensional*) dataset, obtained from a dynamical system from which we do not know the dynamic rule (equations of motion). If one has a timeseries, it first must be reconstructed into a state space set via delay embedding or other means.^{3,22,23} Our operating assumption is on the one hand motivated by the increase of interest in observed or measured multivariate data, and higher accessibility of sensors and experimental data in preparation for a nonlinear-dynamics-based analysis.^{33} On the other hand, there is also noticeable recent progress regarding better attractor reconstruction techniques based on delay embeddings, which can also utilize multivariate measurements, yielding a higher quality reconstruction overall (see, e.g., Refs. 34 and 35 by Kraemer *et al.,* and references therein). We point out that our goal here is estimating FDs of chaotic sets *knowing that these exist*, in order to separate the problem of the FD estimation from the scientific questions surrounding the interpretation of the data. The latter requires following best practices, e.g., various data pre- and post- processing, de-noising, or surrogate tests comparing the FD value of the real data with those of surrogates. These steps are entirely skipped here, and the reader can find more information in standard timeseries analysis textbooks or review articles, such as Refs. 23 and 28.

Since the first review on the topic of FD by Theiler in 1990,^{16} computers and software have improved and several new algorithms to estimate fractal dimensions have been proposed. It is, thus, timely to revisit the subject, and, here, we will provide a comparison and evaluation of FD estimators across a range of topics much larger than what has been done so far. Beyond comparing and evaluating various fractal dimension estimators, we provide optimized, easy-to-use, extensively tested, open source implementations for all algorithms discussed in this review in the DynamicalSystems.jl software library.^{36}

This review and comparison paper is structured as follows (and see also Fig. 1 for a summary of the paper). In Sec. II, we provide a self-contained concise definition of the major methods used to compute FDs and how they relate with the natural density and with each other. Connected with this section is Appendix A, which presents all (computationally feasible) algorithms we have found for estimating a fractal dimension. The core of the paper are Secs. III and IV, which compare in detail the three best and most popular estimators of FDs: scaling of entropy, scaling of correlation sum, and an extreme value theory approach. We compare across data dimensionality, data length, different kinds of dynamical systems, noise levels, real-world data, various embeddings of timeseries, and the order $q$ of the fractal dimension, among others. We close the paper with a summary of our findings in Sec. VI. Every result, plot, and method that we present in this paper is fully reproducible via open source code and adjustable to other input datasets (see Appendix B for details).

## II. STATE OF THE ART

An easy to understand introduction and review regarding the fractal dimension was given by Theiler in 1990.^{16} The theoretical background of the fractal dimension and the methods (known until 1990) to compute it are summarized, and a plethora of historic references is given there. A more recent publication on fractal dimensions in a style of a review is given by Lopes and Betrouni in 2009.^{37} However, it is focused on image analysis and pattern recognition and does not do any quantitative comparison. The most recent detailed source that provides quantitative information on the limitations and pitfalls of fractal dimension estimates is the well known textbook by Kantz and Schreiber.^{22}

A number of ways to estimate a FD, which are applicable to dynamical systems, have been devised in the literature. For this comparison, we found, implemented, and compared the methods that we briefly summarize in Table I. From these, our main analysis focuses on the three most prominent ones. In the rest of this section, we will provide a concise, yet self-contained, summary of the concept of a fractal dimension in dynamical systems. Then, we introduce the three main estimators that we use in the extensive comparisons done in Secs. III and IV and illustrate how the different estimators connect to the natural density and with each other. In the following, we assume that we have a set $X$ that contains $N$ $D$-dimensional points representing a (possibly observed) multivariate timeseries of a dynamical system. We will use the letter $\Delta $ to denote various versions of a fractal dimension, and we will use a superscript in parenthesis, such as $ \Delta ( S )$ to denote the particular *estimator* used to estimate $\Delta $.

Estimator . | Brief description . | Main Refs. . |
---|---|---|

Natural measure entropy | Scaling of generalized entropy H_{q} of amplitude binning, vs box size | 17 |

Molteno’s histogram optimization | Optimized algorithm for amplitude binning with restricted size $\epsilon $ | 38 |

Correlation sum | Scaling of correlation sum C_{q} vs radius $\epsilon $ | 10 and 39 |

Box/prism-assisted correlation sum | Optimized algorithm to calculate the correlation sum | 40 |

Performance-optimized box size | Optimized for performance box-assisted algorithm | 41 |

Logarithmic correction | Better converging fit instead of the standard least square fit of log (C_{2}) vs $log\u2061(\epsilon )$ | 42 |

Takens’ estimator | Maximum likelihood estimation of scaling exponent, $ C 2(\epsilon )\u221d \epsilon \Delta $ for $\epsilon \u2208(0, \epsilon max]$ | 13 and 43 |

Judd’s estimator | Binned MLE with additional degrees of freedom from polynomial | 44 and 45 |

Mean return times | Logarithmic scaling of mean return time to an $\epsilon $-sphere, vs $\epsilon $ | 46 and 16 |

Lyapunov dimension | Kaplan and Yorke’s linear interpolation for sum of Lyapunov exponents = 0 | 47 |

Lyapunov dimension via fits | Higher-order interpolations and other fits for sum of Lyapunov exponents = 0 | 48 |

Extreme value theory based | Rare events follow a Pareto distribution whose parameter is the fractal dim. | 49 |

Persistent homology | Quantify how the topology of shape changes as it is thickened | 50 |

Estimator . | Brief description . | Main Refs. . |
---|---|---|

Natural measure entropy | Scaling of generalized entropy H_{q} of amplitude binning, vs box size | 17 |

Molteno’s histogram optimization | Optimized algorithm for amplitude binning with restricted size $\epsilon $ | 38 |

Correlation sum | Scaling of correlation sum C_{q} vs radius $\epsilon $ | 10 and 39 |

Box/prism-assisted correlation sum | Optimized algorithm to calculate the correlation sum | 40 |

Performance-optimized box size | Optimized for performance box-assisted algorithm | 41 |

Logarithmic correction | Better converging fit instead of the standard least square fit of log (C_{2}) vs $log\u2061(\epsilon )$ | 42 |

Takens’ estimator | Maximum likelihood estimation of scaling exponent, $ C 2(\epsilon )\u221d \epsilon \Delta $ for $\epsilon \u2208(0, \epsilon max]$ | 13 and 43 |

Judd’s estimator | Binned MLE with additional degrees of freedom from polynomial | 44 and 45 |

Mean return times | Logarithmic scaling of mean return time to an $\epsilon $-sphere, vs $\epsilon $ | 46 and 16 |

Lyapunov dimension | Kaplan and Yorke’s linear interpolation for sum of Lyapunov exponents = 0 | 47 |

Lyapunov dimension via fits | Higher-order interpolations and other fits for sum of Lyapunov exponents = 0 | 48 |

Extreme value theory based | Rare events follow a Pareto distribution whose parameter is the fractal dim. | 49 |

Persistent homology | Quantify how the topology of shape changes as it is thickened | 50 |

### A. What does a fractal dimension really quantify?

In the introduction, we discussed how the FD is useful for practical matters and, hence, worthy of being estimated. However, before discussing any estimators for a FD, it is useful to conceptualize what the FD truly characterizes, as this is the origin of all practical estimations.

For this discussion, we ignore the presence of noise existing in real data, and the fact that observed timeseries need to be delay embedded to yield higher dimensional data. The starting assumption, therefore, is that the set $X={ x 1,\u2026, x N}$ we have at hand is a faithful sampling of some sort of $D$-dimensional invariant set (typically a chaotic attractor) of a dynamical system, i.e., $X\u2282A$. $A$ is itself characterized by its *natural, or invariant, measure* $\mu ( x)$, which defines a probability space on top of $A$ by requiring $\mu (A)=1$. Equivalently, we may use the natural density $\rho ( x)d x=d\mu $, which in practical terms is the $D$-dimensional histogram of $X$. To learn more on the natural measure, one should consult various textbooks on nonlinear dynamics^{3,18,21} or the review by Theiler.^{16} In essence, while the samples $ x i\u2208X$ form a time sequence of points with deterministic origin, they can be also be thought of as points sampled “at random” from $A$ according to the measure $\mu $. Note that throughout this FD description, we make the fundamental assumption that $A$ (with measure $\mu $) is ergodic.

^{51}centered at $ x$ with radius $\epsilon $. For sufficiently small $\epsilon $, and for almost all $ x i\u2208A$, it is assumed that the scaling of $\mu $ with $\epsilon $ is exponential, so that

*local dimension*at $ x i$, and by construction $ \Delta i\u2264D$. The assumption in Eq. (1) is a reflection of our intuitive notion of dimension, which describes that the “bulk” or the “amount” of a set scales with a linear size (scale) exponentiated to the dimension, e.g., the amount of a cube scales with its side length to the third power. Here, we measure “amount” of a subset of $A$ by its natural measure, i.e., its relative probability mass. From $ \Delta i$ a fractal dimension characterizing $A$ as a whole can be obtained as the mean,

This intuition-based definition of a local and attractor dimension is further motivated by Theiler in his review article on the basis of the Hausdorff dimension.^{16} Equation (1), however, is noncomputable because $\mu $ is unknown, since in practice, we have only partial knowledge of $\mu $ due to the finite observations of $X\u2282A$. In addition, for the overwhelming majority of dynamical systems, $\mu $ does not have an analytic expression anyway, irrespectively of finite data. An estimator of a FD, therefore, attempts to measure either the local scaling of Eq. (1) or the global scaling by averaging in Eq. (2).

Before going into specific estimators, we need to stress that none of the estimators we considered in this review yield the Hausdorff dimension.^{2,52} Often in the literature, researchers use the term “Hausdorff dimension” to refer to the output of the estimators, but no one has provided any formal proof of the equivalence between the estimates and the Hausdorff dimension (which itself has a very precise and rigorous mathematical definition). This is especially so for the box-counting dimension, where it is easy to prove that it cannot be the same as the Hausdorff dimension.^{53}

### B. Entropy of natural density

^{54}

To connect $ H q$ with $\mu $, we re-write $ \u2211 i p i q= \u2211 i p i p i ( q \u2212 1 )$. This is a weighted average, equaling to $\u27e8 p ( q \u2212 1 )\u27e9$, since $\u2211 p i=1$. We note that $\u27e8p\u27e9$ is our notion of “amount,” as we measure the amount by the probability mass. Regularizing the expression by its exponent, the quantity $\u27e8 p ( q \u2212 1 ) \u27e9 1 / ( q \u2212 1 )\u2261exp\u2061(\u2212 H q)$ is the “average bulk” or “average amount” in a hypercube of linear size (i.e., scale) $\epsilon $. The number $q$ settles the way we average: for $q=2$, we have the arithmetic (typical) average; for $q=3$, a root mean square; and for $q\u21921$, a geometric average.

*generalized dimension of order $q$*,

^{55,56}is defined as

*et al*.

^{17}$ \Delta 0 ( H )$ is called the

*box counting or capacity dimension*, while $ \Delta 1 ( H )$ is called the

*information dimension*. The box-counting dimension can also be thought as the fractal dimension of the

*support*of the attractor, as it disregards the values of $\mu $.

In Eq. (4), both limits are theoretical and cannot be realized in practice. As a result, $ \Delta q ( H )$ is estimated by plotting $\u2212 H q$ vs $log\u2061(\epsilon )$ and estimating the slope of a linear scaling region (for sufficiently large $N$, more on this in Sec. III I).

If the fractal dimension depends on $q$, the set is called multi-fractal. This is the case when the natural measure underlying the attractor is strongly non-uniform. Large positive values of $q$ then put more emphasis on regions of the attractor with a higher natural measure. In dynamical systems theory, almost all chaotic sets are multi-fractal.^{14} The dependence of $ \Delta q ( H )$ on $q$ is connected with another concept, the so-called singularity spectrum, or multifractality spectrum $f(\alpha )$; however, we will not be calculating $f(\alpha )$ here and refer to other sources for more (see, e.g., Chap. 9 of Ref. 18 by Ott).

### C. Correlation sum

^{10}The correlation sum is an alternative way of estimating the average amount at a given scale, based on the points nearby a given point. Specifically, we associate an amount $S$ at point $ x i\u2208X$ and at scale $\epsilon $ as

*pointwise (local) dimension*.

^{57}The average “amount” $S$ over $X$ is called the

*correlation sum*, given by $C(\epsilon )= \u2211 iS( x i,\epsilon )/N$.

^{11,22,39}

^{58}(known as Theiler window), which excludes as neighbors points that are temporally close. This removes spurious correlations due to dense sampling of continuous dynamical systems. We choose $w$ as follows: for each timeseries present in the multi-dimensional input dataset $X$, we calculate the first minimum of its self-mutual information.

^{3}The maximum of the time shifts corresponding to these minima is chosen as $w$.

Originally, the version explicitly having $q=2$ was used to define the correlation dimension,^{10} and the process of defining a FD was, in fact, the same as in $ H q$ for any $q$. A linear scaling region is estimated from the curve of $log\u2061( C q)$ vs $log\u2061(\epsilon )$. Then, the correlation-sum-based FD $ \Delta q ( C )$ is the slope of that linear region.

We may leverage two potential improvements here. First, to calculate $ C q(\epsilon )$, we used a box-assisted method.^{40,41} We modify this method as discussed in Sec. 4 of Appendix A, because otherwise it fails for data with even a small amount of noise (see discussions in Sec. III F and Sec. 4 of Appendix A). Second, in addition to the standard least squares fit $log\u2061 ( C q )\u223ca+ \Delta q ( C )log\u2061(\epsilon )$, we also used the correction by Sprott and Rowlands^{42} when possible (i.e., when at least half the range has $log\u2061(\epsilon )<0$). Reference 42 optimizes the fit $log\u2061( C 2)\u223ca+ \Delta 2 ( C )log\u2061(\epsilon )+blog\u2061(\u2212log\u2061(\epsilon ))$ $ ( a , b \Delta 2 ( C ) )$ are parameters to be optimized in parallel with $ \Delta 2 ( C ) )$. It is intended to give better fits for sets that have a slowly converging fractal dimension estimate, but as we will show later, it is best to not use it in practice.

In the rest of the manuscript, we will use $ H q$ or $ C q$ to refer to the methods of estimating FD via the (generalized) entropy or (generalized) correlation sum. We will explicitly use a subscript $ H 2, C 2$ when we make statements that apply only to this particular order $q=2$.

### D. Extreme value theory

The third major way of defining and estimating a FD from a set $X$ is based on extreme value theory (EVT) applied to dynamical systems.^{49} The method estimates a local dimension $ \Delta i ( E )$, as in Eq. (1), and then provides the FD as the average. Interestingly, the method utilizes exactly the same information as the correlation sum: all inter-point distances. However, $ \Delta i ( E )$ is not estimated directly via an exponential scaling relationship in contrast to the generalized entropy and correlation sum methods.

To the best of our knowledge, the method has been developed using progress across several papers^{59–64} (see also Chaps. 4 and 9 of Ref. 49). Even though relatively recent, this method has been applied already to a plethora of real-world cases (see, e.g., Refs. ^{65–74}) and many more (see Ref. 75 for a summary of recent applications). Despite this plethora of applications, we have noticed that some applications have ambiguities with the basic theory that connects EVT with FD, and we discuss these issues in Sec. 10 of Appendix A.

*exceedances*of $ g i$, defined as

^{62,76}in the limit $N\u2192\u221e,p\u21921$, and for the particularly chosen form of the function $g$, the values $ E i$ follow a generalized Pareto distribution (GPD) with parameters $\sigma ,\xi $ and shift parameter 0 (see Sec. 10 of Appendix A for more on GPD). However, if the measure $\mu $ and attractor $A$ satisfy the criterion of Eq. (1), then the GPD is reduced to an exponential distribution (EXPD) with parameter $\sigma $, $E\u223cexp\u2061(E/\sigma )$. Within this extreme value theory approach, the local dimension $ \Delta i ( E )$ assigned to state space point $ x i$ is given by the inverse of the $ \sigma i$ parameter of the EXPD fit to the exceedances, i.e.,

Let us now discuss how this theory connects to Eq. (1) and, hence, relates $ \Delta i$ to the $ \sigma i$ parameter. Due to the invertibility of the $g$ function, the exceedances $ E i$ correspond to when the orbit of the dynamical system comes closer than $ \epsilon \u2217$ to the reference point $ x 0$, with $ \epsilon \u2217=exp\u2061(\u2212 g p)$. This clarifies why we focus on extremes of $ g i$: we are interested in what happens around a small radius $\epsilon $ around a reference point $ x 0$, in order to connect with the fundamental definition of the local dimension of Eq. (1).

*given*that an exceedance has occurred (i.e., a state $ x j$ is at least $ \epsilon \u2217$-close to $ x$), there is an exceedance of $E$. This probability is by construction given by 1 minus the cumulative distribution function of the fitted EXPD, i.e., $ \pi i(E)=exp\u2061(\u2212E/ \sigma i)$. However, the same probability can be constructed in terms of the natural measure $\mu $ to be

*slowly varying function of $\epsilon $*as $\epsilon \u21920$, which may or may not depend on reference point $ x i$. With this assumption, both numerator and denominator of Eq. (11) scale exponentially with $ \Delta i$ and the remaining factors cancel out even though $\epsilon \u2260 \epsilon \u2217$. By taking into account also the expressions of Eq. (12), and that $ \pi i(E)=exp\u2061(\u2212E/ \sigma i)$, we put everything together and have

### E. Lyapunov (Kaplan–Yorke) dimension

^{47}is based on the Lyapunov exponents characterizing an attractor. Let ${ \lambda i}$ denote the Lyapunov spectrum, with $ \lambda 1\u2265 \lambda 2\u2265\cdots \u2265 \lambda D$. Then, the Lyapunov dimension is defined as

System . | D
. | Δ^{(L)}
. |
---|---|---|

Lorenz96 | 4 | 2.99 |

Lorenz96 | 6 | 4.93 |

Lorenz96 | 8 | 6.91 |

Lorenz96 | 10 | 8.59 |

Lorenz96 | 12 | 10.35 |

Lorenz96 | 14 | 12.10 |

Lorenz96 | 32 | 27.68 |

Rössler (chaotic) | 3 | 1.9 |

Hénon map | 2 | 1.26 |

Kaplan–Yorke map | 2 | 1.43 |

Towel map | 3 | 2.24 |

Coupled logistic maps | 8 | 8 |

Kuramoto–Sivashinsky | 101 | 31.76 |

System . | D
. | Δ^{(L)}
. |
---|---|---|

Lorenz96 | 4 | 2.99 |

Lorenz96 | 6 | 4.93 |

Lorenz96 | 8 | 6.91 |

Lorenz96 | 10 | 8.59 |

Lorenz96 | 12 | 10.35 |

Lorenz96 | 14 | 12.10 |

Lorenz96 | 32 | 27.68 |

Rössler (chaotic) | 3 | 1.9 |

Hénon map | 2 | 1.26 |

Kaplan–Yorke map | 2 | 1.43 |

Towel map | 3 | 2.24 |

Coupled logistic maps | 8 | 8 |

Kuramoto–Sivashinsky | 101 | 31.76 |

$ \Delta ( L )$ has a huge advantage when compared to the previous definitions of fractal dimensions: it can be computed with very high precision, even for high-dimensional systems (where the other methods typically suffer from accuracy, as we will show below). However, it also has a huge disadvantage: practically, it can be computed only if the dynamic rule (equations of motion) is known. Only using the dynamical rule and its linearization one can estimate the entire Lyapunov spectrum with satisfactory precision, for example, by means of the known algorithm due Shimada and Nagashima^{78} and Benettin *et al.*^{79} From a finite, and often noisy real-world dataset, calculating the entire spectrum of exponents is a very challenging task that requires for higher-dimensional attractors very large data sets.^{80} Therefore, one is in most cases better off calculating a fractal dimension directly from data, or instead fit an explicit model to the data, e.g., Ref. 81.

## III. CORRELATION SUM VS ENTROPY

In this section, we perform a quantitatively rigorous and exhaustive comparison of the methods based on entropy $ H q$ and correlation sum $ C q$. Even though fundamentally different, both rely on estimating the scaling of some quantity vs some size $\epsilon $. For a given $q$, they both (in theory) approximate the same quantity, the exponential scaling of the $q$-average of “amount” of measure vs the scale, as we illustrated in Sec. II. To compute $ H q$, we use the method Sec. 1 of Appendix A, and for $ C q$ the method Sec. 4 of Appendix A for most cases, and the straightforward implementation Sec. 3 of Appendix A for very high-dimensional data. For $ C 2$, we also tested the logarithmic correction of Ref. 42. The motivation of choosing these methods, and why an exhaustive comparison of other methods is not presented, is explained in Appendix A. Before any numerical analysis, we normalize input data $X$ so that each of its columns is transformed to have 0 mean and standard deviation of 1. This linear transformation leaves dynamic invariants (like the FD) unaffected, however, makes all numerical methods more accurate and faster to converge.

In Secs. III A–III I, all results will be presented with the same plot type, as, e.g., shown in Fig. 2. The legend shows the different datasets used in the plot and a description of the plot’s purpose. The top panel is the entropy estimate while the bottom is the correlation sum. To estimate $ \Delta q ( H ), \Delta q ( C )$, for each curve, we identify automatically, and objectively, a linear scaling region as discussed in Sec. III I. This region is denoted by markers of the same color on each curve. The secondary legends inside the panels provide the 5%–95% confidence intervals for the estimated slopes of these segments. Unless otherwise stated, all datasets used have length $N= 10 5$, which is for many experiments a typical upper bound for the amount of data one has access to. Continuous time systems are sampled with approximately 10 points per characteristic oscillation period.

### A. Benchmark sets with known dimensions

The best place to start is a simple sanity and accuracy test of the two main estimators for sets whose fractal dimension can be computed analytically in a straightforward manner. This is shown in Fig. 2. We use a periodic orbit from the Rössler system ( $\Delta =1$), a quasiperiodic orbit of order 2 from the Hénon–Heiles system ( $\Delta =2$), the Koch snowflake ( $\Delta =log\u2061(4)/log\u2061(3)\u22481.262$), the Kaplan–Yorke map ( $\Delta =1\u2212log\u2061(2)/log\u2061(0.2)\u22481.4306$), a uniform filling of the 3D sphere ( $\Delta =3$) and a chaotic trajectory of the Standard Map (SM) for very high $k=64$, which covers uniformly the state space and, thus, has $\Delta =2$.

Some results that will be repeatedly seen throughout this section become clear. For *noiseless data*, the correlation sum method is clearly better for three reasons. First, its linear scaling region covers a wider range of scales, while $ H 2$ saturates much more quickly to flatness for small $\epsilon $. Notice that $log\u2061( C 2)$ cannot saturate for small $\epsilon $, but instead diverges to $\u2212\u221e$, but we can never reach this point due to the process that chooses the appropriate overall range of $\epsilon $ (Sec. III I). Both curves would in principle saturate to flatness for very large $\epsilon $, specifically exceeding the total size of $X$, but we again do not reach this threshold based on the choice of the range of $\epsilon $.

Second, within the linear scaling region, the curves of $log\u2061( C 2)$ fluctuate less than those of $ H 2$, resulting in a narrower confidence interval. Comparing confidence intervals is only meaningful if the same method is used to extract them. That is why specifically for Fig. 2, we used the standard least squares fit for $log\u2061( C 2)$ instead of the aforementioned correction of Ref. 42. Third, the actual numbers we obtain for $\Delta $ from the correlation sum method are closer to the analytically expected values than those of the entropy-based method. In particular, for sets that should have an integer fractal dimension, the correlation sum method is much closer to the actual result.

### B. Different dynamical systems

In the following, we cross-compare with the value obtained from the Lyapunov (Kaplan–Yorke) $ \Delta ( L )$ dimension, which for all systems of interest used in this paper is found in Table II. We note that $ \Delta ( L )$ is conjectured to equate to the $q=1$ version of dimensions; however, here we use $q=2$. That is, because the correlation sum algorithm does not apply to $q=1$, unless one uses a fixed mass approach, but we explain in Sec. 5 of Appendix A why we do not. We assume that the differences between the two estimators, when compared to $ \Delta ( L )$, should not depend much when changing $q=1$ to $q=2$; hence, the following results remain valid.

In Fig. 3, we compare three discrete and three continuous dynamical systems of different input dimensionality (systems are defined in detail in Appendix C). We confirm the result of Sec. II, i.e., the correlation sum method is more accurate than the entropy one because it is much closer to the values expected from the Lyapunov dimension (Kaplan–Yorke conjecture). Furthermore, the entropy method seems to underestimate the fractal dimension more strongly as the state space dimensionality increases. This is expected given the fact that the entropy method works via a histogram approximation of the natural density, and it is well known that the higher dimensional the data, the less accurate producing a histogram for them becomes (i.e., space is covered more sparsely by $N$ points as $D$ increases).

This figure also allows us to indeed confirm that the logarithmic correction to the correlation sum by Sprott and Rowlands^{42} can be impactful, especially for high-dimensional sets where the convergence is the slowest. For example, the standard linear regression fit would give confidence intervals $(6.44,6.6)$ for the eight-dimensional Lorenz96 model and $(5.78,6)$ for the eight coupled logistic maps. These values are closer to the entropy-based estimates but further away from the $ \Delta ( L )$ estimates of Table II.

### C. Data length

^{82}discussed the data requirements for estimating the correlation dimension and pointed out that the minimal number $ N min$ of data points needed to estimate a dimension $ \Delta 2 ( C )$ quickly increases like $log\u2061( N min)\u221d \Delta 2 ( C )$. Assuming linear scaling for $\epsilon =\rho E$, where $E$ denotes the diameter of the set (largest pointwise distance) and $\rho $ is a small number like 0.1, Eckmann and Ruelle state that for $N$ data points the estimation of $ \Delta 2 ( C )$ will not provide values larger than

The results for the 8D Lorenz96 model shown in Fig. 4 are essentially consistent with the estimates of Eckmann and Ruelle. For $N=500$ the $(5%,95%)$ confidence interval of the slope is $(5.57,5.94)$, i.e., slightly larger than $ \Delta 2 max ( C )(N)=5.4$ assuming $\rho =0.1$ $ ( \Delta 2 max ( C ) ( N ) = 5.7 )$a value of $\rho =0.113$, for example, would provide $ \Delta 2 max ( C ) ( N ) = 5.7 )$. With increasing $N$, the estimated slopes converge toward the value $ \Delta ( L )=6.91$ of the Kaplan–Yorke dimension of the 8D Lorenz96 system, a value that is already included in the interval $(5.51,7.46)$ obtained with $N=10000$, a number of points large enough to estimate correlation dimensions up to $ \Delta 2 max ( C )(N)=8$ according to the Eckmann–Ruelle limit.

The bottom panel of Fig. 4 shows log–log plots of $ C 2$ for real-world experimental data (see Sec. III G for a description). The dimension estimates $ \Delta 2 ( C )$ for different lengths $N$ are all about 3 and only the corresponding confidence intervals shrink for increasing $N$. This is also in agreement with the Eckmann–Ruelle bound, because dimensions $ \Delta 2 ( C )\u22483$ can, in principle, be achieved with $N=500$ or larger.

On the other hand, Tsonis *et al.*,^{26} using the results of Nerenberg and Essex,^{83} argue that the minimum number of points $ N min$ to estimate a dimension $x$ with 95% confidence is scaling like $ N min\u223c 10 2 + 0.4 x$. In our example, assuming true value $x\u22487$, it would require at least $ N min=63095$ points, which is too high compared to what we can estimate from Fig. 4. However, the estimate presented by Tsonis *et al.* is surrounded by its fair share of ambiguity, because it involves deciding *a priori* a scaling region extent, and it does not say whether the dimension $x$ in the expression should be the embedding dimension or the actual fractal dimension (that is unknown). We will discuss this topic again in Sec. III H.

If only small data sets are available, finite sample corrections derived by Grassberger^{84} could be used to improved estimates of $ H q$ or $ C q$.

### D. *q*-order dimensions (multi-fractality)

Here, we examine how well the estimators capture multi-fractal properties, i.e., the dependence of $ \Delta q$ on $q$, or the absence of multi-fractality, i.e., results that should be invariant to $q$. This dependence on $q$ was discussed in the review by Theiler,^{16} but we think an even better reference is in the book by Rosenberg.^{85} A known theoretical result is that $ \Delta q$ is a non-increasing function of $q$, $ \Delta q 2\u2264 \Delta q 1$ for $ q 2> q 1$ (see, e.g., Ref. 11).

Here, we will use two examples. The first is the Koch snowflake, which has an (approximately) uniform density and, thus, its FD should have no dependence on $q$ whatsoever. The second is the Hénon map, which has a strongly non-uniform natural measure, giving the expectation of a clear decrease of $ \Delta q$ with increasing $q$. The results are shown in Fig. 5.

Both the entropy and correlation sum approaches perfectly capture the absence of multi-fractality, giving identical curves for all $q$ for the Koch snowflake. For the Hénon map estimates, both methods satisfy the criterion of a decreasing $ \Delta q$ with $q$. But there is a problem. For the correlation sum method and for $q\u22602$, the function $log\u2061( C q)$ vs $log\u2061(\epsilon )$ is no longer a straight line but is composed of two linear regions with significantly different slope, making the results ambiguous. It is also not obvious why the slopes change at the given $\epsilon $ value, or why the slopes below that indicate a significantly lower dimension value. The slopes change at $log\u2061\epsilon \u2248\u221210$ and for $q=3$ the left slope becomes $\u22480.58$ while for $q=4$, the left slope becomes $\u22480.39$ (right slopes are shown in figure legend). We observed this behavior of $log\u2061( C q)$ having two slopes in practically all example sets with a non-uniform measure.

We could not find anything in the literature about this observation, and, in fact, we could not find a single figure in the literature plotting $ C q$ vs $\epsilon $ for $q\u22602$, even though the correlation sum for $q\u22602$ is provided in several publications,^{11,22,39} which report a value for $ \Delta q \u2260 2 ( C )$ (but it is unclear whether they have encountered the same problem as we have or not). We have extensively tested our code and we are confident that the implementation of Eq. (6) is correct.

For multi-fractal analysis, we are (typically) interested in quantifying the most fine properties of the fractal. Perhaps then one should determine the slope of the linear region at the *smallest* $\epsilon $ values, instead of the slope of the linear region covering the largest range of $\epsilon $ (but for very small $\epsilon $, the statistics becomes worse for finite data sets). However, the slopes of the smallest $\epsilon $ are clearly incorrect; the correct slopes are the ones of the largest $\epsilon $ values (those also highlighted in Fig. 5). In any case, using the slope of the largest $\epsilon $ values gives the correct results, but this strong dependence of slope with $\epsilon $ when $q\u22602$ is worrisome and indicates that more clarity regarding $ C q \u2260 2$ must be established in the literature.

### E. Dimension (delay embedding)

This section examines the impact of varying state space dimensionality $D$ of input data, which is common case when delay-embedding timeseries^{22,86,87} (because there one increases $D$ and searches for convergence of $\Delta $). In principle, provided the condition $d> \Delta 0 ( H )$ is met, with $d\u2208 N$ the embedding dimension, then the reconstructed set has the same fractal dimension as the original set the timeseries was recorded from Ref. 88. In Fig. 6, we check how the methods fare with this statement, and whether their accuracy decreases with increasing input dimensionality, i.e., embedding dimension.

In Fig. 6, we used a chaotic timeseries from the Hénon–Heiles system. This has $\Delta =3$, and as such, we expect convergence of the fractal dimension estimates to a value around 3 for $d\u22654$. We see in Fig. 6 that this is indeed the case. $ \Delta ( C 2 )$ does not seem to drop in performance with increasing $d$, besides a very small decrease in the overall range of order of magnitudes the linear scaling region covers. The same results were obtained using a timeseries from the chaotic Rössler system with $\Delta \u22482$ or a chaotic Lorenz96 system with $D=4$, which also has $\Delta \u22483$. $ \Delta 2 ( H )$ seems to perform poorer with increasing $d$ by either significantly reducing the linear scaling region or being less accurate in the dimension estimates. However, how much poorer it performs depends on the dataset: for the Hénon–Heiles, it is still decent, while for Lorenz96, it performs much worse (not shown). Hence, we conclude that $ \Delta 2 ( C 2 )$ performs much better as the state space dimensionality of data increases (while keeping data length and other aspects constant) vs $ \Delta 2 ( H )$, which is somewhat already evident from Fig. 3.

### F. Noise

Real-world data are always accompanied by noise, and, therefore, the impact of noise on the calculation is highly important for the choice of the method. As it is well known, the presence of noise in the data makes estimating a fractal dimension harder, as the fractal dimension of the noise is equal to that of the state space and, hence, almost always larger than the fractal dimension of the clean data. Figure 7 shows results for various kinds, and amount, of noise added to the (normalized) chaotic Rössler attractor. On purpose for this plot, we have used $\zeta =1$ (see Sec. III I), and the standard linear regression method for $log\u2061( C 2)$, because the logarithmic correction of Ref. 42 overestimates $\Delta >3$ for noisy data (the input dataset is three-dimensional and, thus, cannot have $\Delta $ > 3).

We start with the case of additive noise. There, it is known that there is some distance $ \epsilon \sigma $ called the “noise level,” below which the slope of $log\u2061( C 2)$ changes from being the fractal dimension of the chaotic set to that of the noise,^{89} This fact can be used to actually estimate the noise level of the data.^{22} In Fig. 7, we see how quickly this really happens for $log\u2061( C 2)$. Even for 5% additive noise, the curve is already dominated with the noise slope (which has $\Delta \u22483$ for three-dimensional additive noise), and only a small segment of the curve at large $\epsilon $ values where the slope becomes the deterministic value. The entropy curve does not have this property and remains having a single slope throughout (except the saturation part of course), while the slope value is an average between the purely deterministic $\Delta $ and that of the noise. On one hand, this may be considered a downside, because it does not allow estimation of noise level. On the other hand, we should note the majority of the $log\u2061( C 2)$ curve is already reflecting the noise slope. Thus, if we estimated the average slope of the $log\u2061( C 2)$ curve, it would be much larger than that of $ H 2$, i.e., it puts much more weight on the noise dimension than the deterministic data. These results regarding additive noise are typical and do not seem to strongly depend on the system considered. Note that in the case of delay reconstruction, the slope of the noise should increase with increasing embedding dimension. The slope corresponding to the FD of the deterministic set should remain constant for embedding dimensions larger than a minimum value required for successful reconstruction of the state space of the dynamical system generating the data.

For dynamic noise, we turned the ordinary differential equation (ODE) of the Rössler system into a stochastic differential equation by adding a Wiener process term $\eta dW$ in the second equation. For small amount of dynamic noise (which here reflects a proportionality of $\eta $ with the expected size of the $x$ variable), the fractal dimension increases slightly as expected, but does not have any noticeable change in its numerical value up to 10% noise. When one turns up the dynamic noise more, the dynamics collapse and there is no “chaotic” attractor anymore (not shown). Both entropy and correlation sum methods perform equally well vs this kind of noise, and there is no noise radius or change of slope discernible in the correlation sum case. We also looked at low-resolution data, by rounding Röessler timeseries to 2 digits after the decimal. This obviously decreases the valid $\epsilon $-ranges one can do the computation for (see Sec. III I). For $ \Delta 2 ( C )$, this also significantly changes the result to a value smaller than “correct,” while for $ \Delta 2 ( H )$, it has no impact. This means that $ \Delta 2 ( H )$ performs better for rounded data, which makes sense given the way it is computed (as long as points are in the same box, it does not matter how close they are to each other).

### G. Real-world data

In Fig. 8, we show fractal dimension estimates for real-world experimental data. We focused specifically in experiments that are relatively clean (large signal-to-noise ratio) and where *the underlying dynamics is well known to display low dimensional deterministic chaos*. This is important, because for this review, we do not want to mix the scientific question of whether an observed system accommodates a low-dimensional deterministic representation, with the technical/computational question of whether an estimator would actually detect that.

In Sec. III H, we further discuss what happens with real-world data where neither of these two conditions apply. We limited densely sampled experimental data to at most $N=50000$, with sampling of about ten samples per characteristic timescale. Because of the observations of Sec. III H, we have used $\zeta =1$ and the standard linear regression method for $log\u2061( C 2)$ instead of the logarithmic correction of Ref. 42.

The datasets are as follows: two electrochemical oscillator datasets (the second being more chaotic than the first);^{34} timeseries from a circuit replicating the dynamics of the Shinriki oscillator;^{90} the mean field of a network of 28 circuits following Rössler dynamics from Ref. 91; data from a mechanical double pendulum from Ref. 92; and ECG recordings during a pacing experiment of healthy individual from Ref. 32. All experimental timeseries were delay embedded using the recent automated method due to Kraemer *et al.*^{34} (see Appendix D for the embedding parameters). The method yielded embedding space of seven or less for all experimental timeseries, giving even more confidence that the data may display low-dimensional deterministic chaos.

For dataset “Rössler Net,” the curve $(log\u2061(\epsilon ),log\u2061( C 2))$ continuously changes its slope instead of having one, or at most two, constant-slope segments. This makes deducing a single fractal dimension ambiguous. Slight curving can be observed also in “electroch. 1,” “electroch. 2” and “ECG IBI” but it is a weak enough effect that two scaling regions can nevertheless be extracted (for “ECG IBI,” Fig. 8 reports the slope of the noise). On the other hand, the entropy-based approach does not suffer from this problem and, besides the expected saturation for small $\epsilon $, seems to be described quite accurately by a single slope and, thus, a specific FD value. The results shown in Fig. 8 are from five to six-dimensional embeddings yet both $ H 2, C 2$ yield FDs that are less than the embedding dimensions (excluding the “Rössler Net” case for $ C 2$ where a FD cannot be estimated). Hence, we can assume that the FD of the underlying dynamics has to be somewhere between the low bounds of $ \Delta 2 ( H )$ and $ \Delta 2 ( C )$.

The “constant slope curving” of $ C 2$ is something we have not seen before with synthetic data. From the discussion of Sec. III F, the problem may be because realistic noise may be neither white nor stationary, or because a too high of a noise level in the data. Indeed, we saw that for 5%–10% relative noise, the slopes of $ C 2$ already reflect the noise FD. In Fig. 8, $ C 2$ yields consistently higher FD than $ H 2$, even though we know that an underlying low-dimensional representation exists. To extract this lower FD value from $ C 2$, one, therefore, must focus on the larger scales $\epsilon $ and try to find this consistent smaller slope by increasing embedding dimension (as is standard practice^{22}). We show such an analysis in Appendix F. Nevertheless, it appears true that $ C 2$ is more strongly affected by noise when compared to $H$.

### H. Extreme cases

In this subsection, we examine the result of applying the aforementioned methods to ill-conditioned data, which may be non-deterministic or non-stationary, or to extremely high-dimensional data, where there exists this notion in the literature that the methods used so far are unlikely to succeed. For the first dataset, we used data from the Lorenz-96 model, with $D=6$, while having the $F$ parameter increase linearly during the time evolution from 1.0 (periodic motion) to 24.0 (chaotic motion with $\Delta \u22485$). The second dataset is the concatenation of a periodic trajectory from the Rössler system with noise uniformly distributed on the 3D sphere. The third dataset is paleoclimate temperature timeseries from the Vostok Ice core, embedded in eight-dimensional space, which is unlikely to be stationary or to accommodate a low-dimensional representation.^{25} The fourth dataset is a stock market timeseries for the “nifty50” index embedded in six-dimensional space, which is definitely non-stationary and rather unlikely to be deterministic. The last two datasets are extremely high-dimensional data of the Lorenz96 and the Kuramoto–Sivashinsky spatiotemporal system (the latter having 101 dimensions after discretization). The results of the dimension estimation are shown in Fig. 9.

Generally speaking, in Fig. 9, the results of the first four datasets show that something is “wrong.” There is a large miss-match between the estimates of the entropy and correlation sum methods and the curves do not seem to be composed of a single slope. Oddly, for the non-stationary Lorenz96 data, the correlation sum has a clear straight slope with fractal dimension somewhere between the extreme values. The Vostok data, in particular, are plagued both by a continuous change in slope, especially in $log\u2061( C 2)$, but also, the resulting FD values do not converge when increasing the embedding dimension (not shown). Additionally, the FD values obtained from $ H 2$ or $ C 2$ are very different. Notice that in all cases, our automated algorithm finds a value for $\Delta $ nevertheless. This only serves to highlight how careful one should be, and to always plot the curves of $ H 2$, $log\u2061( C 2)$.

Although in the literature there exist several tests for non-stationarity using, e.g., permutation entropy or other methods, we will now describe a simple, fractal-dimension-based scheme. One can divide the data into $m$ equal parts in two ways: Making $m$ segments of length $N/m$ of successive points, or by choosing every $m$th point, each time starting from point 1 to $m\u22121$. For these subsets, the same fractal dimension estimation is done. If there is non-stationarity, the first kind of selection will show significantly different estimates across its sub-datasets, while the second will show approximately the same estimates.

Let us now consider the last two datasets of Fig. 9 (32-dimensional Lorenz96 and Kuramoto–Sivashinsky). Surprisingly, $ C 2$ shows a rather clear linear scaling region that has a very high slope, but not as high as the expected dimension values $ \Delta ( L )$ (28 and 32, respectively). This is consistent with the upper bound equation (16) of Eckmann and Ruelle with $ \Delta 2 max ( C )(100000)\u224810$ for $\rho =0.1$ or $ \Delta 2 max ( C )(100000)\u224823$ for $\rho =1/e$, because even in cases where linear scaling occurs already for relatively large values $\epsilon =\rho E$ it cannot be expected to obtain slopes of size 28 or 32 with $N=100000$ data points, only. Furthermore, Fig. 9 shows that for these data the range of scales covered by the linear region is very small: only one $e$. Increasing the amount of data also increases the linear scaling region and the resulting confidence intervals shrink as we get more data points in the linear region (not shown here).

As discussed already in Sec. III C, these examples show again that with high-dimensional data, one needs much longer timeseries to properly cover the (typically high-dimensional) chaotic set, and this affects any kind of estimate of dynamical properties; it is not a problem specific to the correlation dimension. Additionally, one needs a very low signal-to-noise ratio, because due to the coverage of a very small range of scaling factors $\epsilon $, even a small amount of noise may ruin the estimation. However, if one does have such a clean high-dimensional dataset, $ \Delta 2 ( C )$ may still provide a useful estimator at least for a lower bound of the correlation dimension of that data set, where the value obtained should always be compared with $ \Delta 2 max ( C )(N)$ for the available amount of data $N$ and dimension estimates of surrogate data (see Sec. V) to avoid wrong conclusions.

### I. Estimation of slopes and sizes $\epsilon $

To estimate the value of $ \Delta ( H q )$ or $ \Delta ( C q )$ we need to find the slope of $\u2212 H q$ or $log\u2061( C q)$ vs $log\u2061(\epsilon )$. This matter is typically resolved in a context-specific manner, where each plot is carefully examined and the “linear region” is decided by the practitioner by eye. This approach cannot work in an objective comparison. Here, we formulate an entirely objective and sensible (but not flawless) automated process that is separated into two parts: the choice of which sizes $\epsilon $ to calculate $\u2212 H q$ or $log\u2061( C q)$ for, and how to estimate a linear scaling region from the respective curves. Once the linear scaling region is identified, actually obtaining the fractal dimension is a simple least squares fit. Note that small deviations from a straight line for densely sampled values of $\epsilon $ may occur due to *lacunarity* of fractal sets.^{126–130} While we can also observe this for very densely sampled $\epsilon $, this effect is so miniscule that we consider it irrelevant for almost all data sets in practice. These oscillations are not suitable for the quantification of lacunarity in fractal sets, but other measures exist for this purpose.^{131,132}

The range of $\epsilon $ is always decided with generating formula $\epsilon = e x$ where $x$ are $k$ linearly spaced values from $log\u2061( \epsilon min)+\psi $ to $log\u2061( \epsilon max)\u2212\zeta $, i.e., the values of $\epsilon $ are exponentially ranged in base $e$. $ \epsilon min$ is the smallest inter-point distance existing in the set and $ \epsilon max$ the average of the lengths along each of the variables of the set, and $\psi ,\zeta $ constants. Unless stated otherwise, we have used $k=16,\psi =1,\zeta =1$ in Sec. III. In essence, we are limiting $\epsilon $ to be one order of magnitude (in base $e$) larger than the smallest inter-point distance and one order of magnitude smaller than $ \epsilon max$. If the resulting $\epsilon $ range does not cover at least two orders of magnitude (common case in high-dimensional data), we use $\zeta =0$ instead. This choice of $ \epsilon max$ brings very good performance in the box-assisted algorithm for the correlation sum (Sec. 4 of Appendix A) but it is not so small as to make the computation meaningless for realistic and/or noisy data. Notice that the automated fractal dimension estimates can be sensitive on $k,\zeta ,\psi $. In practice, we would recommend producing several estimates by varying these parameters and obtaining the median of $\Delta $.

To estimate the linear region, we proceed as follows. We scan the local slopes of each one of the $k\u22121$ segments of the curve $y$ vs $log\u2061(\epsilon )$ starting from the leftmost one (here $y=\u2212 H q$ or $log\u2061( C q)$). If the local slope of the preceding segment is approximately equal to that of the next one with relative tolerance tol, i.e., $ | s i \u2212 1\u2212 s i |\u2264 tol\u22c5 max( s i \u2212 1, s i)$, then these two segments belong to the same linear region. We move to the next segment and compare it in the same way with the first segment belonging to the same linear region. When we find a mismatch, we start a new linear region. This way we have segmented the curve $y$ vs $log\u2061(\epsilon )$ into approximately linear regions. We then choose the linear region which spans the largest amount of the $log\u2061(\epsilon )$ axis, and label it “the” linear region. We finally perform a least squares fit there and report the 5-95% confidence interval of the fitted slope. In Fig. 10, we visually demonstrate the process. We also compare it with another standard way fractal dimension related plots are presented: the successive local slopes $ \delta 1$ of each point of the curves and the same slopes $ \delta 5$ but fitted in a five-long data window. Our linear regions approach is equivalent with finding the largest plateau in the local slopes plots.

Notice that there is a clear pitfall here. This algorithm will deduce a linear region no matter what. In many scenarios, this region might be meaningless, being too small to be of actual value, or could even be the scaling of the noise in noisy data, as shown in Sec. III F. So after all, careful consideration of the result is always necessary.

In addition to the algorithm presented here, also worth mentioning is recent work by Deshmukh *et al.*^{93} that offers an alternative way to estimate a slope. All possible slopes that could be estimated from the curve (by choosing all possible segments of length more than a specified minimum) are estimated. These are weighted by their length and by their inverse error and compose a distribution. The mean of the distribution is presented as the slope, while the quantiles of the distribution can be used as confidence intervals. For the work presented here, we believe our approach is more fitting, because how large a scaling region is also part of the accuracy of a FD estimator. Furthermore, we still wanted to display problems in the presence of two scaling regions (as in, e.g., Sec. III F), while the approach of Deshmukh *et al.* transforms the “looking at the curve $y(x)$ for two scaling regions” problem into “looking at the distribution of all possible slopes for two peaks,” which, while easier to resolve, still would add more information content in our already extensive article. Nevertheless, the approach Deshmukh *et al.* is most likely better suited to use in practice, when estimating a fractal dimension from experimental data, as the authors have extended their method to also provide convergence criteria of fractal dimension estimates in Ref. 94.

## IV. EXTREME VALUE THEORY ANALYSIS

In this section, we thoroughly analyze the power and shortcomings of the extreme value theory based FD $ \Delta ( E )$ introduced in Sec. II D for computing FDs under a similar lens as the comparison of Sec. III. Unless stated otherwise, we use $N= 10 5$ as the data length, standardize all input sets before any computations, and use $p=0.99$ as the quantile probability of Sec. II D (due to the discussion in Sec. IV B). Since $ \Delta ( E )$ is obtained via an arithmetic mean, it can be compared to $ \Delta 2 ( H )$ and $ \Delta 2 ( C )$, which is what is used by most plots in Sec. III.

### A. Exemplary sets

We start with Fig. 11, which shows $ \Delta ( E )$ computed for exemplary sets. The figure should be compared with Figs. 2 and 3. The figure style is typical for the rest of this section and shows the dimension estimates as distributions, with dashed white lines indicating the mean, and dotted white lines indicating the “expected” value, for which we use $ \Delta ( L )$ if possible, otherwise $ \Delta 2 ( C )$. Small horizontal red lines cap the strict limit that the dimension estimates should not exceed, i.e., the state space dimensionality. The inner legend in the plot displays information about the distribution: the mean, and in parenthesis the percentage of values that exceed the dimensionality limit cap (the red line).

All in all, the $ \Delta ( E )$ estimates seem to match well those obtained by $ \Delta 2 ( C )$ with two notable exceptions: the method performs “poorly” for a quasiperiodic ( $\Delta =2$) trajectory of the Hénon–Heiles system, for the chaotic attractor of the Hénon map and for the Lorenz-96 chaotic attractor. Here, “poorly” means significant inaccuracy in the first decimal digit. It is not clear exactly why the EVT method is not particularly accurate for these systems; however, we can speculate the respective reasons. For quasiperiodic trajectories, the sampling time may be near-commensurate with one of the two periods, making some points very rarely visited in the finite set, even if they would be uniformly visited in the limit $N\u2192\u221e$. As the method assigns a higher dimension value to a point according to its visitation frequency, these rarely visited points on the quasiperiodic orbit get higher dimension values than they should (within the EVT framework, the lower the visitation frequency of a state space point, the higher its local FD). For the Hénon map, the only thing to note is that the attractor natural density is extremely singular and the assumption of Eq. (13) likely does not hold. For the Lorenz96 and the coupled logistic maps, the thing to note is that the attractors have a high “expected” (here Lyapunov) FD, which means that for given amount of data and high underlying FD, the EVT underestimates the FD more than $ C 2$.

### B. Quantile probability

Unlike the entropy and correlation sum methods for computing FD presented so far, the EVT one is *parametric*:^{95} it requires the choice of an “extreme” probability value $p$ for which to extract the quantile of $ g i$ when calculating the exceedances $ E i$ in Eq. (8). Therefore, before performing any further evaluation of the method, we must examine how it depends on its parameter $p$. We have found no formal mathematical definition of an “extreme” in the literature of this EVT methodology, or how to practically compute an “optimal” value for $p$, or whether an optimal value exists at all. Reference 65 provides some methodology for checking whether the chosen $p$ is inappropriate that we evaluate in Sec. IV C.

In this subsection, we examine the impact of choices of $p$. This choice is somewhat linked with the data length $N$, as the local dimension estimation for each state space point is done based on $N(1\u2212p)$ points. In Fig. 12 we, therefore, vary $p$ with fixed $N$ but also co-vary $N,p$ with fixed $N(1\u2212p)$.

The results show that increasing $p$ increases $ \Delta ( E )$. It also appears that not only the mean of the distribution of $ \Delta i ( E )$ depends on $p$ but the shape of the distribution as well. On one hand, it is somewhat reassuring than once a fixed $N(1\u2212p)$ is chosen, the results do not vary as wildly as when $N$ is fixed, provided that $N(1\u2212p)$ is large enough. However, we also noticed that under fixed $N(1\u2212p)$, the estimated dimension seems to monotonically decrease further and further away from the expected value when $p$ is decreased.

On the other hand, in most real-world applications, it is $N$ that is fixed, and, hence, the results of the top panel of Fig. 12 are what is of most interest. In addition, even if it was possible to co-vary $N,p$ in a realistic application, we still cannot provide instructions of what the best choice should be for $p$: while Fig. 12 reveals the dependence on $p$, it does not lead to any obvious conclusions on what $p$ should be. A saving grace here is that while there is a clear dependence on $p$, the mean value $ \Delta ( E )$ does not change significantly (i.e., differences span less than one integer, which is anyways the accuracy we are interested in practice), provided that $p$ remains in a range so that $N(1\u2212p)\u2265100--1000$.

### C. Quantifying significance

The entropy or correlation sum approaches provide a relatively straightforward way to check for the significance of results: there should a single slope covering several orders of magnitude $\epsilon $. The larger the range of magnitudes, the more significant the results. Such a simple visual significant check does not exist for the EVT approach. In this section, we examine possible ways to test for the significance of the EVT results based on what has been suggested in the literature or with alternative means we devised while composing this review.

In practical applications as in Ref. 65, the authors identify a range of $p$ values that are appropriate using a statistical hypothesis test of whether $ E i$ follows an exponential distribution (EXPD). Examples of such statistical tests are the Anderson–Darling^{96} or the Kolmogorov–Smirnov.^{97} Here, we used the Kolmogorov–Smirnov exact one sample test. The test proceeds as follows: for a given $p$, each of the $ E i$ are first fitted to a EXPD. Then, the fitted EXPD is used in a statistical test for the null hypothesis: “the data coming from the given GPD.” The test yields a p-value. Typically, if the p-value is very small, e.g., p<0.05, the null hypothesis can be rejected, which may mean unsuitable data altogether, or not enough data (e.g., quantile probability $p$ was chosen too high). In practice, one hopes that the majority of the p-values (each for each $ E i$) is significantly larger than some low threshold of, e.g., 0.05.

Formally speaking however, a p-value greater than some threshold does not mean we can accept the null hypothesis; only that we fail to reject it. Any other distribution may have generated the data equally well. Hence, the convincing power of this line of argumentation (checking for large p-values) is weak from a statistical inference point of view. An alternative test mentioned in the literature is to check how stable the distributions for $\sigma $ of the fitted EXPDs are when varying $p$; but we did not find this argument convincing (stability of parameters does not mean significant fit), so we ignored it.

^{98,99}(without any mathematical guarantee of course).

In Fig. 13, we show distributions of p-values and NRMSEs for a chaotic trajectory of the towel map. We provide more similar such plots in Appendix E, which establish that our observations do not depend at all on the input set $X$.

The results are very surprising. It appears that the p-value based test is unhelpful and/or misleading. For example, it shows that quantile $p=0.95$ is a bad choice, because the overwhelming majority of p-values are $\u2264$0.05, eluding to a rejection of the hypothesis “the data come from a EXPD.” Yet, if we look at the actual fitted data to the right of the top panel, we do not observe “bad quality” fits at all. This is further established by the distribution of NRMSE values, which has all of its mass in low values. We are not sure why the hypothesis test behaves this way in this scenario.

On the other extreme of $p=0.999$, the p-value based test is again misleading. This is the case where most of the p-values are >0.05; hence, it would be the most trustworthy in terms of the p-value test. Yet clearly, this is the case where the actual fits are the worst by far. We have performed extensive numerical tests and are confident that the code implementations yielding the p-value are correct.^{100}^{,}^{132}

This leads us to conclude that the NRMSE based test is much more trustworthy. If a practitioner wants to transform the NRMSE data into a Boolean decision “is this $p$ okay?,” we would argue to check if the majority (i.e., 99%) of the mass of the NRMSE distribution is less than 0.5. Even so, the NRMSE test may only provide a range for $p$ where the EXPD fits are of sufficiently high quality. It cannot instruct how to pick a $p$ from that range. Thankfully, from what we have seen in Sec. IV B, the fluctuations of $ \Delta ( E )$ with $p$ are relatively small if $p$ is in an appropriate range.

What made the discussion of this subsection difficult is that we have not found any information regarding the $p$-values of this test in the literature, despite the plethora of real-world applications (see Appendix A 10). While it has been mentioned that the $p$ chosen in these real-world applications “satisfy” this p-value test,^{65} the actual p-value distributions were not shown.

In fact, we have not found any discussions in general regarding the significance of the EVT results. The main argument the EVT FD literature has used in favor of significance of results is that the $ \Delta i ( E )$ distributions did not change much within a range of appropriate $p$. However, by itself, this argument is not convincing of the *validity* of the results, only of their *stability*. We will discuss these aspects again in Sec. IV H and in the conclusions. For the rest of the article, we will be using $p=0.98--0.99$, as these values seem to yield correct results for synthetic $X$ with lengths $ 10 4-- 10 5$.

### D. Comparison with pointwise dimension

In Fig. 14, we again compute $ \Delta ( E )$ for exemplary sets as in Sec. IV A, but now we compare it with the pointwise dimension, i.e., the scaling of the inner sum of Eq. (6) vs $\epsilon $. Our goal with this comparison is to see how well either method captures the “spread” of dimension values. We expect that for rather uniform fractal sets the distribution should be narrow, and wide for sets with highly non-uniform natural measure. Note that the average of the pointwise dimensions of the correlation sum *does not coincide* with $ \Delta 2 ( C )$. As is also made clear in Ref. 16, $ \Delta 2 ( C )$ gives a more accurate result for the FD of the whole attractor because it utilizes more points to estimate the scaling behavior.

The most important result here is that for the chosen $p=0.99$, the two methods yield very similar results, hence establishing the overall accuracy of the EVT method for synthetic data. The pointwise dimension estimates, however, are more accurate for the Hénon map attractor and a quasiperiodic orbit, which we already discussed in Sec. IV A. Hence, $ C q$ is slightly more accurate for noiseless deterministic sets.

### E. Length, dimension, sampling time

We now test the EVT approach while varying various aspects of the input data: length, state space dimensionality (using delay embeddings of increasing dimension as in Sec. III E) and sampling time. The reason to judge the quality of the EVT approach vs sampling time is because, unlike the correlation sum approach, which explicitly takes into account dense time sampling via the Theiler window $w$, the EVT approach is typically presented as agnostic to the sampling time. The analysis is presented in Fig. 15.

Regarding data length, EVT scales well with decreasing $N$ up to a threshold. When $N$ becomes too low, so that $N(1\u2212p)$ becomes less than 50, the results significantly lose accuracy, making the estimated $ \Delta ( E )$ increase rapidly. Interestingly, with decreasing $N$, the EVT *overestimates* the FD instead of underestimating it like $ H 2$ or $ C 2$. Note that one cannot simply fix this problem by reducing $p$ because, as illustrated in Sec. IV B, this has its own downside of decreasing the estimated dimension. Still, we may conclude that for the data considered here, with $ N max> 10 3$ the EVT performs well, which is a better scaling with $N$ than $ H 2$, but worse than $ C 2$. Like with $ C 2$, this data length $ N max$ should scale with the FD; however, there is no analytic treatment as to how (while for $ C 2$ analytic bounds are discussed in Sec. III C). Given that real-world data are typically small in length, one has to be particularly aware about this point.

As far as input dimensionality is concerned, we observe similar results as with $ \Delta 2 ( C )$ in Fig. 6: $ \Delta ( E )$ seems quite robust when increasing dimensionality.

When it comes to sampling time, the correlation sum approach utilizes the Theiler window, which in practice shortens the distance calculations from $N$ to $N\u2212w$. This has a negligible impact on data length but a significant positive impact on the FD estimate in cases where data are sampled densely in time.^{3} Similar results are obtained for $ \Delta ( E )$: for very small sampling times, the FD is biased toward lower values. Hence, it would make sense to include a Theiler window in Eq. (7), i.e., only include $j$ with absolute distance from index $i$ greater than some $w$ like in Eq. (6). We note that Ref. 66 also considered the impact of “temporal neighbors” and reached the same negative bias conclusion.

### F. Noise

We repeat here the analysis of Sec. III F in Fig. 16 using the Rössler system combined with various forms of noise. For additive noise, $ \Delta ( E )$ behaves much more similarly to $ \Delta 2 ( H )$ because the mean FD value of EVT is an average without focus at a specific scale. This means that the FD value in the presence of additive noise is between 1.9 (deterministic) and 3 (state space dimensionality). Additionally, the FD values of $ \Delta ( E )$ are the smallest (and, hence, closest to the deterministic FD value) out of the three ( $E, C 2, H 2$). $ \Delta ( E )$ also seems to be completely unaffected by rounding.

These observations make sense if one considers how $ \Delta ( E )$ is computed (Sec. II D). The logarithms of all inter-point distances are taken into account for the computation of the quantile of $ g i$. Rounding will have a negligible effect on the distribution of inter-point distances, and additive noise will have a diminished effect due to being “averaged out” in some sense when computing the quantile. These properties make $ \Delta ( E )$ preferable in the presence of noise, unless one wants to identify the noise radius, in which case $ C 2$ is more suited.

Last, we mention that in the case of dynamic noise, $ \Delta ( E )$ provides slightly higher values than $ H 2$ or $ C 2$ (but, of course, we do not know whether any of the three is the more “correct” number). It is worth noting nevertheless that the distribution of $ \Delta ( E )$ for dynamic noise is much narrower than for other types of noise, which we found unexpected. We examined no further however, as the results with dynamic noise depend strongly on the system used and the exact form the noise was added, and, hence, cannot lead to any general statements.

### G. Real-world data

In Fig. 17, we use the same datasets as in Sec. III G. Because the datasets are smaller in size than the typical length we used before ( $N= 10 5$), we used $p=0.98$ instead of $p=0.99$. This value for $p$ also satisfies the “NRMSE test” we described in Sec. IV C, in the sense of most NRMSE values being less than 0.5.

In addition to the large extent of some of the distributions of $ \Delta i ( E )$, we do not notice any downside or incorrectness in the mean FD values: they are comparable with those coming from $ H 2$ (and we cannot know whether $ H 2$ or EVT are more correct in their estimate). Given the results of the preceding Sec. IV F, this is expected due to the better response EVT has to noise (vs $ C 2$).

### H. Extreme cases

In Fig. 18, we apply $ \Delta ( E )$ in various extreme cases as in Sec. III H. $ \Delta ( E )$ does a good job distinguishing that two sets of very different dimensions have been artificially merged with each other in the first two cases of Fig. 18, because it yields bi-modal distributions for $ \Delta ( E )$. However, this is mainly due to the way that the sets were created. When compared to the case of adding noise to the data, we did not see bimodal distributions with peaks at 2 and 3 dimension values, i.e., $ \Delta ( E )$ cannot be used to identify a noise radius unlike $ \Delta 2 ( C )$. Nevertheless, $ \Delta ( E )$ could be a good tool detecting non-stationarity in an observed set, if that non-stationarity significantly changed the FD value over time.

Unfortunately, for the sets that do not accommodate a low-valued FD description, like the Vostok and nifty50, a straightforward application of $ \Delta ( E )$ gives a rather “clear” picture that a low-dimensional FD value describes the data. This is especially obvious in the nifty50 set (which we remind is a stock market timeseries), where $ \Delta ( E )$ gives the lowest dimension values and a narrow distribution. Surrogate testing did not help here either: generated random-Fourier surrogates^{101} had consistently higher $ \Delta ( E )$ than the original data, enhancing the wrong conclusion that the estimated $ \Delta ( E )$ of the data is valid. Additionally, performing the significance tests of Sec. IV C made things worse: the resulting plots (shown in Fig. 23) look very similar to those obtained from systems with a legitimate low-dimensional representation, again reinforcing the wrong conclusion. Last, we attempted to perform the standard analysis of checking whether the FD result converges with increasing embedding dimension of the timeseries. In Appendix F, we find that, in contrast to the $ H 2, C 2$ methods, the EVT method shows a convergence of the “nifty50” timeseries FD. Already in embedding dimension $d=5$, it shows a constant mean $ \Delta ( E )$ for any $d\u22655$, which, again, re-enforces the wrong conclusion. The same results were obtained for the “Vostok” timeseries.

It appears that four different significance testing methods (p-values, NRMSEs, surrogate timeseries, convergence with increasing embedding dimension) all yielded with confidence that inappropriate data like “vostok” or “nifty50” have a small FD, which is incorrect. These results have major implications for both previous and future applications of the EVT method in real-world data. We have not found a way to test whether the FD values yielded from the EVT method actually represent a low-dimensional deterministic system or not, i.e., there is no way to *falsify* the method. Hence, extreme care must be taken when applying the EVT method to arbitrary real-world datasets, and whether the data accommodate a deterministic representation must be confirmed by other means (e.g., a self-consistent physical theory or using the correlation sum with increasing embedding dimension as is standard practice in nonlinear timeseries analysis^{22}).

For extremely high-dimensional but deterministically chaotic data, $ \Delta ( E )$ does an excellent job in identifying a very high FD (note the FD values are divided by 4 in the figure), which is also very close to the expected value $ \Delta ( L )$. This was expected, as $ C 2$ also does an excellent job identifying a very high dimension for clean data, and, in general, EVT and $ C 2$ perform very similarly overall when it comes to deterministic noiseless data. For the Kuramoto–Sivashinsky example, we see that EVT estimates higher FD than $ C 2$ (while typically we noticed that for high-dimensional data, it underestimates the FD when compared to $ C 2$). However, in this example, we are not sure what the results are because EVT is indeed more accurate or because EVT overestimates the FD due to having too small $N$ when compared to the expected FD value (see discussion in Sec. IV E).

## V. A NOTE ON SURROGATE TIMESERIES

Surrogate timeseries have been recommended by Theiler *et al.* in the early 1990s^{101} as a means to test for nonlinearity in noisy timeseries. It was suggested there that a discriminatory statistic for the test can be the FD computed via the correlation sum. That requires a bit of care. If one uses the algorithm we described here, i.e., using Eq. (6) and then deducing the slope of the largest linear scaling region, then the user risks estimating the fractal dimension of the noise (already existing in the original data) instead of the “underlying deterministic nonlinear dynamics data” (if any exist), thus invalidating the hypothesis testing approach in the first place. Other discriminatory statistics should be used instead (see Ref. 102, for example).

If a fractal dimension is chosen as a discriminatory statistic nevertheless, we propose the following alternatives for computing it: (i) use Takens’ estimator (Sec. 6 of Appendix A) with the empirically good estimate $ \epsilon max=0.25std(x)$ with $x$ the original timeseries. Because Takens’ estimator performs a maximum likelihood estimation instead of a linear fit and, thus, considers *all* regions of the correlation sum up to some $ \epsilon max$, it produces an “average” fractal dimension of the noise and the underlying data. It should be clear, however, that the Takens estimator is used in this context (just) as a discriminating statistics and its results must not be interpreted as meaningful dimension estimates, or (ii) use the EVT version $ \Delta ( E )$, which also similarly produces an “average” of fractal dimensions of noise and deterministic set.

## VI. CONCLUSIONS

In this paper, we have analyzed many different practically relevant fractal dimension (FD) estimators we found in the literature. From all these estimators, we focused on an extensive quantitative comparison between the entropy-from-histogram method $ H q$ (Sec. II B and Sec. 1 of Appendix A), the (potentially box-assisted) correlation sum method $ C 2$ (Sec. II C) without any logarithmic corrections, and the extreme value theory method (EVT, Sec. II D). Based on our review, these three estimators are the ones most worthy using in practice ( Appendix A).

To keep this paper within sensible limits, we have used a relatively small set of possible dynamical systems and real-world data. We cross-checked our results with different systems (not shown), and we are confident that the results presented are robust. Nevertheless, it is impossible to guarantee their universality for all possible input datasets. To amend this, the reader can repeat our extensive analysis for any other dynamical system or input dataset of choice by only changing a couple of lines of code in the provided code base (see Appendix B).

Our conclusions are as follows (see also Fig. 1). When comparing $ H 2$ with $ C 2$, we found that for *synthetic (i.e., noiseless) data*, $ C 2$ is clearly superior to $ H 2$, retaining much better accuracy for decreasing $N$ (amount of points), increasing $D$ (state space dimension), or decreasing $\epsilon $ (size scale). $ C 2$ can also be used to detect the “noise radius,” as, e.g., illustrated by Kantz and Schreiber.^{22} For real data, this can become a downside, making the FD estimation using $ C 2$ ambiguous due to either an almost continuous change of slope or the majority of the slope reflecting the slope of the noise (Sec. III G). This can be partly alleviated by examining the behavior of $ C 2$ with increasing embedding dimension $d$. In Sec. III C, we saw that, provided that a deterministic chaotic attractor is known to generate the data, it is still quite sensible to estimate a FD of relatively small datasets using $ C 2$, as the estimated FD remains very accurate even for relatively small data lengths. On the other hand, $ H q$ is very sensitive to data length and underestimates the FD strongly even for moderately small data (which is often the case in experiments). We had difficulties making sense of correlation sum for order $q\u22602$, even though it has been mentioned several times in the literature. $ C q$ for $q\u22602$ gives correct results only when one considers the slope at the largest $\epsilon $ values and it is unclear why the slopes (i.e., FD values) at small $\epsilon $ values are incorrect. As such, we suggest that when treating multifractality, the entropy method should be preferred, or, if one has too few data points (where $ H q$ performs poorly), then using $ C q$ and recording the slope of $log\u2061( C q)$ at largest $\epsilon $ is the best alternative.

We then compared $ C 2$ with EVT. In deterministic datasets, we found a very high degree of agreement between $ C 2$ and EVT, confirming EVT’s accuracy. EVT performed equally well for high-dimensional data but worse than $ C 2$ for decreasing $N$. Still, EVT performs better with decreasing $N$, increasing $D$, or increasing underlying FD when compared to $ H 2$. When it comes to noise, EVT appears to have similar results with $ H 2$, i.e., the results are an average of the deterministic (smaller) FD and that of the noise. EVT reports the smallest FD values when contaminated by noise, and hence, EVT is affected less strongly by noise than $ C 2$ or $ H 2$. One more advantage of EVT vs $ C 2, H 2$ is that it foregoes the identification, fitting, and extraction of slope from a scaling region and, hence, does not face the same limitations when the data can cover only very small range of magnitudes of $\epsilon $. This advantage is balanced by the disadvantage of EVT being much more of a black box method than $ C 2, H 2$.

It appears that the EVT is a promising method that combines benefits from both $ H 2, C 2$: it scales well with decreasing $N$ or increasing $D$ and is more tolerant to noise than $ C 2$. However, we also observed that it has a huge downside: *it cannot be falsified*. Or, at least, at the moment, there does not exist a method (visual or statistical) in the literature that can confidently falsify it, nor that it can quantify its significance meaningfully. All four methods we utilized and discussed in Sec. IV C (p-values, root-mean-squared errors, surrogate tests, and increasing embedding dimension) failed to indicate that, e.g., stock market timeseries are an inappropriate input (despite being non-stationary and not satisfying practically any of the assumptions underlying the EVT method). Instead, all ways to test for significance gave big confidence that stock market timeseries are described by a very small FD of $\u22482.5$.

Surprisingly, we found practically no discussion in the literature about falsifiability, despite the plethora of real-world applications in very varied input datasets (Sec. II D). Nevertheless, we believe that falsifiability is important, because with it, one resolves controversies like the fractal dimension of “global climate attractors” (Ref. 26). The lack of falsifiability has major implications for both previous and future real-world applications using the extreme value theory approach: the argument for low-dimensional determinism in the data cannot come from the EVT method itself, at least not at the moment.

It is clear, as it was before this work, that estimating a FD is not an easy task, and, hence, focusing on only a single number can mislead. The best practice we feel is to calculate several versions of $\Delta $, from different methods and with varying the parameters of each method (including the range of $\epsilon $ or the quantile probability $p$) and produce a median of the results. In addition, given the software implementation we provide here, calculating all FD variants only necessitates a couple of lines of code (see Appendix B). Furthermore, plotting of the appropriate quantities $\u2212 H q,log\u2061( C q)$ vs $\epsilon $ is a must and can hint whether the methods are applied at inappropriate data. In addition, expecting more than one decimal point of accuracy is unrealistic in most practical applications.

As an outlook, we believe there is still some future research to be done regarding estimating FDs. In addition to the Lyapunov dimension, which is not easily applicable to observed data, every other estimator disregards the time-ordering information in observed data (i.e., that the sequence of points follows the flow in state space instead of being randomly drawn samples on the attractor). Perhaps here is a way to make a more powerful estimator by using this discarded information of the time-ordering of the points in the dataset. For example, this time-ordering information has been used in, e.g., estimating the transfer operator,^{103,104} which can also yield the natural density, and perhaps this operator can be utilized to create a FD estimator of higher accuracy or with better scaling with the number of points $N$.

Regarding the EVT approach, we believe future research can improve in two fronts: (1) developing a mathematically rigorous framework for choosing $p$ and (2) developing statistical tests for correctness and significance of the method’s results that can successfully falsify the method with inappropriate data. With new statistical indicators, the method may be applied with more confidence to data of unknown dynamical origin.

## ACKNOWLEDGMENTS

We thank Ignacio Del Amo for an initial draft code implementation of the extreme value theory dimension estimator and for providing the simple proof of non-equivalence between the box-counting and Hausdorff dimensions; three independent reviewers for constructive criticism that greatly improved the quality of the manuscript; Nils Bertschinger for helpful discussions regarding distributions of p-values; and Gabrielle Messori and Davide Faranda for discussions and clarifications regarding the usage of extreme value theory for estimating a fractal dimension, including its p-value test of Ref. 65.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

**George Datseris:** Conceptualization (lead); Data curation (lead); Formal analysis (lead); Methodology (lead); Software (lead); Supervision (lead); Visualization (lead); Writing – original draft (lead); Writing – review & editing (lead). **Inga Kottlarz:** Formal analysis (supporting); Software (supporting); Writing – original draft (supporting). **Anton P. Braun:** Formal analysis (supporting); Software (supporting); Writing – original draft (supporting). **Ulrich Parlitz:** Conceptualization (lead); Supervision (lead); Writing – review & editing (lead).

## DATA AVAILABILITY

Appendix B discusses software implementations and a code base for reproducing the submitted paper.

### APPENDIX A: ALGORITHMS FOR ESTIMATING A FRACTAL DIMENSION

#### 1. Optimized histograms for arbitrary $\epsilon $

Here, we describe an optimized method to calculate histograms, which to our knowledge neither has been published before nor could we find a faster method that works for arbitrary $\epsilon $. The process has memory allocation scaling of $D\u22c5N$ and performance scaling of $D\u22c5Nlog\u2061(D\u22c5N)$, neither of which depends on $\epsilon $. The process is as follows. Every point in the dataset is first mapped to its corresponding bin via the operation $ b i=\u230a( x i\u2212 x min)/\epsilon \u230b$, where $ x min$ is a vector containing the minimum value of the dataset along each dimension and $\u230a\u22c5\u230b$ is the floor operation. The resulting $ b$ ( $N$ in total) are then sorted with a quick sorting algorithm, which results in all equal $ b i$ being successive to each other. The sorting is lexicographic, i.e., sorting by first dimension, then by second, and so on. We then count the successive occurrences of equal $ b i\u2261 b i + 1$, which gives the amount of points present in the corresponding bin and move on to the next bin (there are no actual bins being created in memory, a bin is conceptually defined as the group of successively equal $ b i$). Dividing by the total amount of points gives the probabilities that can then be plugged into Eq. (3) to yield the entropy $ H q$.

#### 2. The box-counting algorithm by Molteno

This box-counting algorithm for the generalized dimension was introduced by Molteno^{38} as an improvement for the method introduced by Grassberger.^{105} It claims to be of order $D\u22c5N$. The algorithm partitions the data into boxes and counts the number of points in each box to retrieve the probabilities $ p i$ necessary to calculate the generalized dimension $ \Delta q ( H )$. The faster runtime of the algorithm is due to the use of integer manipulations for the division of the data points into boxes.

The first box contains all indices to the array that stores the data points, where a box refers to an array of indices. The boxes are subsequently partitioned, until the mean number of points per filled box falls below a threshold $ k 0$, recommended by Molteno to be 10.

The algorithm provides exponentially scaled sets of values that suit the approximation of the limit in Eq. (4) and it only needs to compute the operation (A2) $D\u22c5N$ times per partitioning process. The main disadvantage is the static box size choice. It works well for low-dimensional data sets, but for larger dimensions, the number of boxes increases exponentially with dimension, thereby increasing the number of points necessary to calculate the dimension exponentially. This drastically limits the $\epsilon $ range to which the algorithm is applied over, and for some sets makes the computation of low accuracy or even straight-out impossible. Given how fast our histogram algorithm already is (all $ H q$ curves in this paper took on average 0.1 s to compute), we saw no reason to use the Molteno method. Also, the Molteno method does not allow the user to decide $\epsilon $ values, making it less flexible.

#### 3. Correlation sum

^{22}in the exponent of $1/(q\u22121)$. If the $q$-order correlation given by Kantz and Schreiber is called $ C q \xaf(\epsilon )$, it scales as $ C q \xaf(\epsilon )\u223c \epsilon ( q \u2212 1 ) \Delta q ( C )$, then $ C q \xaf ( \epsilon ) 1 / ( q \u2212 1 )= C q(\epsilon )\u223c \epsilon \Delta q ( C )$ and both versions are equal. In addition to the exponent, the formulation of $ C q$ also differs in the indices of the outer of the sums, which range from 1 to $N$, requiring a slightly changed normalization.

#### 4. The box and prism-assisted correlation sum

Theiler^{40} proposed an improvement over the calculation of the correlation dimension by Grassberger and Procaccia^{106} that divides the data into boxes before calculating the distances between points. Thereby, the number of distance calculations is reduced and the scaling becomes faster than $ N 2$ (how much, it depends on the box size $ r 0$). After division into boxes, the formula given in (6) is used to calculate the correlation sum, therefore an extension to the $q$-order correlation sum is possible.

For a point inside a box, there may be points outside the box that are within the given distance of the point. Therefore, the neighboring boxes with respect to the current box also have to be found and checked. The distances between the points in the box and its neighbors are calculated as given in Eq. (6). The first sum runs over all points of the initial box and the second sum uses all points in the box and adjacent boxes. For $q=2$, only boxes with equal or larger indices are included in the neighbor search and the optimization, presented in the previous chapter, is used.

A point of criticism for this algorithm is its poor runtime for higher dimensions. The advantage of distributing the points into boxes beforehand diminishes as the number of boxes increases considerably with dimension. To reduce the amount of boxes, Theiler proposed a prism algorithm, where only the first $P$ dimensions are used to distribute the data into boxes. These new boxes, where $P$ sides are of side length $ r 0$ and the other $D\u2212P$ sides cover the whole range, are called prisms. The best choice given by Theiler is $P=0.5 log 2\u2061N$ and should be used when $D$ exceeds $0.75 log 2\u2061N$. A downside of this prism approach is that for any $P<D$, some point pairs that should have been discarded may be included due to having small distances in the first $P$ dimensions but a larger distance in at least one of the remaining dimensions.

^{41}proposed a different, and more stable, algorithm for the calculation of the optimal box size $ r 0$. They optimized the expected calculation times for the optimal number of filled boxes to

The box size estimator still varies in its choice of box size but shows fluctuations of smaller amplitude than the Theiler estimator. The choice of a box size smaller than the smallest interpoint distance only occurs for high dimensional data sets with a low number of points. Furthermore, Bueno-Orovio and Pérez-García chose a prism dimension of always $P=2$ (for $D>2$). However, a prism dimension of 2 can result in box size estimates smaller than the minimal interpoint distance for high dimensional datasets with comparably low size. In this paper, we used $P=2$ but we have different $ r 0$ (see below).

The main benefit of a small $ r 0$ is that it makes the computations much faster. Unfortunately, both suggestions, and especially that of Ref. 41, often fail in practice. They give much too small $ r 0$ values and for data with any amount of noise whatsoever this value is well below the noise radius. This is displayed in Fig. 7. The vertical dotted line shows the $ r 0$ estimated by Eq. (A6) (the calculation of $ C 2$ would be limited up to this $ r 0$). In fact, $ r 0$ is so small that even for only 5% additive noise the computation would only show the noise dimension and no hint of the deterministic (and smaller) slope of $log\u2061( C 2)$ vs $log\u2061(\epsilon )$. That is why in this paper, we decided to use the box-assisted version for better performance, but with $ r 0= \epsilon max/ e 2$ as discussed in Sec. III I. The performance is not too bad, e.g., for the typical data lengths $N= 10 5$ considered here computing the entire $ C 2$ curve takes about a minute on an average computer. Notice that even with the optimized-for-performance $ r 0$ of Ref. 41, the entropy method is still massively faster (in fact, it is even faster than just computing $ r 0$).

#### 5. Fixed mass correlation dimension

^{107–109}proposed a method using a fixed mass algorithm that does the opposite: the probabilities are fixed by defining $ p i=n/N$, where $n$ is a chosen number of points and $N$ is the total number of points in the set. The diameter is chosen to include $n$ points. Following the explanation by Grassberger

^{84}the scaling can be rewritten to assume the form of Eq. (A7),

^{110}giving a very high performance for low-dimensional dataset, scaling poorly for very high-dimensional ones. Hence, we decided to use only the traditional correlation sum version in the main comparison of Sec. III.

#### 6. Takens’ estimator

^{13}aims to estimate the correlation dimension using the method of maximum likelihood estimation (MLE).

^{111}From $m$ interpoint distances $ \rho j< \epsilon max$, one can estimate the correlation dimension as

^{13,43}which can be easily corrected by writing

For a Gaussian distributed random variable, the log-likelihood function is a parabola, that at $1\sigma $ has fallen by 0.5 from its maximum and at $2\sigma $ by 2. By invariance,^{112} this is also the case for a non-Gaussian random variable, letting us easily estimate the variance of $ \Delta ( T )$.

When testing the algorithm and its dependency on $ \epsilon max$ (Fig. 20) on different dynamical systems, we found that the variation of $ \Delta ( T )$ exceeds the confidence intervals at any fixed $ \epsilon max$ for low-dimensional systems.

These variations occur because for the estimation, it is assumed that Eq. (A10) holds. Thus, the estimated $ \Delta ( T )$ and its confidence intervals are of no use as long as the validity of the assumption (A10) is not known.

While Takens’ estimator does not provide a significant advantage in the precise estimation of the fractal dimension, compared to correlation-sum-based methods, it can be useful in the case of surrogate timeseries as described in Sec. V.

#### 7. Judd’s estimator

^{44,45}also uses maximum likelihood estimation. To account for deviations from uniform scaling, it allows a polynomial $a$ of degree $t$ so that the assumed correlation sum is

*cutoff*and $w=log\u2061(1/\lambda )$ the bin width. Now, the probability of observing a distance in bin $ B i$ becomes $ p i= P i\u2212 P i + 1$, with

Once the optimal bin width is found, the cutoff $ \epsilon 0$ is chosen as the right edge of the fullest bin of the histogram. All bins to the right of this bin are joined to form $ B 0=[ \epsilon 0,\u221e)$.

The minimization of Eq. (A12) is subject to two difficulties that are already noted by Judd. First, an optimizer cannot understand the idea that the exponential $ \epsilon \Delta ( J )$ is the essence of the model, while the polynomial is only a device to correct for deviations from the scaling law. Second, the optimizer is highly sensitive to the initial condition of the optimization, which could be reduced by a tailored optimizer,^{113} but still one can observe a very broad distribution of $ \Delta 2 ( J )$ values over different samples of a long trajectory, especially for higher-dimensional systems, as is shown in Fig. 21.

Due to these problems, we decided not to include the estimator in the main comparison.

#### 8. Dimension from Lyapunov exponents

In Sec. II E, we described the Lyapunov dimension $ \Delta ( L )$ due to Kaplan and Yorke. We do not have anything to add here regarding $ \Delta ( L )$, but we want to mention Ref. 48 by Chlouverakis and Sprott. They suggest that instead of a linear interpolation to the sum of $ \lambda i$, a polynomial interpolation should be done instead. However, as we found no theoretical foundation for this proposal, we decided to skip it (and we also did not notice any significant improvement with the numeric results).

#### 9. Mean return times

According to the Poincaré recurrence theorem, any trajectory within an ergodic set^{114} will return arbitrarily often and arbitrarily close to any neighborhood in the ergodic set.^{99} We represent this neighborhood as a hypersphere of radius $\epsilon $, centered at some point $ x 0$ in the ergodic set, and define as $\gamma $ the mean return time to this hypersphere. Then, one expects that $log\u2061(\gamma )\u2248\u2212 \Delta ( \gamma )log\u2061(\epsilon )$ with $ \Delta ( \gamma )$ the fractal dimension, as estimated from the mean return times. A more formal discussion of this fact and explicit connection with the natural measure of the ergodic set and the fractal dimension obtained via the generalized entropy (4) is given by Theiler.^{16} The earliest reference we found using return times to estimate fractal dimensions was Ref. 46.

Unfortunately, the method using mean return times is not recommended at all. A fundamental limitation is that knowledge of the dynamic rule $f$ is necessary, otherwise the results of the method for measured data are too inaccurate to be considered seriously. Even for known rule $f$, the method converges slowly (numerically). Furthermore, it provides an estimate of the local dimension around the point of return, similarly with the point-wise dimension. Thus, it has to be further averaged over several state-space points, requiring several orders of magnitude more computation time than the correlation sum method or the Lyapunov dimension method. Last, we found its numeric output (not shown, see online repository) to be quite far from the results of the correlation dimension and, thus, we do not consider it accurate enough.

#### 10. Extreme value theory

We introduced the algorithm of estimating a FD via extreme value theory (EVT) in Sec. II D. Here, we will expand more on how the algorithm has been used in the literature and highlight potential ambiguities we have noticed in its real-world applications.

*et al.*,

^{75}as well as real-world applications that explicitly fit a GPD to data (i.e., allowing $\xi \u22600$ instead of enforcing $\xi =0$). It is correct that the exceedances $E$ in a dataset may follow a GPD, but if one wants to make the claim that the $\sigma $ parameter is connected with the fractal dimension, then one must assume that the data follow the reduced EXPD instead. That is because of Eq. (14). If one uses the GPD cumulative function instead of the EXPD one, one would get the expression $exp\u2061(\u2212E \Delta i)$ to be equated to $ ( 1 + \xi E / \sigma i ) \u2212 1 / \xi $, from which it is impossible to claim $ \Delta i=1/ \sigma i$ like in Eq. (14). Hence, when using GPD fits instead of EXPD, one cannot simply equate the (local) fractal dimension with the $\sigma $ parameter of the fit, in contrast to what has been done in the literature.

^{75}

The second ambiguity we encountered is the lack of application, and even discussion, of delay coordinates embeddings. Many real-world applications, e.g., Ref. 65, analyze a single dynamic variable (such as sea level pressure in the case of climate applications). This dynamic variable is a spatiotemporal field and, hence, provides a multi-dimensional input dataset. Despite the high input dimensionality, this single variable is likely coupled to many other dynamic variables in a coupled dynamical system (which is especially true in the case of climate dynamics). The theory of delay embeddings is supposed to re-construct the missing dynamic variables and as a result provide a more correct representation of the dynamical flow. Delay embedding is missing from almost all applications of EVT we reviewed and cited, even though they all use the timeseries of only one dynamic variable. Given that delay embedding is a well established analysis step^{28} that is completely separate from estimating fractal dimensions, we are not sure why there is a lack of discussion of it.

The third ambiguity we want to highlight is the report of relatively small values for the fractal dimensions of spatiotemporal, and highly complex, real-world data. For example, Ref. 70 reports dimensions of $\Delta \u22483.5$ for the dynamics of slow earthquakes in the Cascadia region, Ref. 72 reports $\Delta \u224815$ for spatiotemporal atmospheric flow (of daily resolution; hence, large scale turbulence is considered), Ref. 73 report a difference of at most $\delta \Delta i\u22482$ from the states of largest and smallest local fractal dimension of the 500-hPa geopotential height (Z500) dynamic variable for the weather of the European-Atlantic region. In particular, in the last case, this would mean that, out of the potentially millions of available degrees of freedom in this (discretized) spatiotemporal system, there is a difference of at most two additional degrees accessed by the state space flow between the least and most stable regions in state space. Given that in this review, we noticed much higher differences of local dimensions in much lower-dimensional systems (see, e.g., Fig. 11 or 18), we find this reported small number difficult to grasp. In general, given the discussion on falsifiability of Secs. IV C and IV H, as well as limitations that come from the length of input data that we discussed in Sec. III C, we believe that the absolute value of the fractal dimensions reported in these publications should be taken with a grain of salt and not be equated with the available degrees of freedom in the state space. Whether or not the *relative* values of the local dimensions $ \Delta i ( E )$(in the sense that a relatively higher value means higher local state instability for the real system) can be used to draw conclusions or not depends on the confidence one has on the stability of the distributions of $ \Delta i ( E )$ (see Sec. IV C).

#### 11. Persistent homology

Persistent homology methods are based on topological timeseries analysis and applications in dynamical systems. These techniques have been relatively recently applied to estimate fractal dimensions, and a quantitative review was recently published by Jaquette and Schweinhart.^{50}

The methodology is based on tracking how $d$-dimensional holes form or disappear as the point cloud that composes $X$ is “inflated” or “thickened.” This means that each point in $X$ is taken as a sphere with radius initially 0, and this radius is increased as the point cloud is “inflated.” Estimators based on zero-dimensional persistent homology using minimal spanning trees were proposed already in the early 1990s by van de Weygaert *et al.*^{115} and Martínez *et al.*^{116} who stress that this approach provides an estimate of a (generalized) fractal dimension and also works with relative small data sets. For more information about implementations of the method, see Ref. 50.

The review of Ref. 50 compared fluctuations in the output values and the distance of the output values themselves from the reference “true” values of the test sets applied to. The results showed that the persistent homology method has similar performance for $d=0$, and dramatically worse performance for $d>0$, when compared to the correlation sum. Unfortunately, for $d=0$, the method performs poorly when noise is present, i.e., it does not distinguish two slopes (of the noise and of the deterministic set) and instead shows that of the noise, while it does find two slopes for $d=1$. Additionally, the method output depends strongly on its meta-parameter $\alpha $, whose value cannot be deduced from input data. For these reasons, and because the methodology itself is more complicated to both explain and implement than the correlation sum of Sec. II C, we deem the method worse than the correlation sum and only considered the correlation sum for a more in-depth comparison in Sec. III.

### APPENDIX B: SOFTWARE IMPLEMENTATIONS AND CODE BASE

The work done in this paper and the figures produced are available as a fully reproducible code base, which can be found on GitHub.^{117} It is written in the Julia language^{118} and is using the software: DynamicalSystems.jl,^{36} DifferentialEquations.jl,^{119} BenchmarkTools.jl,^{120} ComplexityMeasures.jl,^{104} LsqFit.jl, and DrWatson.^{121} Figures were produced with Makie.^{122} All methods, with the exception of Judd’s algorithm and the persistent homology method, are implemented, documented, and tested extensively in the FractalDimensions.jl^{123} submodule of DynamicalSystems.jl. The implementations follow best practices in scientific code^{124} and are highly optimized, utilizing multi-threading whenever possible. The following code is a simple example of calculating $ \Delta 2 ( C )$, $ \Delta 1 ( H )$, and $ \Delta i ( E )$ with DynamicalSystems.jl and the Julia language:

### APPENDIX C: DYNAMIC RULES OF KNOWN SYSTEMS

All dynamical systems used for generating data are listed in Table III.

System . | Dynamical rule . | Initial conditions . | Parameters . |
---|---|---|---|

Hénon map | $ x n + 1=1\u2212a x n 2+ y n, y n + 1=b x n$ | (0.08, 0.12) | a = 1.4, b = 0.3 |

Kaplan–Yorke map | $ x n + 1=2 x n%1, y n + 1=\lambda y n+cos\u2061(4\pi x n)$ | (0.15, 0.2) | λ = 0.2 |

Towel map | $ x n + 1 = 3.8 x n ( 1 \u2212 x n ) \u2212 0.05 ( y n + 0.35 ) ( 1 \u2212 2 z n ) , y n + 1 = 0.1 ( ( y n + 0.35 ) ( 1 + 2 z n ) \u2212 1 ) ( 1 \u2212 1.9 x n ) , z n + 1 = 3.78 z n ( 1 \u2212 z n ) + b y n$ | (0.085, − 0.121, 0.075) | |

Hénon–Heiles | $ x \u02d9= p x, y \u02d9= p y, p \u02d9 x=\u2212x\u22122xy, p \u02d9 y=\u2212y\u2212( x 2\u2212 y 2)$ | $(0,\u22120.25,0.42081,0)$ | |

Coupled logistic maps | $ u n + 1 ( i ) = 4 v n ( i ) ( 1 \u2212 v n ( i ) ) , v n ( i ) = u n ( i ) + k ( u n ( i \u2212 1 ) \u2212 2 u n ( i ) + u n ( i + 1 ) )$ | $(0.1,\u2026,0.9)$ | k = 0.1 |

Rössler | $ x \u02d9=\u2212y\u2212z, y \u02d9=x+ay z \u02d9=b+z(x\u2212c)$ | (0.1, − 0.2, 0.1) | a = b = 0.2, c = 5.7 |

Periodic parameters: | a = b = 0.2, c = 3 | ||

Lorenz-96 | $ x i \u02d9= ( x i + 1 \u2212 x i \u2212 2 ) x i \u2212 1\u2212 x i+F$ | (j × 0.1 for j ∈ 0, …, D − 1) | F = 24 |

System . | Dynamical rule . | Initial conditions . | Parameters . |
---|---|---|---|

Hénon map | $ x n + 1=1\u2212a x n 2+ y n, y n + 1=b x n$ | (0.08, 0.12) | a = 1.4, b = 0.3 |

Kaplan–Yorke map | $ x n + 1=2 x n%1, y n + 1=\lambda y n+cos\u2061(4\pi x n)$ | (0.15, 0.2) | λ = 0.2 |

Towel map | $ x n + 1 = 3.8 x n ( 1 \u2212 x n ) \u2212 0.05 ( y n + 0.35 ) ( 1 \u2212 2 z n ) , y n + 1 = 0.1 ( ( y n + 0.35 ) ( 1 + 2 z n ) \u2212 1 ) ( 1 \u2212 1.9 x n ) , z n + 1 = 3.78 z n ( 1 \u2212 z n ) + b y n$ | (0.085, − 0.121, 0.075) | |

Hénon–Heiles | $ x \u02d9= p x, y \u02d9= p y, p \u02d9 x=\u2212x\u22122xy, p \u02d9 y=\u2212y\u2212( x 2\u2212 y 2)$ | $(0,\u22120.25,0.42081,0)$ | |

Coupled logistic maps | $ u n + 1 ( i ) = 4 v n ( i ) ( 1 \u2212 v n ( i ) ) , v n ( i ) = u n ( i ) + k ( u n ( i \u2212 1 ) \u2212 2 u n ( i ) + u n ( i + 1 ) )$ | $(0.1,\u2026,0.9)$ | k = 0.1 |

Rössler | $ x \u02d9=\u2212y\u2212z, y \u02d9=x+ay z \u02d9=b+z(x\u2212c)$ | (0.1, − 0.2, 0.1) | a = b = 0.2, c = 5.7 |

Periodic parameters: | a = b = 0.2, c = 3 | ||

Lorenz-96 | $ x i \u02d9= ( x i + 1 \u2212 x i \u2212 2 ) x i \u2212 1\u2212 x i+F$ | (j × 0.1 for j ∈ 0, …, D − 1) | F = 24 |

### APPENDIX D: DELAY EMBEDDING PARAMETERS FOR EXPERIMENTAL DATA

The method of Kraemer *et al.*^{34} finds optimal delay times that may not be equispaced. The amount of delay times found is equal to the embedding dimension. The delay times found are listed in the below list for each system (delay times are always integers, in units of the sampling time). For Vostok and “nifty50,” data we used traditional techniques of optimizing delay time and embedding dimension individually via Cao’s method and minimum of mutual information, because the method of Ref. 34 (correctly) yields that the data do not accommodate a proper embedding:

“electroch. 1”: (0, 26, 13, 5, 20),

“electroch. 2”: (0, 25, 16, 148, 138, 87, 60, 105),

“Shinriki”: (0, 19, 38, 57),

“nifty50”: (0, 43, 86, 129, 172, 215),

“vostok”: (0, 50, 100, 150, 200, 250, 300),

“double pendulum”: (0, 51, 25, 39, 12),

“Roessler”: (0, 6, 3, 14), and

“EEG IBI”: (0, 13, 26, 39, 7).

### APPENDIX E: MORE PLOTS FOR SIGNIFICANCE OF EVT

### APPENDIX F: INCREASING EMBEDDING DIMENSION OF REAL-WORLD DATA

In Figs. 24 and 25, we perform what is known as standard practice when estimating FDs: increasing the embedding dimension iteratively until a convergence of FD appears at the largest scales of $\epsilon $.^{22} In particular, for this subsection, we estimate the slope of the *right-most* linear scaling region (i.e., the one at the largest $\epsilon $), as opposed to the slope of the *largest* linear region. That is because in real data, it is this slope that should indicate the FD of the underlying deterministic dynamics, if any exist (see Sec. III F).

Indeed, for the “electrochemical 2” dataset, the convergence of $ \Delta 2 ( C )$ becomes apparent very quickly. On the other hand, for “nifty50,” there is no convergence (we computed $ C 2$ up to $d=13$, not shown). The results of $ H 2$ for “nifty50” are inaccurate because the timeseries has only 3125 points; they should not be trusted (but anyways they do not show any convergence either).

We also perform the same analysis for the EVT approach, which once again reinforces that the method which fails to understand the stock market timeseries should not have a convergent dimension. Instead, the dimension estimates converge very rapidly with increasing $d$.

## REFERENCES

*The Fractal Geometry of Nature*

*Fractical Geometry Mathematical Foundations and Applications*

*Undergraduate Lecture Notes in Physics*, 1st ed. (Springer Nature, Cham, 2022).

*Dimensions and Entropies in Chaotic Systems*, edited by G. Mayer-Kress (Springer, Berlin, 1986).

*Dynamical Systems and Bifurcations*, edited by B. L. J. Braaksma, H. W. Broer, and F. Takens (Springer, Berlin, 1985), pp. 99–106.

*Chaotic Evolution and Strange Attractors*

*Chaotic Dynamics, An Introduction Based on Classical Mechanics*

*Nonlinear Time Series Analysis*

*Analysis of Observed Chaotic Data*

*Measures of Complexity and Chaos*, edited by N. B. Abraham, A. M. Albano, A. Passamante, and P. E. Rapp (Springer New York, Boston, MA, 1989), pp. 51–62.

*Encyclopedia of Complexity and Systems Science*, edited by R. A. Meyers (Springer, New York, NY, 2009), pp. 3754–3779.

*Functional Differential Equations and Approximation of Fixed Points*, edited by H.-O. Peitgen and H.-O. Walther (Springer, Berlin, 1979), pp. 204–227.

*Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts*(John Wiley & Sons, Nashville, TN, 2016).

*Springer Series in Statistics*, 1983rd ed. (Springer, New York, NY, 2012).

*Chaos Detection and Predictability*, Lecture Notes in Physics Vol. 915, edited by C. H. Skokos, G. A. Gottwald, and J. Laskar (Springer, Berlin, 2016), pp. 1–34.

*Fractal Dimensions of Networks*(Springer International Publishing, Cham, 2020), pp. 325–364.

*Dynamical Systems and Turbulence, Warwick 1980*, edited by D. Rand and L.-S. Young (Springer, Berlin, 1981), pp. 366–381.