Two dynamical indicators, the local dimension and the extremal index, used to quantify persistence in phase space have been developed and applied to different data across various disciplines. These are computed using the asymptotic limit of exceedances over a threshold, which turns to be a generalized Pareto distribution in many cases. However, the derivation of the asymptotic distribution requires mathematical properties, which are not present even in highly idealized dynamical systems and unlikely to be present in the real data. Here, we examine in detail the issues that arise when estimating these quantities for some known dynamical systems. We focus on how the geometry of an invariant set can affect the regularly varying properties of the invariant measure. We demonstrate that singular measures supported on sets of the non-integer dimension are typically not regularly varying and that the absence of regular variation makes the estimates resolution dependent. We show as well that the most common extremal index estimation method is not well defined for continuous time processes sampled at fixed time steps, which is an underlying assumption in its application to data.
Extreme value theory shows that under fairly general conditions, the distribution of exceedances over a high threshold of a given process follows a generalized Pareto distribution (GPD). When the process is generated by a dynamical system, the distribution encodes some dynamical properties of the process that allow for the estimation of indicators of persistence. These indicators are the local dimension and the extremal index, and they are being employed to analyze all sorts of environmental data. However, the algorithms used to estimate them require properties, which are seldom present in any realistic system. In this paper, we explore numerically different commonly used dynamical systems to show that the conditions required for the exceedances to converge to a GDP distribution are typically not met. We also show flaws in the statistical method used to infer the extremal index in continuous time systems. These results should be taken into consideration when applying these methodologies to real world data.
I. INTRODUCTION
In recent years, there has been a growing interest in extreme value theory based methods to obtain estimates of dynamical quantities. These methods stem from developments in the analysis of extreme events and their relationship with dynamical systems theory. Same as the sum of random variables has an asymptotic distributional limit described by the central limit theorem, less known theorems describe the distributional limits of extreme events. The asymptotic distributions of extremes allow the quantification of probabilities that relate the magnitude of an event with its frequency. These results have been extended to cover extremes that arise in observables computed along trajectories of chaotic dynamical systems. By defining an observable that depends on the distance between a reference point and the trajectory of the system in state space, one can obtain a representation of the extreme events as close recurrences to the reference point. These recurrences relate the dynamical and ergodic properties of the system to the statistics of the extremes. These methods have been extensively used in atmospheric data,1–14 climate models,15,16 and fractal dimension estimation.17
We focus our attention on two significant extreme value laws. The asymptotic limit of the maximum value in an observed data series partitioned in blocks follows a Generalized Extreme Value (GEV) distribution. Meanwhile, the asymptotic limit of the exceedances over a high threshold follows a Generalized Pareto Distribution (GPD). These laws have also been shown to apply to extremes of observables along orbits of dynamical systems under sufficiently fast decay of correlations.18,19 The theory for estimating the local dimension through extreme value theory has been developed theoretically mostly for the GEV case,20–23 with only a couple of papers dealing with the GPD estimator.24,25 However, as typically the GPD approach requires less data, it is the one that has been mostly used in applications. However, this approach requires several abstract mathematical properties to hold, which are different from the ones required by the GEV approach. On one hand, there are some statistical properties of the dynamical system such as ergodicity and stationarity. On the other hand, there are some geometrical conditions on the invariant measure of the system, such as the existence of the local dimension and regular variation.
The dynamical quantities that are used to obtain information about the system are the local dimension and the extremal index. The setting required is a dynamical system whose evolution is contained in a bounded set of phase space, which possesses an invariant measure that describes the asymptotic frequency of visitation of sets. The local dimension is a geometrical quantity that describes the scaling of the invariant measure of a ball against its radius. This quantity is related naturally to the parameters of extreme value distributions. The extremal index is a statistical parameter that quantifies short-range correlations and appears when the extreme events cluster.
In this paper, we focus on numerically investigating the regular variation property, and the index-based estimation of the extremal index. Regular variation is a necessary property for the convergence of the exceedances to a GPD and the estimation of the local dimension. However, it is rarely mentioned and, to our knowledge, it has never been checked for any system or dataset in the literature dealing with the GPD method. Even some texts that explicitly intend to illustrate the use of this method do so in systems, which are not regularly varying. Index-based estimation is the most used method to estimate the extremal index from data. It is based on the indexed position of the extremes in the data series. It is as such a method that relies on the data series being of discrete time. However, it is routinely applied to continuous time processes discretized by sampling, a procedure that can be done in different ways and may impact the estimation.
II. BACKGROUND
The EBD algorithm.
Inputs: is a high quantile, is an orbit and is an |
independently chosen reference point. |
for , do |
Compute . |
end for |
Sort and set the threshold to be the element which has index |
. |
Compute the set of exceedances |
Compute the inverse of the average of the exceedances |
Outputs: Local dimension at the point . |
Inputs: is a high quantile, is an orbit and is an |
independently chosen reference point. |
for , do |
Compute . |
end for |
Sort and set the threshold to be the element which has index |
. |
Compute the set of exceedances |
Compute the inverse of the average of the exceedances |
Outputs: Local dimension at the point . |
In applications,1–14 the output of the EBD algorithm is usually paired with an estimation of the extremal index . The extremal index arises in the generalized extreme value distribution in the presence of short-range dependencies and can be seen as a number that describes the inverse of the mean size of clusters of extremes.26 Note that the derivation of the exponential law for the exceedances above does not need a particular dependency structure and does not feature the extremal index in the limiting distribution. However, in most papers,1–6,8–14 discussing applications it is included in the distribution function without a mathematical justification. The PoT approach erases the dependencies by considering the points over the threshold without regard for its position in the process. Shuffling the time series randomly will produce the exact same extremes and distribution. That is why, the extremal index should not appear in the distribution function. In contrast, the extremal index appears naturally in the block maxima approach since the clusters typically fall within a block, and thus only one extreme is chosen from each cluster, biasing the distribution.
While the extremal index can be seen as statistical quantity that is intrinsically linked to the process and can be estimated from data, it does not appear in the asymptotic distribution of extremes through the PoT method.
Typically, the set of reference points in a chaotic invariant set that give raise to an extremal index different than 1 is the set of stable and periodic points (see, Ref. 27, Chap. 4), and thus a set of measure zero, unlikely to be observed in most applications.
The index-based estimator for the extremal index is the most used in applications and infers the size of the clusters of exceedances by the index length between them. The following algorithm describes in detail how the computation works.
Index-based estimator for the extremal index.28
Inputs: is a high quantile, is an orbit and is an |
independently chosen reference point. |
for , do |
Compute . |
end for |
Sort and set the threshold to be the element which has |
index . |
Find the indexes of the exceedances in the |
unsorted obs. |
Compute the inter-exceedance times |
Compute and define |
Compute |
Compute |
Outputs: Index-based estimator of the extremal index . |
Inputs: is a high quantile, is an orbit and is an |
independently chosen reference point. |
for , do |
Compute . |
end for |
Sort and set the threshold to be the element which has |
index . |
Find the indexes of the exceedances in the |
unsorted obs. |
Compute the inter-exceedance times |
Compute and define |
Compute |
Compute |
Outputs: Index-based estimator of the extremal index . |
III. LIMITATIONS IN THE ESTIMATION OF THE LOCAL DIMENSION
For measures defined on invariant sets of dynamical systems, it is complicated to obtain an analytical result regarding its regular variation properties. Sometimes, analytic calculations can be done for measures supported on self-similar attractors, but this applies only on a case-by-case basis. Here, we show through a series of selected examples how the regular variation of the measure relates to the geometry of the attractor. We put special emphasis on those which have been previously used in the literature related to the EBD algorithm.
A summary of the dynamical systems used, their dynamical rules, and some of their properties can be found in Table I.
Description of various systems considered in this paper.
System . | Dynamical rule . | Parameters . | Information dimension . | Regularly varying . |
---|---|---|---|---|
Logistic map | xn+1 = axn(1 − xn) | a = 4 | 1 | Yes |
Cantor shift | Left shift (See Appendix B) | N/A | No | |
Fat Cantor set | Random ordering | N/A | 1 | Yes |
Hénon map | a = 1.4, b = 0.3 | 1.26 ± 0.0231 | No | |
yn+1 = bxn | ||||
Arnold’s Cat map | xn+1 = 2xn + yn (mod 1) yn+1 = xn + yn (mod 1) | N/A | 2 | Yes |
Solenoid map | φn+1 = 2φn (mod 2π) (See Appendix D) | a = 0.076 | No | |
Aperiodic circle rotation | N/A | 2 | Yes | |
Lorenz-63 | σ = 10, ρ = 28 β = 8/3 | 2.06 ± 0.0129 | No | |
Lorenz-96 | F = 32 1 ≤ i ≤ 4 | ≈2.834… | No | |
F = 8 1 ≤ i ≤ 8 | ≈4.872… | No | ||
Hénon–Heiles | x0 = (0, − 0.25, 0.42, 0) | 3 | Unclear |
System . | Dynamical rule . | Parameters . | Information dimension . | Regularly varying . |
---|---|---|---|---|
Logistic map | xn+1 = axn(1 − xn) | a = 4 | 1 | Yes |
Cantor shift | Left shift (See Appendix B) | N/A | No | |
Fat Cantor set | Random ordering | N/A | 1 | Yes |
Hénon map | a = 1.4, b = 0.3 | 1.26 ± 0.0231 | No | |
yn+1 = bxn | ||||
Arnold’s Cat map | xn+1 = 2xn + yn (mod 1) yn+1 = xn + yn (mod 1) | N/A | 2 | Yes |
Solenoid map | φn+1 = 2φn (mod 2π) (See Appendix D) | a = 0.076 | No | |
Aperiodic circle rotation | N/A | 2 | Yes | |
Lorenz-63 | σ = 10, ρ = 28 β = 8/3 | 2.06 ± 0.0129 | No | |
Lorenz-96 | F = 32 1 ≤ i ≤ 4 | ≈2.834… | No | |
F = 8 1 ≤ i ≤ 8 | ≈4.872… | No | ||
Hénon–Heiles | x0 = (0, − 0.25, 0.42, 0) | 3 | Unclear |
A. Cantor shift
We start our exposition with the Cantor shift map for two reasons. One is that, it is such a simple system that it can be shown that its invariant measure is not regularly varying. The second one is that it has been used in the literature (see, Ref. 27, Chap. 9) to illustrate the EBD method, recognizing difficulties to apply it but without identifying it as a system with a non-regularly varying invariant measure. The dynamics in the Cantor set are given by a left shift in a symbolic coding of the Cantor set. See Appendix B, for more details and a proof of the statement that the invariant measure on the Cantor set is not regularly varying.
Figure 1 shows the result of taking the exceedances from the Cantor shift map and fitting a GPD distribution to them according to the EBD algorithm. The holes in the measure, in turn, produce holes in the distribution of exceedances as certain values are not possible.
Distribution of exceedances for the Cantor shift map, and a GPD distribution with parameter equal to the inverse of the average of the exceedances.
Distribution of exceedances for the Cantor shift map, and a GPD distribution with parameter equal to the inverse of the average of the exceedances.
Estimates of the regular variation property and local dimension for some selected discrete time systems. (a) If the system is not regularly varying the local dimension estimation is scale dependent. (b) Oscillations take place in every point in some scale due to the fractal structure. (c) The -Bernoulli measure is regularly varying in the fat Cantor set, despite the similarities with the Cantor shift. (d) For a system with a uniform scale-free structure, the oscillations can coordinate.
Estimates of the regular variation property and local dimension for some selected discrete time systems. (a) If the system is not regularly varying the local dimension estimation is scale dependent. (b) Oscillations take place in every point in some scale due to the fractal structure. (c) The -Bernoulli measure is regularly varying in the fat Cantor set, despite the similarities with the Cantor shift. (d) For a system with a uniform scale-free structure, the oscillations can coordinate.
The top panel of Fig. 2(a) shows that the quantity described in (9) does not converge to . Instead, it oscillates strongly for any of the orders of magnitude reached. Since the measure is not regularly varying, we see a non-vanishing factor that prevents the convergence of the asymptotic distribution to an exponential. The middle panel shows that the estimate of the local dimension is highly correlated with the concentration of the measure and oscillates as well. This implies that the estimates of the local dimension are resolution dependent due to the lack of regular variation. The correlation estimate for the dimension shows reasonable values, and the final value is relatively close to the actual analytical solution .
B. Fat Cantor set
An interesting edge case is given by the so-called “fat fractals.” These are sets of positive Lebesgue measure and, thus, of integer dimension but that might have a very intricate fine structure similar to a fractal set. The example that we show here is a Cantor set, a nowhere dense set of isolated points and thus topologically equivalent to the middle third Cantor set, but it has positive Lebesgue measure. The set is produced via a dynamic construction, using a non-autonomous modification of the tent map, for more details see Appendix C.
Despite the holes in the set and the topological equivalence to the Cantor set, the Bernoulli measure here is regularly varying. This can be seen by the convergence of the quantity [Eq. (9)] in the top graph of the bottom panel of Fig. 2(c) to the expected value of . The estimates of the local dimension also seem to converge to the right answer, which is 1. Note that the similarities both in terms of geometry and topology between the two Cantor set examples leave as main difference the Lebesgue size in which the Bernoulli measure is concentrated.
C. Hénon map
The Cantor shift map serves as an example of the consequences of the lack of regular variation. Cantor sets are also models for the sections of attractors such as the Hénon map. In Ref. 30, it is stated that the EBD algorithm is adequate for non-Axiom A chaotic dynamical systems with multifractal properties. The Hénon map is a counterexample of this statement.
Figure 3 shows the same computation as in the previous figure but performed on a randomly selected point in the Hénon attractor. In the bottom panels, it also shows the exceedances and their geometrical disposition on four different stages of the zooming in process around . As the threshold becomes larger, the algorithm zooms more into the local structure of the attractor. A branch that looks like a line from afar, doubles, and becomes separate lines and, eventually, the one not containing disappears and the one containing it starts dividing into more. This process causes the quantity and the estimation of the local dimension to oscillate strongly. Due to the structure of the attractor, we expect this phenomenon to arise at all sufficiently small scales, for all points.
This figure shows the geometrical structure of the points that lay in the outer ball as the radius becomes small. The top graph shows the quantity , describing the ratio of the measure of two concentric balls, and the bottom graph shows the corresponding estimation of the local dimension. Both plots seem to converge for a several orders of magnitude, but then an oscillation shows up. The vertical red lines indicate the zooming process where in we take the snapshots of the attractor shown below and label (a) to (d).
This figure shows the geometrical structure of the points that lay in the outer ball as the radius becomes small. The top graph shows the quantity , describing the ratio of the measure of two concentric balls, and the bottom graph shows the corresponding estimation of the local dimension. Both plots seem to converge for a several orders of magnitude, but then an oscillation shows up. The vertical red lines indicate the zooming process where in we take the snapshots of the attractor shown below and label (a) to (d).
Figure 2(b) shows the result of zooming and averaging over 1000 randomly selected reference points. The quantity oscillates for any given point, at different values of the radius. There is a clear cluster around the value and corresponding . This means that at any small scale, the exceedances around most reference points look like a line at that resolution.
Another use of this algorithm is to compute the information dimension of the attractor (Refs. 17 and 27, Chap. 9) by averaging the local dimension estimates along a trajectory. This typically gives values around for the Hénon attractor, which is only one decimal point away for the value estimated by other authors.31 This is possibly a reason why the this system has been used to illustrate the EBD method in some texts (Refs. 24 and 27, Chap. 9), despite the lack of regular variation.
D. Axiom A system: The solenoid map
The Solenoid map is an example of a uniformly hyperbolic system, also called Axiom A. This class of systems have been presented in the literature as a valid setting to apply this algorithm (Refs. 24 and 27, Chap. 9). In Refs. 32 and 33, it is stated that the exceedances of the process (1) around a point in the attractor of an Axiom A system converge to an exponential distribution with parameters equal to the inverse of the local dimension around that point. Here, we produce a counterexample.
The Solenoid map is a discrete time dynamical system on a torus, which stretches the torus to be twice as long and times thinner and wraps it twice inside the original torus. A detailed description of the map and a estimation procedure for its invariant measure can be found in Appendix D.
Figure 2(d) shows the quantity and the estimated local dimension for 1000 points in the Solenoid attractor, with parameter . The quotient of the measure of the balls becomes independent of the chosen reference point due to the uniformity of the attracting set. The result is an oscillation which synchronizes for any reference point, and the same for the estimated local dimension. As a result, there is no convergence for the algorithm even on average.
Using the approximation procedure described in Appendix D, one can plot the measure of a ball of radius centered around a point in the attractor. This behaves approximately as a power law but with a series of kinks spaced by powers of the parameter . These kinks induce the logarithmic oscillations of period and are consistent with the geometrical phenomenon observed in Fig. 3.
E. Lorenz 63
To compute the true recurrences in continuous time systems, we have implemented an interpolation scheme. This makes the dimension estimates 1 unit lower than in reality, see Appendix E for an explanation of the interpolation procedure.
The Lorenz 63 system has also been used to illustrate the EBD method.2,7,30 Figure 4(a) shows the analysis as performed with the discrete time systems but with the interpolation scheme. As the radius decreases the estimates of the measure ratio cluster around 0.5 with some downward excursions. Similarly, the estimates of the dimension cluster around 1 but have oscillations that push them upward for some range of scales. Similar to the Hénon map, the attractor is locally like a surface around most points at any given sufficiently small scale but not for any point through all the scales. The average of the local dimensions stabilizes around the value 2.054, very close to the value reported in Ref. 29.
Estimates of the regular variation property and local dimension for some selected continuous time systems. (a) Continuous time systems display similar phenomena as the discrete time ones. (b) The oscillations in the local dimension do not preserve relative magnitude or order. (c) The amplitude of the oscillations can be of the same order of magnitude as the ambient space dimension. (d) For conservative systems with dynamics in a submanifold in phase space, our method of study may diverge spuriously.
Estimates of the regular variation property and local dimension for some selected continuous time systems. (a) Continuous time systems display similar phenomena as the discrete time ones. (b) The oscillations in the local dimension do not preserve relative magnitude or order. (c) The amplitude of the oscillations can be of the same order of magnitude as the ambient space dimension. (d) For conservative systems with dynamics in a submanifold in phase space, our method of study may diverge spuriously.
F. Lorenz 96
An example of a less flat attractor is given by the 4D Lorenz 96 model. Figure 4(c) shows the results for this system which results in chaotic time evolution. The quantities plotted do not seem so converge or cluster in any way. A sufficiently small scale seems to be reached around , after that the distribution of estimates appears to be stable. The range of the estimations remains very broad and with a big dispersion. Our estimation of the information dimension of the attractor over a trajectory with points is , which differs slightly from the average answer of of the run.
Figure 4(b) shows a similar plot with 25 points randomly chosen from the attractor of the Lorenz96 model in 8D. The lines corresponding to the lowest local dimension (dotted) and highest (dashed) at the end of the estimation have been highlighted. In the bottom panel of Fig. 4(c), the difference between the dotted and the dashed curves is plotted. These lines show that the oscillations do not preserve relative order or magnitude and thus inferring any property based on the value of the local dimension is an artifact of the data length. This is directly relevant for studies that associate events to particular atmospheric configurations depending on if its local dimension is high or low, such as Refs. 2,3,5–8,15. We show clearly that which event would get which classification depends on the data length if the underlying dynamics that generated the data are not regularly varying. This is because, the same initial conditions can have larger or smaller local dimension based on the data length.
G. Hénon–Heiles system
The Hénon–Heiles system is a Hamiltonian system that was introduced to model the orbit of stars around galactic centers. Its orbits display a fat fractal behavior.34 The initial condition provided to this system determines the total energy available to the orbits. For our simulations we have used far apart initial conditions belonging to the trajectory with initial condition (0.0, 0.25, 0.42, 0.0).
The results for this system are not conclusive. A problem that arises is that the numerical integration produces round up errors that do not conserve the energy of the system on the long run. This means in our case that the reference point is in a manifold from which the dynamics drift away. Based in the rest of examples, we would expect to see convergence in this case. The graphs of Fig. 4(d) could be converging around , but they start diverging significantly for smaller radii.
IV. LIMITATIONS OF THE EXTREMAL INDEX ESTIMATION
A continuous time notion of the extremal index exists.35 However, previous works1–10,14 use the discrete time version of the extremal index under the assumption that it is adequate when the data are a discrete time sampling of a continuous time process. The papers use the index-based estimator given by Ref. 28, which infers the value of through the indexes of the extremes as in a discrete time process , see Algorithm 2. Clustered exceedances correspond to a lower value of , meanwhile longer gaps between them may increase it. Thus, the method used to discretize the continuous time process affects the estimation. An unambiguous method of discretizing is, for example, the return map to a Poincaré surface or a suspension flow. However, if one uses a numerical scheme as temporal discretization, the extremal index estimator will depend on the sampling time, the local speed of the flow, the length of the data series and the threshold choice. Denser sampling will make clusters bigger but also make the gaps between clusters bigger in the index sense. Longer trajectories may decrease the sizes of the clusters, but they will introduce more clusters and more gaps of unknown length. Increasing the threshold will make the clusters smaller and the gaps bigger, therefore increasing the value of . If an adaptive time step is used, additional parameters may affect the estimation.
To study the effects of varying these parameters we have made a simple experiment with the Lorenz 63 model, the result of which can be seen in Figure 5. On the top-left panel the extremal index is computed and averaged over the process (8) using points on a trajectory while varying the time step . The temporal length is fixed at time units. Two curves have been computed, one with a fixed quantile and one with a varying quantile . This quantile corresponds at taking as extremes approximately the biggest values, where is the total amount of data in the process. For both curves the extremal index decreases strongly as the time step decreases, given that . The bottom left panel shows a similar experiment but with a fixed time step and varying the temporal length of the trajectory. The quantiles are the same as in the top-left panel. Now, there is a difference in the behavior of the two curves. While the extremal index estimation does not change with length when the quantile stays fixed, the extremal index grows with length when the quantile also increases.
On the left side, the average extremal index computed over the points in a trajectory using the index estimator. In the top panel, it is computed along trajectories of the same temporal length but with different sampling times. On the bottom panel shows the effect of changing the trajectory temporal length with a fixed time step of the integration. On the right side, average time spent on a cluster computed over the points in a trajectory using the normalized extremal index. On the top panel, it is computed along trajectories of the same temporal length but with different sampling times. The bottom panel shows the effect of changing the trajectory temporal length with a fixed time step of the integration.
On the left side, the average extremal index computed over the points in a trajectory using the index estimator. In the top panel, it is computed along trajectories of the same temporal length but with different sampling times. On the bottom panel shows the effect of changing the trajectory temporal length with a fixed time step of the integration. On the right side, average time spent on a cluster computed over the points in a trajectory using the normalized extremal index. On the top panel, it is computed along trajectories of the same temporal length but with different sampling times. The bottom panel shows the effect of changing the trajectory temporal length with a fixed time step of the integration.
This quantity, averaged over the trajectory, is shown in the right panels of Fig. 5. In the top right panel, we see that the time duration of the clusters decreases with denser sampling, although this time the behavior is more affected by the quantile choice. When keeping the quantile fixed the t0ime spent in the cluster decreases but seems to stabilize as . When the quantile changes with the amount of data, the average time in clusters decreases almost linearly. This shows that in practice we cannot compare results from index-based estimators over time series sampled with different frequencies. The bottom right panel shows how the average time in the cluster changes with the length of the trajectory. If the quantile is fixed, the average time spent in clusters remains approximately constant, but if the quantile increases the average time spent in clusters decreases with the length of the trajectory.
The choice of quantile has a significant impact on the results. Generally, there is a trade-off between the quantile and the number of data points. While higher quantiles deliver a better approximation to the asymptotic limit, they also result in smaller sample sizes from which to perform statistical inference. The changing quantile chosen leaves points as extremes. The reason for this is that increasing simultaneously increases the quantile and the size of the sample, so it should give a better approximation in both fronts. This shows how sensitive the parameters are to this estimation procedure. The index-based estimator still quantifies a sort of persistence in phase space, but it is related to the spurious correlations induced by the speed of the flow around a point. This is not what the extremal index is meant to quantify. Instead is meant to quantify correlations between distinct recurrences, such as those experimented by a discrete time system when the reference point is a periodic point.
This raises the question of whether the values of extremal indices calculated using the index-based approach say something meaningful about the underlying continuous time dynamical systems or are simply numerical artifacts. Indeed, an unambiguous discretization method through a Poincaré return map assigns an extremal index of to almost all points for the Lorenz 63 system attractor,36 which is what should be expected given the theory of the extremal index (Ref. 27, Chap. 4). This also becomes obvious from the top-left panel of Fig. 5 as , irrespectively of reference point, in the limit , where is a timescale comparable to the typical variability timescales of the dynamical system.
V. CONCLUSIONS
The regular variation property is necessary for the convergence of the exceedances to an exponential distribution. In the previous literature, this property is assumed to hold for the invariant measure of large classes of dynamical systems. Our study shows that the lack of regular variation makes the local dimension estimates resolution dependent. Thus, the estimation depends strongly on the number of data. The bias introduced by this effect can be small relative to the dimension of the system, like in the Solenoid map case, or large as in the 4D Lorenz 96 case. This bias affects differently each data point, or equivalently, each initial condition. In higher dimensional systems, another important source of bias is the one introduced by the concentration of the Euclidean norm,33 but this affects all points equally.
There is an apparent relationship between the geometry of the invariant set and the regular variation property. It appears that when the invariant measure is singular with respect to Lebesgue measure, then it is not regularly varying. Thus, any measure whose supporting set has a dimension different from an integer would not be regularly varying. If true, this would imply that regularly varying systems might be the exception rather than the rule in high dimensional chaotic dynamics.
Index-based estimation of the extremal index is performed by constructing a method designed for discrete time systems. For continuous time systems sampled at a fixed time intervals, the index-based extremal index estimation depends heavily on parameters irrelevant to the dynamics of the system, such as the sampling frequency and the threshold (quantile). This shows that the estimator measures the speed of the flow around a point more than anything else. Thus, its deviation from the theoretically expected value of 1 for most systems is likely a numerical artifact rather than a true characterization of the dynamics. This problem is not fixed by renormalizing by the time step, as previous literature suggested.
These findings are directly relevant for the application of dynamical indicators (local dimensional and extremal index) in really complex systems, such as the output of climate models or observational data. In some other applications, analog states are used instead of recurrences in a phase space, but the theory of dynamical systems is applied to them nevertheless. It is reasonable to think that the same assumptions that apply to recurrences should apply to the analogs as well. The estimation procedures are hard to falsify.17,30 However, they are routinely used to analyze data of enormous complexity where checking properties such that of regular variation is unfeasible, yet the findings of the literature are heavily reliant on it. We believe that additional efforts are needed to clarify the suitability of these methods for real (i.e., continuous time, high dimensional, and complex) data.
ACKNOWLEDGMENTS
We want to thank the anonymous reviewers for helpful suggestions that greatly improved the quality of the paper. This research is supported by the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie Actions Grant No. 956170 CriticalEarth. For the purpose of open access, the author has applied a Creative Commons Attribution (CC BY) license to any Author Accepted Manuscript version arising from this submission.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Ignacio del Amo: Formal analysis (equal); Investigation (equal); Methodology (equal); Software (equal); Writing – original draft (equal). George Datseris: Methodology (equal); Software (supporting); Supervision (equal); Writing – review & editing (equal). Mark Holland: Conceptualization (equal); Funding acquisition (lead); Supervision (equal); Writing – review & editing (equal).
DATA AVAILABILITY
The code and the data used to generate the plots in this paper are fully available in GitHub at https://github.com/PythagoreanCult/FractalDimensionsQ, Ref. 41.
APPENDIX A: CORRELATION-BASED ESTIMATION FOR THE LOCAL DIMENSION
APPENDIX B: CONSTRUCTION OF MIDDLE THIRD CANTOR SET AND SHIFT
Visual aid to understand the computation of the measure of a ball with the (1/2,1/2) Bernoulli measure. The segments represent a subset of set , since one of the segments is completely inside the ball always, further resolution is not needed, however, when the radius is not a power of 1/3, the intersection with a nearby segment makes infinite resolution necessary. This takes the form of the Cantor ternary function , represented on top of the pertinent segments.
Visual aid to understand the computation of the measure of a ball with the (1/2,1/2) Bernoulli measure. The segments represent a subset of set , since one of the segments is completely inside the ball always, further resolution is not needed, however, when the radius is not a power of 1/3, the intersection with a nearby segment makes infinite resolution necessary. This takes the form of the Cantor ternary function , represented on top of the pertinent segments.
The measure of ball a radius and centered in a point can be computed in the following way (see Fig. 6). Note that for some , we have that . Regardless of the position of the center, we have that because the ball will always contain a segment of , with equality holding for all if is exactly . When , the measure of the ball will depend on the position of the point, since there will be a subset of positive measure of which is at distance, but only in one of the sides, and there might be an intersection. Let be the end point of the subset such that which is at distance of the closest subset of . Then, the measure of the ball is , where is the Cantor ternary function (also called Devil’s staircase) whenever is positive, or 0 otherwise. When , the ball will always contain some positive measure subset of the nearest subset. The measure of the ball as before can be expressed again as with when the .
The Bernoulli measure is not regularly varying. Note that because of the gaps in the support of the measure, there are values of for which the measure does not change value at all. Let and meaning is the opposite end point. Note that the end points will always belong to by construction. If , then Eq. (5) can only be true if the index . Values of and such that this holds can be found for any point , so decreasing sequences of such that this holds can be found.
APPENDIX C: CONSTRUCTION OF A FAT CANTOR SET
The measure we effectively use is again the Bernoulli measure. Starting with a uniform distribution and because the maps are symmetrical with respect to , each iteration removes the same amount of measure from each side. Thus, the measure of each -cylinder set (corresponding to a interval) is , same as for the middle third Cantor set described above.
APPENDIX D: ESTIMATING THE INVARIANT MEASURE OF THE SOLENOID MAP
The proof of formula (D3) in Ref. 37, Chap. 13, is made by finding a suitable covering by balls of an iterate . This idea can be modified to estimate the volume of a small ball intersecting the Solenoid attractor. The solenoid attractor embedded in looks like a line coiled infinitely many times within a torus, hence the name, and the intersection of the solenoid attractor with a small ball centered at a point in the attractor has the structure of a series of branches of variable length, slightly curved, one of which goes through the center. We make the following simplifications:
The measure is uniform along the length of the branches.
We assume that the branches within the ball are approximately straight.
All branches are of total length when they go around the torus.
Equation (D5) gives an approximation of where the set intersects , since , we have , and thus its distance to the nearest point of is bounded by , corresponding to the maximal diameter of a connected component of .
One thing that becomes clear in the derivation of the approximation of the measure is that the point chosen to do the approximation does not matter much. This is, in formula (D4), we see that the initial angle at which we take the Poincaré section is exponentially dampened, so the angles will be almost the same for any Poincaré section taken. The initial condition within the disk is also exponentially dampened because , so again it does not contribute much for high . This is consistent with the uniform hyperbolicity of the system, which implies that the invariant set should have locally the same structure everywhere.
Using this approximation, the measure of a ball of radius centered around a point can be computed. It behaves approximately as a power law , but with a series of kinks spaced according to the geometrical series , as Fig. 7 shows. This is consistent with the geometrical phenomenon described in Fig. 3 since clusters of points in the Poincaré section are separated by powers of .
Plot of the approximation of the measure and comparison between the obtained empirically in Sec. III and the one obtained analytically through the approximation. The dashed lines mark elements of the sequence of for .
Plot of the approximation of the measure and comparison between the obtained empirically in Sec. III and the one obtained analytically through the approximation. The dashed lines mark elements of the sequence of for .
The approximation follows closely the numerical average of the quantity (9) computed over all the points and allows us to extend it with much lower computational effort. This approximation can yield good values for very small radii, and as a verification, estimating the local dimension as the slope of the plot vs over 16 orders of magnitude gives , which is very close to the exact value , which is an accuracy hard to reach when estimating dimensions of fractal sets.
APPENDIX E: INTERPOLATION FOR CONTINUOUS TIME SYSTEMS
When computing recurrences, numerical schemes introduce a source of error by producing points in the trajectory, which may be far from the reference point even if the continuous time trajectory goes very close to it. As the radius of the ball decreases, the numerical approximation may even miss to hit the ball and omit a true recurrence. These problems can be avoided by performing an interpolation and solving a minimization problem that produces only the closest recurrence of the trajectory within the ball. This means that for every trajectory that enters the ball, only one point is kept. Thus, the set of points that remain in the ball has a dimension which is the dimension of the intersection between the attractor and ball minus 1. Figure 8 shows a schema of an attractor intersecting a ball centered around a reference point and the points that remain inside it after the optimization procedure takes place.
Schematic representation of the computational method used to study continuous time dynamical systems. The reference point is represented by a diamond shaped point and circular points are the closest recurrences to it of each branch of the attractor that enters the ball represented with a dashed line.
Schematic representation of the computational method used to study continuous time dynamical systems. The reference point is represented by a diamond shaped point and circular points are the closest recurrences to it of each branch of the attractor that enters the ball represented with a dashed line.