We propose a mutual information statistic to quantify the information encoded by a partition of the state space of a dynamical system. We measure the mutual information between each point’s symbolic trajectory history under a coarse partition (one with few unique symbols) and its partition assignment under a fine partition (one with many unique symbols). When applied to a set of test cases, this statistic demonstrates predictable and consistent behavior. Empirical results and the statistic’s formulation suggest that partitions based on trajectory history, such as the ordinal partition, perform best. As an application, we introduce the weighted ordinal partition, an extension of the popular ordinal partition with parameters that can be optimized using the mutual information statistic, and demonstrate improvements over the ordinal partition in time series analysis. We also demonstrate the weighted ordinal partition’s applicability to real experimental datasets.
Symbolic dynamics is a powerful tool in nonlinear time series analysis. Given a multidimensional time series, we partition its state space into disjoint sets, each represented by a unique symbol. Analyzing the symbolic representations of trajectories can offer useful insights. How we choose to partition the state space is critical; the aim is to select partitions that produce symbolic representations reflecting the true dynamics of a system. We introduce a mutual information statistic as a novel method for assessing how well a partition achieves this aim. The statistic measures the mutual information between a point’s symbolic trajectory history and its current location. We demonstrate greater suitability on test cases when compared with existing statistics and use the statistic to propose an improvement on the popular ordinal partition.
I. BACKGROUND AND MOTIVATION
Symbolic dynamics is the modeling of a dynamical system by discretizing its state space into a set of partitions, with applications in nonlinear time series analysis.1 Symbolic dynamics analysis requires the application of a partitioning scheme, which is the process of assigning a symbol to each point in the data based on which partition it falls into. A partitioning scheme can be considered a map , where represents the state space of a dynamical system and represents the set of symbols assigned uniquely to each partition. Good partitions are those that produce a symbol sequence retaining a large amount of information of the original system, termed “high-information;” finding high-information partitions is, therefore, a problem of interest.
A generating partition is a particular high-information partition where there is a one-to-one correspondence between each trajectory and its infinite symbolic sequence.2 Finding generating partitions is difficult, and a general method does not exist. There is an existing body of work, which attempts to estimate generating partitions. While these methods are theoretically applicable to higher-dimensional continuous systems, implementation has proven challenging, hence applications have focused on two-dimensional systems such as the Hénon and Ikeda maps.2–6
Some works that estimate generating partitions directly from time series have introduced statistics to assess how close to generating a partition is and attempt to find partitions optimizing their values. These statistics follow the principle that under a generating partition, observations that are neighbors in symbol space should also be neighbors in state space; see, for example, Kennel and Buhl’s symbolic false nearest neighbors3 or symbolic shadowing by Hirata et al..2,5 These statistics encourage partitions that are contiguous or otherwise localized in the state space. However, we suggest that partitions that are considered close to generating in this manner are not necessarily well suited for time series analysis tasks. The ordinal partition, introduced by Bandt and Pompe,7 does not necessarily generate contiguous partitions but has been widely used with success in time series analysis tasks.
The ordinal partition is particularly popular for its computational ease.1,8 However, superior partitioning methods allowing for greater information retention may exist. Furthermore, in addition to these generating partition statistics, there is currently no method for assessing and comparing the performance of different partitioning methods. Thus, in this paper, we have two aims: first, to develop a statistic that can be used to quantifiably assess the performance of a partition on a particular system and second, to use this statistic to identify superior alternatives to ordinal partitioning. The statistic we propose allows for the quantitative comparison of partitioning schemes on various systems, and we demonstrate its potential to identify better partitions for time-series analysis.
In Sec. II, we introduce a mutual information statistic as a method for assessing partitions. In Sec. III, the statistic is applied to a set of test partitions and systems to highlight differences with generating partition statistics and to ensure that it correctly ranks the partitions’ quality in accordance with their performance in time series analysis. In Sec. IV, we introduce the weighted ordinal partition as an application of the statistics and demonstrate improvements over the ordinal partition in time series analysis tasks. In Sec. V, we apply the weighted ordinal partition to two real experimental datasets.
II. MUTUAL INFORMATION STATISTIC
Suppose that we take a single low-precision observation of an orbit on a hyperbolic set. There are likely several trajectories passing through this point, and we cannot be sure which our observation comes from. Now, let us take a sequence in time of similarly low-precision observations from the same orbit. The shadowing lemma guarantees that there will be at least one trajectory that passes through these points.9 Knowledge that the trajectory we have observed passes through not only our initial observation but also all of these observations provides us with more information about the trajectory and allows us to identify it with greater precision. In effect, multiple low-precision observations have provided us with the same information as a single high-precision observation would have; this principle is illustrated in Fig. 1. This demonstrates a general property of chaotic dynamical systems; a single high-precision observation provides the same information as a sequence of low-precision ones.
Formally, suppose is a partitioning scheme that can be applied to produce a variable number of partitions. Let be the partition resulting from applying to produce partitions. The process for assessing on a trajectory is as follows:
Algorithm 1
Apply coarse partition to .
Assign to each point the coarse history symbol sequence , where is the symbol for under , and and are parameters called history delay and history length, respectively.
For some , apply fine partition to and assign to each point its corresponding fine partition symbol.
Produce the joint probability distribution , where is the random variable representing fine partition symbols and is the random variable representing coarse symbol history sequences.
Calculate as given in Eq. (1).
Repeat steps 3 to 5 for various values of .
We expect good partitioning schemes to exhibit high values of mutual information and increasing mutual information with , and bad partitioning schemes to exhibit low mutual information and no strong increase in mutual information with . We thus expect that vs graphs should be able to differentiate between good and bad partitioning schemes; in Sec. III, we verify that this is indeed the case.
III. RESULTS ON TEST CASES
In this section, the mutual information statistic is used to assess four partitioning schemes on three test systems (Duffing oscillator, Lorenz system, and i.i.d. noise). Two good partitioning methods (ordinal partition and -means clustering) as well as two poor methods (slice and random partitions) are selected to test that the statistic is able to differentiate between them. The ordinal partition is well established in the literature; see Bandt and Pompe for details.7 In this paper, and refer to the embedding dimension and time-delay parameters respectively for the ordinal partition. The time delay is chosen to be one quarter of the period of the orbit,10,11 while is varied to produce different ; note that is the number of unique ordinal sequences of length present in the time series and cannot be controlled explicitly. Clustering is implemented using -means clustering, an unsupervised machine learning method that separates points into clusters and aims to minimize the sum of the squared distances between all points and their cluster means.8 Parameter defines the number of clusters; we set . The slice partition is constructed by selecting one dimension of a trajectory to produce a scalar time-series , then defining evenly sized bins between and . The random partition is constructed by randomly assigning each point in a trajectory one of symbols from the discrete uniform distribution . If the mutual information statistic is able to produce assessments of each partition that accurately reflect their performance in time series analysis, this will suggest that it is a suitable statistic for assessing the quality of a partitioning scheme.
A. Duffing oscillator
The statistic correctly identifies the random and slice partitions as performing worse than the -means and ordinal partitions, as shown in Fig. 2. The statistic detects that increasing the noise parameter results in decreasing the partition quality, also aligning with expectations. The exception is the random partition, where adding noise to the system does not affect results as partitions are assigned randomly anyway. Results also suggest that the ordinal partition is more resilient to noise than the other partitioning methods; this observation agrees with previous results.7
B. Lorenz system
We choose parameter values . The statistic correctly identifies the random and slice partitions as performing worse than the -means and ordinal partitions, as shown in Fig. 3. The -slice and -embedding result in significantly higher mutual information than other dimensions. This is because the Lorenz system is symmetric under inversion through the -axis, so the dimension does not offer full observability of the system. The -embedded orbit is, therefore, less complex than the original orbit, with the attractor only exhibiting a single lobe, resulting in greater predictability and, therefore, higher mutual information.
The exception to this is the ordinal partition, where the -embedding only slightly outperforms the other dimensions. This is because the partitions in the coarse -embedded ordinal partition are not contiguous, as shown in Fig. 4. This means that transitions between symbols are more difficult to associate with a specific location on the trajectory, resulting in lower mutual information between fine partitions and coarse partition history.
For the Lorenz system, the statistic produces reliable results reflecting both partition quality and the way the partitions interact with the geometry of the system. These examples demonstrate that besides the quality of the partition itself, choices around its implementation, such as embedding dimension, are also crucial for obtaining a good partition.
C. Independent and identically distributed noise
We generate i.i.d. noise from time series of the Lorenz system by randomly selecting points with replacement, preserving the distribution of the data but removing temporal correlation between points. We anticipate that partitions should perform poorly. The -means partition in the state space, shown in Fig. 5(a), does indeed perform poorly. However, the -means partition on embedded orbits and the ordinal partition in Figs. 5(b) and 5(c), respectively, show significant mutual information between fine partitions and coarse partition history. This is because in both cases, a time-delay embedding step is carried out. The time-delay embedding process introduces correlation between successive points in the embedded orbit, so each point contains information about trajectory history, resulting in a higher mutual information statistic value being measured. The ordinal partition is explicitly dependent upon the relationship between successive points, while the -means partition only geometrically partitions the embedded orbit; this accounts for the ordinal partition’s better performance, with higher mutual information and a consistent increase with . Note that unlike on the original Lorenz system, partitions on the -embedding of the i.i.d. noise do not result in higher mutual information. This reflects the fact that here, mutual information results only from the time-delay embedding, not from system dynamics. These results suggest that in general, partitioning methods that utilize trajectory history, including the ordinal partition, will perform better under the mutual information statistic.
D. Comparison of mutual information and generating partition statistics
In this section, we highlight differences between our proposed mutual information statistic and two existing statistics used to assess candidate generating partitions: symbolic false nearest neighbors and symbolic shadowing.
Kennel and Buhl introduce symbolic false nearest neighbors3 to measure how close a partition is to generating. The statistic is built upon the principle that neighbors in symbol space should also be neighbors in the state space. Their algorithm is outlined below for a time series with symbol sequence :
Algorithm 2
- Embed in the unit squarewhere is the number of unique symbols in the symbol sequence.
For each , find its nearest Euclidean neighbor in the unit square and call the nearest neighbor index .
Define .
Define as the percentile rank of among all pairs of points in .
Calculate the proportion of values below .
We compare how our proposed mutual information statistic, the symbolic false nearest neighbors statistic and the symbolic shadowing statistic (with ) assess the same system and partitions. Using the Lorenz system with partitions, we apply the random partition, slice partition ( slice), -means partition, and ordinal partition ( , -embedded). Results are shown in Table I.
. | Random . | Slice . | K-means . | Ordinal . |
---|---|---|---|---|
MI | 0.05 | 1.08 | 1.26 | 2.52 |
SFNN | 0.009 | 0.988 | 0.934 | 0.526 |
SS (n = 1) | 50.7 | 21.2 | 7.2 | 118.4 |
SS (n = 5) | 0.0 | 2.7 | 2.4 | 60.0 |
. | Random . | Slice . | K-means . | Ordinal . |
---|---|---|---|---|
MI | 0.05 | 1.08 | 1.26 | 2.52 |
SFNN | 0.009 | 0.988 | 0.934 | 0.526 |
SS (n = 1) | 50.7 | 21.2 | 7.2 | 118.4 |
SS (n = 5) | 0.0 | 2.7 | 2.4 | 60.0 |
The mutual information statistic offers an assessment of each partition that accurately reflects their performance in time series analysis. It scores the ordinal partition higher than the -means partition, which itself scores higher than the slice partition and then the random partition. Symbol sequences from the slice and -means partitions offer strong state space localization and are rated well using symbolic false nearest neighbors and symbolic shadowing. However, disagreement between the three statistics demonstrates that state space localization does not necessarily result in high-information partitions as defined by the mutual information statistic. Given these results, we suggest that the mutual information statistic’s definition of “high-information” partitions is more appropriate for determining good partitions for time series analysis; we demonstrate such an application in Sec. IV. We also note that computation times for the proposed mutual information statistic are significantly lower than those for the existing generating partition statistics.
IV. WEIGHTED ORDINAL PARTITION
Empirical results and the mutual information statistic’s formulation suggest that the ordinal partition is a good partition as it utilizes trajectory history. Attempting to improve upon the ordinal partition, we introduce the weighted ordinal partition as an application of the mutual information statistic.
Let be a scalar time series from a one-dimensional observation of a system. Consider a point with -dimensional embedding . In an ordinal partition, we assign symbols based on the rank order of the amplitudes of components of (in the literature, time ordering is sometimes used rather than rank ordering; both generate equivalent partitions). In a weighted ordinal partition, we first apply an element-wise weighting, , by calculating , where denotes the element-wise product. We then assign a symbol based on the rank order of the amplitudes of components of . The conventional/unweighted ordinal partition is the case . Because the ordinal partition is scale invariant, we can fix .
Each selection of will result in a different partition with a different value under the mutual information statistic. We propose that a partition can be optimized by selecting to maximize the value of the mutual information statistic, and that the resulting optimized weighted ordinal partition should outperform the conventional ordinal partition (also referred to as just the “ordinal partition”). An algorithm for selecting the optimal for scalar time series is as follows:
Algorithm 3
Select and to produce the embedded orbit . Note that by selecting we also select the number of symbols, .
Select a range of values for to be tested, contained in the set . In this paper, each component of (besides ) is varied in the range with step size .
For each , generate the -dimensional fine symbol sequence by finding the ordinal sequence of for all .
For each , generate the coarse symbol sequence in a similar manner, using truncated weighting and embedding .
For each , follow steps 4 and 5 of Algorithm 1 to calculate the mutual information statistic between the coarse and fine partitions.
Select the weighting that results in the partition with the highest mutual information value.
A. Application to the Lorenz system
We apply a weighted ordinal partition with parameters to an -observation of the Lorenz system as defined in Eq. (2). We apply Algorithm 3 with two modifications: in step 2, is varied in the range and in the range as these are the values of producing interesting behavior; and we remove step 6, as in this section, we are interested in observing behavior for a variety of values of rather than selecting the optimal one. Results are shown in Fig. 6. A region of high mutual information is seen along the parabola . Partitions along the parabola appear to meet at the center of each lobe of the attractor, while partitions adjacent to the parabola meet outside the center of each lobe. Partitions meeting at the center of a lobe form “wedges,” resulting in regular and predictable symbol sequences as a trajectory traverses each lobe and thus higher mutual information.
The standard ordinal partition with has the highest mutual information in the parabolic region with . There are two regions where higher mutual information is observed: the line , with giving , and a region below the parabola, with giving .
Along the line , ordinal sequences are generated from the points . This means that ordinal sequences and only occur when the Lorenz system crosses the plane . This plane represents an important switching between the two lobes of the system; applying this weighting, therefore, captures an important system behavior and resulting in a good symbolic representation of system dynamics with high mutual information.
When compared to the standard , the weighting parameters result in a more even distribution of points among the partitions, as Fig. 7 illustrates. Although many factors determine partition quality, given similar partitioning schemes, evenly distributing points between partitions should provide more information per symbol by maximizing the entropy of the symbol distribution; this can account for the higher mutual information observed for the weighting . It is also noted that the parameter choices and both result in contiguous partitions; this may also increase mutual information.
These results show that when optimized using the mutual information statistic, the weighted ordinal partition can offer improvements over the ordinal partition by identifying partitions that more effectively exploit system dynamics.
B. Weighted ordinal vs conventional ordinal partition
Under a good partitioning scheme, changes in the regime of a system’s behavior should be tracked by symbol sequence complexity measures. We measure the largest Lyapunov exponent (LLE)12 of each system as a proxy for chaos to characterize system behavior, and permutation entropy (PE)7 and Lempel–Ziv complexity (LZC)13 as measures of symbol sequence complexity. On both systems, an ordinal partition and a weighted ordinal partition are each used to generate symbol sequences from which we calculate PE and LZC. To assess partition quality, we compare how well PE and LZC from both partitions track LLE as bifurcation parameters are varied.
1. Logistic map
Both the weighted ordinal and ordinal partitions use embedding parameters , . Points of interest for the logistic map are the onset of chaos at , and periodic windows at . As Fig. 8 shows LZC from the weighted ordinal partition is able to detect the onset of chaos slightly more accurately than LZC from the ordinal partition, while PE from both partitions is inaccurate, with the weighted ordinal partition increasing prematurely at and the ordinal partition delayed at . Additionally, while both LZC statistics detect all three periodic windows, LZC for the weighted ordinal partition is higher in the chaotic regime, meaning the periodic windows are more distinct from the surrounding chaos. Similarly, PE from the weighted ordinal partition detects all three periodic windows, while the ordinal partition demonstrates smaller decreases in PE and fails to detect entirely.
Table II contains short sections of the symbol sequences generated by both partitions at different values of and offers some insight into the reason for the weighted ordinal partition better distinguishing chaos from periodicity. During chaos at , the ordinal partition’s symbol sequence can be split into recurring “motifs” of 2403 and 13. This is because at this parameter value, despite being chaotic, the logistic map produces short sequences of points with recurring ordinal patterns; see Fig. 9(a). Although the order of the ordinal partition’s motifs is chaotic, their presence means that the symbol sequence can effectively be reduced to two symbols, lowering sequence complexity measures. On the other hand, no such motifs are identifiable in the weighted ordinal partition’s sequence. At , the optimal weighting is . Figure 9(b) shows length segments at various points along the time series after is applied. Segments that had the same ordinal sequence under the ordinal partition can be distinguished under the weighted ordinal partition. This eliminates the presence of recurring motifs in the symbol sequence, making periodic windows more distinguishable from chaos than in the conventional ordinal partition. At the periodic window, both the ordinal and weighted ordinal sequences fall into a six-periodic cycle, before returning to their respective chaotic behaviors at .
r . | Ordinal symbol sequence . | Weighted ordinal symbol sequence . |
---|---|---|
3.62 | 1,3,2,4,0,3,2,4,0,3,2,4,0,3,2,4,0,3,2,4,0,3,2,4,0,3,1,3 | 10,9,6,2,4,7,10,9,6,2,5,7,3,4,0,2,10,9,6,2,5,7,3,4,0,2 |
3.63 | 2,4,0,3,1,3,2,4,0,3,1,3,2,4,0,3,1,3,2,4,0,3,1,3 | 5,4,2,3,0,1,5,4,2,3,0,1,5,4,2,3,0,1,5,4,2,3,0,1 |
3.64 | 2,4,0,3,2,4,0,3,1,3,2,4,0,3,2,4,0,3,2,4,0,3,2,4,0,3,1,3 | 0,6,5,7,0,6,5,7,0,6,5,7,0,1,3,6,5,7,0,6,5,10,0,8,5,7 |
r . | Ordinal symbol sequence . | Weighted ordinal symbol sequence . |
---|---|---|
3.62 | 1,3,2,4,0,3,2,4,0,3,2,4,0,3,2,4,0,3,2,4,0,3,2,4,0,3,1,3 | 10,9,6,2,4,7,10,9,6,2,5,7,3,4,0,2,10,9,6,2,5,7,3,4,0,2 |
3.63 | 2,4,0,3,1,3,2,4,0,3,1,3,2,4,0,3,1,3,2,4,0,3,1,3 | 5,4,2,3,0,1,5,4,2,3,0,1,5,4,2,3,0,1,5,4,2,3,0,1 |
3.64 | 2,4,0,3,2,4,0,3,1,3,2,4,0,3,2,4,0,3,2,4,0,3,2,4,0,3,1,3 | 0,6,5,7,0,6,5,7,0,6,5,7,0,1,3,6,5,7,0,6,5,10,0,8,5,7 |
2. Lorenz system
Results also demonstrate a slight advantage for the weighted ordinal partition for the Lorenz system with , shown in Fig. 10. PE for the weighted ordinal partition accurately detects the onset of chaos at , whereas PE for the ordinal partition decreases, while in both PE and LZC, periodic windows at are more distinct for the weighted ordinal partition. The exception to this is PE for both partitions failing to detect the periodic windows at . The advantage that the weighted ordinal partition holds largely disappears when the embedding dimension is increased to , shown in Fig. 11; at a high enough dimension, both symbol sequences contain enough information to track LLE very accurately. This result suggests that using the mutual information statistic to optimize partition assignment for a lower number of symbols has a similar effect to increasing the number of symbols; that is, increasing the information conveyed per symbol.
These results demonstrate that the weighted ordinal partition can offer improvements over the ordinal partition. The ability for complexity measures from both partitions to track LLE is largely similar as they are conceptually identical. However, the mutual information statistic is successful in maximizing the amount of information per symbol given a partitioning scheme and limited number of symbols; this is the cause of the greater visibility of periodic windows and accuracy of detection of the initial transitions from periodicity to chaos.
V. APPLICATIONS TO EXPERIMENTAL DATA
In this section, we apply the mutual information statistic and the weighted ordinal partition to two sets of experimental data: a laser time series from the Santa Fe time series competition,14 and a set of data derived from ECG measurements known as the Fantasia dataset, originally recorded and studied by Iyengar et al.15 and made publicly available on PhysioBank.16
The laser time series consists of 9093 observations representing the intensity of a far-infrared-laser in a chaotic state. In windows of length 100 observations, overlapping and separated by step size 10, we apply Algorithm 3 to find an optimal weighted partition, measure LZC from the partition’s symbol sequence, and track how it changes over time. Results are shown on a segment of the time series in Fig. 12. The time series exhibits oscillations, which increase in magnitude until they “collapse,” returning to small magnitude oscillations, which steadily increase in magnitude again. LZC values appear to increase as the magnitude of the oscillations increase, then suddenly spike downward as the collapse occurs, suggesting the system exhibits more chaotic behavior immediately prior to these transitions. LZC from the weighted ordinal partition captures these patterns better than LZC from the ordinal partition; note in the former the correct detection of the upward spike at index 6400 and a downward spike at index 6850. In LZC from the ordinal partition, the small upward spike at index 6400 has the same magnitude as another spike about 100 observations earlier, while the trough at index 6850 extends past index 7000.
Additionally, we analyze a subset of the Fantasia dataset. The data consist of ten time series containing between 4936 and 8708 observations, representing interbeat time intervals from ten subjects recorded over two hours. Five of the time series are from elderly subjects aged 68–85, and five are from younger subjects aged 21–43. The authors of the original paper15 attempted to isolate age as the only experimental variable, with the aim being to discriminate between the two age groups based on the time series alone. In previous work, other authors have shown that measurements of permutation entropy are unable to do so.15,17 Applying Algorithm 3 to the time series, we generate the optimal weighted ordinal partition with the aim of identifying improvements over the standard ordinal partition in the ability to discriminate based on permutation entropy. However, we find that for embedding parameters and observations, the optimal weighting is always . This means that the standard ordinal partition is already optimal, so the weighted ordinal partition is unable to offer any improvements in this case.
VI. CONCLUSIONS
We have introduced a mutual information statistic to assess partitions of the state space of chaotic dynamical systems. Compared with existing generating partition statistics, this mutual information statistic produces assessments of partitions that better reflect their performance in time series analysis. We, therefore, suggest that the statistic’s mechanism, measuring mutual information between the trajectory history and current location, defines a more accurate notion of “high-information” partitions in the context of time series analysis.
The statistic’s formulation, as well as empirical results from its application, indicate that partitions utilizing trajectory history, such as the ordinal partition, perform well. We can, therefore, offer an account for the already popular ordinal partition’s usefulness and provide evidence supporting its continued use.
As an extension to the ordinal partition, we introduce the weighted ordinal partition. Optimizing the weighted ordinal partition’s parameters using the mutual information statistic produces partitions that exploit system dynamics more effectively than the conventional ordinal partition. The weighted ordinal partition demonstrates improvements over the ordinal partition in time series analysis, particularly, in distinguishing chaos from periodicity when using a small number of unique symbols. The weighted ordinal partition can also be applied to real-world datasets, demonstrated on two experimental time series.
ACKNOWLEDGMENTS
J.L. was supported by the Australian Mathematical Sciences Institute through a 2023–2024 Vacation Research Scholarship.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Jason Lu: Conceptualization (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Software (equal); Visualization (equal); Writing – original draft (equal). Michael Small: Conceptualization (equal); Methodology (equal); Supervision (equal); Writing – review & editing (equal).
DATA AVAILABILITY
Example code implementing an algorithm to calculate the mutual information statistic is openly available in the mutual-information-statistic repository at https://github.com/jason-luuuuu/mutual-information-statistic, Ref. 18.