We propose a mutual information statistic to quantify the information encoded by a partition of the state space of a dynamical system. We measure the mutual information between each point’s symbolic trajectory history under a coarse partition (one with few unique symbols) and its partition assignment under a fine partition (one with many unique symbols). When applied to a set of test cases, this statistic demonstrates predictable and consistent behavior. Empirical results and the statistic’s formulation suggest that partitions based on trajectory history, such as the ordinal partition, perform best. As an application, we introduce the weighted ordinal partition, an extension of the popular ordinal partition with parameters that can be optimized using the mutual information statistic, and demonstrate improvements over the ordinal partition in time series analysis. We also demonstrate the weighted ordinal partition’s applicability to real experimental datasets.

Symbolic dynamics is a powerful tool in nonlinear time series analysis. Given a multidimensional time series, we partition its state space into disjoint sets, each represented by a unique symbol. Analyzing the symbolic representations of trajectories can offer useful insights. How we choose to partition the state space is critical; the aim is to select partitions that produce symbolic representations reflecting the true dynamics of a system. We introduce a mutual information statistic as a novel method for assessing how well a partition achieves this aim. The statistic measures the mutual information between a point’s symbolic trajectory history and its current location. We demonstrate greater suitability on test cases when compared with existing statistics and use the statistic to propose an improvement on the popular ordinal partition.

Symbolic dynamics is the modeling of a dynamical system by discretizing its state space into a set of partitions, with applications in nonlinear time series analysis.1 Symbolic dynamics analysis requires the application of a partitioning scheme, which is the process of assigning a symbol to each point in the data based on which partition it falls into. A partitioning scheme can be considered a map P : M S , | S | = k < , where M represents the state space of a dynamical system and S represents the set of symbols assigned uniquely to each partition. Good partitions are those that produce a symbol sequence retaining a large amount of information of the original system, termed “high-information;” finding high-information partitions is, therefore, a problem of interest.

A generating partition is a particular high-information partition where there is a one-to-one correspondence between each trajectory and its infinite symbolic sequence.2 Finding generating partitions is difficult, and a general method does not exist. There is an existing body of work, which attempts to estimate generating partitions. While these methods are theoretically applicable to higher-dimensional continuous systems, implementation has proven challenging, hence applications have focused on two-dimensional systems such as the Hénon and Ikeda maps.2–6 

Some works that estimate generating partitions directly from time series have introduced statistics to assess how close to generating a partition is and attempt to find partitions optimizing their values. These statistics follow the principle that under a generating partition, observations that are neighbors in symbol space should also be neighbors in state space; see, for example, Kennel and Buhl’s symbolic false nearest neighbors3 or symbolic shadowing by Hirata et al..2,5 These statistics encourage partitions that are contiguous or otherwise localized in the state space. However, we suggest that partitions that are considered close to generating in this manner are not necessarily well suited for time series analysis tasks. The ordinal partition, introduced by Bandt and Pompe,7 does not necessarily generate contiguous partitions but has been widely used with success in time series analysis tasks.

The ordinal partition is particularly popular for its computational ease.1,8 However, superior partitioning methods allowing for greater information retention may exist. Furthermore, in addition to these generating partition statistics, there is currently no method for assessing and comparing the performance of different partitioning methods. Thus, in this paper, we have two aims: first, to develop a statistic that can be used to quantifiably assess the performance of a partition on a particular system and second, to use this statistic to identify superior alternatives to ordinal partitioning. The statistic we propose allows for the quantitative comparison of partitioning schemes on various systems, and we demonstrate its potential to identify better partitions for time-series analysis.

In Sec. II, we introduce a mutual information statistic as a method for assessing partitions. In Sec. III, the statistic is applied to a set of test partitions and systems to highlight differences with generating partition statistics and to ensure that it correctly ranks the partitions’ quality in accordance with their performance in time series analysis. In Sec. IV, we introduce the weighted ordinal partition as an application of the statistics and demonstrate improvements over the ordinal partition in time series analysis tasks. In Sec. V, we apply the weighted ordinal partition to two real experimental datasets.

Suppose that we take a single low-precision observation of an orbit on a hyperbolic set. There are likely several trajectories passing through this point, and we cannot be sure which our observation comes from. Now, let us take a sequence in time of similarly low-precision observations from the same orbit. The shadowing lemma guarantees that there will be at least one trajectory that passes through these points.9 Knowledge that the trajectory we have observed passes through not only our initial observation but also all of these observations provides us with more information about the trajectory and allows us to identify it with greater precision. In effect, multiple low-precision observations have provided us with the same information as a single high-precision observation would have; this principle is illustrated in Fig. 1. This demonstrates a general property of chaotic dynamical systems; a single high-precision observation provides the same information as a sequence of low-precision ones.

FIG. 1.

In (a), we take a single low-precision observation of a chaotic dynamical system and observe three trajectories passing through it; we cannot be sure which we have observed. In (b), we take a sequence in time of low-precision observations from our system, allowing us to identify the red trajectory as the one we have observed. In (c), we identify this trajectory by taking a single high-precision observation. This simple example illustrates a general property of dynamical systems, and consequence of the Shadowing Lemma: a single high-precision observation provides the same information as a sequence of low-precision ones. Mutual information measures how well symbol sequences resulting from a partition preserve this property.

FIG. 1.

In (a), we take a single low-precision observation of a chaotic dynamical system and observe three trajectories passing through it; we cannot be sure which we have observed. In (b), we take a sequence in time of low-precision observations from our system, allowing us to identify the red trajectory as the one we have observed. In (c), we identify this trajectory by taking a single high-precision observation. This simple example illustrates a general property of dynamical systems, and consequence of the Shadowing Lemma: a single high-precision observation provides the same information as a sequence of low-precision ones. Mutual information measures how well symbol sequences resulting from a partition preserve this property.

Close modal
A good partitioning scheme should then also preserve this property in the symbol sequence it generates. This means that a point’s symbol under a fine partition (equivalent to a high-precision observation) should provide the same information as its symbol history under a coarse partition (equivalent to a series of low-precision observations). To quantify this property for a given partitioning scheme on a dynamical system, we measure the mutual information between the random variables representing the fine partitions of each point and the symbol history sequences of each point under a coarse partition. As usual, the mutual information ( M I) between discrete random variables X and Y, in units of bits, is given by
(1)
where S X and S Y are the sample spaces of random variables X and Y, respectively, and p X is the probability mass function of X, p Y of Y, and p X , Y of the joint distribution X , Y.

Formally, suppose P is a partitioning scheme that can be applied to produce a variable number of partitions. Let P k be the partition resulting from applying P to produce k partitions. The process for assessing P on a trajectory X = { x n } n = 1 N is as follows:

Algorithm 1

  1. Apply coarse partition P 2 to X.

  2. Assign to each point x n X , n > d ( 1 ) the coarse history symbol sequence π n d ( 1 ) , π n d ( 2 ) , , π n d , π n, where π m is the symbol for x m under P 2, and d and are parameters called history delay and history length, respectively.

  3. For some k > 2, apply fine partition P k to X and assign to each point x n X , n > d ( 1 ) its corresponding fine partition symbol.

  4. Produce the joint probability distribution P X , Y, where X is the random variable representing fine partition symbols and Y is the random variable representing coarse symbol history sequences.

  5. Calculate M I ( X , Y ) as given in Eq. (1).

  6. Repeat steps 3 to 5 for various values of k.

The parameter history delay, d, serves a similar purpose to an embedding delay, so it is similarly chosen to be approximately one quarter of the period of the orbit for continuous systems.10,11 Briefly, this rule of thumb for the embedding delay length comes from the fact that for a sinusoidal time series, an embedding delay of one quarter of the period best separates trajectories in the embedded attractor for accurate phase identification. History length determines the number of possible coarse history symbol sequences, | S Y | 2 . The relationship is an inequality as not all 2 combinations of coarse history symbol sequences will necessarily appear in the time series. Note also that k = | S X |. In order for differences in M I ( X , Y ) to be observed as k increases, | S Y | should be a fixed value lying between the minimum and maximum values of k tested. We typically vary k between 0 and 100; we set = 5, giving | S Y | 2 5 = 32.

We expect good partitioning schemes to exhibit high values of mutual information and increasing mutual information with k, and bad partitioning schemes to exhibit low mutual information and no strong increase in mutual information with k. We thus expect that M I vs k graphs should be able to differentiate between good and bad partitioning schemes; in Sec. III, we verify that this is indeed the case.

In this section, the mutual information statistic is used to assess four partitioning schemes on three test systems (Duffing oscillator, Lorenz system, and i.i.d. noise). Two good partitioning methods (ordinal partition and K-means clustering) as well as two poor methods (slice and random partitions) are selected to test that the statistic is able to differentiate between them. The ordinal partition is well established in the literature; see Bandt and Pompe for details.7 In this paper, m and τ refer to the embedding dimension and time-delay parameters respectively for the ordinal partition. The time delay τ is chosen to be one quarter of the period of the orbit,10,11 while m is varied to produce different k; note that k is the number of unique ordinal sequences of length m present in the time series and cannot be controlled explicitly. Clustering is implemented using K-means clustering, an unsupervised machine learning method that separates points into clusters and aims to minimize the sum of the squared distances between all points and their cluster means.8 Parameter K defines the number of clusters; we set K = k. The slice partition is constructed by selecting one dimension of a trajectory to produce a scalar time-series { x n } n = 1 N, then defining k evenly sized bins between min ( { x n } n = 1 N ) and max ( { x n } n = 1 N ). The random partition is constructed by randomly assigning each point in a trajectory one of k symbols from the discrete uniform distribution U { 0 , k 1 }. If the mutual information statistic is able to produce assessments of each partition that accurately reflect their performance in time series analysis, this will suggest that it is a suitable statistic for assessing the quality of a partitioning scheme.

The Duffing oscillator is defined by the equation
We set δ = 0.3 , α = 1 , β = 1 , γ = 0.65 , ω = 0.65. For these parameter values, taking observations of x and x ˙ to create a two-dimensional system generates a periodic orbit following an approximately circular trajectory. Dynamical Gaussian noise with distribution N ( 0 , σ ), where σ is a noise parameter, is added to each point during simulation.

The statistic correctly identifies the random and slice partitions as performing worse than the K-means and ordinal partitions, as shown in Fig. 2. The statistic detects that increasing the noise parameter σ results in decreasing the partition quality, also aligning with expectations. The exception is the random partition, where adding noise to the system does not affect results as partitions are assigned randomly anyway. Results also suggest that the ordinal partition is more resilient to noise than the other partitioning methods; this observation agrees with previous results.7 

FIG. 2.

Mutual information vs k on the Duffing oscillator for each partition with various noise parameters σ. As percentages of the standard deviation of the oscillator, σ = 0 , 0.004 , 0.008 correspond to 0 % , 2 %, and 4 %, respectively. The (a) random and (b) slice partitions perform worse than the (c) K-means and (d) ordinal ( τ = 1.25) partitions, correctly reflecting their respective performances in time series analysis. Additionally, the statistic is able to detect that increasing noise reduces partition quality, and that the ordinal partition is more resilient to noise than the other partitions. In all cases, the statistic displays the predicted behavior. The random and K-means partitions are non-deterministic; central lines are a mean of 100 trials, with standard deviation shown above and below in the shaded regions. Note that for the ordinal partition, we use 3 m 7 for σ = 0, and 3 m 6 for σ = 0.004 , 0.008; the different ordinal patterns present in each case result in different k for each m, accounting for the different k domain in Fig. 2(d).

FIG. 2.

Mutual information vs k on the Duffing oscillator for each partition with various noise parameters σ. As percentages of the standard deviation of the oscillator, σ = 0 , 0.004 , 0.008 correspond to 0 % , 2 %, and 4 %, respectively. The (a) random and (b) slice partitions perform worse than the (c) K-means and (d) ordinal ( τ = 1.25) partitions, correctly reflecting their respective performances in time series analysis. Additionally, the statistic is able to detect that increasing noise reduces partition quality, and that the ordinal partition is more resilient to noise than the other partitions. In all cases, the statistic displays the predicted behavior. The random and K-means partitions are non-deterministic; central lines are a mean of 100 trials, with standard deviation shown above and below in the shaded regions. Note that for the ordinal partition, we use 3 m 7 for σ = 0, and 3 m 6 for σ = 0.004 , 0.008; the different ordinal patterns present in each case result in different k for each m, accounting for the different k domain in Fig. 2(d).

Close modal
The Lorenz system is defined by the equations
(2)

We choose parameter values σ = 10 , ρ = 28 , β = 8 3. The statistic correctly identifies the random and slice partitions as performing worse than the K-means and ordinal partitions, as shown in Fig. 3. The z-slice and z-embedding result in significantly higher mutual information than other dimensions. This is because the Lorenz system is symmetric under inversion through the z-axis, so the z dimension does not offer full observability of the system. The z-embedded orbit is, therefore, less complex than the original orbit, with the attractor only exhibiting a single lobe, resulting in greater predictability and, therefore, higher mutual information.

FIG. 3.

Mutual information vs k for the Lorenz system. The (a) random partition performs the worst, followed by the (b) slice partition, then (c) K-means then the (d) ordinal partition ( τ = 0.1), correctly reflecting their respective performances in time series analysis. The symmetry of the Lorenz system under z-inversion means that the z-embedded orbit is simpler, generally resulting in higher mutual information. In all cases, the statistic displays the predicted behavior. The random and K-means partitions are non-deterministic; central lines are a mean of 100 trials, with standard deviation shown above and below in the shaded regions. Note that for the ordinal partition, we use 3 m 6 for the x and z embeddings and 3 m 5 for the y embedding; the different ordinal patterns present in each case result in different k for each m, accounting for the different k domain in Fig. 3(d).

FIG. 3.

Mutual information vs k for the Lorenz system. The (a) random partition performs the worst, followed by the (b) slice partition, then (c) K-means then the (d) ordinal partition ( τ = 0.1), correctly reflecting their respective performances in time series analysis. The symmetry of the Lorenz system under z-inversion means that the z-embedded orbit is simpler, generally resulting in higher mutual information. In all cases, the statistic displays the predicted behavior. The random and K-means partitions are non-deterministic; central lines are a mean of 100 trials, with standard deviation shown above and below in the shaded regions. Note that for the ordinal partition, we use 3 m 6 for the x and z embeddings and 3 m 5 for the y embedding; the different ordinal patterns present in each case result in different k for each m, accounting for the different k domain in Fig. 3(d).

Close modal

The exception to this is the ordinal partition, where the z-embedding only slightly outperforms the other dimensions. This is because the partitions in the coarse z-embedded ordinal partition are not contiguous, as shown in Fig. 4. This means that transitions between symbols are more difficult to associate with a specific location on the trajectory, resulting in lower mutual information between fine partitions and coarse partition history.

FIG. 4.

Coarse k = 2 ordinal partitions ( m = 2 , τ = 0.1) on the Lorenz system. Partitions are contiguous on the (a) x-embedded and (b) y-embedded partitions but not the (c) z-embedded partition. This means transitions between symbols for the z-embedded ordinal partition are more difficult to attribute to a specific location on the attractor, resulting in higher mutual information for a given partition.

FIG. 4.

Coarse k = 2 ordinal partitions ( m = 2 , τ = 0.1) on the Lorenz system. Partitions are contiguous on the (a) x-embedded and (b) y-embedded partitions but not the (c) z-embedded partition. This means transitions between symbols for the z-embedded ordinal partition are more difficult to attribute to a specific location on the attractor, resulting in higher mutual information for a given partition.

Close modal

For the Lorenz system, the statistic produces reliable results reflecting both partition quality and the way the partitions interact with the geometry of the system. These examples demonstrate that besides the quality of the partition itself, choices around its implementation, such as embedding dimension, are also crucial for obtaining a good partition.

We generate i.i.d. noise from time series of the Lorenz system by randomly selecting points with replacement, preserving the distribution of the data but removing temporal correlation between points. We anticipate that partitions should perform poorly. The K-means partition in the state space, shown in Fig. 5(a), does indeed perform poorly. However, the K-means partition on embedded orbits and the ordinal partition in Figs. 5(b) and 5(c), respectively, show significant mutual information between fine partitions and coarse partition history. This is because in both cases, a time-delay embedding step is carried out. The time-delay embedding process introduces correlation between successive points in the embedded orbit, so each point contains information about trajectory history, resulting in a higher mutual information statistic value being measured. The ordinal partition is explicitly dependent upon the relationship between successive points, while the K-means partition only geometrically partitions the embedded orbit; this accounts for the ordinal partition’s better performance, with higher mutual information and a consistent increase with k. Note that unlike on the original Lorenz system, partitions on the z-embedding of the i.i.d. noise do not result in higher mutual information. This reflects the fact that here, mutual information results only from the time-delay embedding, not from system dynamics. These results suggest that in general, partitioning methods that utilize trajectory history, including the ordinal partition, will perform better under the mutual information statistic.

FIG. 5.

Mutual information vs k for i.i.d. noise generated from the Lorenz system. (b) K-means on the embedded system demonstrates meaningful mutual information; (a) the same partition in state space does not; (c) the ordinal partition ( 3 m 5 , τ = 0.1) demonstrates even higher mutual information as well as an increase in mutual information with increasing k. Even though the noise retains no temporal correlation between observations, the process of time-delay embedding introduces correlation between successive observations. These results suggest that, in general, partitioning methods that utilize trajectory history will perform better under the statistic. The K-means partitions are non-deterministic; central lines are a mean of 100 trials, with standard deviation shown above and below in the shaded regions. Note that for the ordinal partition, the different ordinal patterns present in each embedding dimension result in different k for each m, accounting for the different k domain in Fig. 5(c).

FIG. 5.

Mutual information vs k for i.i.d. noise generated from the Lorenz system. (b) K-means on the embedded system demonstrates meaningful mutual information; (a) the same partition in state space does not; (c) the ordinal partition ( 3 m 5 , τ = 0.1) demonstrates even higher mutual information as well as an increase in mutual information with increasing k. Even though the noise retains no temporal correlation between observations, the process of time-delay embedding introduces correlation between successive observations. These results suggest that, in general, partitioning methods that utilize trajectory history will perform better under the statistic. The K-means partitions are non-deterministic; central lines are a mean of 100 trials, with standard deviation shown above and below in the shaded regions. Note that for the ordinal partition, the different ordinal patterns present in each embedding dimension result in different k for each m, accounting for the different k domain in Fig. 5(c).

Close modal

In this section, we highlight differences between our proposed mutual information statistic and two existing statistics used to assess candidate generating partitions: symbolic false nearest neighbors and symbolic shadowing.

Kennel and Buhl introduce symbolic false nearest neighbors3 to measure how close a partition is to generating. The statistic is built upon the principle that neighbors in symbol space should also be neighbors in the state space. Their algorithm is outlined below for a time series { x n } n = 1 N with symbol sequence { s n } n = 1 N:

Algorithm 2

  1. Embed { s n } n = 1 N in the unit square
    where A is the number of unique symbols in the symbol sequence.
  2. For each y i, find its nearest Euclidean neighbor in the unit square and call the nearest neighbor index d.

  3. Define D i = x i x d .

  4. Define R i as the percentile rank of D i among all pairs of points in { x n } n = 1 N.

  5. Calculate the proportion of R values below 1 %.

Time delay τ is selected to be approximately one quarter of the period of the orbit for continuous systems.10,11 Precision j max is selected to be as large as computational precision allows; we use j max = 30. Low values of R mean that neighbors in symbol space are also neighbors in state space. Therefore, a higher proportion of R values below 1 % suggests { s n } n = 1 N was produced by a better partition.

Hirata et al. introduce symbolic shadowing2,5 and use a similar statistic to symbolic false nearest neighbors, based on the principle that symbol sequences should localize points as much as possible. Consider a time series { x i } i = 1 N with symbol sequence { s i } i = 1 N. Point x i has surrounding symbol sequence
where n + and n are the number of symbols considered in the forward and backward directions, respectively. Let
the set of all x j with the same surrounding symbol sequence as x i, and let
the mean position of all the x j. Hirata et al. define the statistic
A lower value means that each x i is closer to the mean of all x j sharing the same surrounding symbol sequence.

We compare how our proposed mutual information statistic, the symbolic false nearest neighbors statistic and the symbolic shadowing statistic (with n = n + = n = 1 , 5) assess the same system and partitions. Using the Lorenz system with k = 27 partitions, we apply the random partition, slice partition ( x slice), K-means partition, and ordinal partition ( m = 5 , τ = 0.1, x-embedded). Results are shown in Table I.

TABLE I.

Comparing assessments of four k = 27 partitions on the Lorenz system offered by the mutual information statistic (MI), symbolic false nearest neighbors (SFNN), and symbolic shadowing (SS). The mutual information statistic offers an assessment of each partition that accurately reflects their performance in time series analysis. Note that for SS (n = 5), under the random partition, every observation has a unique surrounding symbol sequence, so mean distances are 0.

Random Slice K-means Ordinal
MI  0.05  1.08  1.26  2.52 
SFNN  0.009  0.988  0.934  0.526 
SS (n = 1)  50.7  21.2  7.2  118.4 
SS (n = 5)  0.0  2.7  2.4  60.0 
Random Slice K-means Ordinal
MI  0.05  1.08  1.26  2.52 
SFNN  0.009  0.988  0.934  0.526 
SS (n = 1)  50.7  21.2  7.2  118.4 
SS (n = 5)  0.0  2.7  2.4  60.0 

The mutual information statistic offers an assessment of each partition that accurately reflects their performance in time series analysis. It scores the ordinal partition higher than the K-means partition, which itself scores higher than the slice partition and then the random partition. Symbol sequences from the slice and K-means partitions offer strong state space localization and are rated well using symbolic false nearest neighbors and symbolic shadowing. However, disagreement between the three statistics demonstrates that state space localization does not necessarily result in high-information partitions as defined by the mutual information statistic. Given these results, we suggest that the mutual information statistic’s definition of “high-information” partitions is more appropriate for determining good partitions for time series analysis; we demonstrate such an application in Sec. IV. We also note that computation times for the proposed mutual information statistic are significantly lower than those for the existing generating partition statistics.

Empirical results and the mutual information statistic’s formulation suggest that the ordinal partition is a good partition as it utilizes trajectory history. Attempting to improve upon the ordinal partition, we introduce the weighted ordinal partition as an application of the mutual information statistic.

Let { x n } n = 1 N be a scalar time series from a one-dimensional observation of a system. Consider a point x i { x n } n = 1 N with m-dimensional embedding x i = ( x i ( m 1 ) τ , x i ( m 2 ) τ , , x i τ , x i ) R m. In an ordinal partition, we assign symbols based on the rank order of the amplitudes of components of x i (in the literature, time ordering is sometimes used rather than rank ordering; both generate equivalent partitions). In a weighted ordinal partition, we first apply an element-wise weighting, a = ( a 1 , a 2 , , a m ) R m, by calculating a ° x i, where ° denotes the element-wise product. We then assign a symbol based on the rank order of the amplitudes of components of a ° x i. The conventional/unweighted ordinal partition is the case a = 1. Because the ordinal partition is scale invariant, we can fix a m = 1.

Each selection of a will result in a different partition with a different value under the mutual information statistic. We propose that a partition can be optimized by selecting a to maximize the value of the mutual information statistic, and that the resulting optimized weighted ordinal partition should outperform the conventional ordinal partition (also referred to as just the “ordinal partition”). An algorithm for selecting the optimal a for scalar time series { x n } n = 1 N is as follows:

Algorithm 3

  1. Select m and τ to produce the embedded orbit { x n } n = 1 N. Note that by selecting m we also select the number of symbols, k.

  2. Select a range of values for a to be tested, contained in the set A. In this paper, each component of a (besides a m = 1) is varied in the range [ 0 , 3 ] with step size 0.05.

  3. For each a A, generate the m-dimensional fine symbol sequence by finding the ordinal sequence of a ° x i for all x i { x n } n = 1 N.

  4. For each a A, generate the k = 2 coarse symbol sequence in a similar manner, using truncated weighting a ^ = ( a m 1 , a m ) and embedding x ^ i = ( x i τ , x i ).

  5. For each a A, follow steps 4 and 5 of Algorithm 1 to calculate the mutual information statistic between the coarse and fine partitions.

  6. Select the weighting a that results in the partition with the highest mutual information value.

We apply a weighted ordinal partition with parameters m = 3 , τ = 0.1 to an x-observation of the Lorenz system as defined in Eq. (2). We apply Algorithm 3 with two modifications: in step 2, a 1 is varied in the range [ 0.2 , 3 ] and a 2 in the range [ 0.2 , 2 ] as these are the values of a producing interesting behavior; and we remove step 6, as in this section, we are interested in observing behavior for a variety of values of a rather than selecting the optimal one. Results are shown in Fig. 6. A region of high mutual information is seen along the parabola a 2 = a 1. Partitions along the parabola appear to meet at the center of each lobe of the attractor, while partitions adjacent to the parabola meet outside the center of each lobe. Partitions meeting at the center of a lobe form “wedges,” resulting in regular and predictable symbol sequences as a trajectory traverses each lobe and thus higher mutual information.

FIG. 6.

A range of values of a are tested for the weighted ordinal partition ( m = 3 , τ = 0.1) on an x-observation of the Lorenz system. (a) Mutual information values for each a 1 and a 2. Points on the parabola a 2 = a 1 have high mutual information due to partitions that meet at the center of each lobe, forming “wedges” that the symbol sequence predictably cycles through; compare (b) ( a 1 , a 2 ) = ( 1 , 1 ) , M I = 1.25 on the parabola to (c) ( a 1 , a 2 ) = ( 1.5 , 1 ) , M I = 0.84 adjacent to the parabola. Two regions in (a) show higher mutual information values than the conventional ( a 1 , a 2 ) = ( 1 , 1 ) ordinal partition: (d) ( a 1 , a 2 ) = ( 2 , 0 ), M I = 1.38 which separates the lobes along the x = 0 plane, and (e) ( a 1 , a 2 ) = ( 2 , 0.5 ), M I = 1.42 which more evenly distributes points between symbols. Contiguous partitions in (d) and (e) may also contribute to higher mutual information.

FIG. 6.

A range of values of a are tested for the weighted ordinal partition ( m = 3 , τ = 0.1) on an x-observation of the Lorenz system. (a) Mutual information values for each a 1 and a 2. Points on the parabola a 2 = a 1 have high mutual information due to partitions that meet at the center of each lobe, forming “wedges” that the symbol sequence predictably cycles through; compare (b) ( a 1 , a 2 ) = ( 1 , 1 ) , M I = 1.25 on the parabola to (c) ( a 1 , a 2 ) = ( 1.5 , 1 ) , M I = 0.84 adjacent to the parabola. Two regions in (a) show higher mutual information values than the conventional ( a 1 , a 2 ) = ( 1 , 1 ) ordinal partition: (d) ( a 1 , a 2 ) = ( 2 , 0 ), M I = 1.38 which separates the lobes along the x = 0 plane, and (e) ( a 1 , a 2 ) = ( 2 , 0.5 ), M I = 1.42 which more evenly distributes points between symbols. Contiguous partitions in (d) and (e) may also contribute to higher mutual information.

Close modal

The standard ordinal partition with ( a 1 , a 2 ) = ( 1 , 1 ) has the highest mutual information in the parabolic region with M I = 1.25. There are two regions where higher mutual information is observed: the line a 2 = 0, with ( a 1 , a 2 ) = ( 2 , 0 ) giving M I = 1.38, and a region below the parabola, with ( a 1 , a 2 ) = ( 2 , 0.5 ) giving M I = 1.42.

Along the line a 2 = 0, ordinal sequences are generated from the points ( a 1 x n 2 τ , 0 , x n ). This means that ordinal sequences 123 and 321 only occur when the Lorenz system crosses the plane x = 0. This plane represents an important switching between the two lobes of the system; applying this weighting, therefore, captures an important system behavior and resulting in a good symbolic representation of system dynamics with high mutual information.

When compared to the standard ( a 1 , a 2 ) = ( 1 , 1 ), the weighting parameters ( a 1 , a 2 ) = ( 2 , 0.5 ) result in a more even distribution of points among the partitions, as Fig. 7 illustrates. Although many factors determine partition quality, given similar partitioning schemes, evenly distributing points between partitions should provide more information per symbol by maximizing the entropy of the symbol distribution; this can account for the higher mutual information observed for the weighting ( a 1 , a 2 ) = ( 2 , 0.5 ). It is also noted that the parameter choices a 2 = 0 and ( a 1 , a 2 ) = ( 2 , 0.5 ) both result in contiguous partitions; this may also increase mutual information.

FIG. 7.

Weighted ordinal partitions ( m = 3 , τ = 0.1) on the Lorenz system for weightings (a) ( a 1 , a 2 ) = ( 2 , 0.5 ) and (b) ( a 1 , a 2 ) = ( 1 , 1 ). According to the mutual information statistic, ( a 1 , a 2 ) = ( 2 , 0.5 ) offers an improvement over the conventional ordinal partition. This is due to a more even distribution of points between symbols, resulting in higher entropy and therefore mutual information; see histograms for symbol distributions for (c) ( a 1 , a 2 ) = ( 2 , 0.5 ) and (d) ( a 1 , a 2 ) = ( 1 , 1 ).

FIG. 7.

Weighted ordinal partitions ( m = 3 , τ = 0.1) on the Lorenz system for weightings (a) ( a 1 , a 2 ) = ( 2 , 0.5 ) and (b) ( a 1 , a 2 ) = ( 1 , 1 ). According to the mutual information statistic, ( a 1 , a 2 ) = ( 2 , 0.5 ) offers an improvement over the conventional ordinal partition. This is due to a more even distribution of points between symbols, resulting in higher entropy and therefore mutual information; see histograms for symbol distributions for (c) ( a 1 , a 2 ) = ( 2 , 0.5 ) and (d) ( a 1 , a 2 ) = ( 1 , 1 ).

Close modal

These results show that when optimized using the mutual information statistic, the weighted ordinal partition can offer improvements over the ordinal partition by identifying partitions that more effectively exploit system dynamics.

In this section, we compare the weighted ordinal and conventional ordinal partitions on two test systems: the logistic map, and an x-observation of the Lorenz system. On these test systems, behavior can be modified by varying bifurcation parameters. The logistic map is defined by the equation
with r varied as a bifurcation parameter. The Lorenz system is defined in Eq. (2); we set σ = 10, β = 8 3 and vary ρ as a bifurcation parameter.

Under a good partitioning scheme, changes in the regime of a system’s behavior should be tracked by symbol sequence complexity measures. We measure the largest Lyapunov exponent (LLE)12 of each system as a proxy for chaos to characterize system behavior, and permutation entropy (PE)7 and Lempel–Ziv complexity (LZC)13 as measures of symbol sequence complexity. On both systems, an ordinal partition and a weighted ordinal partition are each used to generate symbol sequences from which we calculate PE and LZC. To assess partition quality, we compare how well PE and LZC from both partitions track LLE as bifurcation parameters are varied.

1. Logistic map

Both the weighted ordinal and ordinal partitions use embedding parameters m = 4, τ = 1. Points of interest for the logistic map are the onset of chaos at r 3.57, and periodic windows at r = 3.63 , 3.74 , 3.85. As Fig. 8 shows LZC from the weighted ordinal partition is able to detect the onset of chaos slightly more accurately than LZC from the ordinal partition, while PE from both partitions is inaccurate, with the weighted ordinal partition increasing prematurely at r = 3.55 and the ordinal partition delayed at r = 3.60. Additionally, while both LZC statistics detect all three periodic windows, LZC for the weighted ordinal partition is higher in the chaotic regime, meaning the periodic windows are more distinct from the surrounding chaos. Similarly, PE from the weighted ordinal partition detects all three periodic windows, while the ordinal partition demonstrates smaller decreases in PE and fails to detect r = 3.63 entirely.

FIG. 8.

LLE tracking results for the logistic map ( m = 4 , τ = 1). Both (a) LZC and (b) PE for the weighted ordinal partition outperform their ordinal partition counterparts in their ability to track LLE. Note, in particular: (1) the larger decreases in LZC and PE for the weighted ordinal partition at periodic windows r = 3.63 , 3.74 , 3.85; (2) the failure of PE from the ordinal partition to detect the r = 3.63 periodic window; and (3) the accurate detection of the onset of chaos at r 3.57 in LZC from the weighted ordinal partition.

FIG. 8.

LLE tracking results for the logistic map ( m = 4 , τ = 1). Both (a) LZC and (b) PE for the weighted ordinal partition outperform their ordinal partition counterparts in their ability to track LLE. Note, in particular: (1) the larger decreases in LZC and PE for the weighted ordinal partition at periodic windows r = 3.63 , 3.74 , 3.85; (2) the failure of PE from the ordinal partition to detect the r = 3.63 periodic window; and (3) the accurate detection of the onset of chaos at r 3.57 in LZC from the weighted ordinal partition.

Close modal

Table II contains short sections of the symbol sequences generated by both partitions at different values of r and offers some insight into the reason for the weighted ordinal partition better distinguishing chaos from periodicity. During chaos at r = 3.62, the ordinal partition’s symbol sequence can be split into recurring “motifs” of 2403 and 13. This is because at this parameter value, despite being chaotic, the logistic map produces short sequences of points with recurring ordinal patterns; see Fig. 9(a). Although the order of the ordinal partition’s motifs is chaotic, their presence means that the symbol sequence can effectively be reduced to two symbols, lowering sequence complexity measures. On the other hand, no such motifs are identifiable in the weighted ordinal partition’s sequence. At r = 3.62, the optimal weighting is a = ( 0.4 , 1 , 0.5 , 1 ). Figure 9(b) shows length m = 4 segments at various points along the time series after a is applied. Segments that had the same ordinal sequence under the ordinal partition can be distinguished under the weighted ordinal partition. This eliminates the presence of recurring motifs in the symbol sequence, making periodic windows more distinguishable from chaos than in the conventional ordinal partition. At the r = 3.63 periodic window, both the ordinal and weighted ordinal sequences fall into a six-periodic cycle, before returning to their respective chaotic behaviors at r = 3.64.

FIG. 9.

(a) Segment of the logistic map at r = 3.62. Points alternate above and below the horizontal line at x n = 0.7. The vertical lines separate segments with recurring ordinal patterns and symbol assignments, under an m = 4 ordinal partition. Segments of length 4 (grey background) correspond to the symbol sequence motif 2403, and segments of length 6 correspond to the motif 240 313. The order these motifs appear in is chaotic, but their presence reduces complexity measures. (b) Length m = 4 segments of the time series at n = 16 , 20 , 24 after applying optimal weighting a = ( 0.4 , 1 , 0.5 , 1 ). These segments have identical ordinal patterns (3241) under the conventional ordinal partition but can be distinguished after the weighting is applied. This leads to higher complexity measures for the weighted ordinal partition, making chaos more distinct from periodicity.

FIG. 9.

(a) Segment of the logistic map at r = 3.62. Points alternate above and below the horizontal line at x n = 0.7. The vertical lines separate segments with recurring ordinal patterns and symbol assignments, under an m = 4 ordinal partition. Segments of length 4 (grey background) correspond to the symbol sequence motif 2403, and segments of length 6 correspond to the motif 240 313. The order these motifs appear in is chaotic, but their presence reduces complexity measures. (b) Length m = 4 segments of the time series at n = 16 , 20 , 24 after applying optimal weighting a = ( 0.4 , 1 , 0.5 , 1 ). These segments have identical ordinal patterns (3241) under the conventional ordinal partition but can be distinguished after the weighting is applied. This leads to higher complexity measures for the weighted ordinal partition, making chaos more distinct from periodicity.

Close modal
TABLE II.

Symbol sequences for weighted ordinal and ordinal partitions (m = 4) on the logistic map in the chaotic regime (r = 3.62, 3.64) and six-periodic window (r = 3.63). Even in the chaotic regime, the ordinal partition’s symbol sequence can be split into recurring motifs, resulting in periodic windows being less distinguishable in complexity measures. Where present, motifs and recurring symbol segments are highlighted by alternating between bold and regular text.

r Ordinal symbol sequence Weighted ordinal symbol sequence
3.62  1,3,2,4,0,3,2,4,0,3,2,4,0,3,2,4,0,3,2,4,0,3,2,4,0,3,1,3  10,9,6,2,4,7,10,9,6,2,5,7,3,4,0,2,10,9,6,2,5,7,3,4,0,2 
3.63  2,4,0,3,1,3,2,4,0,3,1,3,2,4,0,3,1,3,2,4,0,3,1,3  5,4,2,3,0,1,5,4,2,3,0,1,5,4,2,3,0,1,5,4,2,3,0,1 
3.64  2,4,0,3,2,4,0,3,1,3,2,4,0,3,2,4,0,3,2,4,0,3,2,4,0,3,1,3  0,6,5,7,0,6,5,7,0,6,5,7,0,1,3,6,5,7,0,6,5,10,0,8,5,7 
r Ordinal symbol sequence Weighted ordinal symbol sequence
3.62  1,3,2,4,0,3,2,4,0,3,2,4,0,3,2,4,0,3,2,4,0,3,2,4,0,3,1,3  10,9,6,2,4,7,10,9,6,2,5,7,3,4,0,2,10,9,6,2,5,7,3,4,0,2 
3.63  2,4,0,3,1,3,2,4,0,3,1,3,2,4,0,3,1,3,2,4,0,3,1,3  5,4,2,3,0,1,5,4,2,3,0,1,5,4,2,3,0,1,5,4,2,3,0,1 
3.64  2,4,0,3,2,4,0,3,1,3,2,4,0,3,2,4,0,3,2,4,0,3,2,4,0,3,1,3  0,6,5,7,0,6,5,7,0,6,5,7,0,1,3,6,5,7,0,6,5,10,0,8,5,7 

2. Lorenz system

Results also demonstrate a slight advantage for the weighted ordinal partition for the Lorenz system with m = 4 , τ = 0.1, shown in Fig. 10. PE for the weighted ordinal partition accurately detects the onset of chaos at ρ 24, whereas PE for the ordinal partition decreases, while in both PE and LZC, periodic windows at ρ = 92 , 100 , 114 , 132 , 150 are more distinct for the weighted ordinal partition. The exception to this is PE for both partitions failing to detect the periodic windows at ρ = 92 , 150. The advantage that the weighted ordinal partition holds largely disappears when the embedding dimension is increased to m = 6, shown in Fig. 11; at a high enough dimension, both symbol sequences contain enough information to track LLE very accurately. This result suggests that using the mutual information statistic to optimize partition assignment for a lower number of symbols has a similar effect to increasing the number of symbols; that is, increasing the information conveyed per symbol.

FIG. 10.

LLE tracking results for the Lorenz system with embedding parameters m = 4 , τ = 0.1. Both (a) LZC and (b) PE for the weighted ordinal partition slightly outperform their ordinal partition counterparts in their ability to track LLE. Note in particular: (1) the larger decreases in LZC and PE for the weighted ordinal partition at periodic windows ρ = 100 , 114 , 132, and (2) the accurate detection of the onset of chaos at ρ 24 in PE from the weighted ordinal partition.

FIG. 10.

LLE tracking results for the Lorenz system with embedding parameters m = 4 , τ = 0.1. Both (a) LZC and (b) PE for the weighted ordinal partition slightly outperform their ordinal partition counterparts in their ability to track LLE. Note in particular: (1) the larger decreases in LZC and PE for the weighted ordinal partition at periodic windows ρ = 100 , 114 , 132, and (2) the accurate detection of the onset of chaos at ρ 24 in PE from the weighted ordinal partition.

Close modal
FIG. 11.

m = 6 , τ = 0.1 weighted ordinal and conventional ordinal partitions demonstrate a similar ability to track LLE on the Lorenz system for both (a) LZC and (b) PE. At this sufficiently high dimension, both symbol sequences contain enough information to track LLE very accurately, so the advantage of the weighted ordinal partition is no longer present.

FIG. 11.

m = 6 , τ = 0.1 weighted ordinal and conventional ordinal partitions demonstrate a similar ability to track LLE on the Lorenz system for both (a) LZC and (b) PE. At this sufficiently high dimension, both symbol sequences contain enough information to track LLE very accurately, so the advantage of the weighted ordinal partition is no longer present.

Close modal

These results demonstrate that the weighted ordinal partition can offer improvements over the ordinal partition. The ability for complexity measures from both partitions to track LLE is largely similar as they are conceptually identical. However, the mutual information statistic is successful in maximizing the amount of information per symbol given a partitioning scheme and limited number of symbols; this is the cause of the greater visibility of periodic windows and accuracy of detection of the initial transitions from periodicity to chaos.

In this section, we apply the mutual information statistic and the weighted ordinal partition to two sets of experimental data: a laser time series from the Santa Fe time series competition,14 and a set of data derived from ECG measurements known as the Fantasia dataset, originally recorded and studied by Iyengar et al.15 and made publicly available on PhysioBank.16 

The laser time series consists of 9093 observations representing the intensity of a far-infrared-laser in a chaotic state. In windows of length 100 observations, overlapping and separated by step size 10, we apply Algorithm 3 to find an optimal weighted partition, measure LZC from the partition’s symbol sequence, and track how it changes over time. Results are shown on a segment of the time series in Fig. 12. The time series exhibits oscillations, which increase in magnitude until they “collapse,” returning to small magnitude oscillations, which steadily increase in magnitude again. LZC values appear to increase as the magnitude of the oscillations increase, then suddenly spike downward as the collapse occurs, suggesting the system exhibits more chaotic behavior immediately prior to these transitions. LZC from the weighted ordinal partition captures these patterns better than LZC from the ordinal partition; note in the former the correct detection of the upward spike at index 6400 and a downward spike at index 6850. In LZC from the ordinal partition, the small upward spike at index 6400 has the same magnitude as another spike about 100 observations earlier, while the trough at index 6850 extends past index 7000.

FIG. 12.

LZC from m = 3 , τ = 2 weighted ordinal and ordinal partitions on a segment of the laser dataset, with a window size of 100 observations. LZC values increase prior to collapses from large to small magnitude oscillations, suggesting the system exhibits more chaotic behavior immediately prior to these transitions. LZC from the weighted ordinal partition outperforms LZC from the ordinal partition; note in the former the detection of the upward spike at index 6400 and a downward spike at index 6850.

FIG. 12.

LZC from m = 3 , τ = 2 weighted ordinal and ordinal partitions on a segment of the laser dataset, with a window size of 100 observations. LZC values increase prior to collapses from large to small magnitude oscillations, suggesting the system exhibits more chaotic behavior immediately prior to these transitions. LZC from the weighted ordinal partition outperforms LZC from the ordinal partition; note in the former the detection of the upward spike at index 6400 and a downward spike at index 6850.

Close modal

Additionally, we analyze a subset of the Fantasia dataset. The data consist of ten time series containing between 4936 and 8708 observations, representing interbeat time intervals from ten subjects recorded over two hours. Five of the time series are from elderly subjects aged 68–85, and five are from younger subjects aged 21–43. The authors of the original paper15 attempted to isolate age as the only experimental variable, with the aim being to discriminate between the two age groups based on the time series alone. In previous work, other authors have shown that measurements of permutation entropy are unable to do so.15,17 Applying Algorithm 3 to the time series, we generate the optimal weighted ordinal partition with the aim of identifying improvements over the standard ordinal partition in the ability to discriminate based on permutation entropy. However, we find that for embedding parameters 3 m 5 and 5 τ 50 observations, the optimal weighting is always 1. This means that the standard ordinal partition is already optimal, so the weighted ordinal partition is unable to offer any improvements in this case.

We have introduced a mutual information statistic to assess partitions of the state space of chaotic dynamical systems. Compared with existing generating partition statistics, this mutual information statistic produces assessments of partitions that better reflect their performance in time series analysis. We, therefore, suggest that the statistic’s mechanism, measuring mutual information between the trajectory history and current location, defines a more accurate notion of “high-information” partitions in the context of time series analysis.

The statistic’s formulation, as well as empirical results from its application, indicate that partitions utilizing trajectory history, such as the ordinal partition, perform well. We can, therefore, offer an account for the already popular ordinal partition’s usefulness and provide evidence supporting its continued use.

As an extension to the ordinal partition, we introduce the weighted ordinal partition. Optimizing the weighted ordinal partition’s parameters using the mutual information statistic produces partitions that exploit system dynamics more effectively than the conventional ordinal partition. The weighted ordinal partition demonstrates improvements over the ordinal partition in time series analysis, particularly, in distinguishing chaos from periodicity when using a small number of unique symbols. The weighted ordinal partition can also be applied to real-world datasets, demonstrated on two experimental time series.

J.L. was supported by the Australian Mathematical Sciences Institute through a 2023–2024 Vacation Research Scholarship.

The authors have no conflicts to disclose.

Jason Lu: Conceptualization (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Software (equal); Visualization (equal); Writing – original draft (equal). Michael Small: Conceptualization (equal); Methodology (equal); Supervision (equal); Writing – review & editing (equal).

Example code implementing an algorithm to calculate the mutual information statistic is openly available in the mutual-information-statistic repository at https://github.com/jason-luuuuu/mutual-information-statistic, Ref. 18.

1.
Y.
Hirata
and
J.
Amigó
, “
A review of symbolic dynamics and symbolic reconstruction of dynamical systems
,”
Chaos
33
,
052101
(
2023
).
2.
Y.
Hirata
and
K.
Aihara
, “
Estimating optimal partitions for stochastic complex systems
,”
Eur. Phys. J. Spec. Top.
222
(
2
),
303
315
(
2013
).
3.
M. B.
Kennel
and
M.
Buhl
, “
Estimating good discrete partitions from observed data: Symbolic false nearest neighbors
,”
Phys. Rev. Lett.
91
(
8
),
084102
(
2003
).
4.
D. J.
Miller
,
N. F.
Ghalyan
, and
A.
Ray
, “A locally optimal algorithm for estimating a generating partition from an observed time series,” in 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP) (Neural Comput., 2017), pp. 1–6.
5.
Y.
Hirata
,
K.
Judd
, and
D.
Kilminster
, “
Estimating a generating partition from observed time series: Symbolic shadowing
,”
Phys. Rev. E
70
,
016215
(
2004
).
6.
R. L.
Davidchack
,
Y. C.
Lai
,
E. M.
Bollt
, and
M.
Dhamala
, “
Estimating generating partitions of chaotic systems by unstable periodic orbits
,”
Phys. Rev. E
61
(
2
),
1353
1356
(
2000
).
7.
C.
Bandt
and
B.
Pompe
, “
Permutation entropy: A natural complexity measure for time series
,”
Phys. Rev. Lett.
88
,
174102
(
2002
).
8.
A.
Hadriche
,
N.
Jmail
, and
R.
Elleuch
, “
Different methods of partitioning the phase space of a dynamic system
,”
Int. J. Comput. Appl.
93
,
1
5
(
2014
).
9.
S. M.
Hammel
,
J. A.
Yorke
, and
C.
Grebogi
, “
Do numerical orbits of chaotic dynamical processes represent true orbits?
,”
J. Complex.
3
(
2
),
136
145
(
1987
).
10.
S.
Strogatz
,
Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering
(
Perseus
,
Reading, MA
,
1994
).
11.
E.
Tan
,
S.
Algar
,
D.
Corrêa
,
M.
Small
,
T.
Stemler
, and
D.
Walker
, “
Selecting embedding delays: An overview of embedding techniques and a new method using persistent homology
,”
Chaos
33
(
3
),
032101
(
2023
).
12.
K.
Geist
,
U.
Parlitz
, and
W.
Lauterborn
, “
Comparison of different methods for computing Lyapunov exponents
,”
Prog. Theor. Phys.
83
(
5
),
875
893
(
1990
).
13.
A.
Lempel
and
J.
Ziv
, “
On the complexity of finite sequences
,”
IEEE Trans. Inf. Theory
22
(
1
),
75
81
(
1976
).
14.
A. S.
Weigend
,
Time Series Prediction
(
Routledge
,
London
,
2019
).
15.
C.
Bian
,
C.
Qin
,
Q. D. Y.
Ma
, and
Q.
Shen
, “
Modified permutation-entropy analysis of heartbeat dynamics
,”
Phys. Rev. E
85
(
2 Pt 1
),
021906
(
2012
).
16.
A. L.
Goldberger
,
L. A.
Amaral
,
L.
Glass
,
J. M.
Hausdorff
,
P. C.
Ivanov
,
R. G.
Mark
,
J. E.
Mietus
,
G. B.
Moody
,
C. K.
Peng
, and
H. E.
Stanley
, “
PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals
,”
Circulation
101
(
23
),
E215
E220
(
2000
).
17.
M.
McCullough
,
M.
Small
,
H. H. C.
Iu
, and
T.
Stemler
, “
Multiscale ordinal network analysis of human cardiac dynamics
,”
Philos. Trans. R. Soc. A
375
(
2096
),
20160292
(
2017
).
18.
J.
Lu
(2024). “
Mutual information statistic algorithm
,” https://github.com/jason-luuuuu/mutual-information-statistic.