We introduce a generalization of the celebrated ordinal pattern approach for the analysis of time series, in which these are evaluated in terms of their distance to ordinal patterns defined in a continuous way. This allows us to naturally incorporate information about the local amplitude of the data and to optimize the ordinal pattern(s) to the problem under study. This last element represents a novel bridge between standard ordinal analysis and deep learning, allowing the achievement of results comparable to the latter in real-world classification problems while also retaining the conceptual simplicity, computational efficiency, and easy interpretability of the former. We test this through the use of synthetic time series, generated by standard chaotic maps and dynamical models, data sets representing brain activity in health and schizophrenia, and the dynamics of delays in the European air transport system. We further show how the continuous ordinal patterns can be used to assess other aspects of the dynamics, like time irreversibility.

Ordinal patterns are a way of analyzing time series in which values in sub-windows are studied in terms of their relative amplitude, or, in other words, of the permutation required to sort them. Such permutations are then represented as symbols, and their frequency is used to characterize the dynamics generating the time series. This, thus, represents a conceptually simple way of synthesizing a whole time series into a discrete distribution and, not surprisingly, has been applied to a plethora of real-world problems. At the same time, these patterns are fixed by construction; in this contribution, we then ask: is it possible to create a continuous version of these, which could be optimized to tackle a specific problem? The result is an approach that inherits the well-known advantages of the classical ordinal patterns analysis; however, this further creates a bridge with deep learning models for time series classification. We evaluate the performance of continuous ordinal patterns in synthetic and real-world time series and further show how existing approaches based on the classical ordinal patterns (specifically, the analysis of time irreversibility) can easily be adapted to this new continuous version.

## I. INTRODUCTION

In the last few decades, ordinal patterns have emerged as a powerful framework for the analysis of time series, both synthetic and representing real-world problems. Introduced in 2002 by Bandt and Pompe,^{1} their main advantage is the conceptual simplicity: values in the time series are represented in terms of their relative amplitude, or, in other words, in terms of their relative ranking. Order patterns, thus, encode the temporal and causal structures in the time series, and their frequency (or their absence, known as forbidden patterns^{2}) can be used to characterize the dynamics generating the time series. Not surprisingly, this approach has proven valuable in a variety of real-world problems,^{3,4} also due to its computational simplicity, weak requirements on the analyzed time series, and robustness against observational noise. At the same time, the simplicity of the approach also results in some limitations, which have been addressed in the literature through different kinds of extensions. The most important one is the fact that amplitudes of the time series are not taken into account, beyond the relationship between neighboring values; a time series $(0,1,2)$ is then seen the same as the time series $(0,1,20)$ from an ordinal pattern perspective, unless modified algorithms are considered.^{5,6} Additionally, equal values in the time series may lead to biased results;^{7} and very large data sets are required to describe high-order (i.e., slow evolving) dynamics.^{8}

An aspect that has hitherto been neglected is the fact that ordinal patterns are fixed by construction. They are obtained by evaluating the relative amplitude of neighboring values; hence, and disregarding the possibility of ties, one value can only be either larger or smaller than the preceding or following one. When evaluating sub-windows of size $D$ (also known as the embedding dimension), only $D!$ ordinal patterns can be obtained. This is, of course, the origin of the simplicity of the method: any $D$-length sub-window can be mapped into one of the $D!$ ordinal symbols, thus yielding an easily analyzable output. Yet, what would happen if such patterns would be defined in a continuous way, such that a sub-window could be represented by an arbitrary pattern $\pi $ not constrained to those $D!$ possibilities?

We here introduce the concept of continuous patterns, i.e., sequences of $D$ values ${\pi t}\u2208R$ (with $t=1,\u2026,D\u22121$). A sub-window of size $D$ of the analyzed time series is then compared to $\pi $ to quantify how the latter is representing the evolution of values of the former. While conceptually similar to the traditional ordinal pattern approach, continuous patterns are not constrained by construction; on the contrary, given a problem, one can find the best pattern to describe the underlying dynamics.

This represents a bridge between the seminal work of Bandt and Pompe^{1} and recent machine learning approaches. Continuous patterns can be seen as a way of pre-processing ordinal data (features) and transforming them into numerical values, which can then be analyzed using standard machine learning or statistics models.^{9–11} As will be shown below, this approach is also connected to deep learning.^{12–14} Specifically, continuous ordinal patterns can be used to discriminate the dynamics coming from different dynamical systems with an efficiency at times comparable to that of state-of-the-art deep learning models. At the same time, retaining part of the original concept allows inheriting advantages like the minimal computational cost, low sensitivity to observational noise, and a high explainability of results.

In the remainder of the contribution, we first present the method in Sec. II, starting with a brief overview of the classical ordinal approach and then moving to the continuous version. The concept is then evaluated in synthetic (Sec. III) and real-world (Sec. IV) time series, the latter ones representing biological and technological systems. In order to show how the continuous patterns can be adapted to solutions based on classical ordinal patterns, Sec. V presents an application to the identification of the temporal irreversibility of time series. Finally, Sec. VI draws some conclusions and proposes future lines of work.

## II. METHODS

### A. Ordinal patterns

The analysis of a time series using classical ordinal patterns is a simple process, implemented through a set of three straightforward steps. First, the time series is divided into sub-windows (either overlapping or not) of size $D>2$, where the latter is known as the embedding dimension. Note that non-consecutive values can also be used, through an embedding lag $\tau >1$, although this case is here not considered for the sake of simplicity. Second, elements inside each sub-window are sorted in increasing order. Last, the permutation performed in the previous step is encoded as a symbol, which is called the ordinal pattern of that sub-window.

Since its introduction 20 years ago,^{1} both the concept of ordinal patterns and modifications thereof have been used in uncountable applications. To illustrate, one may assess the absence of some patterns^{2} or the absence of transitions between them,^{15} which have been shown to be fingerprints of chaotic dynamics; or metrics like entropy or complexity can be calculated on the probability distributions of these patterns^{16} to characterize the time series dynamics. While a complete enumeration of these applications is beyond the scope of the present contribution, the interested reader may refer to the many review works available in the literature.^{3,4,6,17}

### B. Continuous ordinal patterns

Let us now generalize the ordinal pattern concept. For the sake of clarity, the following description will be illustrated through examples with embedding dimension of $D=3$; higher order generalizations are straightforward.

By a continuous ordinal pattern $\pi $, we here refer to a set of $D$ values $\pi =(\pi 0,\pi 1,\u2026,\pi D\u22121)$ normalized in the range $[\u22121,1]$. For $D=3$, this implies that $\pi =(\pi 0,\pi 1,\pi 2)$, with $\pi i=\u22121$ and $\pi j=1$ ($i$ and $j\u2208(0,1,2)$). Let us further consider a time series $X$ composed of $n$ values $X=(x1,x2,\u2026,xn)$; and a sub-window $st$ of it, of size $D$, starting at time $t$, i.e., $s=(xt,xt+1,\u2026,xt+D\u22121)$.

The first step of the analysis involves normalizing the sub-window in the range $[\u22121,1]$, to obtain a new $s\u2217=(xt\u2217,xt+1\u2217,\u2026,xt+D\u22121\u2217)$. Second, the distance $d$ between each value of the sub-window and of $\pi $ is calculated and summed to obtain $\varphi \pi (t)$, i.e.,

The factor $1/2D$ is used to normalize the result between $0$ and $1$. Note that $\varphi \pi (t)$ represents the distance between the normalized sub-window $s\u2217$ and $\pi $ such that the two are perfectly equal for $\varphi \pi (t)=0$. A graphical representation of the process is presented in Fig. 1, and a synthesis of the notation is reported in Table I.

Symbol . | Meaning . |
---|---|

D | Embedding dimension, or size of the ordinal pattern. |

s, $s\u2217$ | Sub-window of the original time series, and its (amplitude) normalized version. |

π, π_{i} | Continuous ordinal pattern, and ith element of it. |

d_{i} | Distance between the ith element of $s\u2217$ and π_{i}. |

ϕ_{π}(t) | Normalized distance between the sub-window starting at time t, and the pattern π. |

1 − ϕ_{π}(t) | Similarity between the sub-window and the pattern π. |

Symbol . | Meaning . |
---|---|

D | Embedding dimension, or size of the ordinal pattern. |

s, $s\u2217$ | Sub-window of the original time series, and its (amplitude) normalized version. |

π, π_{i} | Continuous ordinal pattern, and ith element of it. |

d_{i} | Distance between the ith element of $s\u2217$ and π_{i}. |

ϕ_{π}(t) | Normalized distance between the sub-window starting at time t, and the pattern π. |

1 − ϕ_{π}(t) | Similarity between the sub-window and the pattern π. |

In order to clarify this process, let us consider a time series $x=(1,3,2)$, i.e., with $n=D=3$ and $s=x$ for the sake of simplicity; and $\pi =(\u22121,0,1)$. The normalization of the sub-window under analysis yields $s\u2217=(\u22121,1,0)$; consequently, $\varphi \pi =\u2211di/2D=(0+1+1)/6=1/3$.

In synthesis, $\varphi \pi (t)$ represents the distance between the pattern $\pi $ and the normalized sub-window starting at $t$; and $1\u2212\varphi \pi (t)$ can be understood as a metric of how well the pattern $\pi $ represents the dynamics of the time series at time $t$. When the average (or median) of $1\u2212\varphi \pi $ is calculated over all possible sub-windows, the result quantifies how important is pattern $\pi $ to understand the dynamics of the system generating the time series.

As a last point, it is worth noting that $\pi $ has prima facie $D$ degrees of freedom. To illustrate, for $D=3$, one has three parameters $\pi 0$, $\pi 1$, and $\pi 2$. Yet, due to the normalization, it is more correct to say that six versions of the ordinal pattern exist, each one with a single degree of freedom. In a more explicit form, one has $(\u22121,\u2217,1)$, $(\u22121,1,\u2217)$, $(\u2217,\u22121,1)$, $(\u2217,1,\u22121)$, $(1,\u22121,\u2217)$, and $(1,\u2217,\u22121)$—with the asterisk representing any real number in the range $[\u22121,1]$. Naturally, these correspond to the same six classical ordinal patterns for $D=3$, the main difference being that the intermediate value is a continuous variable—hence the name *continuous ordinal pattern*. Using a complementary interpretation, given two patterns of the Bandt and Pompe methodology, we here introduce a continuous set of intermediate patterns.

### C. Between ordinal patterns and deep learning

Before moving to evaluation tasks and applications, it is worth taking a moment to ponder the similarity between the proposed method, and other ones available in the literature.

On one hand, it is clear that the continuous ordinal patterns are a generalization of the well-known ordinal patterns of Bandt and Pompe.^{1} Aspects like the embedding dimension $D$ or the embedding delay $\tau $ can seamlessly be applied to the proposed approach. At the same time, it naturally solves some problems that have been identified in the literature, including the presence of ties,^{7,18} which are here not relevant as no ranking is calculated. Also, standard ordinal patterns are amplitude agnostic, i.e., two time series $(0.1,0.11,0.5)$ and $(0.1,0.49,0.5)$ are codified by the same pattern, and this has been shown to lead to the loss of relevant information.^{5,6,19} On the contrary, the proposed approach incorporates information about the amplitude of values inside each window, albeit not about the amplitude of the whole window.

While the result of an ordinal pattern analysis is composed of a discrete set of elements (e.g., the probability of six patterns for $D=3$), the result of the proposed approach is more complex. Specifically, the application of one continuous pattern $\pi $ over a sub-window yields a value $\varphi \pi $; hence, a complete time series is represented by another time series $\varphi \pi (t)$, or alternatively, by a probability distribution $p\pi $. While the equivalent of a permutation entropy can be calculated, this should be estimated on the continuous distribution $p\pi $, and then integrated over all possible $\pi $s. On the positive side, this brings three advantages. First of all, patterns are not given but can instead be optimized to characterize specific aspects of the time series—this point will be used in the following examples. Second, these continuous patterns can directly be used to assess the evolution though time of the characteristics of the studied time series, as by definition $\varphi \pi (t)$ is a real value associated with each window; in the case of the classical ordinal patterns, the result would just be a binary value. Third, the requirement of a very long time series for large values of $D$ does not longer hold, as the number of measurements $\varphi \pi (t)$ that can be obtained from a time series of length $n$ is $n\u2212D$.

On the other side of the spectrum, one may ask what is the relation between the continuous patterns here proposed, and other techniques for data analysis. The attentive reader would have noted that the process of calculating $\varphi \pi $ is conceptually similar to a convolution, where the multiplication of elements is substituted by the assessment of a normalized distance. The calculation of one or more convolutions is the basic element behind modern artificial neural networks and deep learning designs,^{12–14} where the output is usually merged (or pooled) into a single numerical output.^{20,21} An analysis based on continuous ordinal patterns can, thus, be seen as the first step of a deep learning approach, yet with the important advantage of not being a black box—i.e., the practitioner can exactly see which pattern is used and why, and the main characteristic of the time series is thus explicit.

In synthesis, continuous ordinal patterns can be seen as a bridge between classical ordinal patterns and deep learning, in which patterns can be tuned and optimized to a specific problem, thus yielding tailored information about the underlying dynamics while retaining the conceptual simplicity of the classical ordinal analysis.

## III. ANALYSIS OF SYNTHETIC TIME SERIES

We start the analysis of the proposed methodology by resorting to synthetic time series generated using the well-known logistic map $xt+1=rxt(1\u2212xt)$, in the chaotic regime with $r=4$ and Lyapunov exponent $\lambda =ln\u2061(2)$. As previously discussed, there is an infinite number of continuous patterns $\pi $ that could be applied to the time series, as opposed to the $D!$ of the classical ordinal approach. It is, thus, necessary to define both a criterion and an algorithm to retrieve the optimal $\pi $.

Let us recall that $\varphi \pi $ quantifies the distance between the pattern $\pi $ and a sub-window of the time series; the mean of $1\u2212\varphi \pi $, thus, quantifies how well represented is the complete time series by the pattern $\pi $. As we aim at synthesizing the unique dynamics of the system (in this case, of the logistic map), it stands to reason to aim at obtaining a pattern $\pi $ whose average $1\u2212\varphi \pi $ is much higher for the target system than what observed in randomly shuffled time series, i.e., in similar time series whose temporal structure has been destroyed. In other words, we want to define a pattern $\pi $ that describes well what is observed in the real system (i.e., yielding a large average $1\u2212\varphi \pi $ when applied to the real data); but at the same time, as farther away as possible from the same system when the temporal information is deleted (i.e., small average $1\u2212\varphi \pi $ when applied to surrogate data). We here resort to a Kolmogorov–Smirnov two-sample test, quantifying the difference between two probability distributions for $\varphi \pi $: one corresponding to the time series generated by the logistic map, and another for the shuffled version of the same time series. Given a continuous pattern $\pi $, the larger the statistic yielded by the test, the more different the two distributions of $\varphi \pi $ are, hence the more $\pi $ is representative of the dynamics under analysis.

As a final step, we use a dual annealing optimization algorithm^{22} to find the pattern $\pi $ that maximizes the test statistics. Such a pattern will here be called “optimal,”in the sense that it is the best one (according to the optimization procedure) for differentiating the $\varphi \pi $ of the original time series from the randomly shuffled versions of the same. While this is reasonable for synthetic time series, the situation may be more complex in the case of real-world data. Specifically, the optimization algorithm may fail to converge to a good solution; and it is also possible for multiple “optimal” patterns to exist, for instance, if the system is switching between different dynamical regimes. None of these situations have been observed in the synthetic examples here presented.

Panel (a) of Fig. 2 reports the continuous pattern $\pi $ maximizing the differences between the $\varphi $s of logistic time series and of randomly shuffled versions of the same. Note that it resembles the pattern $(1,3,2)$ of the classical approach. The inset in the same panel also reports the two probability distributions of the average $\varphi \pi $, respectively, for the logistic map’s (black line) and the shuffled (brown line) time series. It can be appreciated that, as expected, the average $\varphi \pi $ is in general much lower in the case of the logistic map—i.e., the pattern represents well the dynamics of the map, and the average distance is lower compared to the shuffled case.

In order to evaluate the discrimination capacity of this continuous pattern, the average $\varphi \pi $ obtained for each time series has been used as an input feature in a classification task, in which a Decision Tree model^{23} has been trained to distinguish between original and shuffled versions of these time series. The maximum depth of the model has been set to one; in other words, the model only tries to infer the threshold best discriminating both groups. Additionally, the classification score has been estimated through the resulting precision, i.e., the fraction of instances correctly classified. The obtained classification score, as a function of the length of the time series, is depicted in panel (b) of Fig. 2 (blue line). As a reference, the same classification task has also been executed using the Permutation Entropy of each time series, i.e., the Shannon entropy calculated over the probability distribution created by traditional ordinal patterns (green line) and by resorting to a ResNet deep learning model (magenta line), as proposed in Ref. 24. It can be appreciated that the continuous ordinal pattern is much more effective in discriminating between the two dynamics when compared to the classical Permutation Entropy, especially for very short time series. This is due to the fact that a continuous pattern of size $D$ can be evaluated on a time series as short as $D$ points; on the other hand, reliable statistics on the traditional ordinal patterns require longer time series, at least of $(D+1)!$ points.^{2}

As a final issue, panel (c) of Fig. 2 reports an analysis of the sensitivity of the results to the shape of the continuous pattern; and specifically the evolution of the classification score when the last value of the pattern [i.e., $\pi i=2$ of panel (a)] is changed. As is to be expected, results get worse the more the pattern departs from the optimal one. Still, classification scores maintain above $0.9$, as the global shape of the pattern is not substantially changed. It can be concluded that continuous patterns are quite robust and that the optimization process is not a major bottleneck.

Moving to other dynamical models, Fig. 3 reports the best continuous patterns $\pi $, obtained with the same optimization strategy, for:

*Arnold cat map*: $xt+1=(xt+yt)mod1$, $yt+1=(xt+kyt)mod1$, with $k=2$.*Cubic map*: $xt+1=Axt(1\u2212xt2)$, with $A=3$.*Gauss map*: $xt+1=1xtmod1$.*Henon map*: $xn+1=1+yn\u2212axn2$, $yn+1=bxn$, with $a=1.4$ and $b=0.3$.

It can be appreciated that the optimal pattern is map-dependent, i.e., the dynamics of each map is best characterized by a different continuous pattern. We further analyze the results obtained through this methodology for a Lorenz chaotic system, i.e., a continuous system defined as $x\u2032=\sigma (y\u2212x)$, $y\u2032=x(\rho \u2212z)\u2212y$, and $z\u2032=xy\u2212\beta z$, with $\rho =28$, $\sigma =10$ and $\beta =8/3$. The system has been solved with integration steps of $dt=0.01$, with the three variables $x$, $y$, and $z$ here analyzed independently. Panel (a) of Fig. 4 reports the three best continuous patterns $\pi $; notably, the variables $x$ and $z$ are associated with similar continuous patterns, which are substantially different from the one for $y$. Panel (b) of the same figure further reports the classification score obtained through $\varphi \pi $, when the time series are downsampled by a given factor; note that this is the same as considering a lag $\tau >1$ when reconstructing the patterns $\pi $. In this case, variables $x$ and $y$ have a similar behavior, while results for variable $z$ are characterized by a strong oscillation—this latter is caused by the autocorrelation function of the $z$ variable of the Lorenz system, see for instance, Ref. 25.

One of the major advantages that ordinal patterns offer when analyzing real-world data is the robustness of results to observational noise. Figure 5 delves into this topic, by showing the evolution of the classification score for four chaotic maps and time series of length $24$, when an observational noise drawn from a uniform distribution is added to the data. Note that the X axis represents the amplitude of such noise, normalized according to the standard deviation of the original time series. It can be appreciated that the continuous pattern solution is quite resilient to noise. Still, it is important to see that such robustness is obtained only if the best continuous ordinal patterns are calculated on the noisy data (continuous lines), as opposed to the original (not noisy) ones (dashed lines); in other words, the patterns have to represent the real dynamics to be characterized and not the ideal (not noisy) ones.

Finally, a further advantage that is associated with ordinal pattern analyses is the reduced computational cost. While evaluating $\varphi \pi (t)$ is computationally trivial, the estimation of the best pattern $\pi $ given a set of data is a complex problem, which is here, for instance, solved through the dual annealing optimization algorithm. As an example, Fig. 6 analyzes the computational cost of obtaining such optimal pattern $\pi $, using $5000$ time series created with the logistic map and $5000$ randomly shuffled versions of the same, as a function of the maximum number of iterations of the dual annealing algorithm. The figure also reports the computational cost of classifying the same time series using a standard deep learning algorithm and the classification score obtained by a Decision Tree model trained using $\varphi \pi $. In both cases, the computation cost has been obtained on a $3.8$ GHz $8$-Core Intel Core i7 with $32$ GBs of memory. It can be appreciated that the proposed approach is one order of magnitude faster than a deep learning model and additionally that few iterations of the dual annealing algorithm are enough to reach a stable result, as was initially suggested in Fig. 2(c).

## IV. ANALYSIS OF REAL DATA

### A. Analysis of brain time series

As an example of the use of the proposed approach in a real-world problem, we here consider the analysis of brain dynamics data, and specifically the task of characterizing the differences between healthy subjects, and patients suffering from a neurological condition. The electroencephalographic (EEG) recordings here used correspond to a set of $14$ schizophrenia patients and the same number of matched control subjects,^{26} available at http://dx.doi.org/10.18150/repod.0107441. The patients ($7$ males, $27.9\xb13.3$ years, and $7$ females, $28.3\xb14.1$ years) met International Classification of Diseases ICD-10 criteria for paranoid schizophrenia (category F20.0). The corresponding healthy controls were $7$ males (age of $26.8\xb12.9$ years) and $7$ females (age of $28.7\xb13.4$). EEG data correspond to an eyes-closed resting state condition and were recorded at $250$ Hz using the standard 10–20 EEG montage with 19 EEG channels: Fp1, Fp2, F7, F3, Fz, F4, F8, T3, C3, Cz, C4, T4, T5, P3, Pz, P4, T6, O1, O2. The reference electrode was placed at FCz. No additional preprocessing steps (e.g., artifact removal) have been performed, beyond what was initially included in the original study.^{26}

Using the optimization strategy previously described, based on a dual annealing algorithm,^{22} we calculated the best continuous patterns separating the control subjects from the patients, one pattern for each sensor. Note that, in order to avoid overfitting in the next steps, only the first ten subjects of each group are used at this stage. It is interesting to note that, as opposed to what previously observed, the average value of $\varphi \pi $ is quite similar in both groups—see panel (a) of Fig. 7. On the contrary, what differentiates the two groups are the tails of the distribution of $\varphi \pi $; in other words, in the example of Fig. 7(a), control subjects (green line) are characterized by a larger variability in $\varphi \pi (t)$ than patients (red line). With this in mind, the optimization is not performed by maximizing the difference of the average between both groups, but instead by maximizing the difference of the standard deviation of $\varphi \pi (t)$—denoted, in what follows, by $\sigma [\varphi \pi (t)]$. The resulting best patterns to differentiate both groups, for sensor F7, are shown in panels (c)–(e), for $D=3$, $4$, and $5$; additionally, the bottom panels report the probability distributions of $\sigma [\varphi \pi (t)]$, both for control subjects (green lines) and patients (red lines).

Once the best patterns have been obtained, these can be used to classify subjects according to their condition. Note that the objective here is not to yield a clinical tool, but rather using the score of the classification task as a quantifier of the effectivity (or usefulness) of the presented approach. The time series of all subjects have been divided in non-overlapping windows of $1000$ data points, and the corresponding $\sigma [\varphi \pi (t)]$ has been calculated. Afterward, a classification has been performed resorting to a Decision Tree algorithm,^{23} trained with data from the first ten subjects, and validated using the four remaining ones. Note that the classification score here reported, thus, corresponds to data that have not been used in the training phase, nor in the definition of the best patterns. Figure 7(b) presents a bar plot of the classification score, as a function of the embedding dimension $D$ (depicted by different shades of green) and of the EEG channel. Results are generally good, with most of the classification scores being above $0.8$. It can also be noted that results are generally better for larger $D$s, suggesting that a short pattern is not enough to capture the complexity of brain dynamics.

In order to add a reference, the horizontal magenta lines of Fig. 7(b) report the classification score obtained, over the same time series, using a ResNet deep learning algorithm. ResNet (or Residual Network) is a deep learning architecture based on the pooling of a large number of convolutional layers, which has proven to be one of the best models for the classification of time series.^{27,28} It can be appreciated that the results of the proposed method are qualitatively similar that those of ResNet, beyond some natural variability due to the limited size of the data set, suggesting that the former is able to extract the most useful information from the time series. The only notable exception is the sensor T5, for which the deep learning approach strongly underperforms.

### B. Analysis of air transport delay patterns

As a second example of the usefulness of the proposed method, we are here going to analyze time series representing the evolution of delays in air transport. For this, we leverage on the results of Ref. 29, which showed that airport delay profiles, i.e., their relative evolution through time, are highly identifiable; in other words, given a delay profile, a deep learning model can easily identify to which airport it corresponds. This has major implications for air traffic management, as it suggests that delays are not random events, but are instead the result of systematic inefficiencies in the system, which can in principle be tackled. Still, the use of deep learning models implies that little information can be extracted about the patterns supporting such uniqueness. We here bridge this gap, by calculating which are the continuous ordinal patterns that are maximally different in pairs of airports.

Data about air transport operations have been obtained from the EUROCONTROL’s R&D Data Archive, a public repository of historical flights made available for research purposes, and freely accessible at https://www.eurocontrol.int/dashboard/rnd-data-archive. It includes information about all commercial flights operating in and over Europe, completed with flight plans, radar data, and associated airspace structure and it spans four months (i.e., March, June, September, and December) of four years (2015–2018). For the sake of simplicity, only the top-five European airports have here been considered: London Heathrow, Paris Charles de Gaulle, Amsterdam Schiphol, Frankfurt am Main, and Adolfo Suárez Madrid-Barajas. For each flight landing at these airports, the corresponding delay has been calculated as the difference between the actual (as obtained from radar reports) and the planned (according to the last filed flight plan) landing times. Afterward, for each airport, flights have been grouped according to the actual landing hour, and the average delay per hour and airport has been calculated. Finally, time series have been normalized through a Z-Score transformation—as the focus is on the evolution of delays, as opposed to their magnitude.

For each pair of airports, the best continuous ordinal pattern $\pi $ for discriminating the corresponding time series has been obtained using the same dual annealing algorithm^{22} as before; results are then presented in Fig. 8. While results are quite heterogeneous, two main groups of patterns can be identified: those having a constant increase or decrease, as, e.g., Paris Charles de Gaulle vs London Heathrow and those presenting a turning point, as, e.g., Amsterdam Schiphol vs London Heathrow. Note that these are two sides of the same coin, and suggest that airports differentiate according to their capability for recovering from (or limiting) sudden bursts of delays: some of them are able to go back to the previous level in less than 1 h, hence the pattern with a turning point, while others keep accumulating additional delays in a snowball effect. This dynamics of delays amplifying in time is a well-known aspect of airport operations, called “delay multiplication” (or “delay multiplier”);^{30–32} its quantitative assessment is nevertheless not always an easy task. Continuous ordinal patterns seem to be a viable alternative, able to extract a simplified version of patterns in delay evolution, which will have to be tested in the future with larger data sets.

## V. ASSESSMENT OF TIME IRREVERSIBILITY

As a final topic, we here present some initial thoughts and results on the relation between continuous ordinal patterns and irreversibility. A system, or a time series representing its evolution, is said to be irreversible when it is recognizable (and hence significantly different) from its time-reversed version; or, mathematically, when the result of a function applied to its evolution is different, in a statistically significant way, from the result of the same function applied to the time-reversed version. In a more graphical way, one can imagine recording a dynamical process in a tape recorder: the process is irreversible if one can distinguish between the two senses of reading back the tape.^{33} Statistical physics has long been interested in the irreversibility of processes, as it measures how much entropy is produced as time passes,^{34} or, in other words, how far away from equilibrium a system is and further quantifies the degree of nonlinear dependencies (memory).^{35} Not surprisingly, many alternative approaches have been proposed in the last few years to quantify it,^{36} based on the assessment of different functions on the time series under study. Among these, several research groups have independently proposed metrics based on ordinal patterns, with the common idea that irreversibility must appear as an imbalance between the frequency of ordinal patterns that are pairwise time symmetrical.^{25,37–40}

The assessment of the irreversibility of a time series with continuous ordinal patterns is straightforward. Specifically, given a pattern $\pi $, the degree of irreversibility can be estimated as the distance between two probability distributions of $\varphi $, obtained by evaluating $\pi $, respectively, on the original and time-reversed time series. The main difference is that the pattern is not given, as in the case of standard ordinal patterns but must instead be extracted from the data themselves. In other words, the result is not only the level of irreversibility of the time series but also the best continuous ordinal pattern in which such irreversibility manifests itself.

As in previous cases, given a set of time series, we execute a dual annealing algorithm^{22} aimed at finding the pattern $\pi $ maximizing the distance between the distribution of $\varphi $ for the original and time-reversed time series. This distance is estimated through the statistic yielded by a two-sample Kolmogorov–Smirnov test, which is defined as a value between zero (equal distributions) and one (completely non-overlapping distributions). The test statistic is, thus, a directly estimation of the degree of irreversibility of the time series under analysis.

The left panels of Fig. 9 (green lines) report the evolution of such irreversibility for sets of $103$ time series obtained through the logistic (top panel) and Henon (bottom panel) maps, i.e., two maps that are known to be irreversible. As a reference, the black lines in the same panels report the evolution of the same metric when the values in the time series are randomly shuffled, thus destroying any temporal structure. In both cases, the detected irreversibility is well above what is seen in the shuffled time series, as is to be expected. As a note of caution, it has to be highlighted that the optimization procedure requires large quantities of data, here provided through sets of short time series; the result is thus not a statistical test on individual time series but is instead more akin to a deep learning approach—see Refs. 41 and 24.

As an additional test, the right panel of Fig. 9 reports the irreversibility obtained when analyzing the brain activity time series for control subjects (green bars) and Schizophrenia patients (red bars) of the same data set previously considered.^{26} As a reference, the gray bands depict the 10–90 percentile interval obtained when the corresponding time series are randomly shuffled. It can be appreciated that a weak but significant irreversibility is detected for all EEG channels. This is to be expected, as brain activity is intrinsically non-linear and depends on its own past; at the same time, due to the coarse grained and noisy nature of these recordings, irreversibility can also be challenging to detect.^{42–45} The irreversibility significantly increases in Schizophrenia patients in the parietal sensors (i.e., P3, P4, and Pz), while is reduced in the temporal region (see T3 through T7). This is aligned with the known alteration of these two regions’ dynamics in the pathology. Specifically, hyperactivity is usually observed in the inferior parietal lobe,^{46} which may be the result of the increased non-linearity suggested by the higher irreversibility; and altered dynamics of the temporal lobe has been shown to be associated with psychotic behavior.^{47}

## VI. DISCUSSION AND CONCLUSIONS

In this contribution, we proposed an extension of the classical ordinal pattern approach for time series analysis, based on considering continuous patterns and on assessing the distance of these from sub-windows in the original time series. In spite of the similarity between both approaches, this is not an incremental advancement, but rather a qualitative change in the way time series is analyzed. Specifically, it can be seen as a way of connecting ordinal patterns analysis and deep learning.

First, one can start by considering the original ordinal pattern idea; and, instead of evaluating if the values of a sub-window follow the same trend, calculate a normalized distance between the sub-window values and the pattern. The result thus turns from a binary value (i.e., does the sub-window conform to the ordinal pattern), to a real one. This naturally solves some problems of the original approach, like the fact that the amplitude of the time series was disregarded, and the presence of equal values. Second, there is no longer a need for maintaining the values inside the ordinal pattern fixed; and specifically, the intermediate one can freely be moved between the top and bottom limits. The drawback is that, for each original ordinal pattern, an infinite number of continuous ordinal patterns can be defined—alternatively, each pattern can be parameterized. On the other hand, this implies that ordinal patterns are no longer “given,” and an optimization process can be performed to find the best continuous pattern for a given problem. Merging both steps, it is easy to see how the proposed continuous ordinal patterns create the bridge between standard ordinal analysis and deep learning: they incorporate a learning component typical of the latter but retain the conceptual simplicity, computational efficiency, and easy interpretability of the former.

The analysis of real-world data, and specifically of time series representing brain activity in Schizophrenia patients and control subjects, illustrates that the proposed approach is able to detect differences with a sensitivity at times approaching one of deep learning models. This is achieved with a substantially lower computational cost and, most importantly, yields explicit information on the patterns behind these differences—something difficult to obtain from black-box approaches. This advantage is further shown in the second example, focusing on aeronautical data; results presented in a previous work are recovered, but they are further clarified in terms of the magnitude of delay multiplication.

In addition, the proposed approach can easily be adapted to tackling problems that have usually been studied through classical ordinal patterns. As an example, we have here presented an application to the detection of time irreversibility on brain activity data. Once again, instead of evaluating the irreversibility on a fixed set of ordinal patterns, the proposed approach allows finding the optimal continuous pattern to detect such a characteristic.

Beyond what is here presented, the flexibility of this approach opens up many avenues for future research works. For instance, a continuous ordinal pattern can be used to transform a time series into a new one—by transforming overlapping sub-windows into individual real values. These new time series could then be analyzed through other approaches, not least the original ordinal approach or deep learning models. The advantage is that the new time series would represent a specific tunable aspect of the original one and can, thus, be understood as a filtering of the dynamics. At the same time, some challenges will have to be tackled, including understanding what are the best strategies in the learning phase, beyond the annealing optimization here considered or whether multiple “optimal” patterns can coexist, and what would this tell us about the underlying dynamics.

## ACKNOWLEDGMENTS

This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 851255). This work has been partially supported by the María de Maeztu project CEX2021-001164-M funded by the MCIN/AEI/10.13039/501100011033.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

**Massimiliano Zanin:** Conceptualization (equal); Formal analysis (equal); Methodology (equal); Writing – original draft (equal); Writing – review & editing (equal).

## DATA AVAILABILITY

The data that support the findings of this study, respectively, for electroencephalography and air transport, are openly available in Ref. 48 at http://dx.doi.org/10.18150/repod.0107441 and https://www.eurocontrol.int/dashboard/rnd-data-archive, Ref. 49.

## REFERENCES

*2016 International Joint Conference on Neural Networks (IJCNN)*(IEEE, 2016), pp. 2174–2181.

*2017 International Conference on Engineering and Technology (ICET)*(IEEE, 2017), pp. 1–6.

*2017 International Joint Conference on Neural Networks (IJCNN)*(IEEE, 2017), pp. 1578–1585.

*European Air Traffic Management*(Routledge, 2016), pp. 116–141.