Order patterns apply well to many fields, because of minimal stationarity assumptions. Here, we fix the methodology of patterns of length 3 by introducing an orthogonal system of four pattern contrasts, that is, weighted differences of pattern frequencies. These contrasts are statistically independent and turn up as eigenvectors of a covariance matrix both in the independence model and the random walk model. The most important contrast is the turning rate. It can be used to evaluate sleep depth directly from EEG (electroencephalographic brain data). The paper discusses fluctuations of permutation entropy, statistical tests, and the need of new models for noises like EEG.

Order patterns have been applied successfully to various biomedical, geophysical, climate, finance and other data.^{1,2} Their importance grows as long as the size of data grows. On the theoretical side, connections with dynamical systems have been thoroughly studied.^{3–5} There are links to stochastic processes,^{6–8} also in the multivariate case.^{9–11} Different statistical methods were worked out.^{12–17} Nevertheless, basic problems remain to be clarified and new models built to establish order patterns firmly as a mathematical subject. Our paper deals with the statistics of noisy data in the case of a single time series. Only patterns of length up to 5 are of interest, but the delay between time points can vary. For length 3 we define four contrasts, that is, differences of pattern frequencies. This will be the unique system of orthogonal and statistically independent contrasts. For EEG (electroencephalographic brain data) data, a case study shows that the turning rate contrast is most relevant. As a function of time and delay, it has the power to classify sleep stages and uncover an infra-slow rhythm of the brain. As concerns serial dependence tests, permutation entropy seems a good choice because of its slim distribution under the null hypothesis. EEG shows serial dependence at all times and scales and so requires new models of noise.

## I. INTRODUCTION

### A. A personal remark

Karsten Keller asked me to give the opening talk of the Orpatt22 workshop in Dresden. This was a great honor, which encouraged me to tie together various lose ends of my research in recent years. Karsten and I collaborated on topology of fractals in the 1990s, a work which culminated in Karsten’s habilitation,^{18} and after 2000 with Bernd Pompe and others on ordinal time series analysis. I went to Dresden only for two nights, due to bad health of my wife. We had fruitful discussions on both evenings which I shall never forget. A few weeks later, just four days after my wife, Karsten died—unexpectedly, a shock for all of us. I write this paper with great sadness and gratitude, remembering Karsten’s very original, kind, constructive, and optimistic way to teach and do research.

### B. Contents of the paper

Figure 1 shows the six possible order patterns of three consecutive values in a time series, as shown in many papers and defined in Sec. II. Most authors study the frequencies $ p 123, p 132,\u2026$ of these patterns. This paper shows that it is better to study contrasts, that is, differences of frequencies. The four lines in Fig. 1 indicate a canonical system of four contrasts. The line marked $\beta ,$ for instance, denotes the parameter $\beta = p 123\u2212 p 321$ which measures the asymmetry of up and down in a time series. The parameter $\alpha $ measures the amount of turning points (right) compared to monotonously increasing or decreasing parts (left) of the time series. We call $\alpha $ turning rate since it is the relative amount of time points where we have a local maximum or minimum. Small $\alpha $ indicates smoothness, large $\alpha $ means fast oscillation. The parameters $\gamma $ and $\delta $ measure deviations from spatially symmetric and time reversible processes,^{7} similar to $\beta $.

The parameters were introduced^{6,19–21} because they can be interpreted better than frequencies of single patterns. Here, we show that for mathematical reasons, these four parameters form the canonical framework for the study of patterns of length 3. The results in Sec. IV say that the estimates of these four parameters are uncorrelated while the estimates of single pattern frequencies are known to be heavily correlated.^{13,15,16} The four contrasts appear as eigenvectors of a covariance matrix in the random walk model as well as in the independence model. Thus, this system is unique, the methodology for patterns of length 3 is more or less fixed.

On the flip side of the coin, this shows that patterns of length 3 do not contain much information. Suppose the process under study is symmetric with respect to time and spatial reversal—which all Gaussian processes and many observed processes like EEG are. Then, the three asymmetry parameters are zero, and only the turning rate will describe the order structure of length 3. In order to better distinguish symmetric processes, we have to study patterns of length 4, as well as permutation entropy for length $m=5$ or 6. Recent papers indicate that there are other ways to evaluate long order patterns than to determine all $m!$ pattern frequencies.^{17,22,23}

Statistical aspects of permutation entropy are studied by simulation in Sec. V. In the applied part of the paper, we discuss the straightforward use of $\alpha $ to sleep stage classification by two EEG channels. Section VI C also discusses an apparent infra-slow biorhythm uncovered by the turning rate. We study the effects of low-pass and high-pass filtering on the order structure. The statistics of entropy fluctuations is used to test serial dependence of the EEG data in Sec. VII C. EEG has dependencies on every scale. It requires new models.

### C. A historical note

In 1876, Bienaymè^{24} studied the “turning rate,” the relative number of local maxima and minima, in a time series of $T$ observations. He claimed that the turning rate is approximately normally distributed, with mean $ 2 3$ and variance $ 8 45 T.$ His proof was not written down, but presented in the seminar of the Society by J. Bertrand^{24}. However, the correct value of the variance could only be obtained by studying the 120 order patterns of length 5, in a similar way as in Refs. 13, 15, 16, and 33. Moreover, the paper contained numerous calculations with astronomical and other data which supported the normal approximation.

Today, we know that this statistics of the turning rate is exact only for white noise. We can use Bienaymè’s discovery for a test of serial dependence, see Sec. VII C. The difference between independent and random (“non classèes arbirairement”) was not yet known, and Student $t$ test was published 30 years later.

### D. Why patterns work well today?

In the century of computers and big data, the boring job of counting patterns is done by machine. Huge amounts of data require simple methods for screening, classification, and evaluation. For time series data, order patterns can be simply applied as machine learning, with the additional advantage that we understand them.

Nonlinear distortion of the data, low-frequency perturbations, and outliers have little influence on order pattern methods. Intervals of missing data can be overcome: already Bienaymè collected his turning rate from several distinct short time series. No preprocessing of data is required. Order methods are fast and easy to implement.

Modern sensors measure in tricky indirect ways. Fluctuations of voltage indicate brain activity, carbon dioxide concentration, or sea level. Changes in the environment are adjusted automatically, as a camera adapts to changing light. As a consequence, comparison of observed values is often more meaningful than the actual values themselves.

The most important advantage, however, is that *minimal stationarity assumptions* are required. Statistics is based on sampling, for instance repeated measurements. Many time series are observed only once, so we have to sample over time points. This requires that the rules of the game do not change with time. Assuming a stationary Gaussian process for instance means that all multidimensional dependencies among values reduce to correlations and do not change in time. This is never fulfilled in practice, so that perfect theoretical methods need not give nice results. Assumptions for order patterns are less restrictive and more likely true for data.

### E. Work on statistics of patterns

Having stressed practical merits of order patterns, we now regret the lack of a theory. There are some beginnings. Bandt and Shiha studied order pattern frequencies in Gaussian processes. Sinn and Keller^{7} were the first to study estimators for pattern frequencies. They proved strong consistency and asymptotic normality for Gaussian processes. Their work was extended by Betken *et al.*^{8} to long-range dependent processes and by Weiß^{16} for independent variables. Tests for serial dependence, and related tests for $m$-dependence and structural breaks, were suggested by many authors, see Refs. 12–17,25 and the literature quoted there. On the whole, however, classical stochastic processes and order patterns do not fit well. Probably, we must develop a combinatorial time series analysis, as indicated by Keller and Sinn.^{26,27}

## II. PATTERN FREQUENCIES

### A. Basic definitions

A permutation $\pi $ is an ordering of $m$ objects. In our case, the objects are consecutive time points. It is natural to start with the mathematical definition that a permutation is a one-to-one mapping from the set ${1,2,\u2026,m}$ onto itself. A time series is also a mapping, assigning to time points $t=1,\u2026,T$ values $ x t=x(t).$ We write $ \pi k$ for $\pi (k)$ and $\pi = \pi 1 \pi 2\cdots \pi m.$ Thus, $\pi =231$ means $\pi (1)=2,\pi (2)=3,\pi (3)=1.$ The time series 2,3,1 shall represent the permutation 231.

The graph of a permutation $\pi $ represents an order pattern, see Fig. 2. Any pattern that shows the same comparisons between a series of $m$ numbers or objects is considered equivalent to $\pi .$ Thus, $(0,6,\u22122)$ and $(0.199,0.2,\u22124)$ are equivalent to 231. We consider $\pi $ as the name and standard form of an equivalence class of order patterns.

Many authors define the permutation 231 by the relation $ \pi 2< \pi 3< \pi 1.$ They read the values on the $x$ axis from bottom to top while we read them on the $y$ axis from left to right. Note that this notation is counterintuitive since the time series $(3,1,2)$ then represents the pattern 231, and the time reversed series $(2,1,3)$ does not correspond to the time-reversed pattern 132. However, Fig. 2 shows the only case for $m=3$ where the two notations differ.

In this paper, we study only equally spaced time points $t<t+d<t+2d<\cdots <t+(m\u22121)d.$ We call $m$ the length and $d$ the delay (or lag) of the pattern. We always take $m<7$ and mainly consider $m=3$ and $m=4.$ However, we vary $d$ as much as possible.

The time series $x=1,2,0,1.5,\u22121,3$ in Fig. 2 shows patterns 231, 312, 231, 213 for delay 1, and 321, 213 (dotted line) for delay 2.

^{32}

### B. Assumptions and estimates

We are not interested in the time series for its own sake. We rather want to know the laws of an underlying mechanism which produced this time series and which could also produce many similar time series by random variation. It is custom to represent this mechanism by a sequence $X= X 1, X 2,\u2026$ of random variables on a probability space $(\Omega ,P).$ We call $X$ a stochastic process and imagine that the time series $x$ is a random choice from the ensemble $X.$

Since we want to find the laws of $X$ from one single time series, we must assume that the statistics of values and dependencies of the $ X k$ do not change in time. Usually, $X$ is required to be stationary. That is, the $m$-dimensional distribution of $( X t, X t + 1,\u2026, X t + m)$ does not depend on $t,$ for all $m.$ This is a fine starting point for theory but never fulfilled in reality. Weak stationarity says that mean $M( X t)$ and $Cov( X t, X t + k)$ do not depend on $t$.^{28} Even this need not be true in practice, for instance in EEG measurements. We make a less stringent assumption.

**Order stationarity.**For a given pattern $\pi $ with length $m$ and delay $d,$ let

*order stationary for patterns of length $m$*if $ P \pi (d)$ is the same value for all $t$. This should hold for all patterns $\pi $ of length $m$ and all delays $d=1,\u2026, d max$. We take small $m,$ usually 3 or 4, and choose $ d max$ much smaller than the length $T$ of our time series. Then, there is hope that such order stationarity is fulfilled in reality, at least approximately. Note that stationary processes and processes with stationary increments (cf. next section) are order stationary for all $\pi ,m,d$.

$ P \pi (d)$ are the parameters of the process which we would like to determine. The numbers $ p \pi (d)$ are the concrete frequencies obtained from our data series $x$. It is natural to take $ p \pi (d)$ as an estimate of the unknown probability $ P \pi (d).$ This naive estimator is unbiased, and for Gaussian^{7,8} and $m$-dependent processes^{15,16} strongly consistent and asymptotically normal. The estimation error, that is, the variance of the estimator, can only be determined for more specific models.

**Spatial and time reversal.** Time reversal transforms the series $ x 1 x 2\cdots x T$ and permutation $ \pi 1 \pi 2\cdots \pi m$ into the series $ x T\cdots x 2 x 1$ and permutation $ \pi m\cdots \pi 2 \pi 1,$ respectively. An order stationary process is called symmetric under time reversal if the frequency of any permutation $\pi $ agrees with the frequency of its time-reversed permutation. This concept,^{7} tightly related to the well-known concept of reversible process,^{28} was recently studied by Martínez *et al.*^{29} and by Zanin *et al.*^{30} Similarly, the time series $ x 1 x 2\cdots x T$ can be flipped at the horizontal line $y=C/2$ resulting in $(C\u2212 x 1)(C\u2212 x 2)\cdots (C\u2212 x T),$ for any constant n $C.$ For the permutation $ \pi 1 \pi 2\cdots \pi m,$ the spatial reversal as introduced by Sinn and Keller^{7} is $(m+1\u2212 \pi 1)(m+1\u2212 \pi 2)\cdots (m+1\u2212 \pi m).$ For instance, 1342 has spatial reversal 4213 and time reversal 2431. An order stationary process is called symmetric under spatial reversal if the frequency of any permutation $\pi $ agrees with the frequency of its spatially reversed permutation.

### C. Independence model and random walk model

There are essentially only two random processes for which we know the probabilities $ P \pi (d)$ rigorously. The first one is a sequence of independent and identically distributed random variables $ X 1, X 2,\u2026$ which is called an i.i.d. process, and white noise in the case of normal distribution. This process induces the equidistribution on patterns: $ P \pi (d)=1/m!$ for all patterns $\pi $ of length $m$ and for all delays $d.$ The reason is that an i.i.d. process is exchangeable: $( X 1, X 2,\u2026, X m)$ has the same distribution as $( X \pi 1, X \pi 2,\u2026, X \pi m)$ for any permutation $\pi $.

**Note.** The probability distribution of the values, which has nothing to do with order patterns, need not be specified. For simulations, we take standard normal distribution. The class of exchangeable processes theoretically also contains mixtures of several i.i.d. processes. However, for an ergodic stationary process, Bandt and Shiha^{6} proved that the distribution of the values $ x t$ plus all pattern frequencies completely describes all distributions of vectors $( x t, x t + 1,\u2026, x t + m)$ of the process. Thus, order structure and one-dimensional distribution are two complementary parts which together characterize a stationary process.

An i.i.d. process is self-similar in the sense that $( X 1, X 2,\u2026, X m)$ and $( X d, X 2 d,\u2026, X m d)$ have the same distribution. Thus, $ P \pi (d)$ does not depend on $d.$ Actually, this seems the only stationary process for which $ P \pi (d)$ is known for arbitrarily large $d$s.

White noise represents the null hypothesis for testing independence of the variables $ X k,$ see Sec. VII C. It should be mentioned that although $ X k$ are independent, the events $ X k< X k + 1$ are dependent. Under the condition $ X 1< X 2<\cdots < X k$, we have $P{ X k< X k + 1}=1/(k+1)$, while in the mean, this probability is 1/2.

The second process is *symmetric random walk* where we set $ X 0=0$ and let the increments $ I k= X k + 1\u2212 X k$ be independent random variables with identical distribution for $k=0,1,\u2026$ with density function fulfilling $\phi (x)=\phi (\u2212x).$ This is a non-stationary process with stationary and independent increments $ I k.$ In the case of Gaussian distribution, the process is self-similar, so that $ P \pi (d)$ does not depend on $d.$ This case will be called Brownian motion.

From an ordinal viewpoint, it is not appropriate to consider random walk just as a cumulative sum of an i.i.d. process. For example, geometric Brownian motion $ Y k=exp\u2061( X k)$ has exactly the same order structure and is not a cumulative sum. It is a standard model for data from financial markets. Gaussian random walk also represents a null hypothesis.^{21}

^{15}Then,

For length 4, pattern probabilities depend on the distribution of the increments. For Gaussian random walk they were calculated in Ref. 6. For length $\u22655,$ they can be expressed as multidimensional integrals which have no analytic solution. Elizalde and Martinez^{31} studied classes of permutations which have the same pattern frequencies for random walks with arbitrary distribution.

Pattern probabilities have also been determined for moving average and autoregressive processes, mostly by simulation. See Refs. 6 and 15 and Table I.

Noise . | β
. | τ
. | γ
. | δ
. |
---|---|---|---|---|

Normal | 0 | 0.086 | 0 | 0 |

Uniform | 0 | 0.083 | 0.042 | 0 |

Bernoulli | 0 | 1/6 | 1/6 | 0 |

Triangular | −0.074 | 0.088 | 0.026 | 0.048 |

exponential | 0.161 | 0.107 | 0.002 | −0.095 |

Noise . | β
. | τ
. | γ
. | δ
. |
---|---|---|---|---|

Normal | 0 | 0.086 | 0 | 0 |

Uniform | 0 | 0.083 | 0.042 | 0 |

Bernoulli | 0 | 1/6 | 1/6 | 0 |

Triangular | −0.074 | 0.088 | 0.026 | 0.048 |

exponential | 0.161 | 0.107 | 0.002 | −0.095 |

**Open problems.** Both stationary processes and processes with stationary increments are order stationary. Do pattern probabilities indicate whether a process is stationary or has stationary increments? Find examples of order stationary processes which are neither stationary nor have stationary increments. Does order structure plus (one-dimensional) distribution of increments completely describe a process with stationary increments?

The condition $ p 12=1$ characterizes an increasing series. This condition can be realized by processes with stationary increments where the distribution of increments is concentrated on positive numbers. The conditions $ p 231= p 312= p 321=0$ describe a similar but slightly a more complicated case.

## III. STATISTICS OF PATTERNS OF LENGTH 3

When we have the frequencies $ p \pi $ for a time series $x,$ we can combine them to calculate the permutation entropy $H$ which says something about the nature of $x.$ Some comments on $H$ are given in Sec. V below. It is also possible to evaluate the frequencies $ p \pi $ separately. This may reveal more specific information. However, the information content of $ p \pi $ for different $\pi $ may overlap. Our intention is to obtain parameters which provide disjoint and interpretable information. We discuss here just the simplest case $m=3.$

### A. Why short patterns?

There are three reasons for considering only patterns of length up to 5.

Estimation: statistical estimates of pattern frequencies would not be accurate for patterns of length $\u22656.$ Already for the 120 patterns of length 5, one needs time series with several thousand values for a reasonable estimate.

Interpretation: it is difficult to give a meaning to longer patterns.

Assumptions of stationarity are more restrictive for longer patterns. We have no experience and intuition for such multidimensional dependencies in random processes.

### B. Amazingly large correlation of pattern frequencies

It is important to understand that in contrast to the values of a time series, pattern frequencies are “always” asymptotically normally distributed. We only have to assume that the dependence between occurence of patterns at two time points $s,t$ decreases fast when the time difference $ |s\u2212t |$ increases. A special case treated in Refs. 13 and 15 is ordinal $\u2113$-dependence which says that the patterns become independent if $ |s\u2212t |>\u2113.$ This assumption is true for white noise, random walks and MA-processes, and can be expected to hold in reality for a large $\u2113.$ Under such an assumption even the combination of all six pattern frequencies will approximately follow a multivariate normal distribution. This follows from central limit theorems, cf. Refs. 15 and 16. We shall not rely on asymptotic statements which contain no quantitative estimates. But they are good for orientation. Since correlations describe all dependencies between variables of a multivariate normal distribution, we must study covariances of pattern estimators.

^{13}recently confirmed by de Sousa and Hlinka

^{15}and Weiss

^{16}show that asymptotic variances and covariances scale with $1/T,$ and are different for different stochastic processes. The exact form of the covariances is $a/T+b/ T 2,$ but the second term will not be discussed. It is negligible if $T\u2265200$. Here are the asymptotic covariance matrices of the $ p \pi $ for an i.i.d. process

^{13}and for symmetric random walk

^{15}which we need later. The six rows and columns correspond to the six permutations 123, 132, 213, 231, 312, and 321.

### C. Brief account of Elsinger’s theorem

Although there are three excellent presentations^{13,15,16} to which we refer, we add a few remarks on the matrices $\Sigma $. The covariance of random variables $U,V$ is defined as $ Cov (U,V)=M(UV)\u2212M(U)M(V)$, where $M$ denotes the mean. Covariance measures linear dependence and is zero for independent variables. For two patterns $\pi $ and $\kappa ,$ we let $U=T\u22c5 p \pi $ and $V=T\u22c5 p \kappa $ the absolute frequencies of the patterns in time series of length $T$. The values of $U$ and $V$ are determined for all time series of length $T$ which are generated by a certain model—for white noise we would take all permutations of length $T$ with equal probability. Then, the covariance is determined as a mean.

^{13,15,16}For white noise, we only need all permutations of length 4 and 5, for random walk just length 4. If we replace the factors $T\u22122\u2212k$ by $T$ (this is the only reason to call the resulting formula asymptotic) we get $ Cov(U,V)=T\u22c5C$ for some rational number $C.$ Since $ p \pi =U/T$ and $ p \kappa =V/T,$ this implies $ Cov ( p \pi , p \kappa )=C/T$. This explains the scaling factor $1/T.$ Performing the calculation for all pairs of patterns, we obtain the above matrices. Our sketch of proof shows that the correlations are only due to the effects of neighboring patterns.

### D. Four degrees of freedom

### E. The pattern contrasts for length 3

*et al.*

^{35}Usually, certain differences of pattern frequencies, which we call contrasts, are more informative than $ p \pi $ themselves. They were introduced in Refs. 6, 19, 21 , and 32 as a kind of autocorrelation functions of $d=1,2,\u2026$ which describe the behavior of data, for instance periodicities, short-time dependence or self-similarity. This practically important aspect is not studied here. We have $d=1$ and focus on showing that the pattern contrasts form an ideal orthogonalized system of patterns, both from an algebraic and a statistical viewpoint. Let us now introduce the pattern contrasts which were illustrated in Fig. 1. “Contrast” is a basic concept of the analysis of variance, contained in every advanced textbook of statistics and in the special introduction.

^{36}It means “weighted sum of frequencies” and is used here in exactly the same spirit. It is actually a difference since the sum of weights is usually zero.

^{6,32}

^{23}$\tau $ or $\alpha $ is the most important pattern contrast, and both versions are useful. The turning rate has the advantage of clear interpretation.

To give an idea of the size of the parameters, we simulated AR(1) processes $ X t= 1 2 X t \u2212 1+ \epsilon t$ with different distributions of the i.i.d. noise $ \epsilon t$. The length $T$ does not matter when it is larger than 1000. For symmetric distributions, we have $\beta =\delta =0$. However, $\gamma $ is positive even for uniform noise although such a minor effect is not visible in small sample series. (To distinguish it from normal noise with error probability $5%$, let $T=3000$. Then 98% of the samples with uniform noise will be recognized correctly.) Bernoulli noise means $ \epsilon t=\xb11$ with probability $ 1 2$. This model can be treated rigorously. Triangular noise was simulated as minimum of two uniform random numbers $ u 1, u 2,$ and exponential noise as $1+log\u2061u.$

## IV. ORTHOGONAL AND INDEPENDENT CONTRASTS

### A. Pythagoras’ theorem

^{20}can be defined for arbitrary $m,$ summing over the set $ S m$ of patterns of length $m.$ Here, it is used for $m=3.$ It is a version of permutation entropy as explained in Sec. V.

#### (Pythagoras’ theorem for pattern contrasts)

**(Pythagoras’ theorem for pattern contrasts)**

*(5)]*fulfils

^{32}

### B. Independence of estimators

*symmetric random walk,*it turns out that all our four pattern contrasts are uncorrelated! Their variances are

#### (Independence of pattern contrasts for symmetric random walk)

**(Independence of pattern contrasts for symmetric random walk)**

For the covariance matrix $T\u22c5 \Sigma B$ of pattern frequencies of symmetric random walk given in (7), the vectors of the four pattern contrasts $\beta ,\tau ,\gamma ,\delta $ are exactly the eigenvectors of the nonzero eigenvalues $ 1 2, 3 16, 1 6$, and $ 1 12.$ Their variances are given in (19).

The proof is a simple calculation, for instance $ \beta \xaf \Sigma B= 1 2 \beta \xaf.$ Note that eigenvectors are determined only up to a real constant, and eigenvector programs yield unit vectors. Our pattern contrasts are not unit vectors. Their variance is determined by $x \Sigma B x \u2032,$ or $\lambda |x | 2$ when we use (16).

*i.i.d. process*is a bit more complicated. By matrix multiplication with $ \Sigma W$, we get the (asymptotic) variances

^{16}determined the nonzero eigenvalues of $T\u22c5 \Sigma W$ as $ \lambda 1= 1 12(2+ 2), \lambda 2= 2 15,T\u22c5 \lambda 3= 1 10,$ and $ \lambda 4= 1 12(2\u2212 2).$ Note that $ \lambda 1$ and $ \lambda 4$ belong to one quadratic factor of the characteristic polynomial. Matlab yields just the numeric eigenvalues $0,0,0.049,0.1,0.133,0.285$ and the corresponding eigenvector columns,

The first two columns are linear combinations of the constraint vectors $ c 1$ and $ c 2.$ The columns 4 and 5 are the unit vectors of $ \gamma \xaf$ and $ \tau \xaf,$ respectively. Thus, $\gamma $ has variance $ 1 10\u22c54= 2 5,$ and $\tau $ has variance $ 2 15\u22c5 4 3= 8 45,$ as Bienaymè knew. The columns 3 and 6 corresponding to $ \lambda 4$ and $ \lambda 1$ are connected with $\beta $ and $\delta .$ Their sum is $\u2212 \beta \xaf,$ and their difference is $ \delta \xaf/ 2.$ So the eigenvectors are $\u2212 1 2 \beta \xaf\xb1 1 2 \delta \xaf/ 2.$ These are the orthogonal and uncorrelated unit vectors of $T \Sigma W,$ and they are unique since the nonzero eigenvalues are all different.

It would be difficult to interpret these linear combinations, however. We stick to $\beta $ and $\delta ,$ which are represented by orthogonal vectors but have negative correlation (21). (Another option would be to change $\delta $ for $ \beta 2=\delta +\beta $ which is uncorrelated with $\beta $ and easy to interpret. But the vectors for $\beta $ and $ \beta 2$ are not orthogonal, and Pythagoras’ theorem then includes a mixed term.)

#### (Independence of pattern contrasts of an i.i.d. process)

**(Independence of pattern contrasts of an i.i.d. process)**

For the covariance matrix $T\u22c5 \Sigma W$ of pattern frequencies of white noise given in (6), the vectors of pattern contrasts $\tau $ and $\gamma $ are eigenvectors of the eigenvalues $ 2 15$ and $ 1 10.$ The other two eigenvectors of nonzero eigenvalues generate the same plane as $ \beta \xaf$ and $ \delta \xaf$. We choose $\beta $ and $\delta $ instead of eigenvectors, for better interpretation, and must take into account the strong negative correlation (21). All other pairs of the four estimators are uncorrelated.

**Open problem.** We have shown that for patterns of length 3, there is a canonical choice of four pattern contrasts. What about patterns of length 4?

## V. PERMUTATION ENTROPY

Pattern frequencies and contrasts are lineaar functions. The normal approximation applies to their statistical estimates.^{26} In this section, we discuss statistical estimation of the permutation entropy $H$ which is a nonlinear function. This results in biased estimates. At its maximum, however, $H$ has a special statistics which can be applied to serial independence tests,^{12,14} with great success in the EEG study in Sec. VII even though they are not based on the $ \chi 2$-distribution.^{13}

### A. Permutation entropy at its maximum

^{34}Note that the scale for $H$ is nonuniform. For a period 2 sequence, like $ x t= ( \u2212 1 ) t/t=\u22121, 1 2,\u2212 1 3,\u2026$ with $ P 132= P 312= 1 2$, we have $ H 3=log\u20612=0.69,$ and for period 3 with $ P 123= P 231= P 312= 1 3$, we get $ H 3=log\u20613=1.10.$ Thus, $H$ works very well in detecting periodicities and distinguishing small periods and other dynamical phenomena.

^{3}On the other hand, noisy time series, coming from sources like EEG or geophysical measurements, all have entropy very near to the maximum. For symmetric random walk, (5) yield $ H 3= 5 2log\u20612=1.7345$ while $ H 3 max=log\u20616=1.7918$. Nevertheless, random walk is fundamentally different from white noise.

For a check of (23), take symmetric random walk with $ H 3(p)=1.7345.$ The Taylor approximation, calculated as $ H 3(p)\u2248log\u20616\u22123 \Delta 2=1.7918\u22121/16=1.7293,$ coincides with two decimals accuracy.

### B. Simulated fluctuations of entropy estimates

We proved that near the i.i.d. process, $ H max\u2212H$ is the quadratic quantity $ m ! 2 \Delta 2$. For all other processes, the Taylor approximation is first order and the deviations are linear. This has consequences for statistical fluctuations of $H$ in sample series of different lengths $T$.

For Fig. 3, we simulated 100 000 time series of different lengths $T$ and determined the distribution of $ H 4 max\u2212 H 4.$ The true value is 0 for white noise and 0.194 for Brownian motion. This does not agree with the mean of the distributions. There is considerable bias in the statistical estimates. The bias is always to the right, to larger deviations from $ H max$. It scales like $1/T$ with the sample size.

For white noise, the bias is easy to explain. The sample frequencies $ p \pi $ are never all equal, so the entropy of the sample series cannot reach the value $ H max$. Due to the convexity of the function $ H max\u2212H,$ we have such bias also for other processes. Table II shows that the bias is divided by two when we double the sample length $T.$ This observation helps to extrapolate the true value and correct the bias.

T
. | 100 . | 200 . | 400 . | 800 . | Limit . |
---|---|---|---|---|---|

Mean | 0.320 | 0.254 | 0.224 | 0.209 | 0.194 |

Bias | 0.126 | 0.060 | 0.030 | 0.015 | 0 |

Std.dev. σ_{T} | 0.0974 | 0.0648 | 0.0437 | 0.0304 | 0 |

σ_{T}/σ_{2T} | 1.50 | 1.48 | 1.44 | … | $ 2$ |

T
. | 100 . | 200 . | 400 . | 800 . | Limit . |
---|---|---|---|---|---|

Mean | 0.320 | 0.254 | 0.224 | 0.209 | 0.194 |

Bias | 0.126 | 0.060 | 0.030 | 0.015 | 0 |

Std.dev. σ_{T} | 0.0974 | 0.0648 | 0.0437 | 0.0304 | 0 |

σ_{T}/σ_{2T} | 1.50 | 1.48 | 1.44 | … | $ 2$ |

The estimates for white noise are much more accurate than for any other process, due to its special Taylor expansion. For $T=800$, for instance, Fig. 3 shows estimates between 0 and 0.04 for white noise and between 0.1 and 0.32 for Brownian motion. Actually, the standard deviation of the distribution scales with $1/T$ for white noise and with $1/ T$ for other processes.

The scaled deviations $Z=T\u2217( H max\u2212H)$ in Fig. 4 illustrate the $1/T$-scaling of the statistical fluctuations of white noise. There is good agreement for different $T,$ and there is a limit density for $T\u2192\u221e$. It is calculated in Corollary 3.2 in Weiss^{16} which extends to $m>3$ when corresponding eigenvalues are determined.

For $m=3$, and in the case $m=4$ for $T=100$, the density functions look irregular. This is not statistical inaccuracy. The density did not change when the number of simulations was increased or decreased. The density did change, however, when $T=100$ was replaced by 99 or 101. The irregularities are caused by the fact that absolute frequencies of patterns are relatively small integers - so there are not so many possible frequency combinations. To avoid such effects, just work with $m=4$ and $T\u2265200$.

### C. Tail probabilities and quantiles

When we know the distribution of $Z$ in an i.i.d. process, in particular the $p$-values or tail probabilities $p(z)=P(Z\u2265z),$ we can perform a statistical test of serial dependence. Such tests are very important for time series analysis. Whenever a model for a time series fits the data, the residues—theoretical minus observed values—should be indistinguishable from i.i.d. random numbers. An overview of the multitude of serial dependence tests can be found in Matilla-Garcia and Ruiz Marin.^{12}

^{12}suggested to take permutation entropy, more precisely the values $2Z,$ as criterion for a serial dependence test. Unfortunately, the authors wrongly claimed that under the null hypothesis of an i.i.d. process, $2Z$ follows a $ \chi 2$ square distribution with $m!\u22121$ degrees of freedom. Actually, $Tm! \Delta 2$ fulfils a well-known formula for the $ \chi 2$-test

^{13}pointed out that besides degrees of freedom, other strong corrections are necessary due to correlation and different variances of pattern frequencies, see Sec. III E. In Refs. 14 and 25, it was pointed out that $ \chi 2$ holds when we appropriately restrict the index set of data. Following Elsinger,

^{13}two different tests were worked out by Sousa and Hlinka

^{15}and Weiss.

^{16}The main tool in both cases is principal axes analysis, see Sec. IV. Weiss

^{16}determines the limit distribution as a convolution of multiples of $ \chi 1 2$ for $m=3.$ His Corollary 3.2 extends to $m>3$ when corresponding eigenvalues are determined. Elsinger

^{13,15}uses a transformation to standard multivariate normal distribution and a classical $ \chi 2$ statistics.

Such asymptotic results are important for our understanding. However, they do not tell us what happens for a particular $T$. For tests of serial dependence in Sec. VII C, we include Fig. 5 and the critical quantiles ( $z$-values with $P(Z\u2264z)=0.95,$ say) in Table III, obtained by direct simulation (100 000 samples for each $T$). For $H=3$, the empirical formula $p= e \u2212 2 z / 3$ works well even for a rather large $z,$ as Fig. 5 shows. The table for $ H 3$ agrees with the table for $2Z$ in p. 21 of Ref. 13.

Sample size . | 95% . | 99% . | 99.9% . |
---|---|---|---|

T = 800 | 4.46 | 6.81 | 10.36 |

T = 400 | 4.47 | 6.82 | 10.37 |

T = 200 | 4.49 | 6.91 | 10.39 |

e^{−2z/3} | 4.49 | 6.91 | 10.36 |

$ \chi 4 2(2z)$ | 4.74 | 6.64 | 9.24 |

Sample size | 95% | 99% | 99.9% |

T = ∞ | 17.22 | 21.55 | 27.48 |

T = 800 | 17.29 | 21.63 | 27.56 |

T = 400 | 17.39 | 21.75 | 27.69 |

T = 200 | 17.64 | 22.06 | 28.08 |

T = 100 | 18.40 | 22.89 | 28.88 |

Sample size . | 95% . | 99% . | 99.9% . |
---|---|---|---|

T = 800 | 4.46 | 6.81 | 10.36 |

T = 400 | 4.47 | 6.82 | 10.37 |

T = 200 | 4.49 | 6.91 | 10.39 |

e^{−2z/3} | 4.49 | 6.91 | 10.36 |

$ \chi 4 2(2z)$ | 4.74 | 6.64 | 9.24 |

Sample size | 95% | 99% | 99.9% |

T = ∞ | 17.22 | 21.55 | 27.48 |

T = 800 | 17.29 | 21.63 | 27.56 |

T = 400 | 17.39 | 21.75 | 27.69 |

T = 200 | 17.64 | 22.06 | 28.08 |

T = 100 | 18.40 | 22.89 | 28.88 |

## VI. THE DESCRIPTIVE POWER OF TURNING RATE

### A. Order patterns and EEG

EEG brain signals are probably the widest field of application of order patterns. There is a well-developed methodology of EEG signals based on Fourier spectra and graphic elements, which is now combined with machine learning because of the size of data sets, see the comprehensive survey.^{37} On the other hand, the data contain a lot of different artefacts which make spectral methods difficult. Order patterns apply most successfully to such really big and “dirty” data sets.

Karsten Keller applied symbolic analysis as early as 2003 to EEG of epileptic children,^{38} to assess the results of therapy performed by Lauffer. Larger permutation entropy indicated higher vigilance of the brain and thus a therapeutic success. He came back several times to such applications.^{40,41} Epilepsy is the most striking application of entropy methods in EEG.^{15,39,42} Other studies concern effects of anesthetic^{43} and psychotropic^{44} drugs, or Alzheimer’s disease.^{45}

We consider data from sleep research^{46} which we studied before^{20} with permutation entropy and $ \Delta 2$. It turns out that sleep analysis is still simpler with turning rate where we count only maxima and minima. This is a new application of pattern contrasts as a function of time $t,$ not of the delay $d,$ in the setting of long data series. Turning rate will be used to compress the data, to classify sleep stages, and to find slow oscillations which are hardly accessible by conventional methods.

### B. The data set of Terzano *et al.*

We take the EEG sleep data of Terzano *et al.*^{46} available on Physionet.^{47} Although 20 years old, they are of excellent quality. There were several healthy volunteers which we shall study. EEG were sampled with 500 Hz and a decent hardware low-pass filter at 30 Hz. We shall take delay $d=4,$ that is 8 ms. In this range, most EEG are totally disturbed by electrical noise from the power grid which is unavoidable in a clinical environment. It is a common practice to filter away the “mains hum,” which works well for conventional methods but can destroy the fine order structure. The Terzano data did not require any preprocessing.

With 15 million values in each EEG channel (eight hours sampled with 500 Hz), these data are big time series as nowadays observed in so many fields. They must be screened automatically. Order patterns are an appropriate tool.

### C. Turning rate recognizes sleep stage

Figure 6 shows a sleep EEG of a healthy person. In the upper part, you see the annotation of sleep stages by a medical expert, with one value for every epoch of 30 s. Experts used various channels, among others eye trackers for REM phases. Their main goal was to find and analyze “cyclic alternating patterns.” Annotation was an arts, with a comprehensive catalog of rules. Now, it is done by machine,^{37} cheaper than experts.

In the lower part, two special EEG channel differences can be seen. Both functions are almost parallel to the annotation. However, the two functions do not show the EEG values. They show the turning rate for $d=4.$ Thus, a simple count of maxima and minima on a time scale of milliseconds will classify sleep stages almost as efficiently as the study of various waves, graphical elements and phenomena in non-EEG channels by qualified experts, or expert systems, in a frequency range between 20 and 0.2 Hz. In Fig. 7, three very short parts of the time series demonstrate how the number of maxima and minima decreases in sleep.

The 15 million data of the very irregular original time series were compressed to a series of 1000 turning rates by analyzing epochs of 30 s. With $T=15000$ values per epoch, the estimates of turning rate are fairly reliable even in the presence of small artefacts. As a result, we obtain a new time series $ \alpha 1,\u2026, \alpha 1000$ of turning rates drawn in Fig. 6. No preprocessing, no filtering, no artefact removal.

Figures 6–8 are taken from an unpublished preprint^{48} where more than 10 persons from Terzano *et al.*^{46} were studied, some with sleep disorders. In spite of big differences between subjects, turning rate was in all cases tightly connected with sleep stage. It seem to us that turning rate could even be used to *define sleep depth.* That would be better than a catalog of diverse rules. Of course, this requires further study since $\alpha $ does not only depend on sleep stage, on $d$ and on the person, but also on measuring equipment, filtering options of hardware etc. REM phases of dream are more difficult to classify than sleep depth. Figure 6 shows that the frontal channel Fp2F4 has higher turning rates in REM than the central channel C4P4 while outside REM it is the other way round. This effect was not seen for all persons.

### D. Infra-slow oscillations detected by turning rate

So far, we took disjoint 30 s epochs to determine turning rates. For more detail, we shift the 30 s windows only by 1 s. Equivalently, we can calculate turning rates for 1 s segments and then apply a moving average with length 30, as shown in Fig. 7. The new time series has 30 000 values $ \alpha t$ instead of 1000 and is fairly smooth.

We found clear oscillations of the turning rate with a period of about 1–2 min, at the onset of sleep, in light sleep (stage S2), and in the morning before getting awake. Such infra-slow frequencies are totally beyond the scope of conventional Fourier analysis. See Fig. 7 for our person n5, and Fig. 8 for four other healthy persons. Both channels (and further channels not presented) show almost synchronous oscillations. This may indicate a slow biorhythm from a central source and deserves further study.

## VII. SOME REMARKS ON EEG DATA

### A. Dependence of turning rate on the delay

In the previous section, we had fixed $d=4$ and considered $\alpha $ as a function of time. Now, we ask how the $\alpha $-functions vary when we change $d$. Fig. 9 illustrates the effects, for $d$ between 2 and 32 ms on the left, and for 64 ms up to 1 s on the right. Actually, the $\alpha $ functions look very similar for most $d,$ and they would agree still better when we take $d=3,\u2026,12,$ for instance. Thus, we have a strong self-similarity in the order of EEG data.

For $d=1$ and $d=2$, the number of maxima and minima is very small. This comes from the hardware low-pass filter which smoothes the measured values, and will be similar for any physical measurement. The values of turning rate further increase with increasing $d,$ at least up to $d=32$. This is quite natural. In Fig. 7 we marked the maxima for $d=1.$ For $d=4$, there are much more maxima.

With increasing $d$, the curves become more noisy. (This can be improved by taking windows according to “zero-crossing.”) The turning rate for $d=256$ and $d=512$ is near $2/3,$ the value for white noise. Here, we have an effect of the high-pass filter which is also necessary for any physical measurement. It guarantees that the measured values do not move too far away from the zero line, as would be the case for random walk. The high-pass threshold was set to 0.3 Hz by Terzano *et al*. When values move away for one second, the filter tends to get them back in the next second. In this way, the self-similarity, present in models like Brownian motion, and apparently also in EEG, will be damped by filters on both sides in experimental data.

### B. Pattern contrasts in the EEG data

After studying turning rate for fixed $d$ and sliding $t$-windows, we now consider autocorrelation functions, with $d=1,\u2026,500$ for fixed time epochs of 30 s. We now consider persistence $\tau (d)$ which resembles autocorrelation. Due to the low-pass filter, the time series is rather smooth on small scale and $\tau (1)$ is larger than 0.5, near to the maximal value $2/3.$ For white noise the value would be zero, for Brownian motion $ 1 6$. Probably, the first values of $\tau $ must be adjusted so that different measuring devices and hardware filters can be compared. There are no studies in this direction.

For a small $d,$ persistence rapidly declines and reaches a minimum, in the first plot of Fig. 10 near $ d min=10,$ in the other plots near $ d min=20.$ This parameter is locked, the minimum was never at $d=15$, say. The persistence at the minimum depends on the sleep stage and can be between 0 and 0.3. For $d\u2264100,$ there were very few negative values of $\tau $.

Right of $ d min$, the curve increases and reaches a maximum at $2 d min.$ In many curves, a second minimum at $3 d min\u2248$ is visible. This indicates that we have here the so-called $\alpha $-rhythm, with frequency 12.5 Hz, corresponding to wavelength 40 ms. This rhythm is always seen in occipital channels, and we had the frontal channel Fp2F4. Here, it is apparent in the morning and did disappear most of the night, which is not shown in Fig. 10. In deep sleep $\tau $ assumed large values up to $d=400$. Thus, we can classify this sleep stage also with large $d.$ We had chosen $d=4$ before the first minimum, because the curves do not vary so much in this range.

Each of the plots in Fig. 10 refers to 40 successive epochs, corresponding to 20 min of measurement. The $\tau $-curves seem to be rather consistent and interpretable. This was not true for the other contrasts $\beta ,\gamma ,\delta $. Although they often show significant positive and negative values (see Table IV), the curves did not show consistent structure. They started near zero and looked like random walk. It seems that these contrasts express the numerous artefacts in EEG recordings. They may have some meaning, but not as average over windows of 15 360 values. For instance, the cyclic alternating patterns which were the subject of the EEG study,^{46} consist of an increasing oscillation and a decreasing oscillation. In our average of $\gamma $ over 30 s, the two will cancel out.

Test parameter . | log24 − H_{4}
. | log6 − H_{3}
. | τ
. | β
. | γ
. | δ
. | τ_{4}
. | β_{4}
. | |
---|---|---|---|---|---|---|---|---|---|

Confidence level % | 95 | 99.9 | 95 | 95 | 95 | 95 | 95 | 95 | 95 |

H_{0} accepted % | 0 | 0.002 | 0.5 | 6.7 | 24 | 36 | 32 | 9.2 | 22 |

Significantly larger | 100 | 99.998 | 99.5 | 80 | 30 | 40 | 37 | 73 | 30 |

Significantly smaller | … | … | … | 14 | 46 | 24 | 31 | 17 | 49 |

Test parameter . | log24 − H_{4}
. | log6 − H_{3}
. | τ
. | β
. | γ
. | δ
. | τ_{4}
. | β_{4}
. | |
---|---|---|---|---|---|---|---|---|---|

Confidence level % | 95 | 99.9 | 95 | 95 | 95 | 95 | 95 | 95 | 95 |

H_{0} accepted % | 0 | 0.002 | 0.5 | 6.7 | 24 | 36 | 32 | 9.2 | 22 |

Significantly larger | 100 | 99.998 | 99.5 | 80 | 30 | 40 | 37 | 73 | 30 |

Significantly smaller | … | … | … | 14 | 46 | 24 | 31 | 17 | 49 |

### C. Testing serial dependence of EEG

It is often necessary to test a time series against the null hypothesis that the values are from an i.i.d. process. In particular, any model of a process aims to explain the data in such a way that the residues, differences between observed and model values, cannot be distinguished from independent random numbers. As mentioned in Sec. V C, such tests on the base of order patterns were suggested by Matilla-Garcia and Ruiz Marin,^{12,25} improved by Elsinger,^{13} and in this issue by Sousa and Hlinka^{15} and Weiss.^{16} There is a great variety of other tests.^{12,14}

We shall not enter this matter, but discuss related questions for our EEG data: could they represent white noise, when they are sampled at some distance $d$? And will $ H 3, H 4,$ and the pattern contrasts from Sec. III distinguish them? We have fairly long time series with $T=15360$ for 1009 epochs of 30 s, and we can do the test for each $d=1,\u2026,500$ between 2 ms and 1 s. Thus we can perform $500\u22171009=504500$ tests only for this data set n5 from one night. Interdependence of tests and correction for multiple testing will be irrelevant for our questions. We are not looking for few tests which reject the i.i.d. null hypothesis. This should be the normal case with sample size $T=15360$. We would like to have few tests which accept the null hypothesis. We use the 95% confidence level to give all parameters a chance. Table IV shows that they all reject the i.i.d. process for the majority of cases, but there are big differences.

For permutation entropy, we took the critical quantiles for 95% and 99.9% from Table III. For pattern contrasts we took standard normal quantiles, multiplied by $\sigma /T$ where $ \sigma 2$ is taken from (20). Compare Weiss^{16} where simulations justified the normal approximation already for much smaller $T.$ We also included the new tests of Weiss, Ruiz Marin, Keller and Matilla-Garcia^{17} with two pattern contrasts of length 4 as follows. Let $ \beta 4= p 1234\u2212 p 4321$ and $ \tau 4= p 1234+ p 4321\u2212 1 12$. In Ref. 17, it is shown that for an i.i.d. process $( p 1234, p 4321)$ has an asymptotic normal distribution with covariance matrix $ 1 4032 ( 199 \u2212 17 \u2212 17 199 )$. Thus, under the null hypothesis $ \beta 4$ and $ \tau 4$ have mean zero and variances $ 3 28$ and $ 181 2016,$ respectively.

Table IV shows that permutation entropy has much better power than the contrasts in these serial dependence tests. Even $ H 3$ is much better than $\tau $ although simulations of Weiss^{16} indicate that $\tau $ can perform better for small $T$. It seems that for large $T,$ the $1/T$-scaling of $ H m$ becomes superior to the $1/ T$-scaling in the normal distribution. For our EEG data, all pattern contrasts show significant deviations to both sides from an i.i.d. process. Of course this can change when we consider special conditions, for instance deep sleep, where $\tau $ and $ \tau 4$ are large (small turning rate, see Sec. VI C). In this connection, we note that for $d=1,\u2026,50,$ all 50 450 tests with $ \tau 4$ gave significant positive deviation from the null hypothesis. This may be due to the influence of the hardware low-pass filter, however. Also $\tau $ accepted $ H 0$ in less than 0.1% of tests with $d\u226450$.

These results support the original suggestion of Matilla-Garcia and Ruiz Marin^{12} to use $ H 4$ as test quantity. By definition, it can detect all differences in pattern frequencies for $m\u22644.$ On the confidence level of 99.9% , there were only 9 from 504 500 tests where $ H 0$ was accepted.

Thus, there is not the slightest chance to find an i.i.d. process in the EEG. Such data represent another type of noise, and new models are needed to understand them. For short time series with $T\u2264300,$ for example monthly economic data, we can just estimate patterns of length 3. Use of an i.i.d. model is justified in such a context. For high-resolution measurements, however, we should work with patterns of length 4 and should develop better models for noise so that we can dismiss the i.i.d. process.

## VIII. CONCLUSION

The six patterns of length 3 are naturally transformed into four independent contrasts. The turning rate $\alpha $ quantifies roughness, and the parameters $\beta ,\gamma ,\delta $ describe deviations from symmetry and reversibility. For symmetric processes, to which some EEG series belong, $\alpha $ controls the whole order structure of length 3. In an EEG, $\alpha $ is taken as a function of time and delay, and its time oscillations seem to indicate an infra-slow rhythm of the brain.

The next step is to study patterns of length 4. Theory and careful study of large data sets must go hand in hand. Two contrasts $ \tau 4$ and $ \beta 4$ were suggested in a new paper by Weiß *et al*.,^{17} which continues the recent development of statistical tests with order patterns.^{13,15,16,25} The field is active and driven by new applications. It will form its own theory in the coming years.

## ACKNOWLEDGMENTS

I am grateful to Karsten and also to Jose Amigo and Osvaldo Rosso for the wonderful workshop in Dresden. My thanks go to Bernd Pompe for fruitful discussions and to both referees of the paper for their competent comments.

## AUTHOR DECLARATIONS

### Conflict of Interest

The author has no conflicts to disclose.

### Author Contributions

**Christoph Bandt:** Conceptualization (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Software (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal).

## DATA AVAILABILITY

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

## REFERENCES

*Permutation Complexity in Dynamical Systems*, Springer Series in Synergetics (Springer, Berlin, 2010).

*Invariant Factors, Julia Equivalences and the (Abstract) Mandelbrot Set*, Lecture Notes in Mathematics, Vol. 1732 (Springer, Berlin, 2000).

*Time Series Analysis and Forecasting*, Contributions to Statistics, edited by I. Rojas and H. Pomares (Springer, 2015).

*Probability and Random Processes*