Developing reliable methodologies to decode brain state information from electroencephalogram (EEG) signals is an open challenge, crucial to implementing EEG-based brain–computer interfaces (BCIs). For example, signal processing methods that identify brain states could allow motor-impaired patients to communicate via non-invasive, EEG-based BCIs. In this work, we focus on the problem of distinguishing between the states of eyes closed (EC) and eyes open (EO), employing quantities based on permutation entropy (PE). An advantage of PE analysis is that it uses symbols (ordinal patterns) defined by the ordering of the data points (disregarding the actual values), hence providing robustness to noise and outliers due to motion artifacts. However, we show that for the analysis of multichannel EEG recordings, the performance of PE in discriminating the EO and EC states depends on the symbols’ definition and how their probabilities are estimated. Here, we study the performance of PE-based features for EC/EO state classification in a dataset of $N=107$ subjects with one-minute 64-channel EEG recordings in each state. We analyze features obtained from patterns encoding temporal or spatial information, and we compare different approaches to estimate their probabilities (by averaging over time, over channels, or by “pooling”). We find that some PE-based features provide about 75% classification accuracy, comparable to the performance of features extracted with other statistical analysis techniques. Our work highlights the limitations of PE methods in distinguishing the eyes’ state, but, at the same time, it points to the possibility that subject-specific training could overcome these limitations.

Ordinal analysis is a symbolic data analysis method based on calculating the probabilities of symbols (known as ordinal patterns) that are defined in terms of the relative magnitude of data points. It has been extensively used to analyze complex signals in various fields, including finances, social systems, climate, physics, physiology, neuroscience, etc. The permutation entropy (PE) is a complexity measure that is computed from the probabilities of the ordinal patterns, and it has been proposed to distinguish, from the analysis of EEG recordings, the states of eyes open (EO) and eyes closed (EC). Using ordinal patterns that encode information of temporal or spatial order between data points, it has been reported that the EO state is characterized by higher entropy and than the EC state; however, the large variability between subjects prevents the differentiation of the EC and EO states of individual subjects. Here, we evaluate the performance of different ways to calculate PE-based features and compare their classification ability in terms of the way the symbols are constructed and how their probabilities are estimated. We find that, for spatial symbols, the results depend significantly on the symbols’ orientation and on the pre-processing of the data. We find that PE-based features can provide a classification performance comparable to that of features extracted with other statistical analysis techniques.

## I. INTRODUCTION

Electroencephalograms (EEGs) consist in the recording of the brain’s electrical activity generated by large neuronal populations, registered by electrodes placed on the subject’s scalp.^{1} EEG signals, generally categorized as delta, theta, alpha, beta, and gamma based on frequency ranges from 0.1 Hz to more than 100 Hz,^{2} can be used as a diagnostic tool of mental disorders such as epilepsy,^{3} Alzheimer,^{4} depression,^{5} etc. But the non-invasive and portable nature of the technology also make EEG signals suitable for distinguishing sleep stages^{6,7} or decoding movement information^{8} that can be used by brain–computer interfaces (BCIs).^{9,10}

The state of the eyes (open, EO, or closed, EC) has been reported to play an important role in EEG-based BCI performance in controlling external devices, a task known as “motor imagery.”^{11,12} Differences in the EEG recordings between subjects with their eyes open or closed have been reported as far back as the 1930s (see Ref. 13 and references therein), mainly in the alpha band.^{14} More recently, efforts have focused on capturing these differences by using data analysis and machine learning techniques. By feeding $13$ statistical measures (such as the mean absolute deviation, the kurtosis, etc.) to a k-Nearest Neighbor classifier, Sinha *et al.*^{15} obtained a classification accuracy of about $78%$. Furman *et al.*,^{16} instead, through short-time Fourier transform and recurrence quantification analysis, extract $320$ features and achieved a classification accuracy of more than $95%$. These results were slightly improved considering $13$ features per channel from recurrence plots and a genetic algorithm as feature selector.^{17} A deep convolutional neural network has been shown to achieve a similar accuracy.^{18} However, it is important to notice that these high-performing methods require also high-dimensional calculations with long enough recordings with high spatial and temporal resolution, which is demanding in terms of time, energy, and memory. The information flow between different brain hemispheres has also been analyzed. Using the transfer entropy, Restrepo and co-workers^{19} analyzed records of 24 EEG channels and found an increase in information transfer in the EC state for the alpha and beta frequency bands, but no preferred direction of inter-hemispheric information transfer under either state.

Alongside these efforts, ordinal analysis^{20} has been used to characterize the differences between the EO and EC states. Ordinal analysis provides a value of entropy, known as permutation entropy (PE), based on the probabilities of symbols, known as ordinal patterns, that encode relative differences between values of data points. Ordinal analysis has been shown to be simple, computationally efficient, robust to noise, able to characterize complex dynamics^{21} and to distinguish an underlying dynamic from stochastic behavior.^{22,23} Hussain *et al.*^{24} reported that the Multiscale Permutation Entropy^{25} of the electrodes located in particular regions of the scalp can distinguish between both states, especially when compared to Multiscale Sample Entropy.^{26} Quintero-Quiroz *et al.*^{27} calculated the PE using patterns defined by data points recorded on the same electrode at different times (we shall refer to this approach as “temporal coding”), and reported, when averaging over subjects, a lower PE value for the EC state than for the EO state. More recently, Boaretto and co-workers^{28} used patterns defined by data points recorded on different electrodes at the same time, an approach that we refer to as “spatial coding,” and that has been used to quantify the complexity of two or higher dimensional data, such as images and videos.^{29–32} They also found that the PE is, on average, lower for the EC state than for the EO state, but they reported a larger separation of EC/EO PE values, as compared to Ref. 27. However, in both cases,^{27,28} the large variability between subjects prevents using the PE for differentiating the EC and EO states of individual subjects.

The goal of this work is to analyze how different approaches to compute PE affect the classification performance of PE-based features. We analyze features obtained from patterns that encode temporal or spatial information, and we also compare different ways to estimate their probabilities: averaging over time, over EEG channels, or by “pooling”—estimating the probabilities from the frequency of occurrence of the patterns, regardless of the time or EEG channel in which they occur.^{33} We also compare PE-based features obtained from raw and filtered data.

Our results show that, first, the orientation of spatial symbols plays an important role, which is conditioned by the pre-processing of the data: symbols defined along the lateral–medial (“horizontal”) direction distinguish better the two states on the raw data, while symbols defined along the anterior–posterior (“vertical”) direction perform better on filtered data. Second, we show that the classification performance of PE-based features is comparable to that previously reported in the literature.^{15} Finally, we show that using several PE-based features obtained from raw data increases the performance, but not above the best values reached by single features obtained from filtered data.

This paper is organized as follows. Section II describes the EEG dataset analyzed. Section III describes the ordinal analysis method and discusses the definition of the patterns (temporal or spatial coding) and the calculation of their probabilities (by averaging over time, over channels, or by pooling). It also presents the machine learning algorithm used to classify, from PE-based features, the EO and EC states of individual subjects. Section IV presents the results obtained and Sec. V presents a summary and an outline of possible future research directions.

## II. DATA

The data consist of a set of freely available EEG recordings with 64 channels,^{34–36} sampled at 160 Hz for 60 s in each EO/EC state. Each channel has 9600 data points recorded in the EC state followed by 9600 data points recorded in the EO state. The position of the electrodes in the scalp is provided with the dataset and displayed in Fig. 3(a).

The data set contains recordings of 109 subjects. As in Ref. 28, we removed subjects 97 and 109, due to several null values in their times series. Therefore, we analyze time series recorded from $N=107$ subjects, $ x i s(t)$, where $s$ represents the subject, $i$ represents the channel, and $t$ the time.

In addition to the analysis of the raw data, we filtered the time series to analyze only the $\alpha $ band, 8–12 Hz, where a significant reduction in activity has been reported in the EO state compared to the EC state.^{13} We filtered the raw data using an FIR filter provided by the MNE *Python* package.^{37} Examples of the raw and filtered time series are shown in Fig. 1.

## III. METHODS

### A. Univariate ordinal analysis

^{20}is calculated by first translating the time series into a sequence of symbols, $s={ s 1, s 2,\u2026, s n}$, where $n=N\u2212(d\u22121)\tau $, $d$ represents the symbol length, and $\tau $ is a time lag. Each symbol is constructed in the following way: for $t=1,\u2026,n$, the vector $[ x t, x t + \tau , x t + 2 \tau ,\u2026, x t + ( d \u2212 1 ) \tau ]$ is reordered such that $ x \pi ( t )\u2264 x \pi ( t + \tau )\u2264\u2026\u2264 x \pi ( t + ( d \u2212 1 ) \tau )$, where $\pi (\u22c5)$ is the corresponding permutation of time steps that brings the original vector into this ordering. Then, the constructed symbol is $ s t=[\pi (t),\pi (t+\tau ),\u2026,\pi (t+(d\u22121)\tau )]$ that corresponds to one of $d!$ possibilities. Such symbols are known as “ordinal patterns.” Once the time series is translated into a sequence of ordinal patterns, the probability of each symbol is estimated by its frequency of occurrence. From these probabilities, the value of normalized permutation entropy, $H$, is

In this approach, each data point (except for the first and last $d\u22121$ points of the time series) is used to construct $d$ different symbols. Alternatively, the data could be partitioned into non-overlapping windows, so that each data point is used only once. In this case, $n= N d$ symbols are obtained. Both approaches have been reported to produce similar results, except for the variance of the entropy estimator.^{38,39} In particular, in our analysis, we did not observe any significant difference between the two approaches if sufficient statistics is available to estimate the entropy using non-overlapping patterns.

### B. Multivariate ordinal analysis

We can extend the ordinal analysis methodology to deal with multivariate time series using four different procedures. For a subject $s$ ( $s=1,\u2026,107)$, the PE of the set of 64 EEG channels, ${ x i s(t),i=1,\u2026,64,t=1,\u2026,9600}$, is computed as

*Temporal coding*: For each channel $i$, we calculate $ H i s$ as explained in Sec. III A. Then, we average the entropy over all the channels, $\u27e8 H i s \u27e9 i$, and also calculate the associated standard deviation, $\sigma ( H i s)$. This procedure is represented by the green arrows in Fig. 2.*Spatial coding*: At each time $t$, we define a vector containing the values of the channels of subject $s$ at time $t$,Then, we transform this vector in a sequence of symbols and calculate the symbol’s probabilities and the corresponding permutation entropy, $ H t s$, at time $t$, as explained in Sec. III A. Then, we average the entropy over all times, $\u27e8 H t s \u27e9 t$, and also calculate the associated standard deviation, $\sigma ( H t s)$. This procedure is represented by the blue arrows in Fig. 2.$ x t s={ x 1 s(t), x 2 s(t),\u2026, x 64 s(t)}.$*Pooling in space*: Encoding the data as in (B), we calculate the probabilities of the different patterns from their frequency of occurrence, regardless of the time when they occur, as represented by the orange dashed box in Fig. 2. The associated PE is referred to as $ H p i s$.*Pooling in time*: Encoding the data as in (A), we calculate the probabilities of the different patterns from their frequency of occurrence, regardless of the channel where they occur, as represented by the red dashed box in Fig. 2. The associated PE is referred to as $ H p t s$.

For procedures (B) and (C), the order of the channels in the vector $ x t s$ [Eq. (2)] can be arbitrary, but it has a direct impact on the result, as shown in Ref. 28, where the EO/EC states were better distinguished if the spatial order of the electrodes follows the numbering displayed in Fig. 3(a) (from now on, referred to as “alternative arrangement”), when compared to ordering provided with the dataset^{34} (from now on, referred to as “linear arrangement”). Such ordering constructs symbols with preferred lateral–medial direction. However, this coding also includes symbols from electrodes that are far apart, such as electrodes $17$ and $18$, or $26$ and $28$. To solve these issues, we embedded the spatial arrangement of the electrodes in a grid, as shown in Fig. 3(b), and decoupled the lateral–medial direction (from now on, referred to as *horizontal*) from the anterior-posterior direction (from now on, referred to as *vertical*) by constructing ordinal patterns with electrodes belonging to the same row or to the same column.

### C. Implementation

To compare the four procedures to calculate PE, we construct spatial or temporal patterns of length $d=3$ formed by neighboring data points in time or in space ( $\tau =1$).

When using temporal symbols, to calculate the PE of channel $i$ of subject $s$, $ H i s$ (procedure A), we have 9598 patterns to estimate the probabilities of six $d=3$ patterns. Then, we calculate the PE of subject $s$ by averaging $ H i s$ over 64 channels; in procedure D (temporal pooling) to calculate $ H p t s$ we have 64 channels $\xd79598$ patterns to estimate the six probabilities.

When using spatial symbols (procedures B and C), considering the spatial arraignment in Ref. 28 gives 62 patterns to calculate the PE of subject $s$ at time $t$. Then, we calculate the PE of subject $s$ by averaging $ H t s$ over 9600 times. In contrast, when limiting the analysis to only horizontal or vertical patterns, we have, at each time, 45 or 44 patterns respectively. In procedure C (spatial pooling) to calculate $ H p i s$, we have $9600\xd762/45/44$ (when using all/only horizontal/only vertical) patterns to estimate the six probabilities.

### D. Classification of EC/EO states

Random forest (RF) is a technique of ensemble machine learning^{40} that is typically employed in regression and classification tasks.^{41} It consists of a large set of decision trees, with a random component imparted by the bootstrap sampling of the training data and by the selection of the predictor variables.^{42} For classification problems, each tree casts a vote for the preferred class, and the most voted one is the class selected by the RF.^{43}

We trained a RF algorithm considering EC as the “positive” class and EO as the “negative” class. A repeated $10$-fold cross-validation (CV) technique was used to measure the algorithm performance over $214$ time series (EO and EC states of $107$ subjects).^{44}

The ability of the algorithm to discriminate the two states was evaluated using standard measures: accuracy (fraction of correct classifications), precision (fraction of true positives over predicted positives), recall (fraction of correctly predicted positives over all positives), specificity (fraction of correctly predicted negatives over all negatives and F1 score (harmonic average of precision and recall).

## IV. RESULTS

### A. Analysis of the four procedures to calculate PE

Figure 4 displays the PE calculated using procedure B, $\u27e8 H t s \u27e9 t$, for the raw data [panel (a)] and for the filtered data [(panel (b)]. The entropy is averaged over all subjects and over all times, and the error bars indicate the standard deviation over time of the subject-averaged entropy, as in Ref. 28. We show the effect of the arrangement of the channels in the vector $ x t s$ [Eq. (2)] on the ability of PE to distinguish the EO and EC states. In the raw data we see that, when considering only “horizontal” symbols, the distinction between the two states improves as compared to the other spatial arrangements. When considering “vertical” symbols, in the raw data almost no distinction is observed, while in the filtered data, a clear differentiation is seen.

As explained before (Sec. III C) considering horizontal or vertical patterns reduces the number of patterns that are available to calculate the patterns’ probabilities: from 62 patterns at time $t$ using the arrangement in Ref. 28, to only 45 (44) in the horizontal (vertical) direction. This may account for the lower entropy values with horizontal or vertical arrangements (in both EC and EO states) when compared to the arrangements used in Refs. 28 and 45.

The averaging procedure performed in Fig. 4 (over subjects and over time) gives global information about the EC–EO states, but it does not provide information to discern the two states for a single subject. With this aim, we analyze in Fig. 5(a) the $\u27e8 H t s \u27e9 t$ values. Here the symbols and the error bars represent the mean and the standard deviation of the distribution of $\u27e8 H t s \u27e9 t$ values of the $N=107$ subjects.

As in Fig. 4, we observe that EC states usually display lower entropy than EO states, at least for the horizontal direction in the raw and filtered data, and in the vertical direction in the filtered data. However, this difference is not enough to discriminate the two states for individual subjects, as the variability of these quantities across subjects is relatively large.

Figures 5(b) and 5(c) display $\sigma ( H t s )$ and $ H p i s$ respectively (procedures B and C). $\sigma ( H t s )$, in particular, provides information about how much variability in time there is in $ H t s$. We observe that, usually, the EC state has higher $\sigma ( H t s )$ than the EO state. As in the previous case, the behavior is the opposite when analyzing the raw data with vertical patterns. As for $\u27e8 H t s \u27e9 t$, these differences are not enough to separate both states. For $ H p i s$, a very similar behavior to $\u27e8 H t s \u27e9 t$ is observed, but with slightly higher values, likely due to the fact that pooling increases the available number of patterns to calculate the probabilities. This increase in the available patterns allows us to use this method employing longer ordinal patterns ( $d>3$) and larger $\tau $ values. However, we did not observe any significant difference from the case $d=3$, $\tau =1$, when testing for the combinations $(d=3,4,5,6;\tau =1)$ and $(d=3;\tau =1,2,3,4)$.

We note that there is a clear difference in the size of the error bars in Figs. 4 and 5. As explained before, in Fig. 4 the same procedure as in Ref. 28 was used, in which the first average over subjects removes the inter-subject variability, which is the primal source of variance, and thus the error bars only capture the small temporal variability of the subject-averaged entropy. On the contrary, the time average of $\u27e8 H t s \u27e9 t$, shown in Fig. 5(a), washes out the temporal variability, and the error bars show the high variability over subjects.

In Fig. 6 we report the results obtained with temporal coding (procedures A and D). The analysis is similar to that reported in Ref. 27, but here we use $d=3$. We observe a similar behavior as with spatial coding: EC states have lower entropy and higher standard deviation than EO states, and pooling provides very similar values as averaging. In fact, in the raw data, the correlation coefficient of the 107 values of $\u27e8 H t s \u27e9 t$ and $ H p t s$ is 0.997 for both EC and EO states. Therefore, “temporal pooling”—computing the entropy from the probabilities of temporal symbols regardless of the channel where their occur—is, at least for this dataset, equivalent to computing the entropy of each channel and then averaging over channels. We have also evaluated all possible combination of $d=3,4,5,6$ and $\tau =1,2,4,8$, verifying that they yield similar results.

In Fig. 7 we summarize the results of this section, reporting the PE values obtained using the different approaches considered; in this figure, the symbols and error bars indicate the mean value and the standard deviation calculated over $N=107$ subjects. We see that no approach can fully separate the two states, neither in the raw data [Fig. 7(a)] nor filtered data [Fig. 7(b)].

Finally, in Fig. 8 we present the relative entropy difference between the two states for each subject, defined as $2[PE(EO)\u2212PE(EC)]/[PE(EO)+PE(EC)]$. Also in this case, constructing spatial symbols with either horizontal or vertical arrangements produces greater relative entropy differences with respect to the alternative arrangement studied in Ref. 28.

In the next section, we present the results of processing PE-based features with a machine-learning algorithm to achieve EC/EO classification without any previous knowledge of the subject.

### B. EC/EO classification

To test the performance of the entropy-based features analyzed in the previous section (with spatial coding, $\u27e8 H t s \u27e9 t$, $\sigma ( H t s )$, $ H p i s$ shown in Fig. 5 or with temporal coding, $\u27e8 H i s \u27e9 i$, $\sigma ( H i s )$, and $ H p t s$ shown in Fig. 6) in distinguishing the EO and EC states of single subjects, we feed these features to a RF algorithm (see Sec. III D) and train it to classify the different times series into EO and EC states. We report the results obtained in Table I (Table II) for raw (filtered) data. When we used temporal-coding features ( $\u27e8 H i s \u27e9 i$, $\sigma ( H i s )$, $ H p t s$), we varied the ordinal patterns’ parameters (length, $d=3,4,5,6$, and time lag, $\tau =1,2,4,8$) and in the tables we report the results obtained with the best performing parameters.

. | . | Accuracy . | F1 score . | Precision . | Recall . | Specificity . |
---|---|---|---|---|---|---|

Horizontal | $\u27e8 H t s \u27e9 t$ | 61 ± 7 | 59 ± 10 | 63 ± 10 | 57 ± 16 | 65 ± 16 |

$\sigma ( H t s )$ | 66 ± 7 | 65 ± 9 | 66 ± 9 | 67 ± 15 | 65 ± 15 | |

$ H p i s$ | 58 ± 8 | 54 ± 12 | 61 ± 11 | 50 ± 16 | 66 ± 15 | |

Vertical | $\u27e8 H t s \u27e9 t$ | 54 ± 9 | 55 ± 12 | 54 ± 10 | 59 ± 17 | 50 ± 15 |

$\sigma ( H t s )$ | 56 ± 9 | 59 ± 10 | 56 ± 9 | 64 ± 15 | 48 ± 16 | |

$ H p i s$ | 55 ± 9 | 56 ± 11 | 55 ± 10 | 59 ± 17 | 51 ± 16 | |

Temporal | $\u27e8 H i s \u27e9 i$ | 63 ± 8 | 56 ± 13 | 70 ± 15 | 49 ± 16 | 77 ± 15 |

$\sigma ( H i s )$ | 69 ± 8 | 66 ± 10 | 73 ± 12 | 62 ± 14 | 76 ± 13 | |

$ H p t s$ | 64 ± 8 | 58 ± 13 | 72 ± 14 | 51 ± 16 | 78 ± 14 |

. | . | Accuracy . | F1 score . | Precision . | Recall . | Specificity . |
---|---|---|---|---|---|---|

Horizontal | $\u27e8 H t s \u27e9 t$ | 61 ± 7 | 59 ± 10 | 63 ± 10 | 57 ± 16 | 65 ± 16 |

$\sigma ( H t s )$ | 66 ± 7 | 65 ± 9 | 66 ± 9 | 67 ± 15 | 65 ± 15 | |

$ H p i s$ | 58 ± 8 | 54 ± 12 | 61 ± 11 | 50 ± 16 | 66 ± 15 | |

Vertical | $\u27e8 H t s \u27e9 t$ | 54 ± 9 | 55 ± 12 | 54 ± 10 | 59 ± 17 | 50 ± 15 |

$\sigma ( H t s )$ | 56 ± 9 | 59 ± 10 | 56 ± 9 | 64 ± 15 | 48 ± 16 | |

$ H p i s$ | 55 ± 9 | 56 ± 11 | 55 ± 10 | 59 ± 17 | 51 ± 16 | |

Temporal | $\u27e8 H i s \u27e9 i$ | 63 ± 8 | 56 ± 13 | 70 ± 15 | 49 ± 16 | 77 ± 15 |

$\sigma ( H i s )$ | 69 ± 8 | 66 ± 10 | 73 ± 12 | 62 ± 14 | 76 ± 13 | |

$ H p t s$ | 64 ± 8 | 58 ± 13 | 72 ± 14 | 51 ± 16 | 78 ± 14 |

. | . | Accuracy . | F1 score . | Precision . | Recall . | Specificity . |
---|---|---|---|---|---|---|

Horizontal | $\u27e8 H t s \u27e9 t$ | 66 ± 7 | 62 ± 10 | 71 ± 11 | 57 ± 15 | 74 ± 14 |

$\sigma ( H t s )$ | 64 ± 8 | 66 ± 8 | 63 ± 8 | 71 ± 14 | 56 ± 16 | |

$ H p i s$ | 65 ± 8 | 65 ± 9 | 67 ± 10 | 65 ± 15 | 66 ± 15 | |

Vertical | $\u27e8 H t s \u27e9 t$ | 74 ± 8 | 70 ± 11 | 84 ± 12 | 61 ± 15 | 87 ± 12 |

$\sigma ( H t s )$ | 69 ± 8 | 63 ± 12 | 81 ± 14 | 54 ± 15 | 85 ± 13 | |

$ H p i s$ | 71 ± 8 | 66 ± 11 | 80 ± 12 | 58 ± 15 | 83 ± 13 | |

Temporal | $\u27e8 H i s \u27e9 i$ | 76 ± 9 | 72 ± 12 | 86 ± 12 | 63 ± 15 | 88 ± 11 |

$\sigma ( H i s )$ | 69 ± 8 | 63 ± 12 | 77 ± 13 | 56 ± 16 | 81 ± 12 | |

$ H p t s$ | 74 ± 8 | 70 ± 12 | 84 ± 12 | 61 ± 15 | 87 ± 11 |

. | . | Accuracy . | F1 score . | Precision . | Recall . | Specificity . |
---|---|---|---|---|---|---|

Horizontal | $\u27e8 H t s \u27e9 t$ | 66 ± 7 | 62 ± 10 | 71 ± 11 | 57 ± 15 | 74 ± 14 |

$\sigma ( H t s )$ | 64 ± 8 | 66 ± 8 | 63 ± 8 | 71 ± 14 | 56 ± 16 | |

$ H p i s$ | 65 ± 8 | 65 ± 9 | 67 ± 10 | 65 ± 15 | 66 ± 15 | |

Vertical | $\u27e8 H t s \u27e9 t$ | 74 ± 8 | 70 ± 11 | 84 ± 12 | 61 ± 15 | 87 ± 12 |

$\sigma ( H t s )$ | 69 ± 8 | 63 ± 12 | 81 ± 14 | 54 ± 15 | 85 ± 13 | |

$ H p i s$ | 71 ± 8 | 66 ± 11 | 80 ± 12 | 58 ± 15 | 83 ± 13 | |

Temporal | $\u27e8 H i s \u27e9 i$ | 76 ± 9 | 72 ± 12 | 86 ± 12 | 63 ± 15 | 88 ± 11 |

$\sigma ( H i s )$ | 69 ± 8 | 63 ± 12 | 77 ± 13 | 56 ± 16 | 81 ± 12 | |

$ H p t s$ | 74 ± 8 | 70 ± 12 | 84 ± 12 | 61 ± 15 | 87 ± 11 |

Using filtered data tends to improve the classification performance, except for the accuracy, precision, and specificity for $\sigma ( H t s )$ of horizontal symbols, and F1 score and recall for $\sigma ( H i s )$, when using temporal coding. With this pre-processing, $\u27e8 H i s \u27e9 i$ with $d=5$ and $\tau =4$ has the best performance across all measures except recall, but $\u27e8 H t s \u27e9 i$ using vertical symbols performs only $\u223c2%$ worse. Such performance is as good as the reported performance of other statistical measures.^{15}

Interestingly, the standard deviation performs comparably, or even better on raw data, than averages. It is the standard deviation of the temporal coding with $d=3$ and $\tau =2$ that performs the best on raw data in terms of accuracy, F1 score, and precision, and second best on specificity (slightly below $ H p t s$). Also, the best recall is obtained for $\sigma ( H t s )$ of horizontal symbols. This measure also performs second best in terms of accuracy and F1 score in raw data.

We also note that “pooling” (in space or in time) does not give a significant advantage, in comparison with averaging, i.e., the performance of $\u27e8 H t s \u27e9 t$ and $ H p i s$, and of $\u27e8 H i s \u27e9 i$ and $ H p t s$, are similar.

We attempted to improve the classification performance in the raw data by using a larger number of features, choosing indifferently temporal and spatial features. However, we found that a larger number of features did not increase the classification performance significantly. In fact, the best performance (in terms of accuracy and F1 score, both close to $78%$) was obtained for $8$ features, and it was comparable to the performance of classification with only one feature in filtered data. In the filtered data, no improvement was obtained for any of the combination of features analyzed, which is due to a strong correlation between the best performing features.

On a final note, the relative difference computed using procedure B with horizontal symbols can discriminate the two states with an accuracy of 96% for the raw data and 97% for the filtered data. Nevertheless, this result assumes that we are comparing two different states from the same subject.

## V. CONCLUSIONS

We have compared different methodologies to calculate the permutation entropy (PE), and analyzed their performance to discriminate eyes open (EO) and eyes closed (EC) states, using a freely available dataset of 64-channels EEG recordings with $N=107$ subjects. We analyzed the distributions of average PE values and standard deviations, considering different strategies for defining ordinal patterns (“horizontal” or “vertical”) and different approaches for calculating the entropy (averaging or “pooling,” over time or over channels).

First, we have shown that to differentiate the two states using spatial PE, defining the symbols in the horizontal direction enhances the difference between the EO and EC states, both on the raw and on the filtered data, when compared to the spatial arrangements considered in.^{28} We have also shown that, when defined in the filtered data, the vertical patterns capture information that differentiates the two states. However, this way of coding, only with horizontal or vertical symbols, comes at the cost of smaller statistics when computing the symbols’ probability of occurrence, hampering the implementation in setups with a low number of electrodes.

Secondly, we have calculated, for individual subjects, time averages of different PE-based quantities and analyzed their distributions for the EO and EC states (Fig. 5). We have found that although EC states present, on average, lower entropy than EO states, their distributions overlap, thus preventing a full distinction between the two states for every subject. We also analyzed the distributions of the standard deviation of the spatial entropy ( $\sigma ( H i s )$) and the pooled spatial entropy ( $ H p i s$), for the EC and EO states, using different coding strategies, and pre-processing of the data. Interestingly, $\sigma ( H i s )$ has an opposite behavior as $\u27e8 H t s \u27e9 t$ (EC states correspond to high standard deviation, and EO to low standard deviation), and qualitatively performs as well as $\u27e8 H t s \u27e9 t$ in distinguishing the two states. Likewise, the “pooling” technique performs similarly to averaging, which opens the possibility that this technique can be used in setups with a small number of electrodes, provided that sufficiently long records are available. We also found that the distributions obtained by using temporal coding (Fig. 6) give similar performance.

Finally, we have quantitatively assessed the performance of these features as classifiers, when feeding them to a random forest algorithm. As expected from the previous reports,^{27,28} we found that filtering improves performance. Moreover, the average entropy, the pooled entropy, and the entropy standard deviation perform similarly, and the proposed approach of vertical symbols on filtered data performs better than the other spatial arrangements of the electrodes studied, providing about $75%$ accuracy, which is comparable, when using a single feature, to the performance of the statistical measures that have been analyzed in the literature.^{15} Such performance falls short when compared with other more powerful methods, however, these also involve a large number of features, complex feature selection, or advanced machine learning techniques.

For future work, it will be interesting to analyze spatial ordinal patterns that have a cross-like shape, and thus, take into account correlations among horizontally and vertically neighboring channels. It will also be interesting to analyze ordinal patterns that incorporate spatial and temporal information (that are defined in terms of data points recorded in different electrodes at different times). In addition, it will be interesting to use horizontal or vertical patterns in other EEG datasets, such as EEGs recorded during motion tasks or during sleep, to test whether they increase the performance of PE-based features for identifying the intention of movement, the type of motion, the sleep stage, etc.

## ACKNOWLEDGMENTS

We acknowledge the support of ICREA ACADEMIA, AGAUR (2021 SGR 00606 and FI scholarship) and Ministerio de Ciencia e Innovación (Project No. PID2021-123994NB-C21)

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

**Juan Gancio:** Conceptualization (equal); Formal analysis (lead); Methodology (lead); Software (lead); Writing – original draft (equal); Writing – review & editing (equal). **Cristina Masoller:** Conceptualization (equal); Supervision (equal); Writing – original draft (equal); Writing – review & editing (equal). **Giulio Tirabassi:** Conceptualization (equal); Supervision (equal); Writing – original draft (equal); Writing – review & editing (equal).

## DATA AVAILABILITY

The data that support the findings of this study are openly available in Physionet at https://physionet.org/content/eegmmidb/1.0.0/.

## REFERENCES

*EEG Technology*

*2016 IEEE Region 10 Conference (TENCON)*(IEEE, 2016), pp. 2466–2469.

*2005 Pakistan Section Multitopic Conference*(IEEE, 2005), pp. 1–6.

*Ensemble Machine Learning: Methods and Applications*(Springer, 2012), pp. 1–34.

*Ensemble Machine Learning: Methods and Applications*(Springer, 2012), pp. 157–175.