Developing reliable methodologies to decode brain state information from electroencephalogram (EEG) signals is an open challenge, crucial to implementing EEG-based brain–computer interfaces (BCIs). For example, signal processing methods that identify brain states could allow motor-impaired patients to communicate via non-invasive, EEG-based BCIs. In this work, we focus on the problem of distinguishing between the states of eyes closed (EC) and eyes open (EO), employing quantities based on permutation entropy (PE). An advantage of PE analysis is that it uses symbols (ordinal patterns) defined by the ordering of the data points (disregarding the actual values), hence providing robustness to noise and outliers due to motion artifacts. However, we show that for the analysis of multichannel EEG recordings, the performance of PE in discriminating the EO and EC states depends on the symbols’ definition and how their probabilities are estimated. Here, we study the performance of PE-based features for EC/EO state classification in a dataset of N = 107 subjects with one-minute 64-channel EEG recordings in each state. We analyze features obtained from patterns encoding temporal or spatial information, and we compare different approaches to estimate their probabilities (by averaging over time, over channels, or by “pooling”). We find that some PE-based features provide about 75% classification accuracy, comparable to the performance of features extracted with other statistical analysis techniques. Our work highlights the limitations of PE methods in distinguishing the eyes’ state, but, at the same time, it points to the possibility that subject-specific training could overcome these limitations.

Ordinal analysis is a symbolic data analysis method based on calculating the probabilities of symbols (known as ordinal patterns) that are defined in terms of the relative magnitude of data points. It has been extensively used to analyze complex signals in various fields, including finances, social systems, climate, physics, physiology, neuroscience, etc. The permutation entropy (PE) is a complexity measure that is computed from the probabilities of the ordinal patterns, and it has been proposed to distinguish, from the analysis of EEG recordings, the states of eyes open (EO) and eyes closed (EC). Using ordinal patterns that encode information of temporal or spatial order between data points, it has been reported that the EO state is characterized by higher entropy and than the EC state; however, the large variability between subjects prevents the differentiation of the EC and EO states of individual subjects. Here, we evaluate the performance of different ways to calculate PE-based features and compare their classification ability in terms of the way the symbols are constructed and how their probabilities are estimated. We find that, for spatial symbols, the results depend significantly on the symbols’ orientation and on the pre-processing of the data. We find that PE-based features can provide a classification performance comparable to that of features extracted with other statistical analysis techniques.

Electroencephalograms (EEGs) consist in the recording of the brain’s electrical activity generated by large neuronal populations, registered by electrodes placed on the subject’s scalp.1 EEG signals, generally categorized as delta, theta, alpha, beta, and gamma based on frequency ranges from 0.1 Hz to more than 100 Hz,2 can be used as a diagnostic tool of mental disorders such as epilepsy,3 Alzheimer,4 depression,5 etc. But the non-invasive and portable nature of the technology also make EEG signals suitable for distinguishing sleep stages6,7 or decoding movement information8 that can be used by brain–computer interfaces (BCIs).9,10

The state of the eyes (open, EO, or closed, EC) has been reported to play an important role in EEG-based BCI performance in controlling external devices, a task known as “motor imagery.”11,12 Differences in the EEG recordings between subjects with their eyes open or closed have been reported as far back as the 1930s (see Ref. 13 and references therein), mainly in the alpha band.14 More recently, efforts have focused on capturing these differences by using data analysis and machine learning techniques. By feeding 13 statistical measures (such as the mean absolute deviation, the kurtosis, etc.) to a k-Nearest Neighbor classifier, Sinha et al.15 obtained a classification accuracy of about 78 %. Furman et al.,16 instead, through short-time Fourier transform and recurrence quantification analysis, extract 320 features and achieved a classification accuracy of more than 95 %. These results were slightly improved considering 13 features per channel from recurrence plots and a genetic algorithm as feature selector.17 A deep convolutional neural network has been shown to achieve a similar accuracy.18 However, it is important to notice that these high-performing methods require also high-dimensional calculations with long enough recordings with high spatial and temporal resolution, which is demanding in terms of time, energy, and memory. The information flow between different brain hemispheres has also been analyzed. Using the transfer entropy, Restrepo and co-workers19 analyzed records of 24 EEG channels and found an increase in information transfer in the EC state for the alpha and beta frequency bands, but no preferred direction of inter-hemispheric information transfer under either state.

Alongside these efforts, ordinal analysis20 has been used to characterize the differences between the EO and EC states. Ordinal analysis provides a value of entropy, known as permutation entropy (PE), based on the probabilities of symbols, known as ordinal patterns, that encode relative differences between values of data points. Ordinal analysis has been shown to be simple, computationally efficient, robust to noise, able to characterize complex dynamics21 and to distinguish an underlying dynamic from stochastic behavior.22,23 Hussain et al.24 reported that the Multiscale Permutation Entropy25 of the electrodes located in particular regions of the scalp can distinguish between both states, especially when compared to Multiscale Sample Entropy.26 Quintero-Quiroz et al.27 calculated the PE using patterns defined by data points recorded on the same electrode at different times (we shall refer to this approach as “temporal coding”), and reported, when averaging over subjects, a lower PE value for the EC state than for the EO state. More recently, Boaretto and co-workers28 used patterns defined by data points recorded on different electrodes at the same time, an approach that we refer to as “spatial coding,” and that has been used to quantify the complexity of two or higher dimensional data, such as images and videos.29–32 They also found that the PE is, on average, lower for the EC state than for the EO state, but they reported a larger separation of EC/EO PE values, as compared to Ref. 27. However, in both cases,27,28 the large variability between subjects prevents using the PE for differentiating the EC and EO states of individual subjects.

The goal of this work is to analyze how different approaches to compute PE affect the classification performance of PE-based features. We analyze features obtained from patterns that encode temporal or spatial information, and we also compare different ways to estimate their probabilities: averaging over time, over EEG channels, or by “pooling”—estimating the probabilities from the frequency of occurrence of the patterns, regardless of the time or EEG channel in which they occur.33 We also compare PE-based features obtained from raw and filtered data.

Our results show that, first, the orientation of spatial symbols plays an important role, which is conditioned by the pre-processing of the data: symbols defined along the lateral–medial (“horizontal”) direction distinguish better the two states on the raw data, while symbols defined along the anterior–posterior (“vertical”) direction perform better on filtered data. Second, we show that the classification performance of PE-based features is comparable to that previously reported in the literature.15 Finally, we show that using several PE-based features obtained from raw data increases the performance, but not above the best values reached by single features obtained from filtered data.

This paper is organized as follows. Section II describes the EEG dataset analyzed. Section III describes the ordinal analysis method and discusses the definition of the patterns (temporal or spatial coding) and the calculation of their probabilities (by averaging over time, over channels, or by pooling). It also presents the machine learning algorithm used to classify, from PE-based features, the EO and EC states of individual subjects. Section IV presents the results obtained and Sec. V presents a summary and an outline of possible future research directions.

The data consist of a set of freely available EEG recordings with 64 channels,34–36 sampled at 160 Hz for 60 s in each EO/EC state. Each channel has 9600 data points recorded in the EC state followed by 9600 data points recorded in the EO state. The position of the electrodes in the scalp is provided with the dataset and displayed in Fig. 3(a).

The data set contains recordings of 109 subjects. As in Ref. 28, we removed subjects 97 and 109, due to several null values in their times series. Therefore, we analyze time series recorded from N = 107 subjects, x i s ( t ), where s represents the subject, i represents the channel, and t the time.

In addition to the analysis of the raw data, we filtered the time series to analyze only the α band, 8–12 Hz, where a significant reduction in activity has been reported in the EO state compared to the EC state.13 We filtered the raw data using an FIR filter provided by the MNE Python package.37 Examples of the raw and filtered time series are shown in Fig. 1.

FIG. 1.

Examples of time series from the data set. Single channel extracted from a single subject, eyes open [(EO, (a)], and eyes closed [EC, (b)]. The raw (filtered) data are displayed in blue (red).

FIG. 1.

Examples of time series from the data set. Single channel extracted from a single subject, eyes open [(EO, (a)], and eyes closed [EC, (b)]. The raw (filtered) data are displayed in blue (red).

Close modal
Given a time series of length N, x = { x 1 , x 2 , , x N }, the permutation entropy20 is calculated by first translating the time series into a sequence of symbols, s = { s 1 , s 2 , , s n }, where n = N ( d 1 ) τ, d represents the symbol length, and τ is a time lag. Each symbol is constructed in the following way: for t = 1 , , n, the vector [ x t , x t + τ , x t + 2 τ , , x t + ( d 1 ) τ ] is reordered such that x π ( t ) x π ( t + τ ) x π ( t + ( d 1 ) τ ), where π ( ) is the corresponding permutation of time steps that brings the original vector into this ordering. Then, the constructed symbol is s t = [ π ( t ) , π ( t + τ ) , , π ( t + ( d 1 ) τ ) ] that corresponds to one of d ! possibilities. Such symbols are known as “ordinal patterns.” Once the time series is translated into a sequence of ordinal patterns, the probability of each symbol is estimated by its frequency of occurrence. From these probabilities, the value of normalized permutation entropy, H, is
H = j = 1 d ! p j log ( p j ) log ( d ! ) ,
(1)
where the normalization log ( d ! ) ensures that H [ 0 , 1 ].

In this approach, each data point (except for the first and last d 1 points of the time series) is used to construct d different symbols. Alternatively, the data could be partitioned into non-overlapping windows, so that each data point is used only once. In this case, n = N d symbols are obtained. Both approaches have been reported to produce similar results, except for the variance of the entropy estimator.38,39 In particular, in our analysis, we did not observe any significant difference between the two approaches if sufficient statistics is available to estimate the entropy using non-overlapping patterns.

We can extend the ordinal analysis methodology to deal with multivariate time series using four different procedures. For a subject s ( s = 1 , , 107 ), the PE of the set of 64 EEG channels, { x i s ( t ) , i = 1 , , 64 , t = 1 , , 9600 }, is computed as

  1. Temporal coding: For each channel i, we calculate H i s as explained in Sec. III A. Then, we average the entropy over all the channels, H i s i, and also calculate the associated standard deviation, σ ( H i s ). This procedure is represented by the green arrows in Fig. 2.

  2. Spatial coding: At each time t, we define a vector containing the values of the channels of subject s at time t,
    x t s = { x 1 s ( t ) , x 2 s ( t ) , , x 64 s ( t ) } .
    (2)
    Then, we transform this vector in a sequence of symbols and calculate the symbol’s probabilities and the corresponding permutation entropy, H t s, at time t, as explained in Sec. III A. Then, we average the entropy over all times, H t s t, and also calculate the associated standard deviation, σ ( H t s ). This procedure is represented by the blue arrows in Fig. 2.
  3. Pooling in space: Encoding the data as in (B), we calculate the probabilities of the different patterns from their frequency of occurrence, regardless of the time when they occur, as represented by the orange dashed box in Fig. 2. The associated PE is referred to as H p i s.

  4. Pooling in time: Encoding the data as in (A), we calculate the probabilities of the different patterns from their frequency of occurrence, regardless of the channel where they occur, as represented by the red dashed box in Fig. 2. The associated PE is referred to as H p t s.

FIG. 2.

Illustration of the procedure to calculate the permutation entropy (PE) using temporal coding (procedures A and D described in the text, indicated by the green arrows and the red dashed box, respectively) and using spatial coding (procedures B and C described in the text, indicated by the blue arrows and the orange dashed box). The procedure is displayed for a small portion of the data.

FIG. 2.

Illustration of the procedure to calculate the permutation entropy (PE) using temporal coding (procedures A and D described in the text, indicated by the green arrows and the red dashed box, respectively) and using spatial coding (procedures B and C described in the text, indicated by the blue arrows and the orange dashed box). The procedure is displayed for a small portion of the data.

Close modal
FIG. 3.

Arrangement of the electrodes used for spatial coding (procedures B and C described in the text). (a) Location of the electrodes on the scalp, with numbering corresponding to the spatial order used in Ref. 28 and Fig. 6. (b) Grid arrangement of the electrodes, where examples of the channels used to define a d = 3 horizontal symbol and a d = 3 vertical symbol are indicated.

FIG. 3.

Arrangement of the electrodes used for spatial coding (procedures B and C described in the text). (a) Location of the electrodes on the scalp, with numbering corresponding to the spatial order used in Ref. 28 and Fig. 6. (b) Grid arrangement of the electrodes, where examples of the channels used to define a d = 3 horizontal symbol and a d = 3 vertical symbol are indicated.

Close modal

For procedures (B) and (C), the order of the channels in the vector x t s [Eq. (2)] can be arbitrary, but it has a direct impact on the result, as shown in Ref. 28, where the EO/EC states were better distinguished if the spatial order of the electrodes follows the numbering displayed in Fig. 3(a) (from now on, referred to as “alternative arrangement”), when compared to ordering provided with the dataset34 (from now on, referred to as “linear arrangement”). Such ordering constructs symbols with preferred lateral–medial direction. However, this coding also includes symbols from electrodes that are far apart, such as electrodes 17 and 18, or 26 and 28. To solve these issues, we embedded the spatial arrangement of the electrodes in a grid, as shown in Fig. 3(b), and decoupled the lateral–medial direction (from now on, referred to as horizontal) from the anterior-posterior direction (from now on, referred to as vertical) by constructing ordinal patterns with electrodes belonging to the same row or to the same column.

To compare the four procedures to calculate PE, we construct spatial or temporal patterns of length d = 3 formed by neighboring data points in time or in space ( τ = 1).

When using temporal symbols, to calculate the PE of channel i of subject s, H i s (procedure A), we have 9598 patterns to estimate the probabilities of six d = 3 patterns. Then, we calculate the PE of subject s by averaging H i s over 64 channels; in procedure D (temporal pooling) to calculate H p t s we have 64 channels × 9598 patterns to estimate the six probabilities.

When using spatial symbols (procedures B and C), considering the spatial arraignment in Ref. 28 gives 62 patterns to calculate the PE of subject s at time t. Then, we calculate the PE of subject s by averaging H t s over 9600 times. In contrast, when limiting the analysis to only horizontal or vertical patterns, we have, at each time, 45 or 44 patterns respectively. In procedure C (spatial pooling) to calculate H p i s, we have 9600 × 62 / 45 / 44 (when using all/only horizontal/only vertical) patterns to estimate the six probabilities.

Random forest (RF) is a technique of ensemble machine learning40 that is typically employed in regression and classification tasks.41 It consists of a large set of decision trees, with a random component imparted by the bootstrap sampling of the training data and by the selection of the predictor variables.42 For classification problems, each tree casts a vote for the preferred class, and the most voted one is the class selected by the RF.43 

We trained a RF algorithm considering EC as the “positive” class and EO as the “negative” class. A repeated 10-fold cross-validation (CV) technique was used to measure the algorithm performance over 214 time series (EO and EC states of 107 subjects).44 

The ability of the algorithm to discriminate the two states was evaluated using standard measures: accuracy (fraction of correct classifications), precision (fraction of true positives over predicted positives), recall (fraction of correctly predicted positives over all positives), specificity (fraction of correctly predicted negatives over all negatives and F1 score (harmonic average of precision and recall).

Figure 4 displays the PE calculated using procedure B, H t s t, for the raw data [panel (a)] and for the filtered data [(panel (b)]. The entropy is averaged over all subjects and over all times, and the error bars indicate the standard deviation over time of the subject-averaged entropy, as in Ref. 28. We show the effect of the arrangement of the channels in the vector x t s [Eq. (2)] on the ability of PE to distinguish the EO and EC states. In the raw data we see that, when considering only “horizontal” symbols, the distinction between the two states improves as compared to the other spatial arrangements. When considering “vertical” symbols, in the raw data almost no distinction is observed, while in the filtered data, a clear differentiation is seen.

FIG. 4.

Permutation entropy computed using procedure B from (a) raw data, (b) filtered data. The symbols indicate the PE value averaged over time and over subjects, and the error bars indicate the standard deviation calculated over time. From left to right: original and alternative arrangements studied in Ref. 28; coding with only horizontal or vertical symbols.

FIG. 4.

Permutation entropy computed using procedure B from (a) raw data, (b) filtered data. The symbols indicate the PE value averaged over time and over subjects, and the error bars indicate the standard deviation calculated over time. From left to right: original and alternative arrangements studied in Ref. 28; coding with only horizontal or vertical symbols.

Close modal

As explained before (Sec. III C) considering horizontal or vertical patterns reduces the number of patterns that are available to calculate the patterns’ probabilities: from 62 patterns at time t using the arrangement in Ref. 28, to only 45 (44) in the horizontal (vertical) direction. This may account for the lower entropy values with horizontal or vertical arrangements (in both EC and EO states) when compared to the arrangements used in Refs. 28 and 45.

The averaging procedure performed in Fig. 4 (over subjects and over time) gives global information about the EC–EO states, but it does not provide information to discern the two states for a single subject. With this aim, we analyze in Fig. 5(a) the H t s t values. Here the symbols and the error bars represent the mean and the standard deviation of the distribution of H t s t values of the N = 107 subjects.

FIG. 5.

Permutation entropy and standard deviation obtained with spatial coding, for horizontal and vertical symbols, for raw and filtered data, for EO and EC states. The squares represent averages over subjects and the error bars indicate the standard deviation calculated over subjects. (a) H t s t and (b) σ ( H t s ) calculated with procedure B described in the text, (c) H p i s calculated with procedure C (spatial pooling).

FIG. 5.

Permutation entropy and standard deviation obtained with spatial coding, for horizontal and vertical symbols, for raw and filtered data, for EO and EC states. The squares represent averages over subjects and the error bars indicate the standard deviation calculated over subjects. (a) H t s t and (b) σ ( H t s ) calculated with procedure B described in the text, (c) H p i s calculated with procedure C (spatial pooling).

Close modal

As in Fig. 4, we observe that EC states usually display lower entropy than EO states, at least for the horizontal direction in the raw and filtered data, and in the vertical direction in the filtered data. However, this difference is not enough to discriminate the two states for individual subjects, as the variability of these quantities across subjects is relatively large.

Figures 5(b) and 5(c) display σ ( H t s ) and H p i s respectively (procedures B and C). σ ( H t s ), in particular, provides information about how much variability in time there is in H t s. We observe that, usually, the EC state has higher σ ( H t s ) than the EO state. As in the previous case, the behavior is the opposite when analyzing the raw data with vertical patterns. As for H t s t, these differences are not enough to separate both states. For H p i s, a very similar behavior to H t s t is observed, but with slightly higher values, likely due to the fact that pooling increases the available number of patterns to calculate the probabilities. This increase in the available patterns allows us to use this method employing longer ordinal patterns ( d > 3) and larger τ values. However, we did not observe any significant difference from the case d = 3, τ = 1, when testing for the combinations ( d = 3 , 4 , 5 , 6 ; τ = 1 ) and ( d = 3 ; τ = 1 , 2 , 3 , 4 ).

We note that there is a clear difference in the size of the error bars in Figs. 4 and 5. As explained before, in Fig. 4 the same procedure as in Ref. 28 was used, in which the first average over subjects removes the inter-subject variability, which is the primal source of variance, and thus the error bars only capture the small temporal variability of the subject-averaged entropy. On the contrary, the time average of H t s t, shown in Fig. 5(a), washes out the temporal variability, and the error bars show the high variability over subjects.

In Fig. 6 we report the results obtained with temporal coding (procedures A and D). The analysis is similar to that reported in Ref. 27, but here we use d = 3. We observe a similar behavior as with spatial coding: EC states have lower entropy and higher standard deviation than EO states, and pooling provides very similar values as averaging. In fact, in the raw data, the correlation coefficient of the 107 values of H t s t and H p t s is 0.997 for both EC and EO states. Therefore, “temporal pooling”—computing the entropy from the probabilities of temporal symbols regardless of the channel where their occur—is, at least for this dataset, equivalent to computing the entropy of each channel and then averaging over channels. We have also evaluated all possible combination of d = 3 , 4 , 5 , 6 and τ = 1 , 2 , 4 , 8, verifying that they yield similar results.

FIG. 6.

Permutation entropy and standard deviation obtained with temporal coding, for raw and filtered data, for EO and EC states. The squares represent averages over subjects and the error bars indicate the standard deviation calculated over subjects. (a) H i s i and (b) σ ( H i s ) calculated with procedure A described in the text, (c) H p t s calculated with procedure D (temporal pooling).

FIG. 6.

Permutation entropy and standard deviation obtained with temporal coding, for raw and filtered data, for EO and EC states. The squares represent averages over subjects and the error bars indicate the standard deviation calculated over subjects. (a) H i s i and (b) σ ( H i s ) calculated with procedure A described in the text, (c) H p t s calculated with procedure D (temporal pooling).

Close modal

In Fig. 7 we summarize the results of this section, reporting the PE values obtained using the different approaches considered; in this figure, the symbols and error bars indicate the mean value and the standard deviation calculated over N = 107 subjects. We see that no approach can fully separate the two states, neither in the raw data [Fig. 7(a)] nor filtered data [Fig. 7(b)].

FIG. 7.

Comparison of the different approaches to calculate PE. A: temporal coding; B: spatial coding with horizontal symbols, vertical symbols, and symbols defined with the alternative arrangement studied in Ref. 28; C: spatial pooling with horizontal and vertical symbols and D: temporal pooling. The PE is calculated from (a) raw data; (b) filtered data. The symbols and error bars represent the mean and the standard deviation calculated over N = 107 subjects.

FIG. 7.

Comparison of the different approaches to calculate PE. A: temporal coding; B: spatial coding with horizontal symbols, vertical symbols, and symbols defined with the alternative arrangement studied in Ref. 28; C: spatial pooling with horizontal and vertical symbols and D: temporal pooling. The PE is calculated from (a) raw data; (b) filtered data. The symbols and error bars represent the mean and the standard deviation calculated over N = 107 subjects.

Close modal

Finally, in Fig. 8 we present the relative entropy difference between the two states for each subject, defined as 2 [ P E ( E O ) P E ( E C ) ] / [ P E ( E O ) + P E ( E C ) ]. Also in this case, constructing spatial symbols with either horizontal or vertical arrangements produces greater relative entropy differences with respect to the alternative arrangement studied in Ref. 28.

FIG. 8.

Same as Fig. 7, but for the distributions of the relative difference between EO and EC values of permutation entropy.

FIG. 8.

Same as Fig. 7, but for the distributions of the relative difference between EO and EC values of permutation entropy.

Close modal

In the next section, we present the results of processing PE-based features with a machine-learning algorithm to achieve EC/EO classification without any previous knowledge of the subject.

To test the performance of the entropy-based features analyzed in the previous section (with spatial coding, H t s t, σ ( H t s ), H p i s shown in Fig. 5 or with temporal coding, H i s i, σ ( H i s ), and H p t s shown in Fig. 6) in distinguishing the EO and EC states of single subjects, we feed these features to a RF algorithm (see Sec. III D) and train it to classify the different times series into EO and EC states. We report the results obtained in Table I (Table II) for raw (filtered) data. When we used temporal-coding features ( H i s i, σ ( H i s ), H p t s), we varied the ordinal patterns’ parameters (length, d = 3 , 4 , 5 , 6, and time lag, τ = 1 , 2 , 4 , 8) and in the tables we report the results obtained with the best performing parameters.

TABLE I.

Performance measures (expressed as percentages) obtained for the different features to distinguish the EC and EO states, for raw data. The best value is highlighted in bold. The last three rows correspond to temporal coding with d = 3 and τ = 2, the best-performing parameters ( 8 % increase in accuracy compared to d = 3 and τ = 1). Results are provided as mean and one standard deviation of the performance measures computed over the 10 folds of the CV scheme.

Accuracy F1 score Precision Recall Specificity
Horizontal  H t s t  61 ± 7  59 ± 10  63 ± 10  57 ± 16  65 ± 16 
  σ ( H t s )  66 ± 7  65 ± 9  66 ± 9  67 ± 15  65 ± 15 
  H p i s  58 ± 8  54 ± 12  61 ± 11  50 ± 16  66 ± 15 
Vertical  H t s t  54 ± 9  55 ± 12  54 ± 10  59 ± 17  50 ± 15 
  σ ( H t s )  56 ± 9  59 ± 10  56 ± 9  64 ± 15  48 ± 16 
  H p i s  55 ± 9  56 ± 11  55 ± 10  59 ± 17  51 ± 16 
Temporal  H i s i  63 ± 8  56 ± 13  70 ± 15  49 ± 16  77 ± 15 
  σ ( H i s )  69 ± 8  66 ± 10  73 ± 12  62 ± 14  76 ± 13 
  H p t s  64 ± 8  58 ± 13  72 ± 14  51 ± 16  78 ± 14 
Accuracy F1 score Precision Recall Specificity
Horizontal  H t s t  61 ± 7  59 ± 10  63 ± 10  57 ± 16  65 ± 16 
  σ ( H t s )  66 ± 7  65 ± 9  66 ± 9  67 ± 15  65 ± 15 
  H p i s  58 ± 8  54 ± 12  61 ± 11  50 ± 16  66 ± 15 
Vertical  H t s t  54 ± 9  55 ± 12  54 ± 10  59 ± 17  50 ± 15 
  σ ( H t s )  56 ± 9  59 ± 10  56 ± 9  64 ± 15  48 ± 16 
  H p i s  55 ± 9  56 ± 11  55 ± 10  59 ± 17  51 ± 16 
Temporal  H i s i  63 ± 8  56 ± 13  70 ± 15  49 ± 16  77 ± 15 
  σ ( H i s )  69 ± 8  66 ± 10  73 ± 12  62 ± 14  76 ± 13 
  H p t s  64 ± 8  58 ± 13  72 ± 14  51 ± 16  78 ± 14 
TABLE II.

Performance measures (expressed as percentages) obtained for the different features to distinguish the EC and EO states, from filtered data. The best value is highlighted in bold. The last three rows correspond to temporal coding with d = 5 and τ = 4, the best-performing parameters ( 16 % increase in accuracy compared to d = 3 and τ = 1). Results are provided as mean and one standard deviation of the performance measures computed over the 10 folds of the CV scheme.

Accuracy F1 score Precision Recall Specificity
Horizontal  H t s t  66 ± 7  62 ± 10  71 ± 11  57 ± 15  74 ± 14 
  σ ( H t s )  64 ± 8  66 ± 8  63 ± 8  71 ± 14  56 ± 16 
  H p i s  65 ± 8  65 ± 9  67 ± 10  65 ± 15  66 ± 15 
Vertical  H t s t  74 ± 8  70 ± 11  84 ± 12  61 ± 15  87 ± 12 
  σ ( H t s )  69 ± 8  63 ± 12  81 ± 14  54 ± 15  85 ± 13 
  H p i s  71 ± 8  66 ± 11  80 ± 12  58 ± 15  83 ± 13 
Temporal  H i s i  76 ± 9  72 ± 12  86 ± 12  63 ± 15  88 ± 11 
  σ ( H i s )  69 ± 8  63 ± 12  77 ± 13  56 ± 16  81 ± 12 
  H p t s  74 ± 8  70 ± 12  84 ± 12  61 ± 15  87 ± 11 
Accuracy F1 score Precision Recall Specificity
Horizontal  H t s t  66 ± 7  62 ± 10  71 ± 11  57 ± 15  74 ± 14 
  σ ( H t s )  64 ± 8  66 ± 8  63 ± 8  71 ± 14  56 ± 16 
  H p i s  65 ± 8  65 ± 9  67 ± 10  65 ± 15  66 ± 15 
Vertical  H t s t  74 ± 8  70 ± 11  84 ± 12  61 ± 15  87 ± 12 
  σ ( H t s )  69 ± 8  63 ± 12  81 ± 14  54 ± 15  85 ± 13 
  H p i s  71 ± 8  66 ± 11  80 ± 12  58 ± 15  83 ± 13 
Temporal  H i s i  76 ± 9  72 ± 12  86 ± 12  63 ± 15  88 ± 11 
  σ ( H i s )  69 ± 8  63 ± 12  77 ± 13  56 ± 16  81 ± 12 
  H p t s  74 ± 8  70 ± 12  84 ± 12  61 ± 15  87 ± 11 

Using filtered data tends to improve the classification performance, except for the accuracy, precision, and specificity for σ ( H t s ) of horizontal symbols, and F1 score and recall for σ ( H i s ), when using temporal coding. With this pre-processing, H i s i with d = 5 and τ = 4 has the best performance across all measures except recall, but H t s i using vertical symbols performs only 2 % worse. Such performance is as good as the reported performance of other statistical measures.15 

Interestingly, the standard deviation performs comparably, or even better on raw data, than averages. It is the standard deviation of the temporal coding with d = 3 and τ = 2 that performs the best on raw data in terms of accuracy, F1 score, and precision, and second best on specificity (slightly below H p t s). Also, the best recall is obtained for σ ( H t s ) of horizontal symbols. This measure also performs second best in terms of accuracy and F1 score in raw data.

We also note that “pooling” (in space or in time) does not give a significant advantage, in comparison with averaging, i.e., the performance of H t s t and H p i s, and of H i s i and H p t s, are similar.

We attempted to improve the classification performance in the raw data by using a larger number of features, choosing indifferently temporal and spatial features. However, we found that a larger number of features did not increase the classification performance significantly. In fact, the best performance (in terms of accuracy and F1 score, both close to 78 %) was obtained for 8 features, and it was comparable to the performance of classification with only one feature in filtered data. In the filtered data, no improvement was obtained for any of the combination of features analyzed, which is due to a strong correlation between the best performing features.

On a final note, the relative difference computed using procedure B with horizontal symbols can discriminate the two states with an accuracy of 96% for the raw data and 97% for the filtered data. Nevertheless, this result assumes that we are comparing two different states from the same subject.

We have compared different methodologies to calculate the permutation entropy (PE), and analyzed their performance to discriminate eyes open (EO) and eyes closed (EC) states, using a freely available dataset of 64-channels EEG recordings with N = 107 subjects. We analyzed the distributions of average PE values and standard deviations, considering different strategies for defining ordinal patterns (“horizontal” or “vertical”) and different approaches for calculating the entropy (averaging or “pooling,” over time or over channels).

First, we have shown that to differentiate the two states using spatial PE, defining the symbols in the horizontal direction enhances the difference between the EO and EC states, both on the raw and on the filtered data, when compared to the spatial arrangements considered in.28 We have also shown that, when defined in the filtered data, the vertical patterns capture information that differentiates the two states. However, this way of coding, only with horizontal or vertical symbols, comes at the cost of smaller statistics when computing the symbols’ probability of occurrence, hampering the implementation in setups with a low number of electrodes.

Secondly, we have calculated, for individual subjects, time averages of different PE-based quantities and analyzed their distributions for the EO and EC states (Fig. 5). We have found that although EC states present, on average, lower entropy than EO states, their distributions overlap, thus preventing a full distinction between the two states for every subject. We also analyzed the distributions of the standard deviation of the spatial entropy ( σ ( H i s )) and the pooled spatial entropy ( H p i s), for the EC and EO states, using different coding strategies, and pre-processing of the data. Interestingly, σ ( H i s ) has an opposite behavior as H t s t (EC states correspond to high standard deviation, and EO to low standard deviation), and qualitatively performs as well as H t s t in distinguishing the two states. Likewise, the “pooling” technique performs similarly to averaging, which opens the possibility that this technique can be used in setups with a small number of electrodes, provided that sufficiently long records are available. We also found that the distributions obtained by using temporal coding (Fig. 6) give similar performance.

Finally, we have quantitatively assessed the performance of these features as classifiers, when feeding them to a random forest algorithm. As expected from the previous reports,27,28 we found that filtering improves performance. Moreover, the average entropy, the pooled entropy, and the entropy standard deviation perform similarly, and the proposed approach of vertical symbols on filtered data performs better than the other spatial arrangements of the electrodes studied, providing about 75 % accuracy, which is comparable, when using a single feature, to the performance of the statistical measures that have been analyzed in the literature.15 Such performance falls short when compared with other more powerful methods, however, these also involve a large number of features, complex feature selection, or advanced machine learning techniques.

For future work, it will be interesting to analyze spatial ordinal patterns that have a cross-like shape, and thus, take into account correlations among horizontally and vertically neighboring channels. It will also be interesting to analyze ordinal patterns that incorporate spatial and temporal information (that are defined in terms of data points recorded in different electrodes at different times). In addition, it will be interesting to use horizontal or vertical patterns in other EEG datasets, such as EEGs recorded during motion tasks or during sleep, to test whether they increase the performance of PE-based features for identifying the intention of movement, the type of motion, the sleep stage, etc.

We acknowledge the support of ICREA ACADEMIA, AGAUR (2021 SGR 00606 and FI scholarship) and Ministerio de Ciencia e Innovación (Project No. PID2021-123994NB-C21)

The authors have no conflicts to disclose.

Juan Gancio: Conceptualization (equal); Formal analysis (lead); Methodology (lead); Software (lead); Writing – original draft (equal); Writing – review & editing (equal). Cristina Masoller: Conceptualization (equal); Supervision (equal); Writing – original draft (equal); Writing – review & editing (equal). Giulio Tirabassi: Conceptualization (equal); Supervision (equal); Writing – original draft (equal); Writing – review & editing (equal).

The data that support the findings of this study are openly available in Physionet at https://physionet.org/content/eegmmidb/1.0.0/.

1.
R.
Cooper
,
J. W.
Osselton
, and
J. C.
Shaw
,
EEG Technology
(
Butterworth-Heinemann
,
2014
).
2.
J. S.
Kumar
and
P.
Bhuvaneswari
, “
Analysis of electroencephalography (EEG) signals and its categorization—A study
,”
Procedia Eng.
38
,
2525
2536
(
2012
).
3.
S. J.
Smith
, “
EEG in the diagnosis, classification, and management of patients with epilepsy
,”
J. Neurol., Neurosurg. Psychiatry
76
,
ii2
ii7
(
2005
).
4.
J.
Dauwels
,
F.
Vialatte
, and
A.
Cichocki
, “
Diagnosis of Alzheimer’s disease from EEG signals: Where are we standing?
,”
Curr. Alzheimer Res.
7
,
487
505
(
2010
).
5.
U. R.
Acharya
,
V. K.
Sudarshan
,
H.
Adeli
,
J.
Santhosh
,
J. E.
Koh
, and
A.
Adeli
, “
Computer-aided diagnosis of depression using EEG signals
,”
Eur. Neurol.
73
,
329
336
(
2015
).
6.
J.
Gonzalez
,
M.
Cavelli
,
A.
Mondino
,
C.
Pascovich
,
S.
Castro-Zaballa
,
P.
Torterolo
, and
N.
Rubido
, “
Decreased electrocortical temporal complexity distinguishes sleep from wakefulness
,”
Sci. Rep.
9
,
18457
(
2019
).
7.
D. M.
Mateos
,
J.
Gomez-Ramírez
, and
O. A.
Rosso
, “
Using time causal quantifiers to characterize sleep stages
,”
Chaos, Solitons Fractals
146
,
110798
(
2021
).
8.
A.
Schwarz
,
C.
Escolano
,
L.
Montesano
, and
G. R.
Müller-Putz
, “
Analyzing and decoding natural reach-and-grasp actions using gel, water and dry EEG systems
,”
Front. Neurosci.
14
,
849
(
2020
).
9.
H.
Banville
and
T. H.
Falk
, “
Recent advances and open challenges in hybrid brain–computer interfacing: A technological review of non-invasive human research
,”
Brain-Comput. Interfaces
3
,
9
46
(
2016
).
10.
M.
Rashid
,
N.
Sulaiman
,
A. P. P.
Abdul Majeed
,
R. M.
Musa
,
A. F.
Ab Nasir
,
B. S.
Bari
, and
S.
Khatun
, “
Current status, challenges, and possible solutions of EEG-based brain–computer interface: A comprehensive review
,”
Front. Neurorobot.
14
,
25
(
2020
).
11.
M.
Kwon
,
H.
Cho
,
K.
Won
,
M.
Ahn
, and
S. C.
Jun
, “
Use of both eyes-open and eyes-closed resting states may yield a more robust predictor of motor imagery BCI performance
,”
Electronics
9
,
690
(
2020
).
12.
K.
Wang
,
F.
Tian
,
M.
Xu
,
S.
Zhang
,
L.
Xu
, and
D.
Ming
, “
Resting-state EEG in alpha rhythm may be indicative of the performance of motor imagery-based brain–computer interface
,”
Entropy
24
,
1556
(
2022
).
13.
R. J.
Barry
,
A. R.
Clarke
,
S. J.
Johnstone
,
C. A.
Magee
, and
J. A.
Rushby
, “
EEG differences between eyes-closed and eyes-open resting conditions
,”
Clin. Neurophysiol.
118
,
2765
2773
(
2007
).
14.
R. J.
Barry
and
F. M.
De Blasio
, “
EEG differences between eyes-closed and eyes-open resting remain in healthy ageing
,”
Biol. Psychol.
129
,
293
304
(
2017
).
15.
N.
Sinha
and
D.
Babu
, “Statistical feature analysis for EEG baseline classification: Eyes open vs eyes closed,” in 2016 IEEE Region 10 Conference (TENCON) (IEEE, 2016), pp. 2466–2469.
16.
Ł.
Furman
,
W.
Duch
,
L.
Minati
, and
K.
Tołpa
, “
Short-time Fourier transform and embedding method for recurrence quantification analysis of EEG time series
,”
Eur. Phys. J. Spec. Top.
232
,
135
149
(
2023
).
17.
A.
Khosla
,
P.
Khandnor
, and
T.
Chand
, “
A novel method for EEG based automated eyes state classification using recurrence plots and machine learning approach
,”
Concurr. Comput.: Pract. Exp.
34
,
e6912
(
2022
).
18.
C. Q.
Lai
,
H.
Ibrahim
,
S. A.
Suandi
, and
M. Z.
Abdullah
, “
Convolutional neural network for closed-set identification from resting state electroencephalography
,”
Mathematics
10
,
3442
(
2022
).
19.
J. F.
Restrepo
,
D. M.
Mateos
, and
J. M.
Lopez
, “
A transfer entropy-based methodology to analyze information flow under eyes-open and eyes-closed conditions with a clinical perspective
,”
Biomed. Signal Process. Control
86
,
105181
(
2023
).
20.
C.
Bandt
and
B.
Pompe
, “
Permutation entropy: A natural complexity measure for time series
,”
Phys. Rev. Lett.
88
,
174102
(
2002
).
21.
K.
Lehnertz
, “
Ordinal methods for a characterization of evolving functional brain networks
,”
Chaos
33
,
022101
(
2023
).
22.
M.
Zanin
and
F.
Olivares
, “
Ordinal patterns-based methodologies for distinguishing chaos from noise in discrete time series
,”
Commun. Phys.
4
,
190
(
2021
).
23.
I.
Kottlarz
and
U.
Parlitz
, “
Ordinal pattern-based complexity analysis of high-dimensional chaotic time series
,”
Chaos
33
,
053105
(
2023
).
24.
L.
Hussain
,
W.
Aziz
,
S.
Saeed
,
S. A.
Shah
,
M. S. A.
Nadeem
,
I. A.
Awan
,
A.
Abbas
,
A.
Majid
, and
S. Z. H.
Kazmi
, “
Complexity analysis of EEG motor movement with eye open and close subjects using multiscale permutation entropy (MPE) technique
,”
Biomed. Res.
28
,
7104
7111
(
2017
).
25.
W.
Aziz
and
M.
Arif
, “Multiscale permutation entropy of physiological time series,” in 2005 Pakistan Section Multitopic Conference (IEEE, 2005), pp. 1–6.
26.
M.
Zanin
,
L.
Zunino
,
O. A.
Rosso
, and
D.
Papo
, “
Permutation entropy and its main biomedical and econophysics applications: A review
,”
Entropy
14
,
1553
1577
(
2012
).
27.
C.
Quintero-Quiroz
,
L.
Montesano
,
A. J.
Pons
,
M. C.
Torrent
,
J.
García-Ojalvo
, and
C.
Masoller
, “
Differentiating resting brain states using ordinal symbolic analysis
,”
Chaos
28
,
106307
(
2018
).
28.
B. R.
Boaretto
,
R. C.
Budzinski
,
K. L.
Rossi
,
C.
Masoller
, and
E. E.
Macau
, “
Spatial permutation entropy distinguishes resting brain states
,”
Chaos, Solitons Fractals
171
,
113453
(
2023
).
29.
H. V.
Ribeiro
,
L.
Zunino
,
E. K.
Lenzi
,
P. A.
Santoro
, and
R. S.
Mendes
, “
Complexity-entropy causality plane as a complexity measure for two-dimensional patterns
,”
PLoS One
7
,
e40689
(
2012
).
30.
H. Y. D.
Sigaki
,
R. F.
de Souza
,
R. T.
de Souza
,
R. S.
Zola
, and
H. V.
Ribeiro
, “
Estimating physical properties from liquid crystal textures via machine learning and complexity-entropy methods
,”
Phys. Rev. E
99
,
013311
(
2019
).
31.
G.
Tirabassi
and
C.
Masoller
, “
Entropy-based early detection of critical transitions in spatial vegetation fields
,”
Proc. Natl. Acad. Sci. U.S.A.
120
,
e2215667120
(
2023
).
32.
G.
Tirabassi
,
M.
Duque-Gijon
,
J.
Tiana-Alsina
, and
C.
Masoller
, “
Permutation entropy-based characterization of speckle patterns generated by semiconductor laser light
,”
APL Photonics
8
,
126112
(
2023
).
33.
K.
Keller
and
H.
Lauffer
, “
Symbolic analysis of high-dimensional time series
,”
Int. J. Bifurcat. Chaos
13
,
2657
2668
(
2003
).
34.
See https://physionet.org/content/eegmmidb/1.0.0/ for “EEG Motor Movement/Imagery Dataset.”
35.
G.
Schalk
,
D. J.
McFarland
,
T.
Hinterberger
,
N.
Birbaumer
, and
J. R.
Wolpaw
, “
BCI2000: A general-purpose brain–computer interface (BCI) system
,”
IEEE Trans. Biomed. Eng.
51
,
1034
1043
(
2004
).
36.
A. L.
Goldberger
,
L. A.
Amaral
,
L.
Glass
,
J. M.
Hausdorff
,
P. C.
Ivanov
,
R. G.
Mark
,
J. E.
Mietus
,
G. B.
Moody
,
C.-K.
Peng
, and
H. E.
Stanley
, “
Physiobank, physiotoolkit, and physionet: Components of a new research resource for complex physiologic signals
,”
Circulation
101
,
e215
e220
(
2000
).
37.
A.
Gramfort
,
M.
Luessi
,
E.
Larson
,
D. A.
Engemann
,
D.
Strohmeier
,
C.
Brodbeck
,
R.
Goj
,
M.
Jas
,
T.
Brooks
,
L.
Parkkonen
, and
M.
Hämäläinen
, “
MEG and EEG data analysis with MNE—Python
,”
Front. Neurosci.
7
,
267
(
2013
).
38.
D. J.
Little
and
D. M.
Kane
, “
Variance of permutation entropy and the influence of ordinal pattern selection
,”
Phys. Rev. E
95
,
052126
(
2017
).
39.
A. A.
Rey
,
A.
Frery
,
J.
Gambini
, and
M. M.
Lucini
, “
The asymptotic distribution of the permutation entropy
,”
Chaos
33
,
113108
(
2023
).
40.
R.
Polikar
, “Ensemble learning,” in Ensemble Machine Learning: Methods and Applications (Springer, 2012), pp. 1–34.
41.
L.
Breiman
, “
Random forests
,”
Mach. Learn.
45
,
5
32
(
2001
).
42.
A.
Cutler
,
D. R.
Cutler
, and
J. R.
Stevens
, “Random forests,” in Ensemble Machine Learning: Methods and Applications (Springer, 2012), pp. 157–175.
43.
R.-C.
Chen
,
C.
Dewi
,
S.-W.
Huang
, and
R. E.
Caraka
, “
Selecting critical features for data classification based on machine learning methods
,”
J. Big Data
7
,
52
(
2020
).
44.
M.
Ojala
and
G. C.
Garriga
, “
Permutation tests for studying classifier performance
,”
J. Mach. Learn. Res.
11
,
1833
1863
(
2010
).
45.
T.
Schürmann
, “
Bias analysis in entropy estimation
,”
J. Phys. A: Math. Gen.
37
,
L295
(
2004
).
Published open access through an agreement with Universitat Politècnica de Catalunya