During the last few years, statistical physics has received increasing attention as a framework for the analysis of real complex systems; yet, this is less clear in the case of international political events, partly due to the complexity in securing relevant quantitative data on them. Here, we analyze a detailed dataset of violent events that took place in Ukraine since January 2021 and analyze their temporal and spatial correlations through entropy and complexity metrics and functional networks. Results depict a complex scenario with events appearing in a non-random fashion but with eastern-most regions functionally disconnected from the remainder of the country—something opposing the widespread “two Ukraines” view. We further draw some lessons and venues for future analyses.
Statistical physics is becoming a reference framework for studying real-world complex systems, thanks to the possibilities it offers to connect micro-scale dynamical properties with macro-scale observations. In spite of this, political events have mostly been neglected, partly due to the complexity in securing relevant quantitative data on them—the solution usually being relying on indirect data, as, e.g., Twitter activity. We here leverage on a detailed data set of violent events that preceded the current military crisis in Ukraine and use two concepts (entropy and complexity and functional networks) to unveil the underlying relationship structure.
I. INTRODUCTION
During the last few decades, statistical physics concepts and tools have ceased being exclusive of this scientific field for becoming standard approaches used in the analysis of numerous and heterogeneous real-world problems. To illustrate but a few examples, complex networks have become an essential asset in epidemics spreading models,1 neuroscience,2 or climate;3,4 entropy and irreversibility have been used to characterize biomedical systems from brain5–7 to heart dynamics.8 The reason for such success is possibly rooted in statistical physics’ ability for decoupling the dynamical and observational scales, that is, a system may only be observable at the macro-scale but conclusions about the underlying micro-scale dynamics can still be drawn.
Among the real-world problems that have still received little to no benefit from statistical physics concepts, the analysis of international (violent) events stands out. This may be due to several reasons from the difficulty in securing quantitative data on those events (which usually are of indirect nature, e.g., Twitter messages9,10) to the natural barriers to the cross-dissemination between social and physical sciences. The objective of this contribution is to bridge this gap and, specifically, to showcase how statistical physics concepts (namely, entropy, complexity, and functional networks) can be used to improve our understanding of international events. We specifically focus on the Ukrainian crisis and its role in the ongoing conflict between this country and the Russian Federation.
Ukrainian crises can partly be seen as an old process, resulting from the juxtaposition of two national identities, i.e., the pro-European west part of the country and the pro-Russian east, which have failed to coexist peacefully.11–14 A turning point can be found in November 2013, when protests erupted against Ukrainian President Viktor Yanukovych’s decision to reject a stronger economic integration deal with the European Union. This resulted, on one hand, in Russian military troops taking control of the region of Crimea in March 2014, which was later annexed by the Russian Federation following a local referendum.15 This initial military conflict then escalated in a full war in February 24th, 2022, with the Russian Federation launching a full-scale military invasion into Ukraine—still developing at the time when this paper was being prepared. Beyond these large-scale military operations, many local violent events took place since 2014, especially during the last year. These involved Ukrainian security forces, pro-Russian anti-government separatist groups, and the general population and included from bombardment and shelling to unrests and protests.
In this contribution, we analyze a public dataset of violent events happened in Ukraine from January 2021 to January 2022 by applying two complementary approaches: entropy and complexity on one hand and functional networks on the other. These are respectively aimed at detecting temporal and spatial relationships in the appearance of those events. In other words, we try to answer the questions: Are events independent from each other, both in temporal and spatial dimensions? Or, on the other hand, do past events in one region affect the appearance of other events? Results indicate that the situation is more similar to the latter case, with events having both temporal and spatial structures not compatible with a random dynamics. Most interestingly, events in the regions of Donetsk and Luhansk, i.e., the two regions contested between Ukraine and the Russian Federation, are causally disconnected from the remainder of the country. The implications of these results are discussed, and we, finally, draw some conclusions and lessons learnt.
II. DATASET AND ANALYSIS TECHNIQUES
Data about unrest and other violent events in Ukraine have been obtained from the Armed Conflict Location & Event Data Project (ACLED) and are freely available at https://acleddata.com. They contain a list of all events by date, including the parties or groups involved in them, the type of the event, and a geolocalization. Regarding the latter, and in order to avoid a too granular division of data, only the first administrative division (regions, or oblasts) has here been considered. A total of events are reported from January 1st, 2021 to January 31st, 2022; the evolution of the number of events and their type is reported in Fig. 1.
Violent events in Ukraine. The figure depicts the temporal evolution of the number of events by day, according to their categorization into six groups, left panel (see the right panel for color code); and the total number of events by group, right panel.
Violent events in Ukraine. The figure depicts the temporal evolution of the number of events by day, according to their categorization into six groups, left panel (see the right panel for color code); and the total number of events by group, right panel.
This dataset has been analyzed through two complementary statistical physics techniques: permutation patterns and functional complex networks. For the sake of completeness, these are briefly described below.
A. Permutation entropy and complexity
Permutation entropy and complexity are named after the symbols, i.e., the so-called permutation patterns, over whose probability distribution function (PDF) the two metrics are computed. The method to retrieve those patterns translates the relative amplitudes of a time series in observational windows of a certain length into their symbolic representation,16 thus accounting for the temporal causality in the data. This allows us to assess dynamical properties like degree of randomness or periodicity, possible degrees of stochasticity, or level of complexity.17,18 This procedure is known to be simple, fully data-driven, computationally efficient, robust against noise, and useful for data with weak stationarity.19
Given a time series , we define a pattern length of and divide the time series into overlapping windows. The elements of each -window of are sorted in the increasing order to capture their indexes , such that . Hence, words are symbols representing all possible permutations of , this way encoding the relative amplitudes of each -segment. The associated PDF associates the occurrence frequency of permutation patterns in the data, satisfying .
We describe the temporal dynamics of violent events through both the their permutation entropy () and complexity (). The former is a normalized quantifier that captures correlation structures not potentially retrieved by a simple entropy analysis and is defined as
Here, measures the order level of the system and is given by the ratio of the entropy of to the maximum entropy of the system modeled by a uniform probability . Therefore, for a total ordered or uncorrelated random system, respectively.
In a complementary way, characterizes the system organization accounting for the interplay between its order and disorder and is independent of size effects since it does not increase when the system becomes larger. It is defined as the product of and a non-euclidean similarity between the observed and : , where is the Jensen–Shannon divergence. It assesses the emergence of correlation structures, with , in the case of order () or total randomness (), implying no structure in the time series.
Statistical significance of results is assessed through surrogate time series. Specifically, we compute (respectively, ), for then getting the () counterpart in a set of of randomly shuffled time series. We first analyze the global structure of the aggregated violent events during a year in the whole country (), for then focusing on the time evolution for windows of days. This window size has been chosen as a compromise between having too large, which would smooth out any fast dynamics, and too short time windows, which would hinder the statistical significance. For reference, results for and days are also reported. Note that the embedding dimension has here been set to while other options could be explored, this value has been chosen to be the largest fulfilling the condition , hence being not only large enough to capture complex relations in the data but also small enough to ensure statistical significance of results in the -days windows.20,21
B. Functional complex networks
We further analyze how events are connected in the spatial dimension by leveraging on the concept of functional networks. This approach entails reconstructing complex network representations22–25 of a system, based on detecting relationships between its constituting elements through the analysis of their temporal dynamics. It has received a special attention from the neuroscience community in which it has allowed us to unveil the patterns of interactions between brain regions in health and pathologies,2,26–28 but it has also been applied to, e.g., climate modeling,3,4 ecology,29 or air transport management.30 In the context of this work, nodes of the network represent Ukrainian regions, pairwise connected whenever a statistically significant relationship between the corresponding time series of events is detected.
Relationships are here detected through the celebrated Granger causality test,31 developed by the economy Nobel Prize laureate Clive Granger on top of the prediction theory of Norbert Wiener,32 and one of the most well-known statistical tests for evaluating the presence of predictive causality33 between pairs of time series. Note that, while the test name includes the word causality, it does not necessarily measure true causality;34 it instead quantifies the information transfer across multiple time scales. In spite of this, and for the sake of simplicity, the relationships detected by this test will here be called causal. It is additionally worth noting that other causality tests have been proposed in the literature,35–37 although their applicability of size-limited time series is not always straightforward.
A brief description of the test is included here. Let us consider two elements and , respectively, described by two time series and . Two autoregressive-moving-average (ARMA) models are fitted on the data, respectively, called the restricted and unrestricted regression models
Here, refers to the model order, the symbol denotes concatenation of column vectors, and contain the model coefficients, and and are the residuals of the models. A Granger causality is then detected if , i.e., if including past information of the driving time series helps predicting the future of the driven one. In order to assess the statistical significance and obtain a -value, an F-test is performed to check whether the coefficients associated to the time series are different from zero—i.e., whether is actually having an impact in the prediction.
As can be seen from Fig. 1, the time series here considered are not stationary, as a higher number of events are present around May and December 2021. In order to solve this, the Granger test has been applied to normalized time series, representing the fraction of events observed in one day in each Ukrainian region over the total number of events in the same day. The final results are networks composed of nodes, one for each region and are represented by adjacency matrices , of size , where the element has a value of to indicate that there is a directed edge from node to (i.e., the events in region “Granger-cause” events in region ) and otherwise.23,25 In order to avoid the increased probability of type I errors as a consequence of the multiple comparisons required by the reconstruction process, we applied the Bonferroni correction and rejected the null hypothesis of the test for an effective . Additionally, the significance of the degree of the most connected nodes is tested using networks created with randomly shuffled time series—i.e., time series in which the temporal structure is destroyed.
As a final note, and in a way similar to what described in Sec. II A, functional networks have also been reconstructed for rolling windows of days to explore the evolution of causal relationships.
III. RESULTS: TEMPORAL RELATIONSHIPS
We start by analyzing the temporal relationships in the data in order to understand if events have appeared randomly in the considered time window or if instead they present some kind of internal structure. To this end, the two metrics described in Sec. II A have been calculated over the time series representing the total number of events per day. The results, reported in the top row of Table I, indicate that this time series is highly irregular, but yet with some non-random structure (note the Z-Score close to ). The same table also reports the values for the regions of Crimea, Donetsk, and Luhansk, i.e., the three regions that have mostly been contested between Ukraine and the Russian Federation; these results will be used in Sec. IV.
Entropy H and complexity C (Z-score in parenthesis) for the time series of the number of total events in Ukraine and for the regions of Crimea, Donetsk, and Luhansk. The third and fifth columns also report the average and standard deviation of the two metrics, calculated on 103 randomly shuffled versions of the same time series.
Region . | H (Z-score) . | Rnd H . | C (Z-score) . | Rnd C . |
---|---|---|---|---|
Ukraine | 0.9839 (−1.94) | 0.9907 ± 0.0035 | 0.0209 (1.87) | 0.0122 ± 0.0046 |
Crimea | 0.3183 (−3.37) | 0.3348 ± 0.0049 | 0.2318 (−1.34) | 0.2384 ± 0.0049 |
Donetsk | 0.9741 (−3.21) | 0.9898 ± 0.0049 | 0.0326 (3.91) | 0.0134 ± 0.0049 |
Luhansk | 0.9740 (−1.39) | 0.9841 ± 0.0072 | 0.0326 (1.62) | 0.0208 ± 0.0072 |
Region . | H (Z-score) . | Rnd H . | C (Z-score) . | Rnd C . |
---|---|---|---|---|
Ukraine | 0.9839 (−1.94) | 0.9907 ± 0.0035 | 0.0209 (1.87) | 0.0122 ± 0.0046 |
Crimea | 0.3183 (−3.37) | 0.3348 ± 0.0049 | 0.2318 (−1.34) | 0.2384 ± 0.0049 |
Donetsk | 0.9741 (−3.21) | 0.9898 ± 0.0049 | 0.0326 (3.91) | 0.0134 ± 0.0049 |
Luhansk | 0.9740 (−1.39) | 0.9841 ± 0.0072 | 0.0326 (1.62) | 0.0208 ± 0.0072 |
Figure 2 reports the evolution of (middle panel) and (bottom panel) for rolling windows of days, along with the evolution of the corresponding number of events for reference (top panel). Additionally, the gray bands indicate the – percentiles of the same metrics calculated over randomly shuffled versions of the same time series; and the dotted gray lines the percentile. It can be observed that the time series are mostly compatible with a random noise, except for windows starting around February 2021. For reference, the same results are reported for rolling windows of (green lines) and (brown lines) days, showing in the former case a similar behavior.
Statistical properties of the time series of violent events. The three panels, from top to bottom, report for a -day rolling window: (i) the evolution of the total number of events in the dataset; (ii) the entropy ; and (iii) the statistical complexity . Gray bands in the central and bottom panels correspond to the – percentile of values obtained through a random shuffling of the data, and the dashed gray line corresponds the percentile. Additionally, the green and brown lines correspond to the result using rolling windows of, respectively, and days, aligned according to the central day in each window.
Statistical properties of the time series of violent events. The three panels, from top to bottom, report for a -day rolling window: (i) the evolution of the total number of events in the dataset; (ii) the entropy ; and (iii) the statistical complexity . Gray bands in the central and bottom panels correspond to the – percentile of values obtained through a random shuffling of the data, and the dashed gray line corresponds the percentile. Additionally, the green and brown lines correspond to the result using rolling windows of, respectively, and days, aligned according to the central day in each window.
IV. RESULTS: SPATIAL RELATIONSHIPS
Complementary to what presented in Sec. III, we here analyze the relationships between events happened in different regions of Ukraine using the functional network approach. Specifically, Fig. 3 depicts all links between pairs of Ukranian regions that are statistically significant, as yielded by applying the Granger Causality test over the time series of events in each Ukrainian region. Two main facts stand out. First, the network is relatively highly connected (link density of ), much more that what would be expected if events were random (). Similarly to what observed in Sec. III, events do not appear independently but are instead causally connected also in the spatial dimension, suggesting a kind of action–reaction mechanism. Second, links are spread across the whole country, but interestingly the two most eastern regions (Donetsk and Luhansk, i.e., the two regions contested between Ukraine and the Russian Federation) are not connected to the network. This is not due to a reduced number of violent events, respectively, of and . It is also not due to a lack of temporal structure in the events of those two regions, as illustrated by the results in Table I—note the large Z-Score, especially, for the region of Donetsk. It can, therefore, be concluded that violent acts in those two regions are not the cause, or the result, of events happening in the remainder of the country but that instead have an independent dynamics.
Graphical representation of the Ukrainian functional network—regions not connected to the network are not represented. Link colors, from green to red, represent the strength of the functional connection (inversely proportional to the corresponding -value); node colors, from light to dark blue, the degree of nodes. The bottom panel reports the out- and in-degrees of the most connected nodes. Light lines report the distribution of the degrees of the same node, obtained with random shuffling of the time series.
Graphical representation of the Ukrainian functional network—regions not connected to the network are not represented. Link colors, from green to red, represent the strength of the functional connection (inversely proportional to the corresponding -value); node colors, from light to dark blue, the degree of nodes. The bottom panel reports the out- and in-degrees of the most connected nodes. Light lines report the distribution of the degrees of the same node, obtained with random shuffling of the time series.
The bottom part of Fig. 3 reports the degree (both out-, blue lines, and in-, brown lines) of the three most connected nodes, and the corresponding distribution obtained with randomly shuffled time series. While degrees are generally small, also due to the reduced size of the network, they are statistically significant (-value always smaller than ). Most notably, the two most connected regions are Chernivtsi and Vinnytsia, located on the western part of the country—once again, not geographically connected with the contested regions of Donetsk and Luhansk.
We have finally analyzed how the connectivity has evolved through time. For that, once again a rolling window of days has been considered, and a functional network has been reconstructed for each time interval. The top panel of Fig. 4 reports the evolution of the number of links (blue line); for reference, the – percentiles obtained in networks of randomly shuffled data are also reported—see the gray band. Additionally, as in Fig. 2, the same results are reported for rolling windows of (green lines) and (brown lines) days. It can be appreciated that the number of links is seldom statistically significant, most probably due to the reduced time series length, which makes the estimation of the Granger causality unreliable. The middle and bottom panels of the same figure further report the evolution of the out- and in-degrees of the four regions that have reached the largest connectivity; once again, they seldom are statistically significant (see the gray dashed lines, representing the maximum degree obtained in networks of randomly shuffled data).
Evolution of the connectivity for networks reconstructed with a rolling window of days. The top panel reports the evolution of the number of links (blue line) and of the – percentile (gray band). The green and brown lines correspond to the result using rolling windows of, respectively, and days, aligned according to the central day in each window. The middle and bottom panels depict the evolution of the out- and in-degrees of the four regions that have reached the largest value; the gray dashed lines represent the maximum degree obtained in networks reconstructed by randomly shuffling the time series.
Evolution of the connectivity for networks reconstructed with a rolling window of days. The top panel reports the evolution of the number of links (blue line) and of the – percentile (gray band). The green and brown lines correspond to the result using rolling windows of, respectively, and days, aligned according to the central day in each window. The middle and bottom panels depict the evolution of the out- and in-degrees of the four regions that have reached the largest value; the gray dashed lines represent the maximum degree obtained in networks reconstructed by randomly shuffling the time series.
V. DISCUSSION AND CONCLUSIONS
In this contribution, we have showcased how two statistical physics concepts, namely, entropy and complex networks, can be used to extract information from violent international events, with a specific focus on the developing Ukrainian crisis. In spite of the conceptual simplicity of this analysis, and of some major limitations that are discussed below, some interesting conclusions can be drawn. Events are not independent, both in the temporal and spatial scales. This is not surprising, as some events are inherently answers to others, e.g., protests can be the response to other violent events; also, battles and explosions are usually part of larger-scale plans and are, therefore, coordinated.
It is, nevertheless, interesting to see some unexpected patterns. Specifically, the regions of Donetsk and Luhansk are not causally connected (in the Granger sense) to any other region, and Crimea only receives a rather weak link. This seems to indicate that violent events in the country are independent, and not a consequence, of what there happens, in spite being these the three regions most contested between Ukraine and the Russian Federation—and actually being the target of the ongoing Russian armed invasion. This contradicts the standard “two Ukraines” view of a country divided between pro-west and pro-Russian regions, a view that has already received some criticisms.11,38 This is supported by the fact that Granger-testing the time series representing the aggregated events in those two regions yield large -values, of (east to west) and (west to east)—see Fig. 5. On the contrary, these results suggest a more complex situation with tensions distributed throughout the whole country. This supports a view according to which the main divide in Ukraine is essentially a Ukrainian vs Ukrainian one, with no sharp and unambiguous division along ethnic, geographical, or even linguistic lines.39,40 We tested this by executing an optimization problem, aimed at detecting the two sets of regions minimizing the -value of the Granger causality.41 The result, represented in Fig. 5, suggests that violent events in five provinces (Volyn, Ternopil, Chernivtsi, Vinnytsia, and Kherson) are driven by events in the remainder of the country. Most importantly, this analysis is objective in nature, thus not relying on the interpretation of political events or of other qualitative types of data.
Testing the “two Ukraines” hypothesis. Light red and green provinces respectively correspond to the pro-Russian and pro-west regions, according to the results of the Ukrainian presidential elections of 2010. Provinces outlined in purple correspond to those minimizing the -value of the Granger causality test—values are reported in the bottom left part.
Testing the “two Ukraines” hypothesis. Light red and green provinces respectively correspond to the pro-Russian and pro-west regions, according to the results of the Ukrainian presidential elections of 2010. Provinces outlined in purple correspond to those minimizing the -value of the Granger causality test—values are reported in the bottom left part.
The analysis presented here also highlights some limitations, associated with the considered type of data. The relative low number of violent events observed in the country precludes the possibility of an analysis on smaller temporal and spatial scales, as this would otherwise require dealing with time series with many equal and/or zero values—a known limitation of the permutation entropy approach.42,43 This is apparent even in the analysis here presented, with results for a -days rolling window seldom being statistically significant. The magnitude of events, which may provide a more complete view of how the situation unfolded, has also been disregarded. Taking this aspect into consideration is, nevertheless, not trivial, as it would require, first, extraction quantitative information from the text describing each event, e.g., through natural language processing44 and, second, designing a way of comparing heterogeneous magnitudes, e.g., number of people in a protest vs number of deaths in a military attack.
ACKNOWLEDGMENTS
This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No 851255). J.H.M. acknowledges funding from the project PACSS No. RTI2018-093732-B-C22 of the MCIN/AEI/10.13039/501100011033/ and EU through FEDER funds (a way to make Europe). The authors acknowledge the Spanish State Research Agency through Grant No. MDM-2017-0711 funded by MCIN/AEI/10.13039/501100011033. The authors thank J. J. Ramasco for his assistance.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
DATA AVAILABILITY
The data that support the findings of this study are available from ACLED. Restrictions apply to the availability of these data, which were used under license for this study. Data are available from the authors upon reasonable request and with the permission of ACLED.