The apparent dichotomy between information-processing and dynamical approaches to complexity science forces researchers to choose between two diverging sets of tools and explanations, creating conflict and often hindering scientific progress. Nonetheless, given the shared theoretical goals between both approaches, it is reasonable to conjecture the existence of underlying common signatures that capture interesting behavior in both dynamical and information-processing systems. Here, we argue that a pragmatic use of integrated information theory (IIT), originally conceived in theoretical neuroscience, can provide a potential unifying framework to study complexity in general multivariate systems. By leveraging metrics put forward by the integrated information decomposition framework, our results reveal that integrated information can effectively capture surprisingly heterogeneous signatures of complexity—including metastability and criticality in networks of coupled oscillators as well as distributed computation and emergent stable particles in cellular automata—without relying on idiosyncratic, *ad hoc* criteria. These results show how an agnostic use of IIT can provide important steps toward bridging the gap between informational and dynamical approaches to complex systems.

Originally conceived within theoretical neuroscience, integrated information theory (IIT) has been rarely used in other fields—such as complex systems or non-linear dynamics—despite the great value it has to offer. In this article, we inspect the basics of IIT, dissociating it from its contentious claims about the nature of consciousness. Relieved of this philosophical burden, IIT presents itself as an appealing formal framework to study complexity in biological or artificial systems, applicable in a wide range of domains. To illustrate this, we present an exploration of integrated information in complex systems and relate it to other notions of complexity commonly used in systems such as coupled oscillators and cellular automata. Through these applications, we advocate for IIT as a valuable framework capable of revealing common threads between diverging branches of complexity science.

## I. INTRODUCTION

Most theories about complexity are rooted in either information theory or dynamical systems perspectives—two disciplines with very different aims and toolkits. The former, built after the work of Turing and Shannon, focuses mainly on discrete systems and considers complexity in terms of information processing, universal computation, distributed computation, and coherent emergent structures.^{1} The latter, following the tradition started by Poincaré and Birkhoff, focuses on continuous systems and studies their behavior using attractors, phase transitions, chaos, and metastability.^{2}

This methodological divide has contraposed how various communities of researchers think about complex systems, even to the extent of triggering some longstanding disagreements. This is particularly evident in the field of cognitive neuroscience, where proponents of computational approaches claim that the brain works similarly to a Turing machine,^{3–6} while opponents believe that cognitive processes are essentially continuous and rate-dependent.^{7–9} A related debate has taken place in the artificial intelligence community between symbolic and connectionist paradigms for the design of intelligent systems.^{10} Modern stances on these problems have promoted hybrid approaches,^{11} bypassing ontological arguments toward epistemological perspectives where both information and dynamics represent equally valid methods for enquiry.^{12}

Interestingly, bridging the gap between the information-processing and dynamical systems literature has proven scientifically fruitful. Examples of this are Wolfram’s categorization of cellular automata in terms of their attractors, which provided insights into the possible types of distributed computation enabled by these systems according to dynamical properties of their trajectories,^{13} and Langton’s intuition that computation takes place in a phase transition “at the edge of chaos.”^{14} This rich point of contact, in turn, suggests that what informational and dynamical approaches deem as interesting might have a common source, beyond the apparent dissimilarities introduced by heterogeneous tools and disciplinary boundaries.

In this article, we propose *Integrated Information Theory* (IIT),^{15–17} together with its recent extension *Integrated Information Decomposition* ($\Phi $ID),^{18} as a conceptual framework that can help bridge the gap between information-processing and dynamical systems approaches. At its inception, IIT was a theoretical effort that attempts to explain the origins and fundamental nature of consciousness.^{19,20} The boldness of IIT’s claims has not gone unnoticed, and they have caused a heated debate in the neuroscience community.^{21–23} Unfortunately, its audacious claims about consciousness have kept many scientists away from IIT, thereby preventing some of its valuable theoretical insights from reaching other areas of knowledge.

We advocate for the adoption of a *pragmatic* IIT, which can be used to analyze and understand complex systems without the philosophical burden of its origins in consciousness science. Consequently, the goal of this paper is to dissociate IIT’s claims as a theory of consciousness from its formal contributions and put the latter to use in the context of complexity science. For this purpose, we demonstrate that integrated information—as calculated in $\Phi $ID—peaks sharply for oscillatory dynamical systems that exhibit criticality and metastability and also characterizes cellular automata that display distributed computation via persistent emergent structures. These findings illustrate the remarkable flexibility of integrated information measures in discovering features of interest in a wide range of scenarios, without relying on domain-specific considerations. Overall, this work reveals how a grounded, demystified interpretation of IIT can allow us to identify features that are transversal across complex information-processing and dynamical systems.

The rest of this article is structured as follows. Section II presents the core formal ideas of IIT and $\Phi $ID in a simple manner and puts forward a sound and demystified way of interpreting its key quantities. Our main results are presented in Secs. III and IV: the former shows that integrated information can capture metastability and criticality in dynamical systems, and the latter that integrated information is a distinctive feature of distributed computation. Finally, Sec. V discusses the significance of these results and summarizes our main conclusions.

## II. A PRAGMATIST’S IIT

IIT constitutes one of the first attempts to formalize what makes a system “more than the sum of its parts,” building on intuitive notions of synergy and emergence that have been at the core of complexity science since its origins.^{24,25} IIT proposes *integrated information* for that role, informally defining it as information that is contained in the interactions between the parts of a system and not within the parts themselves. The core element of IIT is *Φ*, a scalar measure that accounts for the amount of integrated information present in a given system.

While faithful to its original aims, throughout its life, IIT has undergone multiple revisions. Of all of them, we will focus on the theory as introduced by Balduzzi and Tononi in 2008.^{16} While more recent accounts of the theory exist,^{17} these place a much stronger emphasis on its goals as a theory of consciousness, at the expense of a departure from standard information-theoretic practice and more convoluted algorithms—which have hindered its reach and made the theory applicable only in small discrete systems.

### A. The maths behind Φ

This section provides a succinct description of the mathematical formulas behind IIT 2.0,^{16} following Barrett and Seth’s^{26} concept of *empirical* integrated information. The overall analysis procedure is represented schematically in Fig. 1.

The building block of integrated information is a measure of *effective information* (i.e., excess of predictive information) typically denoted as $\phi $.^{16} Effective information quantifies how much better a system $X$ is at predicting its own future after a time $\tau $ when it is considered a whole compared to when it is considered the sum of two subsystems $M1$ and $M2$ [so that $X=(M1,M2)$]. In other words, $\phi $ evaluates how much predictive information is generated by the system over and above the predictive information generated by the two subsystems alone. For a given bipartition $B={M1,M2}$, the effective information of the system $X$ beyond $B$ is calculated as

where $I$ is Shannon’s mutual information. We refer to $\tau $ as the *integration timescale*.^{27}

The core idea behind the computation of $\Phi $ is to (i) exhaustively search all possible partitions of the system, (ii) calculate $\phi $ for each of them, and (iii) select the partition with lowest $\phi $ (under some considerations, see below), termed the minimum information bipartition (MIB). Then, the integrated information of the system is defined as the effective information beyond its MIB. Concretely, the integrated information $\Phi $ associated with the system $X$ over the integration scale $\tau $ is given by

where $K$ is a normalization factor introduced to avoid biasing $\Phi $ to excessively unbalanced bipartitions. Defined this way, $\Phi $ can be understood as the minimum information loss incurred by considering the whole system as two separate subsystems.

An important drawback of *Φ* is that it can take negative values, which hinders its interpretation as a measure of system-wide integration.^{28} Recently, Mediano *et al.*^{18} showed that *Φ* quantifies not only information shared across and transferred between the parts, but also includes a negative component measuring *redundancy*—i.e., when the parts contain *the same* predictive information. Therefore, *Φ* measures a balance between information transfer and redundancy such that $\Phi <0$ when the system is redundancy-dominated.

A principled way to address this limitation is to refine $\Phi $ by disentangling the different information phenomena that drive it. This can be achieved via the $\Phi $ID framework,^{18} which provides a taxonomy of “modes” of information dynamics in multivariate dynamical systems. Using $\Phi $ID, one can define a revised version of $\phi $, denoted as $\phi R$,^{18} which removes the negative redundancy component in $\Phi $ by simply adding it back in. Work is ongoing on understanding different possible redundancy functions. Here, we compute $\phi R$ via the minimum mutual information (MMI)^{29} redundancy function, which yields

Using $\phi R$, we can define Φ^{R} analogously through Eq. (2).

### B. Interpretation of $\Phi $

Conceptually, there are several ways to interpret *Φ*, which highlight different aspects of the systems under study. One interpretation—particularly relevant for complexity science –is based on the theory of *information dynamics*, which decomposes information processing in complex systems in terms of storage, transfer, and modification.^{31–34} From this perspective and based on earlier results,^{18} Φ^{R} can be seen as capturing a combination of information modification across multiple parts of the system, information transfer from one part to another, and storage in coherent structures that span across more than one system variable.

Alternatively, a more quantitative and mathematically rigorous way of interpreting Φ^{R} is in terms of *prediction bounds*. The conditional entropy (a complement of the mutual information^{35}) provides an upper bound on the optimal prediction performance^{36,37} such that a system with low conditional entropy can be predicted accurately. Therefore, mutual information acts as a bound too: the higher the mutual information between two variables, the better one can be predicted from the other. Thus, Φ^{R} measures to what extent the full state of the system enables better predictions than the states of the parts separately.

## III. INTEGRATED INFORMATION, METASTABILITY, AND PHASE TRANSITIONS

This section explores the usage of integrated information to study dynamical systems, exploring the relationship between *Φ*, metastability, and phase transitions. For this, we focus on systems of coupled oscillators, which are ubiquitous in both natural and engineered environments, making them of considerable scientific interest.^{2} Typical studies of oscillatory systems—such as the classic work of Kuramoto^{40}—examine the conditions under which the system stabilizes on states of either full synchronization or desynchronization, although these two extremes are by no means representative of all real-world synchronization phenomena. Many systems of interest, including the human brain, exhibit synchronous rhythmic activity on multiple spatial and temporal scales but never settle into a stable state, entering so-called chimera-like states^{41} of high partial synchronization only temporarily. A system of coupled oscillators that continually moves between highly heterogeneous states of synchronization is said to be *metastable*.^{42}

In 2010, Shanahan^{42} showed that a modular network of phase-lagged Kuramoto oscillators can exhibit metastable chimera states. Variants of this model have since been used to replicate the statistics of the brain under a variety of conditions, including wakeful rest^{43} and anesthesia.^{44} In the following, we study these metastable oscillatory systems through the lens of integrated information theory.

### A. Model and measures

We examine a community-structured network of coupled Kuramoto oscillators (shown in Fig. 2), building on the work of Shanahan.^{42} The network is composed of $N$ communities of $m$ oscillators each, with every oscillator being coupled to all other oscillators in its community and to each oscillator in the rest of the network with probability $q$. The state of the $i$th oscillator is determined by its phase $\theta i$, the evolution of which is governed by

where $\omega $ is the natural frequency of the oscillators, $\alpha $ is a global *phase lag*, $Kij$ are the connectivity coefficients, and $\kappa $ is a normalization constant. To reflect the community structure, the coupling between two oscillators $i,j$ is defined as

with $a>b$. The system is tuned by modifying the value of the phase lag, parameterized by $\beta =\pi /2\u2212\alpha $. We note that the system is fully deterministic; i.e., there is no noise injected in the dynamical equations.

To assess the dynamical properties of the oscillators, we consider their *instantaneous synchronization* $R$ and *metastability index* $\lambda $. The instantaneous synchronization at time $t$ of community $c\u2208{1,\u2026,N}$, comprising oscillators $Ic$, quantifies their dispersion in the $\theta $-space,

Building on this notion, the metastability of each community is defined as the variance of synchrony over time, and the overall metastability is its average across communities,

Communities that are either hypersynchronized or completely desynchronized are both characterized by small values of $\lambda c$, whereas only communities whose elements fluctuate in and out of synchrony have a high $\lambda c$. Put simply, a system of oscillators exhibits metastability to the extent that its elements fluctuate between states of high and low synchrony. In addition to metastability, we also consider the *global synchrony* of a network defined as the spatiotemporal average of the instantaneous synchrony,

For tractability, we calculate $\Phi R$ with respect to the *coalition configuration* of the system, defined for each community $c$ and time $t$ as

where $\gamma $ is the coalition threshold. This representation provides $N$ interdependent binary time series $Xt:=(Xt1,\u2026,XtN)$, which indicates the set of communities that are more internally synchronized. The Shannon entropy of $Xt$ is referred to as the *coalition entropy*$Hc$ and quantifies the diversity of synchronization patterns across communities.

### B. Results

We simulated a network composed of $N=8$ communities of $m=32$ oscillators each. The probability of connections across communities was set to $q=1/8$, with connection strengths of $a=0.6$ within communities and $b=0.4$ across. The natural frequency used was $\omega =1$ and the normalization constant $\kappa =64$. We ran 1500 simulations with values of $\beta $ distributed uniformly at random in the range $[0,2\pi )$ using a 4th-order Runge–Kutta algorithm, using a step size of 0.05 for numerical integration. Each simulation was run for $5\xd7106$ time steps, discarding the first $104$ to avoid transient effects and applying a thinning factor of 5. For the results presented here, we used $\gamma =0.8$, and we confirmed that results were qualitatively stable for a wide range of threshold values. All information-theoretic measures are reported in bits.

#### 1. Metastability and $\Phi R$ at the phase transition

We first study the system from a purely dynamical perspective, and, replicating previous findings,^{42} we find two well differentiated dynamical regimes: one of hypersynchronization and one of complete desynchronization, with strong metastability appearing in the narrow transition bands between them (Fig. 3). Interestingly, it is in this transition region where the oscillators operate in a critical regime poised between order and disorder and where complex phenomena appear. As the system moves from desynchronization to full synchronization, there is a sharp increase in metastability, followed by a smoother decrease as the system becomes hypersynchronized.

Importantly, $\Phi R$ was found to exhibit a similar behavior to $\lambda $: it is zero for desynchronized systems, peaks in the transition region, and shrinks again in the fully ordered regimes (Fig. 4). This shows that networks in metastable regimes are the only ones that exhibit integrated information. When comparing $\Phi R$ with the coalition entropy, results show both peaks at the same point, although the peak in $\Phi R$ is much narrower than the peaks in $\lambda $ and $Hc$. Hence, while some values of $\beta $ do give rise to non-trivial dynamics, it is only at the center of the critical region that these dynamics give rise to integrated information.

These results imply that $\Phi R$ is sensitive to more subtle dynamic patterns than the other measures considered and is in that sense more discriminating. In effect, a certain degree of internal variability is necessary to establish integrated information, but not all configurations with high internal variability lead to a high $\Phi R$. Also, $\Phi R$ accounts for spatial *and* temporal patterns in a way that the other metrics do not.^{45}

#### 2. Integrated information at multiple timescales

As a further analysis, we can investigate the behavior of $\Phi R$ at multiple timescales by varying the integration timescale parameter $\tau $ (see Fig. 2 for a visual guide of different $\tau $ values). Figure 5 shows $\Phi R$ for several values of $\tau $ and compares it with standard time-delayed mutual information (TDMI) $I(Xt\u2212\tau ;Xt)$. Note that this analysis cannot be carried out with $Hc$ or other measures of complexity that are not sensitive to temporal order—i.e., that are functions of $p(Xt)$ and not $p(Xt|Xt\u2212\tau )$.

Results show that $\Phi R$ and TDMI exhibit opposite trends with respect to changes in $\tau $: TDMI decreases with higher $\tau $, while $\Phi R$ increases. At short timescales, the system is highly predictable—thus the high TDMI—but this short-term evolution does not imply much system-wide interaction—thus the low $\Phi R$. Together, the high TDMI and low $\Phi R$ suggest that at short timescales, the system is redundancy-dominated: the system contains information about its future, but this information can be obtained from the parts separately. Conversely, for prediction at longer timescales, TDMI decreases but Φ^{R} increases, indicating that while the system is overall less predictable, this prediction is enabled by the information contained in the interaction between the parts.

#### 3. Robustness of $\Phi R$ against measurement noise

Finally, we study the impact of measurement noise on $\Phi R$, wherein the system runs unchanged, but our recording of it is imperfect. For this, we run the (deterministic) simulation as before and generate the sequence of coalition configurations and then emulate the effect of uncorrelated measurement noise by flipping each bit in the time series with probability $p$, yielding a corrupted time series $X^t$. Finally, $\Phi R$ is recalculated on the corrupted time series (Fig. 6). To quantify the impact of noise, we studied the ratio between the corrupted and the original time series,

To avoid instabilities as $\Phi R[X;\tau ]\u22480$, we calculate $\eta $ only in the region within 0.05 rad of the center of the peak shown in Fig. 4, where $\Phi R[X;\tau ]$ is large. The inset of Fig. 6 shows the mean and standard deviation of $\eta $ at different noise levels.

Results show that $\Phi R$ decays exponentially with $p$ (Fig. 6, upper panel), reflecting a gradual loss of the precise spatiotemporal patterns that are characteristic of the system. In particular, $\Phi R$ was found to be highly sensitive to noise and to undergo a rapid decline, as a measurement noise of 5% can wipe out 70% of the observed integrated information of the system. While the distortion has a stronger effect on time series with greater $\Phi R$, it preserves the dominant peak for all values of $p$.

Overall, in this section, we have shown that a network of Kuramoto oscillators presents a sharp, clear peak of integrated information around its phase transition that coincides with a strong increase in metastability. Furthermore, we have found that Φ^{R} is informative, as it can reveal information about timescales of interaction between system components. Finally, Φ^{R} was also found to be sensitive, as it vanishes quickly if the specific spatiotemporal patterns of the system under study are disrupted. This, in turn, suggests that it is highly unlikely to observe significant values of Φ^{R} due to artifacts induced by (uncorrelated observational) noise.

## IV. INTEGRATED INFORMATION AND DISTRIBUTED COMPUTATION

In Sec. III, we related integrated information to dynamical complexity by linking Φ^{R} with criticality and metastability in coupled oscillators. We now move on to cellular automata (CA), a well-known class of systems widely used in the study of distributed computation.^{31,46} Our aim here is to relate IIT to distributed computation in two ways: at a global scale, Φ^{R} is higher for complex, class IV^{13} automata and at a local scale Φ^{R} is higher for emergent coherent structures, such as blinkers, gliders, and collisions.

A CA is a multi-agent system in which every agent has a finite set of possible states and evolves in discrete time steps following a set of simple rules based on its own and other agents’ states. CA have been often used in studies of self-organization,^{1,47} and some of them are capable of universal computation.^{13} In a CA, agents (or *cells*) are arranged in a one-dimensional cyclic array (or *tape*). The state of each cell at a given time step has a finite number of possible states, which is determined via a boolean function (or *rule*), which uses as arguments the state of itself and its immediate neighbors at the previous time step. The same boolean function dictates the evolution of all agents in the system, inducing a spatial translational symmetry. Each CA, irrespective of its number of agents, is usually denoted by its rule.^{1}

For all the results presented below, we follow the simulation parameters used by Lizier in his study of local information dynamics in CA:^{31} we initialize a tape of length $104$ with i.i.d. random variables, discard the first 100 steps of simulation, and run 600 more steps that are used to estimate the probability distributions used in all information-theoretic measures.

### A. Integrated information and complexity classes

Our first analysis focuses on elementary cellular automata (ECA), a specific subclass of CA. In ECA, each cell has two possible states (usually denoted as white or black). ECA are traditionally denoted by their rule number, between 0 and 255, and grouped in four complexity classes:^{13} Class I rules have attractors consisting of single absorbing states; Class II rules evolve toward periodic orbits with relatively short cycle lengths; and Class III and IV rules have attractors with length of the order of the size of their phase space, with the latter being characterized by the presence of highly structured patterns and persistent structures.

As a first experiment, we calculate the average integrated information of each ECA, separating each automaton by complexity class (Fig. 7). For this, we followed the classifications defined in Wolfram’s original article^{13} as well as other clear-cut rules and excluded border cases, which did not neatly fit into a single category.

Results show that Φ^{R} correlates strongly with complexity as discussed by Wolfram: automata of higher classes have consistently higher Φ^{R} than automata of lower classes, and the difference between classes I, II and III, IV is stark.

It is worth noting the small difference between classes III and IV. This is likely related to the blurriness of the line separating both classes—visually, it is hard to judge whether structures are “coherent enough” to support distributed computation, and formally, the problem of determining whether a particular rule belongs to class III or IV is considered undecidable.^{48,49} Based on this, we may tentatively suggest that the capacity to integrate information is a necessary, but not sufficient, condition for universal computation.

### B. Integrated information at the edge of chaos

In his seminal 1990 article, Langton^{14} took a step beyond Wolfram’s classification and argued that the complexity and universality observed in ECA may reflect a broader phenomenon called *computation at the edge of chaos*. In this view, computation is made possible by indefinitely long transient states, a manifestation of *critical slowing-down*,^{50} that form the particle-like structures seen in class IV rules.

Langton’s argument starts by defining a parameter $\lambda $, which represents the fraction of neighborhoods in a CA’s rule table that map to a non-quiescent state (i.e., a non-white color). Then, by initializing one automaton with an empty rule table and progressively filling it with non-quiescent states, one can observe a transition point with exponentially long, particle-like transients [Fig. 8(a)]. Here, we repeat Langton’s experiments using a 6-color, range-2 CA, and compute its average *Φ* as its rule table gets populated and $\lambda $ increases.

In agreement with Langton’s argument, we found that integrated information has largest values for intermediate values of $\lambda $, coinciding with the automata’s transition to a chaotic regime [Fig. 8(b)]. Interestingly, this shows that rules with high Φ^{R} are the ones at the critical region—where computation is possible.

Another unusual feature of Fig. 8(b) is that there is a region where complex, high-Φ^{R} automata coexist with simpler ones. This phenomenon was reported by Langton^{14} already: different automata experience a “transition to chaos” at different values of $\lambda $. This motivated a further analysis of measures of complexity as a function of $\Delta \lambda $, the distance from the transition event for that particular automaton. As expected, when aligned by their critical $\lambda $ value and plotted against $\Delta \lambda $ [Fig. 8(c)], all curves align onto a consistent, seemingly universal, picture of integrated information across the $\lambda $ range.

For completeness, it is worth mentioning why at the right side of Figs. 8(b) and 8(c), Φ^{R} does not vanish for high $\lambda $ (as one could expect, given that the single-cell autocorrelation does^{14}). This is essentially due to the determinism and locality of the automaton’s rule: given a spatially extended set of cells, it is always possible to predict the middle ones with perfect certainty. At the same time, cutting the system with a bipartition will reduce the spatial extent of this predictable region so that the predictability of the whole is greater than the predictability of the parts, and thus, $\Phi >0$.

### C. Information is integrated by coherent structures

In the experiments above, we have shown that more complex automata integrate more information. However, this is not enough to make a case for Φ^{R} as a marker of distributed computation—it may just be the case that medium-$\lambda $ CA have higher Φ^{R} due to general properties of their rule tables or for some other reasons. In this section, we address this possible counter-argument by showing that the increase in Φ^{R} is due to the emerging particles and, therefore, can be directly associated with distributed computation.

To show this, we run large simulations of ECA rules 54 and 110 and evaluate several local information measures in small fragments of both (Fig. 9). Specifically, we compute Lizier’s measures of storage (excess entropy, $ek$) and transfer (transfer entropy, $tk$), as well as local integrated information, using the binary time series of local neighborhoods of ECA cells as input (details in the Appendix). Note that this pointwise *Φ* has not been $\Phi $ID-revised, as the development of pointwise $\Phi $ID metrics is still an open problem.^{51} A dedicated treatment of pointwise $\Phi $ID is part of our ongoing work and will be presented in a separate publication.

As expected, TE is high in gliders (moving particles), while excess entropy is high in blinkers (static particles), confirming Lizier’s results that these structures perform information transfer and storage in CA.^{31} More interestingly for our purposes is that *Φ* is high in *all of them*—gliders, blinkers, and the collisions between them.

When studied at a local scale in space and time, we see that information integration encompasses the three categories—storage, transfer, and modification—and that *Φ* can detect all of them *without having been explicitly designed to do so*. This reinforces our claim that *Φ* (and, in turn, its revised version Φ^{R}) is a generic marker of emergent dynamics and is connected with known measures of information processing. This relationship can be understood in a more mathematically rigorous manner via $\Phi ID$,^{18} which shows that *Φ* is decomposable in terms of information transfer and synergy.

## V. DISCUSSION

Many theoretical efforts to understand complexity have their roots in information-theoretic or dynamical systems perspectives. On the one hand, information-theoretic approaches focus on discrete systems displaying coordinated activity capable of universal computation, e.g., via particles in Wolfram’s class IV automata.^{13} On the other hand, dynamical approaches focus on continuous coupled dynamical systems near a phase transition, displaying a stream of transient states of partial coherence that balance robustness and adaptability.^{52,53} The results presented in this paper reveal how integrated information is effective at characterizing complexity in both information-processing and dynamical systems and hence can be regarded as a common signature of complex behavior across a wide range of systems. Overall, this paper shows how a grounded, demystified understanding of IIT combined with the improved metrics provided by $\Phi $ID can be used as a first step toward a much needed point of contact between these diverging branches of complexity science, helping to bridge the gap between information-processing and dynamical approaches.

### A. On the relationship between integrated information and metastability, criticality, and distributed computation

It is important to remark that the relationship between integrated information, metastability, criticality, and distributed computation is not an identity and that their agreement is an important finding conveyed by $\Phi $.

Metastability, in the case of oscillator networks, is a community-local quantity—that is, it corresponds to an average over quantities ($\lambda c$) that depend on the temporal diversity seen within each community, independently of the rest. In stark contrast, $\Phi $ relies on the irreducible interactions between communities. Interestingly, our finding reveals a close relationship between the two, insofar as internal variability enables the system to visit a larger repertoire of states in which system-wide interaction can take place.

Criticality is, in general, enabled by a precise balance of two opposite forces (typically characterized as order and chaos in physics) that enables peculiar and fascinating phenomena, such as scale-freeness, extreme sensitivity to perturbations, and universality classes.^{54,55} In contrast, one of the core ideas in IIT is that integration and differentiation are not opposite forces but can actually coexist together.^{56,57} Therefore, integrated information is not maximized by optimal trade-offs, but by mechanisms that incentivize direct and synergistic information transfer.^{18}

Finally, distributed computation is mainly based on intuitive but ultimately informal notions developed after Wolfram’s and Langton’s work. Our results establish a strong relationship between the capability of information-processing complex systems to integrate information and their ability to perform highly non-trivial information processing via emergent coherent structures, such as blinkers, gliders, and collisions. The fact that higher integrated information was carried by coherent emerging structures is consistent with recent accounts of *causally emergent dynamics*,^{58} which further supports the case of *Φ* being an effective quantitative indicator of distributed emergent computation with numerous potential future applications.

### B. Related work

The conceptual link between complexity and integrated information is by no means a novel idea: in fact, in the early days of IIT, integrated information and complexity were closely intertwined^{59}—as was its close relation with the theory of complex networks.^{60} Unfortunately, as the theory turned more convoluted and less applicable, this link lost relevance and IIT drifted away from standard information-theoretic methods.

Nonetheless, recent research has again brought complexity science to the fore within the IIT community. In some cases, information-theoretic measures originally conceived as measures of complexity have been re-purposed within IIT.^{61} In others, new measures of complexity are inspired by previous work on IIT.^{39} In contrast with its consciousness-focused sibling, advancements in this more pragmatic IIT (cf. Sec II) have been enabled by a simplification and unification of the underlying principles of the theory—for example, in terms of information geometry^{62} or information decomposition.^{18}

Given its origins in theoretical neuroscience, it is no surprise that most of IIT’s relevance to complex systems has been mostly on models of neural dynamics. In this context, the combination of integration and segregation captured by *Φ* has been linked to other dynamical features of interest (such as metastability and neural avalanches) in a variety of settings, including, e.g., models of whole-brain activity^{63} and spiking neural networks.^{64}

Relatedly, there has been recent work linking *Φ* with phase transitions in thermodynamic systems.^{65} Together with recent results linking information processing and stochastic thermodynamics,^{66} this exciting avenue of research opens the door for a purely physical interpretation of *Φ* in general physical systems.

Finally, note that for our main analyses, we estimate Φ^{R} using the simple yet effective MMI redundancy function.^{29} However, there are multiple possible redundancy functions,^{67,68,74} and while they tend to yield qualitatively similar results in practice,^{30,69} they have been shown to differ in some specific scenarios,^{68} and therefore, it is important to elucidate their similarities and differences in practical analyses. There is ongoing investigation into the different properties of each redundancy function, and future work will explore the specific advantages of diverse redundancy functions when used to calculate Φ^{R}.

### C. Concluding remarks

This paper puts forward a pragmatic argument that metrics of integrated information—such as the ones provided by IIT and $\Phi $ID—can allow us to investigate unexplored commonalities between informational and dynamical complexity, pointing out a promising avenue to reconcile their divide and benefit subsidiary disciplines, such as cognitive and computational neuroscience. Note that *Φ* is by no means the only quantity that peaks with the system’s complexity—in cellular automata, one could use the autocorrelation of a single cell and in coupled oscillators the variance of Kuramoto’s order parameter. However, the feature that makes *Φ* unique is that *it is applicable across the board* and yields the desired results in different kinds of systems without requiring idiosyncratic, *ad hoc* measures. This unification, analogous to the transversal role that Fisher information plays in phase transitions over arbitrary order parameters,^{70} posits integrated information as a key quantity for the study of complex dynamics, whose relevance has just started to be uncovered.

## ACKNOWLEDGMENTS

P.A.M.M. and D.B. are funded by the Wellcome Trust (Grant No. 210920/Z/18/Z). F.E.R. is supported by the Ad Astra Chandaria Foundation. A.B.B. is grateful to the Dr. Mortimer and Theresa Sackler Foundation, which supports the Sackler Centre for Consciousness Science.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors declare no conflict of interest.

### Author Contributions

P.A.M.M. and F.E.R. contributed equally to this work.

## DATA AVAILABILITY

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

### APPENDIX: LOCAL INTEGRATED INFORMATION

A natural way to extend IIT 2.0 for complex systems analysis is to consider local versions of $\Phi $, which can be built via the framework introduced by Lizier.^{32} Local (or pointwise) information measures are able to identify coherent, emergent structures known as *particles*, which have been shown to be the basis of the distributed information processing that takes place in systems such as cellular automata.^{31,33,34}

One of the most basic pointwise information metrics is the local mutual information, which is defined as

so that $E[i(X;Y)]=I(X;Y)$ is the usual mutual information. By evaluating $i$ on every $x,y$ pair, one can determine which particular combinations of symbols play a predominant role for the observed interdependency between $X$ and $Y$. (More specifically, the local mutual information captures specific deviations between the joint distribution and the product of the marginals.) Building on these ideas, Lizier proposed a taxonomy of distributed information processing as composed of *storage*, *transfer*, and *modification*.^{31} For this, consider a bivariate stochastic process $(Xt,Yt)$ with $t\u2208Z$ and introduce the shorthand notation $Xt(k)=(Xt\u2212k,\u2026,Xt)$ and $Xt(k+)=(Xt,\u2026,Xt+k\u22121)$ for the corresponding past and future embedding vectors of length $k$. In this context, storage within the subprocess $Xt$ is identified with its *excess entropy*$Ek=I(Xt(k);Xt+1(k+))$^{71} and transfer from $Yt$ to $Xt+1$ with the *transfer entropy*$TEk=I(Xt+1;Xt(k)|Yt(k))$.^{72} Interestingly, both quantities have corresponding local versions,

such that, as expected, $E[ek]=Ek$ and $E[tk]=TEk$. Note that to measure transfer in either direction for the results in Fig. 9, we compute the local TE from a cell to its left and right neighbors and take the maximum of the two.

These ideas can be used to extend the standard formulation of integrated information measures in two ways. First, by using embedding vectors, the IIT metrics are applicable to non-Markovian systems.^{73} Second, by formulating pointwise measures, one can capture spatiotemporal variations in *Φ*. Mathematically, we reformulate Eq. (2) introducing these modifications as

and apply the same partition scheme described in Sec. II A to obtain an “embedded” integrated information, $\Phi k$. Then, the equation above can be readily made into a local measure by replacing mutual information with its local counterpart,

such that, as expected, $\phi k[X;\tau ,B]=E[\varphi k[xt;\tau ,B]]$.

## REFERENCES

*Philosophical Explorations of the Legacy of Alan Turing*(Springer, 2017), pp. 279–304.

^{73}A possible measure of integrated information for non-Markovian systems is described in the Appendix and applied in Sec. IV C.

*et al.*, “A synergistic workspace for human consciousness revealed by integrated information decomposition,” bioRxiv (2020).

*Australian Conference on Artificial Life*(Springer, 2007), pp. 49–60.

*Non-Standard Computation*, edited by T. Gramß, S. Bornholdt, M. Groß, M. Mitchell, and T. Pellizzari (Wiley-VCH, 1996), pp. 95–140.

^{74}the relatively much simpler redundancy in Eq. (3) is not naturally translated into a pointwise setting, and thus, we restrict the pointwise analyses to standard integrated information only.