There has recently been an explosion of interest in how “higher-order” structures emerge in complex systems comprised of many interacting elements (often called “synergistic” information). This “emergent” organization has been found in a variety of natural and artificial systems, although at present, the field lacks a unified understanding of what the consequences of higher-order synergies and redundancies are for systems under study. Typical research treats the presence (or absence) of synergistic information as a dependent variable and report changes in the level of synergy in response to some change in the system. Here, we attempt to flip the script: rather than treating higher-order information as a dependent variable, we use evolutionary optimization to evolve boolean networks with significant higher-order redundancies, synergies, or statistical complexity. We then analyze these evolved populations of networks using established tools for characterizing discrete dynamics: the number of attractors, the average transient length, and the Derrida coefficient. We also assess the capacity of the systems to integrate information. We find that high-synergy systems are unstable and chaotic, but with a high capacity to integrate information. In contrast, evolved redundant systems are extremely stable, but have negligible capacity to integrate information. Finally, the complex systems that balance integration and segregation (known as Tononi–Sporns–Edelman complexity) show features of both chaosticity and stability, with a greater capacity to integrate information than the redundant systems while being more stable than the random and synergistic systems. We conclude that there may be a fundamental trade-off between the robustness of a system’s dynamics and its capacity to integrate information (which inherently requires flexibility and sensitivity) and that certain kinds of complexity naturally balance this trade-off.

A hallmark of complex systems is the emergence of a coherent “whole” that is “greater than the sum of its parts.” Scientists and mathematicians have used information theory to develop useful tools for recognizing and describing these structures, although it is often unclear how these “emergent” structures relate to other well-explored aspects of dynamics and behavior. In this paper, we use evolutionary computing to evolve small systems with fixed “kinds” of emergent structure (highly redundant, highly synergistic, and highly complex) to see how emergent information constrains dynamics. We find that there is a trade-off between how stable a system is and its capacity to integrate information, analogous to the trade-off between redundancy and just-in-time efficiency in supply-chain networks. We discuss the implications of this trade-off for evolved and artificial systems and how to find compromises that achieve the best of both worlds.

Over the past two decades, information theory has emerged as something of a lingua franca for the study of complex systems, as it provides a natural, mathematical framework for exploring the relationships between “parts” and “wholes” in multivariate systems.1 In particular, information theory can provide insights into “emergent” or “higher-order” interactions, where information is encoded in the interaction between large numbers of different elements2,3 and, crucially, not accessible from a reduced subset (challenging the historic focus on scientific reductionism). Newly developed formal tools, such as the partial information decomposition,4–6 the integrated information decomposition,7 and heuristics, such as the O-information,8 have empowered scientists to start looking for higher-order information structures in a variety of complex systems, often with great success. Higher-order synergies appear to be ubiquitous in natural and artificial systems, having been found in climate data,9 sociological data,10 artificial neural networks,11–13 cortical neuronal networks,14,15 and global brain dynamics.16–18 Furthermore, alterations in redundancy and synergy seem to reflect meaningful differences between systems. For example, synergy in on-going brain dynamics has been found to decrease when consciousness is lost19,20 and change with age.21,22 At the cellular level, the synergies instantiated by individual neurons changes depending on task performance15 or drug administration,23 and in artificial neural networks, the distribution of redundancies and synergies changes over the course of the learning process.11,12

Despite these results, it remains unclear exactly what the significance of these alterations implies in the general case. Mediano et al. recently proposed that synergistic information integration can be understood as a general measure of “computational complexity” in complex systems,7 although “complexity” is arguably as slippery a term as “synergy.”24 Broadly, the relationship between higher-order information and other well-understood features of dynamical systems is relatively under-explored.

Generally, when scientists study higher-order information in a system, the presence (or absence) of higher-order information is the dependent variable. How does synergy change when consciousness is lost?19 How does the synergy between social identities differ by demographic category?10 How do neural networks encode redundant and synergistic dependencies across training?12 In all of these studies, the change in redundancy or synergy is thought to be informative about some essential feature of the system’s emergent properties; however, synergy itself remains very abstract.

To try and resolve this ambiguity, we attempt to flip the script. Instead of treating higher-order information as the dependent variable that changes between conditions, here, we “force” particular kinds of higher-order information into simple boolean networks and then characterize how the presence of higher-order redundancies, synergies, and complexities alters their dynamics. We choose boolean networks for several reasons. The first is that, as naturally discrete models, they are very amenable to simple information-theoretic analysis (indeed, elementary cellular automata have been a common test-bed for exploring emergent information dynamics25–27). The second is that there is a long history of using boolean networks as toy models in complex systems. Seminal work by Kauffman showed how the structural properties of random boolean networks can inform on their dynamics,28 and we take direct inspiration from this work, although instead of altering structural properties of the network (degree, density, etc.), we instead alter the computational properties of the individual nodes to inject redundancy or synergy into the logic. Finally, boolean networks are a very popular model in systems biology, where they are frequently used to model genetic and metabolic regulatory networks.29,30

By exploring the link between the global information structure of a system and its dynamics, we hope to address two outstanding questions in the field of complex systems: the first is understanding what it means when we observe redundancy and synergy in natural and artificial systems (“what have we really learned upon discovering that a system is high in synergy”?), and the second to understand how systems that require particular attributes (canalization, computational capacity, etc.) might self-organize their internal dynamics in a way that supports those particular properties. Resolving both of these questions will help scientists in many fields both better understand the structure and function of complex systems, as well as potentially helping with the design of novel systems with desirable computational or dynamic properties.

A boolean network is a directed graph: G = { V , E } composed of vertices V and directed edges E that connect two nodes V i V j. Each node V i V can be in one of two states: { 0 , 1 } and comes equipped with a function that maps the states of all the “parent” nodes to an updated state of the target node: f : { 0 , 1 } k { 0 , 1 }, where k is the in-degree of the target node. Being naturally discrete models, boolean networks have some useful properties for the purposes of this analysis: for a network with N nodes, there will be only 2 N possible configurations the whole system can adopt (meaning a finite support set), and from any given initial condition, the system will always settle into an attractor after a finite period of time (meaning that the entire state-space can be brute-forced).

We used 12-node boolean networks arranged into a ring lattice topology. Each node received four directed inputs from its four nearest neighbors, as well a self-loop ensuring that a given node’s immediate past also factored into the computation of the next step (as is the case in the elementary cellular automata). A size of 12 was selected as it optimized the trade-off between the runtime required to compute the O-information (which grows with system size), and the richness of the state-space that could be plausibly explored. Unlike the standard elementary cellular automata (which have been frequently used in the past to explore higher-order statistics in discrete systems27,31), in our systems, each node is allowed to implement its own unique function, vastly increasing the number of possible networks that can be evolved.

We focused on three typical forms of higher-order information-sharing: redundancy (information duplicated over multiple elements simultaneously), synergy (information that is only present in the joint-state of all the variables and no simpler combination of sources), and “complexity” (the balance between independence and integration32). To estimate the redundancy and synergy, we used the O-information.8 First introduced by James and Crutchfield as the “enigmatic information”33 and then further refined by Rosas et al., the O-information provides a heuristic measure of whether a given multidimensional probability distribution [ P ( X )] is dominated by redundancy [in which case Ω ( X ) > 0 bit], or if it is dominated by synergy [in which case, Ω ( X ) < 0 bit]. Although the O-information can be written in several equivalent ways, here, we prefer the form derived by Varley et al.,16 as it only requires defining one additional function,
Ω ( X ) = ( 2 N ) T C ( X ) + i = 1 N T C ( X i ) ,
(1)
where N is the number of elements, X i is the set of all X j X excluding X i, and T C ( X ) is the total correlation of X,
T C ( X ) = ( i = 1 N H ( X i ) ) H ( X ) .
(2)

H ( X ) is the Shannon entropy of X. The total correlation (and by extension, the O-information) is zero if X is comprised of independent elements [i.e., H ( X ) is maximal]. If we understand the total correlation in terms of deviation from independence,34 the O-information can be interpreted as a measure of whether X’s deviation from independence is in the “whole” or the lower-order “parts.” If there is a structure in the total correlation of the whole that is not accessible when considering the parts, then ( 2 N ) T C ( X ) > i = 1 N T C ( X i ). The reverse is true if most of the deviation from independence is in lower-order collections of elements.

To quantify the complexity, we used the Tononi–Sporns–Edelman (TSE) complexity.32 The TSE complexity was introduced by Tononi, Sporns, and Edelman in the context of theoretical neuroanatomy and attempts to quantify the degree to which a system balances integration and segregation. A system is said to have a high complexity if, on average, each element is largely independent of every other element, but the whole system strongly deviates from independence. Like the O-information, the TSE complexity can be written out in terms of the total correlation,
T S E ( X ) = i = 1 N 1 [ ( i N ) T C ( X ) E [ T C ( X γ ) ] | γ | = i ] ,
(3)

where E [ T C ( X γ ) ] is the expected value of the total correlation of all subsets of X with γ elements. Since the number of subsets of size γ grows with the binomial coefficient ( 12 γ ), it is not practical to sample all possible subsets for all scales of i. Consequently, following Ref. 16, we took a subsampling approach: if the number of possible subsets of X was greater than 75, that was the number of randomly selected subsets sampled for each scale.

Computing the information theoretic measures requires defining a probability distribution on X, which in this case is a distribution on the 2 12 possible states a given boolean network could adopt. Here, we used a modified intervention distribution: initially, all possible states X = x were set to be equally likely (a maximum-entropy distribution). Then, the system was allowed to update one time step, producing a new distribution. Intuitively, this can be understood using a random walker model: if we imagine the state-transition structure of the whole system as a directed network with 2 12 nodes, we can “place” a single random walker on each node. Updating the system is like allowing the walkers to take a single step. The new distribution of walkers on nodes defines our intervention distribution: some nodes will now have more than one walker, some will have none, etc.

The use of the maximum-entropy distribution can be understood in causal terms:35,36 the updated distribution reflects the expected states of the system if it was intervened upon in a manner analogous to the do-calculus.35 This causal flavor separates the intervention distribution from other plausible choices, such as the stationary distribution,2 which in deterministic boolean networks, incorporates a very small subset of states. In the context of two-element systems, the use of this interventional distribution with the mutual information is known as the “effective information,”36 and so, we can think about the application to the O-information or the TSE complexity as the “effective” O-information or “effective” complexity as well. For a visual explanation of the intervention distribution, see Fig. 1 

FIG. 1.

Intervention distribution. The state-transition structure of a boolean network defines a transition probability matrix (TPM), which gives the probability of a given output (the columns), conditional on a given input (the rows). Here, we see the TPM for a three-element system X, and how the intervention distribution is computed. At time t, all global states are equally likely, corresponding to a maximum-entropy distribution. After one time step, to time t + 1, the distribution of possible states is no longer uniform: this is the distribution of states after performing an intervention on X35—it is this distribution that is fed into the calculations for O-information and Tononi–Sporns–Edelman complexity.

FIG. 1.

Intervention distribution. The state-transition structure of a boolean network defines a transition probability matrix (TPM), which gives the probability of a given output (the columns), conditional on a given input (the rows). Here, we see the TPM for a three-element system X, and how the intervention distribution is computed. At time t, all global states are equally likely, corresponding to a maximum-entropy distribution. After one time step, to time t + 1, the distribution of possible states is no longer uniform: this is the distribution of states after performing an intervention on X35—it is this distribution that is fed into the calculations for O-information and Tononi–Sporns–Edelman complexity.

Close modal

To evolve networks for redundancy, synergy, and complexity, we implemented a naïve evolutionary optimization. Initially, a population of 500 (if the measure was effective O-information) or 200 (if the measure was effective TSE complexity) random boolean networks were initialized. For each generation, the effective O-information or effective TSE complexity of each member of the population was computed, and the bottom half of the networks were removed. The remaining networks were then “mated” to produce offspring, with pairs selected randomly, with the probability of being chosen proportional to the rank of that network within the population.

The process of producing a “child” network from two “parent” networks involves combining the functions associated with each node. Since all networks have the same topology, we matched the nodes in each parent and constructed a new function with 50% of the input–output mappings coming from the first parent and the remaining input–output mappings coming from the second. We also built in an inherent mutation rate of 0.001, which gives the proportion of times a single input–output mapping is flipped. Each population was allowed to evolve for 750 generations, although the vast majority converged before 500 generations (for visualization, see Fig. 2).

FIG. 2.

Evolutionary optimization of redundancy, synergy, and complexity. Top row: Presented above are the evolutionary trajectories for all populations evolving for redundancy (left), synergy (middle), and TSE complexity (right). From random initial conditions, it is clear that the evolutionary optimization is able to discover many configurations that significantly deviate from the random initial conditions. This figure establishes the color schema that will be used for the paper. Red for redundancy, blue for synergy, and gold for TSE complexity. Bottom row: For each of the three classes of system (high redundancy, high synergy, and high complexity), we selected fittest boolean networks and ran them to get a visual sense of their different properties. Note how the highly redundant system almost immediately falls into a stable attractor, while the high-synergy system has a long transient time and overall visually noisier patterns. While these individual trajectories are one of many, they are representative of broader trends. For the redundant system, the average transient time required to hit an attractor is merely 2.75 steps. In contrast, for the synergistic system, the average transient time is 26.91 steps, and for the TSE-maximizing system, the average transient time is 7.7 steps.

FIG. 2.

Evolutionary optimization of redundancy, synergy, and complexity. Top row: Presented above are the evolutionary trajectories for all populations evolving for redundancy (left), synergy (middle), and TSE complexity (right). From random initial conditions, it is clear that the evolutionary optimization is able to discover many configurations that significantly deviate from the random initial conditions. This figure establishes the color schema that will be used for the paper. Red for redundancy, blue for synergy, and gold for TSE complexity. Bottom row: For each of the three classes of system (high redundancy, high synergy, and high complexity), we selected fittest boolean networks and ran them to get a visual sense of their different properties. Note how the highly redundant system almost immediately falls into a stable attractor, while the high-synergy system has a long transient time and overall visually noisier patterns. While these individual trajectories are one of many, they are representative of broader trends. For the redundant system, the average transient time required to hit an attractor is merely 2.75 steps. In contrast, for the synergistic system, the average transient time is 26.91 steps, and for the TSE-maximizing system, the average transient time is 7.7 steps.

Close modal

To explore how controlling higher-order information influences dynamics, we analyzed the final, evolved systems with a suit of well-established tools used for exploring automata models. The first is the number of attractors that the system can settle in. Since deterministic boolean networks will always fall into either a point-attractor or a limit cycle, the number of attractors is an efficient heuristic for how canalized the network is (for more on canalization in boolean networks, see Ref. 37). We also computed the transient time from every state to its inevitable attractor. Transient times are typically used to distinguish between boolean networks in an ordered, sub-critical phase (where transients are short) and a chaotic, super-critical phase (where transients are long).38 The final measure was the Derrida coefficient,39 which quantifies how robust the system is to perturbation. We followed the method detailed by Manicka et al.40 Briefly, we selected 2000 possible states of the network and randomly perturbed each state by flipping m bits resulting in a state and a near-copy. Each state (original and copy) was then allowed to update one step, and the Hamming distance between them was computed. We selected m from the range [ 1 , 2 ] and computed the Derrida coefficient as the slope of the linear regression of the average Hamming distance against m. A Derrida coefficient greater than one indicates a super-critical, chaotic system where small perturbations lead to large differences in future trajectories. A value less than one indicates a stable, sub-critical system where small perturbations have small effects. A Derrida coefficient of one indicates a critical system “on the edge of chaos.”41 These three measures provide a set of tools by which the relative stability or chaosticity of a given boolean network can be assessed and correlated with the presence, or absence, of higher-order information structures.

To quantify the extent to which these systems are capable of integrating information (sometimes used as a proxy measure for non-trivial “computation”7,14), we used a variation of the Φ value from integrated information theory42 following the outline detailed by Mediano et al., in their recent paper.7 Briefly, for a given boolean network X, we computed the difference between the effective information in the entire system and the sum of the effective informations after bi-partitioning X into non-overlapping subsets. This measure is typically referred to as Φ W M S,
Φ W M S ( X ) = I ( X ( t ) ; X ( t + 1 ) ) ( I ( X α ( t ) ; X α ( t + 1 ) ) + I ( X β ( t ) ; X β ( t + 1 ) ) ) ,
(4)

where α and β define the two, non-overlapping partitions of X. Ideally, the partition ( α , β ) is constructed to minimize the loss of effective information when partitioning the system (the so-called minimum information bipartition36); however, this is known to be an intractable problem for even modestly sized systems, as it required brute-forcing all possible bipartitions of X. Here, we use a heuristic estimator from spectral graph theory: the algebraic connectivity and the Fiedler vector.43,44

We began by constructing an effective connectivity graph of the system, defining a matrix M such that M i j = I ( X i ( t ) ; X j ( t + 1 ) ) + I ( X j ( t ) ; X i ( t + 1 ) ). The result is a symmetric, low-dimensional representation of the dynamics of X, where each edge represents the total predictive information following from X i to X j and vice versa. For a dyadic, undirected graph, the Fiedler vector is the eigenvector of the graph Laplacian associated with the smallest non-zero eigenvalue. From the Fiedler vector, it is possible to extract bipartitioning of the graph that bisects the graph while approximately minimizing the total mass of the severed edges. This bisection defines the partition ( α , β ) used in computing Φ W M S. Spectral methods have been previously used to great effect in approximating the integrated information in large systems,45 and this method was chosen for its computational efficiency and conceptual simplicity (in contrast to other, more involved optimizations, such as Queyranne’s algorithm, that are sometimes used for this problem46). To ensure that the effective connectivity graphs were connected, a small amount of noise was added to each edge on the order of 10 6 bit. The computation of the Fiedler vector was done using the Networkx package.

The W M S in Φ W M S refers to the idea that it is a “whole-minus-sum”: it is the difference in the predictive power of the whole system and the sum of the predictive power of the component parts. The higher the value, the more integrated information there is in the system. Unlike many information-theoretic quantities Φ W M S are not strictly non-negative, the interpretation of Φ W M S < 0 was a standing question in the field for years. Recently, however, Mediano et al., showed that negative values occur when dynamic redundancy shared by X α and X β overwhelms the integrated information.47 This prompted Mediano et al., to propose a corrected measure of integrated information: by adding back in the redundancy, it is possible to ensure a non-negative value. Several different temporal redundancy functions have been proposed;47,48 here, we use the minimum mutual information redundancy measure following7 
Φ R ( X ) := Φ W M S ( X ) + min γ , δ { α , β } I ( X γ ( t ) ; X δ ( t + 1 ) ) .
(5)

We selected the top 100 most redundant (highest O-information), most synergistic (lowest O-information), and most complex (highest TSE complexity) evolved boolean networks, and for each one, we computed the minimum information bipartition and corrected integrated information Φ R ( X ).

The first result is verifying that the evolutionary optimization worked and successfully evolved a population with non-trivial higher-order information structures. Figure 2 shows this: for positive O-information, negative O-information, and high TSE complexity, we successfully evolved populations that significantly deviated from the random initial conditions. The average O-information for random systems was 2.396 ± 0.198 bit, while for the high-redundancy systems, it grew to 3.246 ± 1.063 bit and for the high-synergy systems, it fell to 7.02 ± 0.872 bit. Similarly, the random systems had a low TSE complexity 4.107 ± 0.285 bit, but after evolving for high complexity, the systems achieved an average 13.919 ± 0.609 bit. The evolution for redundancy showed a curiously biphasic pattern: many of the populations transiently saturated around 0 bit, before accelerating above zero in a second phase of growth.

Collectively, these show that the evolutionary optimization works to construct systems that have desired higher-order information structures, and also that there is an apparent link between “randomness” and synergy: randomly initialized systems already have O-informations significantly lower than zero, and larger random systems have lower O-informations. This link will be extensively explored in subsequent results.

When characterizing the different classes of systems, we found markedly different dynamics. To compare different families of networks, we use the Mann–Whitney U test,49 a non-parametric significance test of whether one random variable is statistically greater than another by comparing the rank-sums of the different conditions. The test statistic U and the p-value are reported for completeness, although we note that, in simulated data, the p-value is of limited informativeness as the theoretical number of possible samples is unbounded. We also report the absolute value of the effect size | d | using Cliff’s Delta,50 which quantifies the degree to which the greater variable dominates the lesser.

The random systems had the highest average number of unique attractors ( 5.285 ± 2.715), followed by very closely by the synergistic systems ( 4.56 ± 2.156): this small difference was nevertheless significant ( U = 16893.5, p = 0.003, | d | = 0.155). The high-complexity systems came in third, with an average number of attractors of 3.91 ± 2.607, itself a significant decrease from the synergistic systems ( U = 15494.5, p < 4 × 10 5, | d | = 0.225). Finally, the redundant systems had the fewest average number distinct attractors ( 2.13 ± 1.457), significantly lower than the high-complexity systems: U = 10314.0, p < 3.77 × 10 18, | d | = 0.484. These results suggest that the redundancy has a “canalizing” effect:37 given that all networks have the same number of possible states ( 2 12), the low number of attractors in the redundant system indicates that more initial conditions can lead to the same final state than in the random or synergistic cases.

When comparing the joint entropy of the systems after maximum-entropy perturbation (see Sec. II C), we found that the random systems had the highest joint entropy ( 10.606 ± 0.126 bit), which was significantly greater than the next-highest class: the high-synergy systems ( 10.061 ± 0.295, U = 1469.0, p < 4.08 × 10 58, | d | = 0.927). The high-complexity systems had the next-lowest joint entropy ( 6.984 ± 0.488 bit), significantly lower than the high-synergy systems ( U = 0.0, p < 2.42 × 10 67, | d | = 1). Finally, the high-redundancy systems had the lowest joint entropy ( 4.523 ± 0.818 bit), significantly lower than the high-complexity systems ( U = 803.0, p < 3.26 × 10 62 ), | d | = 0.96). These results are consistent with the notion that redundancy is canalizing while synergy (and randomness) maintain “flatter” configuration landscapes, although unlike the number of attractors (which describes the limit behavior of the systems), these show a collapse in joint entropy a mere single time step after perturbation. This suggests dynamics that are not only canalizing in the limit, but rapidly canalizing in the short term as well.

We can see a similar result when we consider the time it takes the systems to reach their various attractor states (the transient times). The high-synergy systems had the longest transients ( 43.965 ± 16.405 steps), which was just barely significantly greater than the random systems ( 38.96 ± 16.816 steps, U = 16300.0, p = 0.0007, | d | = 0.185). The high-complexity systems had radically shorter transient times ( 6.671 ± 2.634 steps) compared to the random systems ( U = 50.0, p < 5.112 × 10 67, | d | = 0.998). Finally, the high-redundancy systems had very short transient times ( U = 2146.5, p < 4.27 × 10 54, | d | = 0.893).

The final measure of dynamic stability and chaosticity we measured was the Derrida coefficient,39 which quantifies the extent to which small perturbations propagate through time. A Derrida coefficient greater than one typically indicates chaos (i.e., small perturbations grow), while a Derrida coefficient less than one indicates stability (i.e., perturbations die out). We found that the synergistic systems had the highest average Derrida coefficients ( 2.155 ± 0.108), indicating highly sensitive, chaotic dynamics. The random systems had the next-highest coefficients ( 1.972 ± 0.053), a significant decrease from the synergistic systems ( U = 3106.5, p < 1.185 × 10 48, | d | = 0.845). The high-complexity systems were closer to the critical point ( 1.428 ± 0.103), significantly lower than the random systems ( U = 0.0, p < 2.414 × 10 67, | d | = 1. Interestingly, the lowest coefficients were very near the critical boundary of one: the redundancy-dominated systems had average Derrida coefficients of 1.027 ± 0.156 (a significant decrease from the high-complexity systems, U = 1091.5, p < 2.02 × 10 60, | d | = 0.945).

Collectively, these results point to a consistent picture of how higher-order information can influence the dynamical properties of a boolean network: the presence of synergy seems to produce signatures of chaos like sensitivity to perturbation, long transience, and lower canalization. In contrast, redundancy had a clear canalizing effect, although the Derrida coefficients also placed the redundant systems at the critical boundary between subcritical and supercritical dynamics. The significance of this is currently unclear. The high-complexity systems, which use TSE complexity to balance integration and segregation, seemed to split the difference: generally more flexible than the highly redundant systems, but much more stable than the random or synergistic systems. For visualization of all results, see Fig. 3.

FIG. 3.

Dynamical differences between random, synergistic, redundant, and complex systems. Top right: The number of unique attractors for each network, for each class. Top left: The joint entropy of the global state intervention distribution after one time step. Middle left: The Derrida coefficient for each network, for each class. Middle right: The average length of the transients. Bottom: The integrated information capacity Φ R for each network for each class.

FIG. 3.

Dynamical differences between random, synergistic, redundant, and complex systems. Top right: The number of unique attractors for each network, for each class. Top left: The joint entropy of the global state intervention distribution after one time step. Middle left: The Derrida coefficient for each network, for each class. Middle right: The average length of the transients. Bottom: The integrated information capacity Φ R for each network for each class.

Close modal

To assess how different higher-order information structures influenced the capacity of the system to integrate information, we selected the 100 most redundant (highest effective O-information), most synergistic (lowest effective O-information), and most complex (highest effective TSE complexity) evolved boolean networks and computed the Φ R measure of integrated information. We also generated 100 random networks to compare the evolved systems to.

We found that the random networks had the highest value of Φ R ( 4.46 bit ± 0.35). The second highest set was the synergistic set, with an average Φ R value of 3.672 bit ± 1.45. The Mann–Whitney U test found significant differences between these groups ( U = 6476.0, p = 0.0003, | d | = 0.676). After the synergistic systems, the family with the next-highest average Φ R value were the high TSE complexity boolean networks ( 1.08 bit ± 0.537), which were significantly lower than the synergistic family ( U = 1080.0, p < 10 21, | d | = 0.946). Finally, the redundancy-dominated networks had the lowest capacity to integrate information, with an average Φ R of 0.162 bit ± 0.31. This was significantly lower than the TSE complexity group ( U = 9308.5, p < 10 25, | d | = 0.535). For visualization, see Fig. 3. These results are consistent with the previous dynamic results: the highly synergistic systems resemble the random systems, while the highly redundant system is profoundly different and the high-complexity systems split the difference, appearing between the two extremes. It also suggests that redundant systems are not particularly effective at integrating information. We should note, however, that these results hinge on the particular algorithm used to estimate the minimum information bipartition; in this case, the Fiedler vector, which is based on a pairwise effective connectivity model (see Sec. II F). Since the minimum information bipartition is typically inaccessible to compute directly, future work may focus on how different heuristics do (or do not) change the relationship between integration and higher-order information.

After analyzing a variety of signatures of dynamical complexity, we can begin to qualitatively describe the impacts of evolving for redundancy, synergy, and complexity. Broadly speaking, synergistic boolean networks resemble random boolean networks: both classes of network are highly sensitive to perturbation and have long transient times before reaching an attractor state (both indicators of chaotic dynamics). They also maintain high entropy dynamics and are capable of integrating information (as evidenced by the high values of Φ R). In contrast, highly redundant systems are typically stable, with fewer unique attractors, short transient times, and a general robustness to perturbation (suggesting sub-critical dynamics). They do not, however, integrate much information: their “computational” or “information processing” complexity7 is very low. Finally, the systems that were evolved to have high TSE complexity32 seemed to sit between the redundant and the synergistic systems. They were more sensitive to perturbations and had slightly longer transients than the purely redundant systems, but also showed a greater capacity to integrate information.

These result suggest that there may be a fundamental trade-off between the capacity to integrate information and the stability of the system. Systems that are too stable (such as those evolved for high O-information) cannot support information integration, while those systems that can integrate significant information are chaotic and unstable. This trade-off is reminiscent of the well-known compromise between redundancy and efficiency in the economics of supply chains: a supply chain with many redundancies can more effectively absorb shocks than a just-in-time chain that is more efficient but buckles under unexpected perturbations.51 Here, however, rather than a financial consideration, the consideration is whether the system is capable of “integrating” information from multiple streams into a unified whole.

One possible implication of this trade-off is that, if one wanted to build a complex system that was both capable of integrating large amounts of information but was stable enough to be predictable (such as an animal nervous system), one possible avenue might be to expand the number of redundant elements enough to stabilize the information-integration capacity. This might partially explain the apparent upward pressure on the size of animal brains over the course of evolution, particularly, the recent expansion of the brain region associated with information integration:18 the capacity to engage in complex information-processing may require a “substrate” of redundant components to stabilize the more complex integrative processes. This is, however, a significant leap from a small population of boolean networks and requires considerably more research before any strong claims can be made.

Maintaining a population of high-redundancy elements comes with its own energetic costs, and therefore, a structure that innately balances the trade-off between stability and computational capacity may be still desirable. Our results suggest that a system with a high TSE complexity (which balances integration and segregation)32 may partially accomplish this. Across all measures, the systems evolved to maximize complexity (rather than high or low O-information) typically split the difference between the extremes of synergy and redundancy: being more stable than the high-synergy systems, but more flexible (and with greater computational capacity) than the purely redundant systems. These results are consonant with early findings by Sporns et al., who found that evolving for TSE complexity in simple systems produced topologies highly reminiscent of those seen in biological nervous systems.52 Future work further exploring the dynamical properties of the TSE complexity, and the evolutionary contexts in which it emerges, may be highly informative about the general properties of evolved information-processing systems like the brain.

Finally, the apparently similar behavior between synergy and randomness is an intriguing result. It is known that Ω ( X ) = 0  bit X i X j for all X i and X j in X.8 The fact that the random systems had O-information values significantly less than zero strongly suggests a link between random systems and synergistic ones. This is not the first paper to suggest as such: Orio, Mediano, and Rosas recently showed that adding small amounts of stochastic noise to elementary cellular automata can transiently increase the synergy present in the system (as measured with the O-information as well),27 and Varley et al. argued that synergistic entropy (in the context of the partial entropy decomposition5) corresponds to irreducible randomness.17 The nature of this link remains mysterious, as randomness is typically associated with independence (and by extension, minimal O-information). Where this “structure” comes from, then, is a question of significant interest.

In this paper, we have shown how evolving small complex systems (in this case, 12-node boolean networks) for the presence of different higher-order information structures (redundancy, synergy, complexity) can influence the kinds of dynamics and computational capabilities the systems display. We found that evolving for redundancy produced systems that were robust to perturbation, highly canalized, and had low capacity to integrate information. Conversely, systems evolved for synergy resembled random systems in many respects: chaotic, sensitive to perturbation, but with a high capacity to integrate information. Finally, systems evolved to be TSE-complex (balancing integration and segregation) combined aspects of both extremes. We propose that there is a fundamental trade-off between stability and computational capacity, and that complex systems combining local segregation with global integration may naturally balance the two extremes.

To replicate this analysis, we have included boolean_networks.pyx as a supplementary material. This Cython package includes functions for construction and evolution of boolean networks, as well as all information-theoretic analyses performed here.

The authors have no conflicts to disclose.

Thomas F. Varley: Conceptualization (equal); Formal analysis (equal); Investigation (equal); Visualization (equal); Writing – original draft (equal). Josh Bongard: Funding acquisition (equal); Writing – review & editing (equal).

All analysis was done in silico. Scripts for recreating and analyzing boolean networks will be included as the supplementary material upon final publication.

1.
T. F.
Varley
, “Information theory for complex systems scientists,” arXiv:2304.12482 (2023).
2.
T. F.
Varley
and
E.
Hoel
, “
Emergence as the conversion of information: A unifying theory
,”
Philos. Trans. R. Soc. A: Math., Phys. Eng. Sci.
380
(
2227
),
20210150
(
2022
).
3.
P. A. M.
Mediano
,
F. E.
Rosas
,
A. I.
Luppi
,
H. J.
Jensen
,
A. K.
Seth
,
A. B.
Barrett
,
R. L.
Carhart-Harris
, and
D.
Bor
, “
Greater than the parts: A review of the information decomposition approach to causal emergence
,”
Philos. Trans. R. Soc. A: Math., Phys. Eng. Sci.
380
(
2227
),
20210246
(
2022
).
4.
P. L.
Williams
and
R. D.
Beer
, “Nonnegative decomposition of multivariate information,” arXiv:1004.2515 (2010).
5.
R. A. A.
Ince
, “The partial entropy decomposition: Decomposing multivariate entropy and mutual information via pointwise common surprisal,” arXiv:1702.01591 (2017).
6.
T. F.
Varley
, “Generalized decomposition of multivariate information,” arXiv:2309.08003 (2023).
7.
P. A. M.
Mediano
,
F. E.
Rosas
,
J.
Carlos Farah
,
M.
Shanahan
,
D.
Bor
, and
A. B.
Barrett
, “
Integrated information as a common signature of dynamical and information-processing complexity
,”
Chaos
32
(
1
),
013115
(
2022
).
8.
F.
Rosas
,
P. A. M.
Mediano
,
M.
Gastpar
, and
H. J.
Jensen
, “
Quantifying high-order interdependencies via multivariate extensions of the mutual information
,”
Phys. Rev. E
100
(
3
),
032305
(
2019
).
9.
A. E.
Goodwell
and
P.
Kumar
, “
Temporal information partitioning: Characterizing synergy, uniqueness, and redundancy in interacting environmental variables
,”
Water Resour. Res.
53
(
7
),
5920
5942
(
2017
).
10.
T. F.
Varley
and
P.
Kaminski
, “
Untangling synergistic effects of intersecting social identities with partial information decomposition
,”
Entropy
24
(
10
),
1387
(
2022
).
11.
T. M. S.
Tax
,
P. A. M.
Mediano
, and
M.
Shanahan
, “
The partial information decomposition of generative neural network models
,”
Entropy
19
(
9
),
474
(
2017
).
12.
D.
Alexander Ehrlich
,
A.
Christian Schneider
,
V.
Priesemann
,
M.
Wibral
, and
A.
Makkeh
, “A measure of the complexity of neural representations based on partial information decomposition,” Transactions on Machine Learning Research, May 2023, ISSN 2835-8856; see https://openreview.net/forum?id=R8TU3pfzFr.
13.
A. M.
Proca
,
F. E.
Rosas
,
A. I.
Luppi
,
D.
Bor
,
M.
Crosby
, and
P. A. M.
Mediano
, “Synergistic information supports modality integration and flexible learning in neural networks solving multiple tasks,” arXiv:2210.02996(2022).
14.
E. L.
Newman
,
T. F.
Varley
,
V. K.
Parakkattu
,
S. P.
Sherrill
, and
J. M.
Beggs
, “
Revealing the dynamics of neural information processing with multivariate information decomposition
,”
Entropy
24
(
7
),
930
(
2022
).
15.
T. F.
Varley
,
O.
Sporns
,
S.
Schaffelhofer
,
H.
Scherberger
, and
B.
Dann
, “
Information-processing dynamics in neural networks of macaque cerebral cortex reflect cognitive state and behavior
,”
Proc. Natl. Acad. Sci. U.S.A.
120
(
2
),
e2207677120
(
2023
).
16.
T. F.
Varley
,
M.
Pope
,
J.
Faskowitz
, and
O.
Sporns
, “
Multivariate information theory uncovers synergistic subsystems of the human cerebral cortex
,”
Commun. Biol.
6
(
1
),
451
(
2023
).
17.
T. F.
Varley
,
M.
Pope
,
P.
Maria Grazia
,
F.
Joshua
, and
O.
Sporns
, “
Partial entropy decomposition reveals higher-order information structures in human brain activity
,”
Proc. Natl. Acad. Sci. U.S.A.
120
(
30
),
e2300888120
(
2023
).
18.
A. I.
Luppi
,
P. A. M.
Mediano
,
F. E.
Rosas
,
N.
Holland
,
T. D.
Fryer
,
J. T.
O’Brien
,
J. B.
Rowe
,
D. K.
Menon
,
D.
Bor
, and
E. A.
Stamatakis
, “
A synergistic core for human brain evolution and cognition
,”
Nat. Neurosci.
25
(
6
),
771
782
(
2022
).
19.
A. I.
Luppi
,
P. A. M.
Mediano
,
F. E.
Rosas
,
J.
Allanson
,
J. D.
Pickard
,
G. B.
Williams
,
M. M.
Craig
,
P.
Finoia
,
A. R. D.
Peattie
,
P.
Coppola
,
D. K.
Menon
,
D.
Bor
, and
E. A.
Stamatakis
, “
Reduced emergent character of neural dynamics in patients with a disrupted connectome
,”
NeuroImage
269
,
119926
(
2023
).
20.
A. I.
Luppi
,
P. A. M.
Mediano
,
F. E.
Rosas
,
J.
Allanson
,
J. D.
Pickard
,
R. L.
Carhart-Harris
,
G. B.
Williams
,
M. M.
Craig
,
P.
Finoia
,
A. M.
Owen
,
L.
Naci
,
D. K.
Menon
,
D.
Bor
, and
E. A.
Stamatakis
, “A synergistic workspace for human consciousness revealed by integrated information decomposition,” eLife, 12, April 2024; see https://doi.org/10.7554/eLife.88173.2; see https://elifesciences.org/reviewed-preprints/88173.
21.
M.
Gatica
,
R.
Cofré
,
P. A. M.
Mediano
,
F. E.
Rosas
,
P.
Orio
,
I.
Diez
,
S. P.
Swinnen
, and
J. M.
Cortes
, “
High-order interdependencies in the aging brain
,”
Brain Connect.
11
(
9
),
734
744
(
2021
).
22.
M.
Gatica
,
F. E.
Rosas
,
P. A. M.
Mediano
,
I.
Diez
,
S. P.
Swinnen
,
P.
Orio
,
R.
Cofré
, and
J. M.
Cortes
, “
High-order functional redundancy in ageing explained via alterations in the connectome in a whole-brain model
,”
PLoS Comput. Biol.
18
(
9
),
e1010431
(
2022
).
23.
T. F.
Varley
,
D.
Havert
,
L.
Fosque
,
A.
Alipour
,
N.
Weerawongphrom
,
H.
Naganobori
,
L.
O’Shea
,
M.
Pope
, and
J.
Beggs
, “The serotonergic psychedelic N,N-dipropyltryptamine alters information-processing dynamics in cortical neural circuits,” arXiv:2310.20582 (2023).
24.
D. P.
Feldman
and
J. P.
Crutchfield
, “
Measures of statistical complexity: Why?
,”
Phys. Lett. A
238
(
4
),
244
252
(
1998
).
25.
J. T.
Lizier
, “The local information dynamics of distributed computation in complex systems,”
Springer Theses
(Springer Berlin Heidelberg, Berlin, Heidelberg, 2013); ISBN: 978-3-642-32951-7; 978-3-642-32952-4.
26.
B.
Flecker
,
W.
Alford
,
J. M.
Beggs
,
P. L.
Williams
, and
R. D.
Beer
, “
Partial information decomposition as a spatiotemporal filter
,”
Chaos
21
(
3
),
037104
(
2011
).
27.
P.
Orio
,
P. A. M.
Mediano
, and
F. E.
Rosas
, “Dynamical noise can enhance high-order statistical structure in complex systems,” arXiv:2305.13454 (2023).
28.
S. A.
Kauffman
, “
Emergent properties in random complex automata
,”
Phys. D: Nonlinear Phenom.
10
(
1
),
145
156
(
1984
).
29.
A.
Saadatpour
and
R.
Albert
, “
Boolean modeling of biological regulatory networks: A methodology tutorial
,”
Methods
62
(
1
),
3
12
(
2013
).
30.
R.
Albert
and
J.
Thakar
, “
Boolean modeling: A logic-based dynamic approach for understanding signaling and regulatory networks and for making useful predictions
,”
Wiley Interdiscip. Rev.: Syst. Biol. Med.
6
(
5
),
353
369
(
2014
).
31.
F.
Rosas
,
P. A. M.
Mediano
,
M.
Ugarte
, and
H. J.
Jensen
, “
An information-theoretic approach to self-organisation: Emergence of complex interdependencies in coupled dynamical systems
,”
Entropy
20
(
10
),
793
(
2018
).
32.
G.
Tononi
,
O.
Sporns
, and
G. M.
Edelman
, “
A measure for brain complexity: Relating functional segregation and integration in the nervous system
,”
Proc. Natl. Acad. Sci. U.S.A.
91
(
11
),
5033
5037
(
1994
).
33.
R. G.
James
,
C. J.
Ellison
, and
J. P.
Crutchfield
, “
Anatomy of a bit: Information in a time series observation
,”
Chaos
21
(
3
),
037109
(
2011
).
34.
S.
Watanabe
, “
Information theoretical analysis of multivariate correlation
,”
IBM J. Res. Dev.
4
(
1
),
66
82
(
1960
).
35.
E. P.
Hoel
,
L.
Albantakis
, and
G.
Tononi
, “
Quantifying causal emergence shows that macro can beat micro
,”
Proc. Natl. Acad. Sci. U.S.A.
110
(
49
),
19790
19795
(
2013
).
36.
G.
Tononi
and
O.
Sporns
, “
Measuring information integration
,”
BMC Neurosci.
4
(
1
),
31
(
2003
).
37.
M.
Marques-Pita
and
L. M.
Rocha
, “
Canalization and control in automata networks: Body segmentation in Drosophila melanogaster
,”
PLoS One
8
(
3
),
e55946
(
2013
).
38.
F.
Xavier Costa
,
J. C.
Rozum
,
A. M.
Marcus
, and
L. M.
Rocha
, “
Effective connectivity and bias entropy improve prediction of dynamical regime in automata networks
,”
Entropy
25
(
2
),
374
(
2023
).
39.
B.
Derrida
and
D.
Stauffer
, “
Phase transitions in two-dimensional Kauffman cellular automata
,”
Europhys. Lett.
2
(
10
),
739
(
1986
).
40.
S.
Manicka
,
M.
Marques-Pita
, and
L. M.
Rocha
, “
Effective connectivity determines the critical dynamics of biochemical networks
,”
J. R. Soc. Interface
19
(
186
),
20210659
(
2022
).
41.
C. G.
Langton
, “
Computation at the edge of chaos: Phase transitions and emergent computation
,”
Phys. D: Nonlinear Phenom.
42
(
1
),
12
37
(
1990
).
42.
D.
Balduzzi
and
G.
Tononi
, “
Integrated information in discrete dynamical systems: Motivation and theoretical framework
,”
PLoS Comput. Biol.
4
(
6
),
e1000091
(
2008
).
43.
M.
Fiedler
, “
Algebraic connectivity of graphs
,”
Czech. Math. J.
23
(
2
),
9
(
1973
).
44.
J. L.
Gross
and
J.
Yellen
,
Handbook of Graph Theory
(
CRC Press
,
2003
), ISBN: 978-0-203-49020-4; Google-Books-ID: mKkIGIea_BkC.
45.
D.
Toker
and
F. T.
Sommer
, “
Information integration in large brain networks
,”
PLoS Comput. Biol.
15
(
2
),
e1006807
(
2019
).
46.
J.
Kitazono
,
R.
Kanai
, and
M.
Oizumi
, “
Efficient algorithms for searching the minimum information partition in integrated information theory
,”
Entropy
20
(
3
),
173
(
2018
).
47.
P. A. M.
Mediano
,
F. E.
Rosas
,
A. I.
Luppi
,
R. L.
Carhart-Harris
,
D.
Bor
,
A. K.
Seth
, and
A. B.
Barrett
, “Towards an extended taxonomy of information dynamics via integrated information decomposition,” arXiv:2109.13186 (2021).
48.
T. F.
Varley
, “
Decomposing past and future: Integrated information decomposition based on shared probability mass exclusions
,”
PLoS One
18
(
3
),
e0282950
(
2023
).
49.
H. B.
Mann
and
D. R.
Whitney
, “
On a test of whether one of two random variables is stochastically larger than the other
,”
Ann. Math. Stat.
18
(
1
),
50
60
(
1947
).
50.
N.
Cliff
, “
Dominance statistics: Ordinal analyses to answer ordinal questions
,”
Psychol. Bull.
114
(
3
),
494
509
(
1993
).
51.
D. L.
Olson
and
S. R.
Swenseth
, “
Trade-offs in supply chain system risk mitigation
,”
Syst. Res. Behav. Sci.
31
(
4
),
565
579
(
2014
).
52.
O.
Sporns
,
G.
Tononi
, and
G. M.
Edelman
, “
Theoretical neuroanatomy and the connectivity of the cerebral cortex
,”
Behav. Brain Res.
135
(
1
),
69
74
(
2002
).