Questions of causation are foundational across science and often relate further to problems of control, policy decisions, and forecasts. In nonlinear dynamics and complex systems science, causation inference and information flow are closely related concepts, whereby “information” or knowledge of certain states can be thought of as coupling influence onto the future states of other processes in a complex system. While causation inference and information flow are by now classical topics, incorporating methods from statistics and time series analysis, information theory, dynamical systems, and statistical mechanics, to name a few, there remain important advancements in continuing to strengthen the theory, and pushing the context of applications, especially with the ever-increasing abundance of data collected across many fields and systems. This Focus Issue considers different aspects of these questions, both in terms of founding theory and several topical applications.
A basic and fundamental pursuit in science is to infer causal interactions and relationships. In terms of dynamical systems science, one may ask which dynamic variables influence other variables, either directly or indirectly through intermediate variables, and which variables only appear to be related due to the influence of a common driver. While traditional approaches to scientific discovery of causation between variables are through close iterations of forming a hypothesis and conducting controlled real experiments, in the past few decades, novel data-oriented approaches have been proposed that attempt to detect causal relations from purely observational data, driven by a growing availability of large-scale datasets. Observational causal inference constitutes a challenging problem for complex dynamical systems, from theoretical foundations to practical computational issues. Papers in this Focus Issue cover both theory and applications to a broad range of problems in social, physical, and biological systems.
The gold standard of scientific discovery is validation through controlled experiments: For example, in a standard physics experiment, to test whether a variable has an influence on , one physically intervenes in , changes its state, and measures whether this has an effect on , while keeping all other conditions unchanged as much as possible. In many problems, however, such controlled experiments can be infeasible—for example, when Johannes Kepler made the discovery of the laws of planetary motion, it was based on observations of orbital dynamics. In general, the central question that arises is how one might perform causal inference from purely observational data. The statistics of causation by interventions and observations is thoroughly described in Pearl’s extensive works, summarized in Ref. 36, which lays out a mathematical framework for causation and develops the conditions under which interventions can be predicted from observational data.
In counterpoint to controlled experiments, there is the notion of observing a “free-running” complex system without actively probing it, and from (passive) time dependent observations thereof, asking the comparable question of what variables and factors are causal of others. In this context, causality can be interpreted as a form of influence on predictability (or the lack of predictability), that is, if the knowledge of one time series is useful in forecasting another time series, then the former can be seen and be interpreted as potentially “causal” for the latter. This formulation was put forward specifically in the work of Granger16 that led to his 2003 Nobel memorial prize in Economic Sciences,19 which in fact was closely related to the work of Norbert Wiener that predated by more than a decade.68 In analogy to the controlled experimental setting above, to test whether a variable has an influence on , one builds a predictive model of from observational data (most commonly, a linear autoregressive model) based on ’s and other covariates’ past and measures whether the inclusion of in the model improves the predictability of . Here, other variables are not actively kept under control, but they are statistically controlled for.
Information-theoretic measures based on Shannon entropy and mutual information9 naturally allow for a very general characterization of dependencies in complex and dynamical systems from symbolic to continuous descriptions. Analogous to Wiener-Granger causality for linear systems, transfer entropy has become a highly popular way to consider questions of pairwise information transfer between nonlinear dynamical systems.53 In a basic sense, transfer from to measures how much information the past of contains about , beyond ’s own past, which shows the close analogy to Granger causality. In the special case that the systems being analyzed are linear Gaussian stochastic processes, it has been shown that Granger causality is equivalent to transfer entropy.3 Hence, the entropic approach is more general, as it is applicable in the context of general distributions and nonlinear influences. Notice also that information-theoretic measures are fundamentally probabilistic in nature, as is Granger causality, since they are premised on comparing probability distribution functions of states. So, while it is naturally appropriate for a stochastic process, including random dynamical systems29 or a stochastic differential equation,2 it also can be well descriptive of a deterministic dynamical system. To this end, we simply recast the perspective by considering the evolution of ensembles of initial conditions,2 which is essentially foundational to ergodic theory63 and also transfer operators.5,6,29 Another aspect of causal inference that often is based on information-theoretic measures regards notions of causal coupling strength.24,38,42,44 Information-theoretic measures for continuous variables can be most efficiently estimated from data using nearest-neighbor estimators,26 where permutation shuffle tests can be used for conditional-independence testing,41 that is, testing whether a (conditional) mutual information is zero.
The works of Arnhold, Hirata, Schiff, Sugihara, and others consider the causation problem through the perspective of dynamical attractors underlying nonlinear dynamical systems and the concept of generalized synchronization.1,17,20,21,39,51,56 Starting from the insight that higher-dimensional attractors can be reconstructed from univariate measured time series via time-delay embedding,60 these works utilize what can be termed as the closeness principle. According to this principle, in causally related systems, states in the driver attractor that are temporally near to states in the response attractor should also be close to each other. As one implementation, convergent cross-mapping56 tests the closeness principle via the convergence of nearest neighbors on the attractors of the two systems as the number of samples from the attractor increases.
More from the statistics and machine learning community, the problem of causal inference is treated within the framework of structural causal models, related to the work of Pearl.36 The theory of structural causal models, summarized in Peters et al.,37 lays out the conditions under which certain causal models, for example, linear models with non-Gaussian error terms, or nonlinear models, can be identified from observational data, not necessarily time series data, as in the Granger and nonlinear dynamics context, which is the major topic of this Focus Issue.
More broadly, a multi-faceted perspective is desired since while standard Granger causality and also transfer entropy are well suited for two-variable (or two-subsystem, two-component) settings of information flow, they are not designed to ask the question associated with three or more factors, including the issue of the curse of dimensionality.45 For example, if we were to consider a system of three subsystems, , , and , and ask: does influence directly or does influence but only by an intermediary ? For these questions, one must cast a conditional variation of the above concepts. There exist conditional Granger causalities,7,12,34 conditional transfer entropies, and state conditioned transfer entropies,67 and a special variation leading to causation entropy8,25,58,59. Furthermore, if one wishes to uncover coupling structure, then we require an algorithm premised on these computations; for example, PC algorithm and momentary information-based causal discovery46 which address inference of large-scale nonlinear causal networks in the presence of strong autocorrelation, or the optimal causation entropy (oCSE) approach58,59 which is designed to uncover the network of direct information flow influences using (CSE) as the underlying influence measure.
While most of the approaches are data-oriented in detail, and many are information-theoretic, there is notably the Liang-Kleeman formalism31 which takes a thermodynamic differential equations approach assuming known equations, in the spirit of information theory that is also shown in certain setting to coincide with transfer entropy. To counter point, a theoretical direct transfer operator description can be posed6,29 that utilizes Jensen-Shannon divergence instead of Kullback-Leibler divergence; these perspectives are also highlighted in this issue.
Several works have shown limitations of the Shannon information-theoretic framework in measuring dependence and causation. One aspect being that entropy is not invariant to change of variables,9 which was a critique of differential entropy early on Ref. 65. Another more recent work notes that there are stochastic processes for which the past is entirely shielded from the future.23 In terms of multivariate dependencies, James et al.22 elaborate on polyadic relationships that put in question, even fundamentally, the notion of a directed graph with pairwise links to represent what should be a hyper graph. Such polyadic, or synergistic, causal drivers can also be crucial in optimal time series prediction schemes.43
In the recent decade in particular, there has been a significant resurgence in consideration of causation inference, and the closely related concepts of information flow. Perhaps there are several underlying reasons for this, but not least of which the growth in this field is associated with the even broader growth generally of data sciences, machine learning, and “big data” analytics, in particular in the context of complex systems. Reasons for this boom can be described in terms of the ever growing availability of large-scale and more affordable data collection platforms, ever more powerful and cost-effective computing facilities and equipment, and an abundance of massive datasets from an unimaginable number of sensors everywhere. Questions like asking what part of the earth’s atmosphere may be influential in forecasting other parts of the earth’s atmosphere4,10,33,54,64,69 would be too overwhelmingly complex to consider analytically, but in the vein of a data-oriented approach on massive computing platforms, such a question if possible is clearly important. Data-oriented answers to general scientific questions are broadly enjoying a golden age, and causation inference and information flow are no exception. Applications range from social,18,32,57,62 medical,35,40,49,61,66 earth sciences,13–15,27,28,47,48,52 engineering,50,55 as well as financial and economic,11 to name a small fraction of the massive growing literature on the exciting applications of these powerful scientific tools.
Many open problems remain, however, from spatial and temporal resolution effects to determinism, nonlinearity, and multi-element and synergistic interactions. In this Focus Issue, contributions include work on these topics in terms of theory as well as applications such as collective animal behavior, interdependent ecological dynamics, social communication and opinion diffusion, sensing and monitoring in mechanical and other engineering structures, and the global climate system, thus encompassing a broad range of problems in social, physical, and biological systems.
II. This Focus Issue
As the theory of causation inference evolves, the state of the art, algorithmic design, and applications are all advancing as reflected in the papers herein. Many of the general concepts discussed here are expanded upon, modernized, or reviewed in the articles of this Focus Issue. Below, we provide a brief summary of each of the papers that appear in this Focus Issue.
J. M. Amigo and Y. Hirata (this focus issue, Ref. 70) re-examine the method of the joint distance distribution to identify directional nonlinear couplings (“Detecting directional couplings from multivariate flows by the joint distance distribution”). The method lies at the intersection of nonlinear dynamics and time series analysis and utilizes the closeness principle according to which the states in the driver attractor that are concurrent with close states in the response attractor are also close to each other. Also, convergent cross-mapping falls into this framework. The paper insightfully illustrates advantages and pitfalls of the method, for example, the issue of phase synchronization.
H. Ashikaga and R. James (this focus issue, Ref. 71) explored asymmetry in the information flow across different spatial scales in a mathematical model of cardiac dynamics, aimed at determining the relationship between rotors and inter-scale information flow (paper title “Inter-scale information flow as a surrogate for downward causation that maintains spiral waves”). By comparing transfer entropy and intrinsic transfer entropy, the paper concludes that information flow from macro- to micro-scale is adequately captured by transfer entropy, and no synergistic effects are present. Using transfer entropy as a surrogate, the paper focuses on information flow from macro-scale behavior of the system to its corresponding micro-scale states and found that such “downward causation” correlates with the number of rotors, which are rotation centers of cardiac spiral waves. However, no significant association was found between a higher number of rotors and higher downward information flow. Such a finding has the potential to challenge the existing paradigm in cardiac research that rotors are the causal factors to maintain atrial fibrillation.
J. Bagrow and L. Mitchell (this focus issue, Ref. 72) discuss a model of the social flow of written information (“The quoter model: A paradigmatic model of the social flow of written information”). The quoter model simulates the posting and sharing of short social media posts where information propagates over a graph via a quoting mechanism. The authors provide an in-depth information-theoretic analysis with analytical derivations of information flow validated by numerical experiments. The model can serve as a benchmark for how information flow measures applied to text deal with spurious interactions and confounds.
E. M. Bollt (this focus issue, Ref. 73) presents a new perspective on information flow in terms of directly inspecting the associated transfer operators. Since entropy is fundamentally measured in terms of inspecting the underlying probability distributions, the argument here is that information flow should consider the evolution of such densities, and therefore two competing versions of the evolutions of densities are inspected, one without and one with considering the possibility of the outside influence of an extra factor. These two competing possibilities of closed versus open as it turns out are elegantly considered in terms of either a standard Frobenius-Perron operator for the autonomous (closed) deterministic system or a stochastic kernel for the corresponding stochastic Frobenius-Perron operator for the nonautonomous (open) dynamical system, whereby the ensemble of collective perturbations can be thought of as a random influence. Then, directly measuring the differences between these kernels leads to the need for a Jensen-Shannon divergence due to proper inclusion, and this is called the forecastability quality metric.
The paper by J. Garland, A. M. Berdahl, J. Sun, and E. M. Bollt (this focus issue, Ref. 74), “Anatomy of leadership in collective behaviour,” studies causality and information flow in the form of leadership in collective behaviour of mobile animal groups, which is facilitated by the recent emergence of large datasets of trajectory time series of individuals in animal groups. Starting from the insight that heterogeneous individuals in such groups feature different types of influence over group behaviour, the authors develop an anatomy of leadership and provide a framework for evaluating and discussing leadership and models of animal group behaviour.
In “Causality and information flow with respect to predictability,” by X. S.-Liang (this focus issue, Ref. 75), the author continues in advancing the Liang-Kleeman formalism,30,31 which is a rigorous formalism for information transfer assuming the system’s dynamics is analytically described. Here, however, important thermodynamic issues related to over a time epoch are related in terms of Shannon information despite arbitrary dimensionality, and the property of so-called “nil causality” is considered, whereby classical methods may fail to verify.
U. Ozturk, N. Marwan, O. Korup, H. Saito, A. Agarwal, M. J. Grossman, M. Zaiki, and J. Kurths (this focus issue, Ref. 76) use event synchronization to construct complex networks of information flow to track extreme rainfall over Japan and surrounding seas (paper “Complex networks for tracking extreme rainfall during typhoons”). Directionality of information flow in the network is used to analyze regional sources and sinks of extreme weather patterns. In addition, the paper found, among several interesting results derived from a network and information flow perspective of the system, that for typhoon seasons, extreme rainfall tends to follow the southwest-northeast motion of typhoons and mean rainfall gradient of Japan.
M. Paluš, A. Krakovská, J. Jakubík, and M. Chvosteková (this focus issue, Ref. 77) set out to investigate the interesting topic of how measured causality would change if one were to reverse the temporal order of observations (paper “Causality, dynamical systems and the arrow of time”). They found, using several distinct methods as ways for measuring causality, that Granger’s first principle of causality which states that “The cause occurs before the effect” can in fact be violated for chaotic systems under time reversal. Even though such a violation occurs only in hypothetical situations since chaotic processes in the real-world are not reversible, it does send a warning signal for practitioners who wish to detect causality in experimental data to not rely solely on a single measure of causality, but instead consider additional analysis, such as tests for nonlinearity, synchronization, as well as spectral and time-frequency analysis.
S. Roy and B. Jantzen (this focus issue, Ref. 78) present a novel method for detecting the direction of influence between two nonlinearly coupled dynamical processes (“Detecting causality using symmetry transformations”). The method utilizes the property of dynamical symmetries, which can be considered as the set of transformations of the system trajectories that commute with its time evolution. Comparisons with transfer entropy and convergent cross mapping show that the method is especially robust in the presence of observational noise. Currently, the method is limited to first-order autonomous systems leaving extensions to the important multivariate case to future work.
The paper by J. Runge, “Causal network reconstruction from time series: From theoretical assumptions to practical estimation” (this focus issue, Ref. 79), offers a comprehensive computational and theoretical perspective on what can be learned from observed experimental multivariate time series. The paper discusses the underlying assumptions and computational issues of the broad framework of conditional independence-based methods that encompasses Granger causality, transfer entropy, optimal causation entropy, and momentary information transfer, among others. Which effects occur if important assumptions are not satisfied, such as those due to unobserved variables, sampling issues, determinism, stationarity, nonlinearity, or measurement error? How are causal reconstructions affected by computational issues due to autocorrelation and high dimensionality? The article is intended to briefly review and accessibly illustrate the foundations and practical problems of time series-based causal discovery and stimulate further methodological developments.
In, “Local causal states and discrete coherent structures,” Rupe and Crutchfield (this focus issue, Ref. 80) study computational mechanics and causal influences in spatiotemporal processes to extract so-called local causal states and uncover “local symmetries.” This causal approach is an original departure from many other set-oriented methods, more often used in the study of coherent structures, with applications here in terms of cellular automata for a rigorous discussion, but there is an outlook of real world application of this new approach.
The paper by Smirnov, “Transient and equilibrium causal effects in coupled oscillators” (this focus issue, Ref. 81), notably describes nonequilibrium effects as contrasted to equilibria effects from the viewpoint of causal influences. Coupling parameter variations and also on-off switching of coupling are considered in the context of Wiener-Granger causality, and two kinds of influences are categorized for unidirectional couplings.
This work was funded in part by the Simons Foundation (Grant No. 318812), the U.S. Army Research Office (Grant No. W911NF-16-1-0081), the U.S. Office of Naval Research (Grant No. N00014-15-1-2093), and also DARPA. J.R. was funded by a postdoctoral fellowship by the James S. McDonnell foundation.