Detecting causality between variables in a time series is a challenge, particularly when the relationship is nonlinear and the dataset is noisy. Here, we present a novel tool for detecting causality that leverages the properties of symmetry transformations. The aim is to develop an algorithm with the potential to detect both unidirectional and bidirectional coupling for nonlinear systems in the presence of significant sampling noise. Most of the existing tools for detecting causality can make determinations of directionality, but those determinations are relatively fragile in the presence of noise. The novel algorithm developed in the present study is robust and very conservative in that it reliably detects causal structure with a very low rate of error even in the presence of high sampling noise. We demonstrate the performance of our algorithm and compare it with two popular model-free methods, namely transfer entropy and convergent cross map. This first implementation of the method of symmetry transformations is limited in that it applies only to first-order autonomous systems.

Real world systems are composed of a number of interacting units. Identifying causality refers to detecting how these units are coupled and quantifying the strength of that coupling from time series data. Detecting causality is particularly important in understanding real-world systems, and hence has attracted researchers from different disciplines to develop a number of causality detecting tools. Here, we present a novel causality detecting algorithm that utilizes the property of dynamical symmetry. The application of the present version of the algorithm is limited to first-order autonomous systems. The performance of the algorithm is demonstrated with toy models from ecology and a real-world electronic system and is found to be very reliable in correctly identifying the coupling between variables. In other words, it successfully discovers the presence and direction of causal edges. We further discuss the scope of improvement of the present version of the algorithm that may find application in more general scenarios.

## I. INTRODUCTION

Detecting causality between observed time series refers to the identification of dominant coupling direction and the quantification of coupling strengths between the interacting time series variables. Identifying such causal relationships from observational or experimental data is also often referred to as “causal discovery.”^{1} Identifying causality is of great importance as it has applications in unraveling various real-world phenomena, ranging from climate networks^{2} to brain networks.^{3} The challenge of learning causal relations for such a diversity of systems has attracted researchers from various disciplines, and a variety of methods have been proposed. One early method is Granger causality,^{4} which relies on the idea of separability. This refers to the removal of information of the causative factors from the effects.^{5,6} Originally, the application of Granger causality was limited because it assumed linear dependence,^{3} whereas most real-world systems are non-linear. But recently, Granger causality has been extended to nonlinear systems.^{7} Another approach for detecting causality in linear systems is by computing cross-correlation^{8,9} between time series for a range of time lags, and identifying the maximum in correlation value at some non-zero lag. The leading time series variable is then inferred to be the cause.

To infer causality in non-linear systems, model-free methods (that do not assume a background model), such as transfer entropy,^{10} convergent cross map,^{5} recurrence plots,^{3,11} causation entropy,^{12} and phase dynamics^{13,14} are becoming increasingly popular. A tool to detect the direction of coupling in interacting oscillators is developed in Refs. 13 and 14 and works by observing an asymmetric failure of invariance in the time evolution of oscillator phases. Roughly, if oscillator 1 drives oscillator 2, then the change in phase of oscillator 2 over a fixed interval of time will depend upon the phase of oscillator 1 at the outset, but not vice versa. On the other hand, transfer entropy (TE) is based on an information-theoretic concept and employs the idea that causes predict their effects. In particular, it quantifies the information flow between time series variables and identifies the direction of causation via the dominant direction of information flow. Transfer entropy has already found applications in neuroscience,^{15} finance,^{16} social media,^{17} animal behavior,^{8,18,19} and public policy.^{20} The main limitation of TE is that the method requires long time series which may not be available in practical situations.

A relatively recent but popular method is the convergent cross map (CCM),^{5} which is based on a state space reconstruction technique. CCM introduces a new criterion for detecting causality. In this case, the effects cross-estimate the causes contrary to the Granger and TE sense.^{21} In Ref. 5, CCM is identified as a superior method to both TE and Granger causality in correctly detecting causality.^{5} CCM has already found application in diverse settings.^{5,22–25} The major limitation of CCM is that it can only detect low to moderate coupling and fails in the presence of strong coupling and for noisy data.^{26}

Despite their broad applicability, TE and CCM continue to struggle with noisy data or small datasets. We present a first version of an algorithm that, though of limited range of application, is robust for small, noisy datasets. This method detects the direction(s) of causal influence by utilizing the concept of dynamical symmetries.^{27}

The paper is organized as follows. In Sec. II, we describe dynamical symmetries, and how properties of these symmetries can be exploited in detecting causality. In Sec. III, we present the algorithm. In Sec. IV, we explain TE and CCM. In Sec. V, we introduce Lotka-Volterra models as toy target systems and demonstrate the performance of our algorithm in detecting unidirectional and bidirectional causality. We further compare the performance of our algorithm with that of TE and CCM on real world systems. Finally, we present the limitations of our algorithm and the scope of improvement in Sec. VI.

## II. Method

The concept of dynamical symmetry was first introduced in Ref. 28 and has proved to be a powerful tool for classifying dynamical systems according to higher-order properties of their causal structure.^{27} For purposes of time series analysis, we use the following definition of the concept:

**Dynamical symmetry:** Let $V$ be the set of states of a system of $n$ variables, and let $\Lambda t1,t0$ be the state evolution operator that takes the system from some state $v0\u2208V$ at time $t0$ to the corresponding evolved state $v1\u2208V$ at $t1$. The transformation $\sigma :V\u2192V$ is a dynamical symmetry with respect to time if and only if it has the following property: $\u2200v\u2208V\u2200t0\u2200t1>t0[\Lambda t1,t0(\sigma (v))=\sigma (\Lambda t1,t0(v))]$. In other words, the set of dynamical symmetries is the set of transformations of the system that commute with its time evolution.

For deterministic dynamics described by a system of differential equations, dynamical symmetries correspond to automorphisms on the space of solutions, which in the case of dynamical systems are trajectories. Consider two arbitrary trajectories such that at $t0$, the system is in state $s1,t0$ on trajectory 1 and state $s2,t0$ on trajectory 2. Then the map that pairs states $s1,t0+\delta $ with $s2,t0+\delta $ for any $\delta \u22650$ is equivalent to a dynamical symmetry. To see this, consider any paired states $s1,t0+\delta $ and $s2,t0+\delta $. If one evolves the state $s1,t0$ for time $\delta $ and then applies the map, one obtains $s2,t0+\delta $, exactly as if one had applied the map at $t0$ and then evolved through an interval of $\delta $. The map in question is also a function (generally a vector function) since, under the assumption that the system is deterministic, no trajectory can repeat a state (i.e., no trajectories intersect with themselves or others).

The set of dynamical symmetries of a system is determined by its causal structure. Furthermore, dynamical symmetries can be directly estimated from data without first learning a dynamical model.^{27} For these reasons, dynamical symmetry can be used to probe causal structure.

In the present study, we exploit the nature of dynamical symmetries to identify the causal influences in a system composed of two time-dependent variables that may be coupled in either or both directions. To illustrate our procedure, consider a unidirectionally coupled, autonomous, first-order system of the form:

A common representational tool in causal discovery is the structural equation model (SEM)^{29} (Chaps. 1 and 5). If a system such as Eq. 1 has solutions for the given $h1$ and $h2$, then we can equivalently represent it with a structural equation model.^{27} This is because one can express a flow of the dynamical system as a function of time and the present and past values of the dynamical variables. In this case, we have

Our method depends upon examining the dynamical symmetries of a system. More specifically, we look at differences in their one-variable projections. Given any two trajectories of a deterministic system of two dynamical variables ($x$ and $y$), there is (at least one) dynamical symmetry that maps one into the other. In general, these symmetries can be represented by two-component vector functions of system states. Each component of such a vector function can be viewed as a surface whose distance above or below a point in the $x,y$ phase plane is the value to which that component of the dynamical symmetry maps the point. Sequences of points in the phase plane are thus mapped by a dynamical symmetry to curves on this surface. Our method relies in effect on detecting differences in the shape of these surfaces along different basis directions. We do so by looking at the projection of curves on these surfaces onto the plane corresponding to one or the other dynamical variable. For example, the $x$-component of the vector function representing a symmetry may be written $\sigma x(x,y)$. Consider a curve in the phase plane given by $\u27e8r(t),s(t)\u27e9$. This is mapped by a dynamical symmetry to $\sigma x(r(t),s(t))=\u27e8u(t),v(t)\u27e9$, a curve on the surface dictated by $\sigma x(x,y)$. If we project this curve onto the plane containing the $x$-axis, we obtain a new curve, denoted $\sigma (x)$. In this case, when parameterized by time, this curve is given by $\sigma (x(t))=u(t)$. In some cases, the projected curve $\sigma (x)$ is in fact a proper function of $x$ and thus represents a dynamical symmetry of the system that involves intervention on only the variable $x$. However, it is often the case that $\sigma (x)$ is not a function. For this reason, we call these projections *pseudo-symmetries*.

To derive implicit equations for the pseudo-symmetries involving only $x$ [denoted $\sigma (x)$] and those involving only $y$ [denoted $\sigma (y)$] given the above assumptions, consider the series of operations summarized in Tables I and II, respectively. To find $\sigma (x)$ we start from $t=0$ then apply the transformation on $x$ followed by time evolution. Each row of the center column of the table indicates the state of the system after each of these operations as dictated by Eq. (2). Next, we start from $t=0$, but change the order of time evolution and the transformation on $x$. The resulting system states are shown in the right-hand column. This procedure yields a condition that must be satisfied for any transformation, $\sigma (x)$, to qualify as a pseudo-symmetry. From the final row of Table I, we can see that any transformation of $x$ that is a pseudo-symmetry must satisfy

Importantly, this condition entails that any symmetry $\sigma (x)$ is dependent upon both initial conditions, $x0$ and $y0$. That is, the set of pseudo-symmetries of $x$ are determined in part by the initial value of $y$.

$\Lambda \Delta ,0(\sigma (x))$ . | ||
---|---|---|

$t$ | $x$ | $y$ |

$0$ | $x0$ | $y0$ |

$0$ | $\sigma (x0)$ | $y0$ |

$\Delta $ | $f(\Delta ,g(\Delta ;y0);\sigma (x0))$ | $g(\Delta ;y0)$ |

$\sigma (\Lambda \Delta ,0(x))$ | ||

$t$ | $x$ | $y$ |

$0$ | $x0$ | $y0$ |

$\Delta $ | $f(\Delta ,g(\Delta ;y0);x0)$ | $g(\Delta ;y0)$ |

$\Delta $ | $\sigma (f(\Delta ,g(\Delta ;y0);x0))$ | $g(\Delta ;y0)$ |

$\Lambda \Delta ,0(\sigma (x))$ . | ||
---|---|---|

$t$ | $x$ | $y$ |

$0$ | $x0$ | $y0$ |

$0$ | $\sigma (x0)$ | $y0$ |

$\Delta $ | $f(\Delta ,g(\Delta ;y0);\sigma (x0))$ | $g(\Delta ;y0)$ |

$\sigma (\Lambda \Delta ,0(x))$ | ||

$t$ | $x$ | $y$ |

$0$ | $x0$ | $y0$ |

$\Delta $ | $f(\Delta ,g(\Delta ;y0);x0)$ | $g(\Delta ;y0)$ |

$\Delta $ | $\sigma (f(\Delta ,g(\Delta ;y0);x0))$ | $g(\Delta ;y0)$ |

$\Lambda \Delta ,0(\sigma (y))$ . | ||
---|---|---|

$t$ | $x$ | $y$ |

$0$ | $x0$ | $y0$ |

$0$ | $f(0,\sigma (y0);x0)$ | $\sigma (y0)$ |

$\Delta $ | $f(\Delta ,g(\Delta ;y0);x0)$ | $g(\Delta ;\sigma (y0))$ |

$\sigma (\Lambda \Delta ,0(y))$ | ||

$t$ | $x$ | $y$ |

$0$ | $x0$ | $y0$ |

$\Delta $ | $f(\Delta ,g(\Delta ;y0);x0)$ | $g(\Delta ;y0)$ |

$\Delta $ | $f(\Delta ,\sigma (g(\Delta ;y0));x0)$ | $\sigma (g(\Delta ;y0))$ |

$\Lambda \Delta ,0(\sigma (y))$ . | ||
---|---|---|

$t$ | $x$ | $y$ |

$0$ | $x0$ | $y0$ |

$0$ | $f(0,\sigma (y0);x0)$ | $\sigma (y0)$ |

$\Delta $ | $f(\Delta ,g(\Delta ;y0);x0)$ | $g(\Delta ;\sigma (y0))$ |

$\sigma (\Lambda \Delta ,0(y))$ | ||

$t$ | $x$ | $y$ |

$0$ | $x0$ | $y0$ |

$\Delta $ | $f(\Delta ,g(\Delta ;y0);x0)$ | $g(\Delta ;y0)$ |

$\Delta $ | $f(\Delta ,\sigma (g(\Delta ;y0));x0)$ | $\sigma (g(\Delta ;y0))$ |

We perform a similar exercise to obtain the condition that $\sigma (y)$ must satisfy. From Table II, it is clear that the pseudo-symmetries involving $y$ alone must satisfy

which is independent of the initial condition for $x$. The reason $\sigma (y)$ is independent of $x0$ is because the evolution of $y$ as given in Eq. (2) is not influenced by $x$. A similar argument to the above shows that if the causal influence is bidirectional—if $x$ influences $y$ and vice versa—then $\sigma (x)$ would depend upon the initial value of $y$*and*$\sigma (y)$ would depend upon the initial value of $x$.

The central idea of the present study is to exploit this pattern of dependence on initial conditions to infer the direction(s) of causal influence. By estimating $\sigma (y)$ for two different initial conditions of $x$, one can ascertain whether $\sigma (y)$ depends upon $x0$. If it does, then $x$ is a cause of $y$. Conversely, if $\sigma (x)$ is found to depend upon $y$, then $y$ is a cause of $x$, as in the example above. And if both one-variable pseudo-symmetries depend upon the initial values of the other variable, then the causal influence is mutual (i.e., $x$ and $y$ influence each other). Note, however, that this decision procedure does not indicate the degree to which one influences the other.

This approach depends upon comparing pairs of pseudo-symmetries, $\sigma (x)$ and $\sigma (y)$, for the dynamical system of interest. In practice, the information to do so can be gleaned from a single time series of sufficient length, but we assume here that multiple replicates of a dynamical system are used, as depicted in Fig. 1. In the first (System 1), we start by choosing two initial conditions for $x$ ($x0=\gamma $ and $x0=\delta $) while keeping the initial condition for $y$ fixed at $\alpha $. System 2 is set up similarly, except that $y0$ is fixed at $\beta $. The pseudo-symmetries of interest are the functions $x\alpha ~\u2192(x\alpha \u2192)$ and $x\beta ~\u2192(x\beta \u2192)$ that, respectively, map values of $x$ in the time series $x\u2192$ to the corresponding (same time) value in the time series vector, $x~\u2192$ (the subscripts $\alpha $ and $\beta $ refer to the initial value of $y$ for each system). These two functions are denoted $\sigma 1(x)$ and $\sigma 2(x)$ in Fig. 1(a). Similarly, we set up two additional systems, System 3 and System 4, to produce two time series for the variable $y$, each starting from a different initial condition, while the initial value of $x$ is set to either $\delta $ (for System 3) or $\gamma $ (for System 4). The relevant pseudo-symmetries of $y$ in this case are the functions denoted $\sigma 3(y)$ and $\sigma 4(y)$ in Fig. 1(b). The subscripts $\delta $ and $\gamma $ refer to initial value of $x$ in Systems 3 and 4, respectively. Table III presents a complete notation for the respective trajectories.

Initial conditions . | Notation for $x$ . | Notation for $y$ . |
---|---|---|

$[\gamma ,\alpha ]$ | $x\alpha \u2192$ | $y\gamma ~\u2192$ |

$[\gamma ,\beta ]$ | $x\beta \u2192$ | $y\gamma \u2192$ |

$[\delta ,\alpha ]$ | $x\alpha ~\u2192$ | $y\delta ~\u2192$ |

$[\delta ,\beta ]$ | $x\beta ~\u2192$ | $y\delta \u2192$ |

Initial conditions . | Notation for $x$ . | Notation for $y$ . |
---|---|---|

$[\gamma ,\alpha ]$ | $x\alpha \u2192$ | $y\gamma ~\u2192$ |

$[\gamma ,\beta ]$ | $x\beta \u2192$ | $y\gamma \u2192$ |

$[\delta ,\alpha ]$ | $x\alpha ~\u2192$ | $y\delta ~\u2192$ |

$[\delta ,\beta ]$ | $x\beta ~\u2192$ | $y\delta \u2192$ |

## III. The algorithm

The algorithm is based on a technique that was first introduced in Ref. 27, where the dynamical symmetries of two systems are compared to determine whether the systems share a similar dynamics. In case they do not share the same symmetry transformations, they are confidently determined to belong to “different kinds”; otherwise, they are provisionally taken to belong to the “same kind.” In the present study, a similar procedure is adopted to compare pairs of pseudo-symmetries and use the pattern of results to detect causal influences. To detect causality in this way, we proceed in four steps: (i) sampling, (ii) data transformation, (iii) comparison (of $\sigma 1$ with $\sigma 2$, and $\sigma 3$ with $\sigma 4$), and (iv) decision.

In step (i), we sample from four systems, where the governing equations and the parameters remain the same, but the initial conditions vary as depicted in Fig. 1. For System 1, two time series are measured using the same initial condition for the $y$ variable ($y0=\alpha $), different initial conditions for $x,$ namely $x0=\delta $ and $x0=\gamma .$ Similarly, for System 2, we set $y0=\beta ,$ and choose two initial conditions for $x,$ $x0=\delta $, and $x0=\gamma $. The trajectories are obtained by measuring the values of the variables at times $t0,\u2026,tm$ for each of the systems. Similarly, time series are acquired from System 3 and System 4. In this case, the initial value of $x$ is held constant, and measurements are made on each system for two different initial values of $y$. In all cases, measurements are made at the same set of times, $t0,t1,\u2026,tm$. We choose segments such that $x$ and $y$ vary monotonically. This guarantees that the pseudo-symmetries are functions over at least this interval, which allows us to use the comparative tools described below.

In step (ii), we combine pairs of time series acquired in step (i) to generate a sample of each of the functions $\sigma i$. Specifically, for System 1 we pair the vector of $x$-values, $x\alpha \u2192$ for the time series beginning with $x0=\gamma $ with the vector of $x$-values $x\alpha ~\u2192$ from the time series beginning with $x0=\delta $. The result is a two-column matrix (represented in Algorithm 1 as input $A\sigma 1,x$) for which the $i$-th row contains elements from each of the two time series at the time $ti$. The function mapping the values in the first column to those of the second [represented by $x\alpha ~\u2192(x\alpha \u2192)$] is the pseudo-symmetry $\sigma 1(x)$. We similarly pair the time series vectors of $x$ or $y$ variables for Systems 2–4.

Input: $A\sigma 1,x$—for System 1, an $m\xd72$ matrix where each row contains one element of $x\alpha \u2192$ and the corresponding element of $x\alpha ~\u2192$ to which $\sigma 1$ maps the system state. |

$A\sigma 3,y$—for System 3, an $m\xd72$ matrix where each row contains one element of $y\delta \u2192$ and the corresponding element of $y\delta ~\u2192$ to which $\sigma 3$ maps the system state. |

$B\sigma 2,x$—for System 2, an $m\xd72$ matrix where each row contains one element of $x\beta \u2192$ and the corresponding element of $x\beta ~\u2192$ to which $\sigma 2$ maps the system state. |

$B\sigma 4,y$—for System 4, an $m\xd72$ matrix where each row contains one element of $y\gamma \u2192$ and the corresponding element of $y\gamma ~\u2192$ to which $\sigma 4$ maps the system state. |

Output: $Comp1$—a boolean value indicating whether $\sigma 1$ and $\sigma 2$ are identical. |

$Comp2$—a boolean value indicating whether $\sigma 3$ and $\sigma 4$ are identical. |

1: randomize the rows of $A\sigma 1,x,A\sigma 3,y,B\sigma 2,x,B\sigma 4,y$ |

2: $Partition1\u2190$ divide the rows of $A\sigma 1,x$ into 10 segments |

3: $Partition2\u2190$ divide the rows of $B\sigma 2,x$ into 10 segments |

4: $Partition3\u2190$ divide the rows of $A\sigma 3,y$ into 10 segments |

5: $Partition4\u2190$ divide the rows of $B\sigma 4,y$ into 10 segments |

6: for $i=1$ to 10 do |

7: for $k=1$ to 4 do |

8: $Traink\u2190\u22c3p{Partitionk[p]|p\u2260i}$ |

9: end for |

10: for $j=1$ to 2 do |

11: $Model2j\u22121\u2190FitPolynomial(Train2j\u22121)$ |

12: $Model2j\u2190FitPolynomial(Train2j)$ |

13: $Model2j\u22121,2j\u2190FitPolynomial(Train2j\u22121\u222aTrain2j)$ |

14: $SE2j\u22121\u2190$ concatenate($SE2j\u22121$, squared errors of $Model2j\u22121$ with respect to $Partition2j\u22121[i]$) |

15: $SE2j\u2190$ concatenate($SE2j$, squared errors of $Model2j$ with respect to $Partition2j[i]$) |

16: $SE2j\u22121,2j\u2190$ concatenate($SE2j\u22121,2j$, squared errors of $Model2j\u22121,2j$ with respect to $Partition2j\u22121[i]\u222aPartition2j[i]$) |

17: end for |

18: end for |

19: $MSEsep1\u2190mean(SE1\u222aSE2)$ |

20: $MSEjoint1\u2190mean(SE1,2)$ |

21: $MSEsep2\u2190mean(SE3\u222aSE4)$ |

22: $MSEjoint2\u2190mean(SE3,4)$ |

23: if $MSEjoint1>MSEsep1+threshold(A\sigma 1,x,B\sigma 2,x)$ then |

24: return $TRUE$ and $Comp1=1$ |

25: else |

26: return $FALSE$ and $Comp1=0$ |

27: end if |

28: if $MSEjoint2>MSEsep2+threshold(A\sigma 3,y,B\sigma 4,y)$ then |

29: return $TRUE$ and $Comp2=1$ |

30: else |

31: return $FALSE$ and $Comp2=0$ |

end if |

Input: $A\sigma 1,x$—for System 1, an $m\xd72$ matrix where each row contains one element of $x\alpha \u2192$ and the corresponding element of $x\alpha ~\u2192$ to which $\sigma 1$ maps the system state. |

$A\sigma 3,y$—for System 3, an $m\xd72$ matrix where each row contains one element of $y\delta \u2192$ and the corresponding element of $y\delta ~\u2192$ to which $\sigma 3$ maps the system state. |

$B\sigma 2,x$—for System 2, an $m\xd72$ matrix where each row contains one element of $x\beta \u2192$ and the corresponding element of $x\beta ~\u2192$ to which $\sigma 2$ maps the system state. |

$B\sigma 4,y$—for System 4, an $m\xd72$ matrix where each row contains one element of $y\gamma \u2192$ and the corresponding element of $y\gamma ~\u2192$ to which $\sigma 4$ maps the system state. |

Output: $Comp1$—a boolean value indicating whether $\sigma 1$ and $\sigma 2$ are identical. |

$Comp2$—a boolean value indicating whether $\sigma 3$ and $\sigma 4$ are identical. |

1: randomize the rows of $A\sigma 1,x,A\sigma 3,y,B\sigma 2,x,B\sigma 4,y$ |

2: $Partition1\u2190$ divide the rows of $A\sigma 1,x$ into 10 segments |

3: $Partition2\u2190$ divide the rows of $B\sigma 2,x$ into 10 segments |

4: $Partition3\u2190$ divide the rows of $A\sigma 3,y$ into 10 segments |

5: $Partition4\u2190$ divide the rows of $B\sigma 4,y$ into 10 segments |

6: for $i=1$ to 10 do |

7: for $k=1$ to 4 do |

8: $Traink\u2190\u22c3p{Partitionk[p]|p\u2260i}$ |

9: end for |

10: for $j=1$ to 2 do |

11: $Model2j\u22121\u2190FitPolynomial(Train2j\u22121)$ |

12: $Model2j\u2190FitPolynomial(Train2j)$ |

13: $Model2j\u22121,2j\u2190FitPolynomial(Train2j\u22121\u222aTrain2j)$ |

14: $SE2j\u22121\u2190$ concatenate($SE2j\u22121$, squared errors of $Model2j\u22121$ with respect to $Partition2j\u22121[i]$) |

15: $SE2j\u2190$ concatenate($SE2j$, squared errors of $Model2j$ with respect to $Partition2j[i]$) |

16: $SE2j\u22121,2j\u2190$ concatenate($SE2j\u22121,2j$, squared errors of $Model2j\u22121,2j$ with respect to $Partition2j\u22121[i]\u222aPartition2j[i]$) |

17: end for |

18: end for |

19: $MSEsep1\u2190mean(SE1\u222aSE2)$ |

20: $MSEjoint1\u2190mean(SE1,2)$ |

21: $MSEsep2\u2190mean(SE3\u222aSE4)$ |

22: $MSEjoint2\u2190mean(SE3,4)$ |

23: if $MSEjoint1>MSEsep1+threshold(A\sigma 1,x,B\sigma 2,x)$ then |

24: return $TRUE$ and $Comp1=1$ |

25: else |

26: return $FALSE$ and $Comp1=0$ |

27: end if |

28: if $MSEjoint2>MSEsep2+threshold(A\sigma 3,y,B\sigma 4,y)$ then |

29: return $TRUE$ and $Comp2=1$ |

30: else |

31: return $FALSE$ and $Comp2=0$ |

end if |

In step (iii), we determine whether $\sigma 1=\sigma 2$ and whether $\sigma 3=\sigma 4$. To detect whether the pseudo-symmetries differ between Systems 1 and 2, we estimate the error of two competing models, *sep* and *joint*, using 10-fold cross-validation. In the *sep* model, data from System 1 and System 2 are treated separately, whereas for the *joint* model, the data from System 1 and System 2 are combined and treated as a single sample from the same function. In either case, we fitted polynomial functions to estimate the pseudo-symmetries. For a given dataset, the order of the polynomial functions to be fit is determined by an additional 10-fold cross-validation (represented by FitPolynomial in Algorithm 1), as was done in Ref. 27. This function starts by considering a polynomial of order 1 and computes the mean-squared error of fitting a first-order polynomial over 10 folds of the data. The function continues fitting the data with higher order polynomials and compares the mean-squared error with that of the preceding order. If the error increases, the function terminates and returns the order that minimized the mean square error. The polynomial coefficients are then evaluated by fitting the entire data set with the order as returned by FitPolynomial in Algorithm 1.

It is important to note that mean squared errors computed for the *sep* and *joint* models cannot be directly compared, because the error depends on how the data are partitioned and on the amount of observation noise present in the system. Following the procedure in Ref. 27, we quantify this background contribution to the mean squared errors of *sep* and *joint* models by dividing each of the systems into two subsystems, and using the same 10-fold cross-validation routine on each pair to identify the difference of the mean squared errors between *sep* and *joint* models that one should expect purely by chance. For example, System 1 is divided into two subsystems and mean squared errors are computed for joint and separate models fitted to these subsystems; the absolute difference in these errors is represented by $\mu 1$. Similarly, $\mu 2$ refers to the absolute difference in mean squared errors between *sep* and *joint* models built from System 2. In Algorithm 1, $threshold(A\sigma 1,x,B\sigma 2,x)=max(\mu 1,\mu 2).$ The hypothesis that $\sigma 1$ and $\sigma 2$ have the same functional form is rejected if the condition $MSEjoint>MSEsep+threshold(A\sigma 1,x,B\sigma 2,x)$ holds, and hence we infer $\sigma 1\u2260\sigma 2.$ The same calculations are done with the data from Systems 3 and 4 to decide whether $\sigma 3=\sigma 4$.

In the final step (iv), we reach a decision regarding causal influence. Specifically, if we detect $\sigma 1=\sigma 2$ and $\sigma 3\u2260\sigma 4$, we identify $x$ as the cause of $y$. Conversely, if we detect $\sigma 1\u2260\sigma 2$ and $\sigma 3=\sigma 4$, we identify $y$ as the cause of $x.$ If both pairs are unequal, this suggests mutual causal influence, and if both pairs are identical, then $x$ does not appear to influence $y$ or vice versa.

Note that the algorithm relies on randomizing the rows of $A\sigma 1,x,\u2026,B\sigma 4,y,$ and hence the decision may vary on different runs using the same data. Thus, for a given time series, we need to run Algorithms 1 and 2 multiple times, and select the majority decision.

Input: $Comp1$ and $Comp2$ from Algorithm 1. |

Output: Value $1$ if $x\u2192y,$ value $2$ if $y\u2192x$, value $3$ if $y\u2194x$ and value $0$ indicating no causal relationship detected. |

1: if $Comp1=1$ and $Comp2=0$ then |

2: return $2$ |

3: else if $Comp1=0$ and $Comp2=1$ then |

4: return $1$ |

5: else if $Comp1=1$ and $Comp2=1$ then |

6: return $3$ |

7: else |

8: return $0$ |

9: end if |

Input: $Comp1$ and $Comp2$ from Algorithm 1. |

Output: Value $1$ if $x\u2192y,$ value $2$ if $y\u2192x$, value $3$ if $y\u2194x$ and value $0$ indicating no causal relationship detected. |

1: if $Comp1=1$ and $Comp2=0$ then |

2: return $2$ |

3: else if $Comp1=0$ and $Comp2=1$ then |

4: return $1$ |

5: else if $Comp1=1$ and $Comp2=1$ then |

6: return $3$ |

7: else |

8: return $0$ |

9: end if |

## IV. Other causality detecting tools: Transfer Entropy (TE) and Convergent Cross Map (CCM)

In our present study, we choose two model-free methods, namely TE and CCM, to provide a benchmark for the performance of our algorithm in detecting causality. In drawing comparisons, it is important to note that the method introduced here requires four replicate time series data each starting from four different pairs of initial conditions, whereas both TE and CCM require a single time series data, without any prerequisite on the number of initial conditions needed.

### A. Transfer entropy

The idea of TE was first formalized in Ref. 10 to measure information flow between two time series variables. TE uses the concept of Shannon entropy. Shannon entropy quantifies the expected value of the information associated with the occurrence of an event and is defined as follows:

where $Pr[x]$ is the probability mass function for a time series variable $X$ taking the value $x$ and $X$ refers to the set containing all possible realizations of $X$. Now, given two time series variables $X$ and $Y,$ the TE from $Y$ to $X$ ($TY\u2192X$), quantifies information transfer from $Y$ to $X$ by measuring the reduction in entropy of $X$ when conditioned on $Y$. Hence, transfer entropy from $Y$ to $X$ is defined as

where $Pr[x(t+1)|x(t)]$ and $Pr[x(t+1)|x(t),y(t)]$ denote the probability of $x(t+1)$ conditioned on $x(t)$ alone, and on both $x(t)$ and $y(t)$, respectively; $Pr[x(t+1),x(t),y(t)]$ denotes joint probability. If $Y$ does not provide any additional information then $Pr[x(t+1)|x(t),y(t)]=Pr[x(t+1)|x(t)]$, and hence $TY\u2192X$ equals zero. In general, transfer entropy is an asymmetric quantity, and comparing $TY\u2192X$ with $TX\u2192Y$ detects the dominant direction of information flow, thus identifying the direction of causation.

We implement TE using the open-source JIDT (Java Information Dynamics Toolkit),^{30} which implements various information-theoretic measures. We used the Kraskov, Stogbauer, and Grassberger (KSG) method to calculate the probability mass functions (PMF) used in the above definitions because it uses dynamically altered kernel width in terms of $K$ nearest-neighbors ($knn$) and it has been shown to decrease errors in PMF estimation.^{31}

### B. Convergent cross map

CCM, first introduced in Ref. 5, is a technique for determining causality between sets of time series using a neighborhood method. It relies on the idea that if time series $X(t)$ causally influences $Y(t)$, then signatures (or imprints) of $X(t)$ inherently exist in $Y(t),$ and hence historical records of $Y(t)$ can be used to estimate the current state of $X(t).$ The estimated state of $X(t)$ is denoted by $X(t)^|MY,$ where $MY$ refers to a shadow version of the original manifold $M$ reconstructed from the projection of times series $Y(t)$. Finally, the accuracy of that estimation is evaluated using the correlation coefficient, calculated between the original time series and their cross-map estimates. If the correlation coefficient between the true values ($X(t)$) and the estimated values ($X(t)^|MY$) is high, then it is concluded that there is “enough” information stored in $Y$ that came from $X$ to indicate that $X$ is among the causes of $Y$. However, if $Y(t)$ does not influence $X(t),$ then estimate of $Y(t)$ using $X(t),$ denoted by $Y(t)^|MX,$ will be inferior. A more detailed description of the algorithm can be found in Ref. 32, and in the supplementary materials of Ref. 5.

The CCM estimates are unitless and the values range between $0$ and $1$. The cross map estimation is evaluated at different library sizes, which refers to the length of the time series used. The cross map estimation converges to a constant value with increasing library size, and the comparison of relative skill of cross-map estimations indicates the direction of causation. As the real-world datasets possess observational and process noise and may lack sufficient data, the asymptotic value of cross map estimation may not converge to a value of one.^{5} The parameters associated with CCM are lag, which is used for the construction of the shadow manifold, and the optimal embedding dimension. In the present study, we implement CCM using the rEDM package^{33} in R, where the lag was set to $1$ by default, and the optimal embedding dimension was evaluated following the Simplex Projection method.^{34}

## V. Results

### A. Detecting unidirectional coupling in competitive Lotka-Volterra model

To evaluate the performance of our algorithm, we chose to implement a two-species competitive Lotka-Volterra model. Nonlinear models of this type are used to describe the population dynamics of two species competing for resources. The governing equations are given as follows:

where $x$ and $y$ denote the population sizes, $r1$ and $r2$ the growth rates, and $K1$ and $K2$ the carrying capacities of species 1 and species 2, respectively. The parameter $\alpha 12$ represents the influence of species 2 on species 1, and similarly, $\alpha 21$ represents the influence of species 1 on species 2.

For the first set of tests, we set $r1=r2=12,$ $K1=K2=150,$ choose $\alpha 21=0,$ and vary $\alpha 12$ from $1$ to $8.$ By setting $\alpha 21=0,$ we ensure that $x$ has no influence on $y$, while increasing values of $\alpha 12$ correspond to an increasing strength of influence from $y$ to $x.$ Next, we implement our algorithm on the time series data generated by solving Eq. (6) using ode45 in Matlab. To implement our algorithm, we need four pairs of initial conditions for $x$ and $y,$ which we choose as follows: $[x0=\gamma =20,y0=\alpha =30],$ $[x0=\delta =40,y0=\alpha =30],$ $[x0=\gamma =20,y0=\beta =60],$ and $[x0=\delta =40,y0=\beta =60].$ We generate time series data of length $1000.$ We further add Gaussian observation noise with $0$ mean and varying standard deviation from $5$ to $10.$

We choose different coupling strengths and standard deviations of observation noise to verify the robustness of our algorithm in correctly detecting $y$ as the cause of $x$. Figure 2(a) demonstrates the results, where we vary the coupling strength ($\alpha 12$) from $1$ to $8,$ and for each choice of $\alpha 12$, we choose six different amounts of observation noise ranging from standard deviation $5$ to $10.$ Since our algorithm relies on shuffling the data while evaluating $\sigma $, the results obtained in the decision phase are not deterministic. Hence, we run the algorithm ten times on the same data and count the number of times $x$ is incorrectly detected as the cause of $y$, the number of times $y$ is correctly detected as the cause of $x$, and the number of times the algorithm indicates bidirectional causality. We plot the result as percentage in Fig. 2(a). Though some runs on a given data set incorrectly indicate bidirectional causation, we note that the majority decision is accurate for every combination of parameter values tested. Furthermore, in no case does it miss the influence of $y$ on $x$.

Next, to provide a benchmark for this performance we implement TE and CCM. Since TE and CCM do not require four different choices of initial conditions, we reuse the data set generated above with initial conditions $[x0=\gamma =20,y0=\alpha =30].$ We evaluate TE for $knn$ values ranging from $2$ to $15$, where $knn$ is the parameter used in estimating the PMF. However, the resolution width used in estimating TE should not be too large nor too small,^{31} so we fix $knn$ to $4.$ We find in Fig. 2(b) that many times TE falsely identifies $x$ as the sole cause, particularly as the coupling strength $\alpha 12$ and measured noise increase. Next, we implement CCM which is presented in Fig. 2(c). CCM relies on the asymptotic value of convergence at maximum library size, and Fig. 2(c) presents the causal direction as identified by CCM. The results demonstrate that CCM also performs poorly, as it too falsely identifies $x$ as a cause of $y$ for a number of values of $\alpha 12$ and noise.

#### 1. Bidirectional causality with equal coupling strength

So far, we only considered the presence of unidirectional coupling. Next, we investigate bidirectional coupling, beginning with the special case where the coupling strength for either direction is the same. Again we modeled a competitive Lotka-Volterra system and kept all parameter values the same, but set $\alpha 12=\alpha 21.$ We vary the coupling strength from values of $1$ to $8,$ and for each choice of coupling strength we vary the standard deviation for the Gaussian noise from $5$ to $10.$ Next, we implement our algorithm on the time series data generated by solving Eq. (6) using ode45 in Matlab. We keep the choice of initial conditions fixed as in the earlier example. Further, we compare the results of our algorithm with that of TE and CCM.

The performance of all three algorithms is presented in Fig. 3(a), where we observe that our algorithm can reliably detect the bidirectional coupling for all choices of coupling strength and increased measured noise. In this case, the output of all 10 runs was uniform for every combination of parameter values. The performance of TE and CCM in detecting the bidirectional coupling is presented in Figs. 3(b) and 3(c), respectively. This case is clearly more challenging for TE and CCM. At least in a minority of cases, TE correctly identifies bidirectional causation, but CCM never does. Note, however, that these methods output continuous measures—of information exchange in the case of TE and degree of correlation for CCM—for which there is no universal decision threshold. It is likely that adjustment of these thresholds would improve performance, but any such threshold would have to be justifiable in the absence of information about the system or the nature of the sampling noise. It is an advantage of the method presented here that a natural threshold is built into the binary decision process that requires no foreknowledge of the system to determine.

For reference, we have provided sample trajectories for both unidirectional and bidirectional case in Fig. 4.

#### 2. Bidirectional causality with unequal coupling strength

Finally, we investigate the general case of bidirectional causality, when both the variables influence each other but do not necessarily do so with equal coupling strength. We again choose the competitive Lotka-Volterra equations and fix the coupling strength from $x$ to $y$ by setting $\alpha 21=10,$ and the standard deviation for the Gaussian noise to $5.$ Next, we vary the coupling strength from $y$ to $x$ from values $1$ to $10,$ where small values of $\alpha 12$ correspond to weak coupling. To measure the performance, we run our algorithm 10 times for each choice of $\alpha 12$ (again, on the identical data set) and count the number of times $x$ and $y$ are identified as a causal driver, independently. Bidirectional coupling will be inferred if the algorithm detects both $x$ and $y$ as causal drivers. We present the result as percentage in Table IV. We notice that the algorithm detects the presence of bidirectional coupling for all choices of $\alpha 12,$ except when $\alpha 12=1.$ In other words, when the value of $\alpha 12$ is relatively small in comparison to $\alpha 21$ the algorithm correctly infers $x$ as a cause of $y$, but misses the relatively weak influence of $y$ on $x$.

$\alpha 12$ | $1$ | $2$ | $3$ | $4$ | $5$ | $6$ | $7$ | $8$ | $9$ | $10$ |

$\sigma 1\u2260\sigma 2(y\u2192x)$ | 0 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |

$\sigma 3\u2260\sigma 4(x\u2192y)$ | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |

$\alpha 12$ | $1$ | $2$ | $3$ | $4$ | $5$ | $6$ | $7$ | $8$ | $9$ | $10$ |

$\sigma 1\u2260\sigma 2(y\u2192x)$ | 0 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |

$\sigma 3\u2260\sigma 4(x\u2192y)$ | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |

### B. Accuracy

So far, we have found that the proposed algorithm performs better for a single noisy dataset in comparison to both TE and CCM. As noted above, the algorithm is indeterministic for a given dataset; the output may depend upon the shuffling that takes place in cross-validation. As suggested above, this can be mitigated when using the algorithm to decide causal structure by running multiple times on a given dataset and then taking a consensus of the results. For characterizing the accuracy of such a procedure, we accept the majority output as definitive and test against a sample of datasets drawn from the same Lotka-Volterra system. We repeat these tests for a range of systems and sampling noises. Specifically, we vary the standard deviation of the Gaussian sampling noise from $5$ to $50$. For a fixed value of noise, we vary the coupling strength $\alpha 12$ from $1$ to $8$. We set $\alpha 21$ to zero for the unidirectional case and vary both $\alpha 12$ and $\alpha 21$ for the bidirectional case, where $\alpha 21$ is set equal to $\alpha 12.$ For each set of system and noise parameters, we conduct ten trials, generating a distinct time series for each. Within each trial, we run our algorithm ten times and identify the causal direction by majority vote. We define the accuracy as the ratio of the number of times the algorithm correctly identifies all and only the correct causal edges to the total number of trials, which is equal to ten. We present the accuracy results in Fig. 5. We find that the accuracy decays with an increase in the noise, however, the effect of noise on performance reduces when the coupling strength is also increased. In other words, for relatively weak coupling the noise dominates the performance, but the performance gets better if coupling strength is increased proportionally. This pattern is prominent in the bidirectional case [Fig. 5(b)] and weakly present in the unidirectional case Fig. 5(a).

### C. Detecting bidirectional coupling in a Dixon system

In this section, we evaluate the performance of our algorithm by implementing it on a system that demonstrates chaos-like behavior. According to the Poincaré-Bendixson theorem, the attractor for any smooth two-dimensional bounded continuous-time autonomous system is either a stable equilibrium or a limit cycle.^{36} Since our algorithm is limited to first-order two-dimensional continuous-time autonomous systems, we cannot apply it to most of the well-known chaotic systems. However, there exists a system derived by Dixon *et al.* which does not satisfy the smoothness condition since it has a singularity at the origin. The governing equations are given as follows:

We choose the parameters $\xi 1=0$ and $\xi 2=0.7,$ for which the system displays “chaotic” behavior, which is more precisely referred to as “S-chaos” (singularity-chaos).^{36,37} Next, we implement our algorithm on the time series data generated by solving Eq. (7) using ode45 in Matlab. We choose four pairs of initial conditions for $x$ and $y$ as follows: $[x0=\gamma =0.5,y0=\alpha =1.5],$$[x0=\delta =0.15,y0=\alpha =1.5],$$[x0=\gamma =0.5,y0=\beta =2.5],$ and $[x0=\delta =0.15,y0=\beta =2.5].$ We further add Gaussian observation noise with $0$ mean and $0.1$ standard deviation. Next, we implemented our algorithm, and ran it ten times and noted the dominant causal direction that it detected. Interestingly, the algorithm detected the correct causal direction in each run, indicating bidirectional coupling.

### D. Real-world system

To evaluate the performance of our algorithm with respect to a real-world system, we built a nonlinear circuit in which an LC oscillator is used to drive an inverting amplifier. The detailed circuit diagram is provided in Fig. 6(a). We treated the output of the LC oscillator and the output of the inverting amplifier as our two dynamical variables. The signals recorded from these two channels are referred to as $X$ and $Y$ in Fig. 6(a). The input voltage VCC was used to intervene in the system in order to attain the desired initial conditions. We chose four pairs of initial conditions for $X$ and $Y$ as follows: $[X0=\gamma =\u22120.58V,Y0=\alpha =2.6V],$ $[X0=\delta =\u22120.48V,Y0=\alpha =2.6V],$ $[X0=\gamma =\u22120.58V,Y0=\beta =1.94V],$ and $[X0=\delta =\u22120.48V,Y0=\beta =1.94V]$. We recorded a sample of size $2496$ at an interval $4E\xd710\u22128$ s. In this experimental setup, $X$ drives $Y.$ The signals are plotted in Fig. 6(b) for reference. Next, we implemented our algorithm and ran it ten times and noted the dominant causal direction that it identified. Interestingly, the algorithm detected the correct causal direction in each run, identifying $X$ as the cause. We further implemented TE and CCM, to compare the performance with respect to our tool. TE correctly detected the coupling direction from $X\u2192Y$, whereas CCM incorrectly inferred bidirectional coupling.

## VI. Discussion and future scope

In the present work, we use the concept of pseudo-symmetries to detect unidirectional and bidirectional causality. We compare the performance of our algorithm to that of TE and CCM when applied to noisily sampled two-species competitive Lotka-Volterra systems. In this context, both TE and CCM can in principle detect the causal influences. However, tests demonstrate that both these tools fail often in the presence of significant noise, particularly with bidirectional coupling. On the other hand, the proposed algorithm based on comparisons of pseudo-symmetries is accurate across all of the conditions examined. To demonstrate the application of the present approach to real-world systems, we constructed a circuit in which one voltage drives another. The algorithm accurately identified the correct causal direction. The symmetry approach is therefore a viable alternative to existing tools when dealing with noisy real-world data that may exhibit any combination of causal influences between dynamical variables.

The present approach relies upon observing behavior for four pairs of distinct initial conditions. The simplest way to achieve this is by intervention on the system. But, given the possibility of intervention, an alternate approach to identifying causal direction suggests itself, namely a direct comparison of trajectories. For example, comparing $x\alpha ~\u2192$ with $x\beta ~\u2192$ and $x\alpha \u2192$ with $x\beta \u2192$ as in Figs. 1(a) and 1(b) would enable one to detect that $y$ drives $x$. The method we use to compare portions of pseudo-symmetries (that can be represented by polynomials) could be used to directly compare trajectories. Differences in, e.g., $x(t)$ for changes in the initial value of $y$ would then be indicative of causal direction (in this case, from $y$ to $x$). However, that method fails when we do not have exact control over initial conditions. To test this assertion, we compared the present method with that of such a trajectory comparison method by adding Gaussian noise to the initial conditions, and then letting a Lotka-Volterra system evolve in time. Under these conditions, the symmetry method performs better in detecting the causal direction than a direct comparison of trajectories. Further, we consider a chaotic Dixon system, to verify the performance with respect to noisy initial conditions. The symmetry method detected the correct causal direction. Although it is simpler to intervene, it is often not necessary. For a sufficiently long time series, a purely observational dataset can often be segmented in such a way that, when each segment is treated as a separate time series, the requisite set of initial conditions are at least approximately obtained. This is sufficient for applying the algorithm, albeit at some cost to accuracy.

This first version of the algorithm is, however, limited to first-order autonomous systems. It is also the case that the present version of the algorithm is applicable only to systems having two time-dependent variables. The most straightforward approach to extending this method (or the direct comparison of trajectories mentioned above) to systems with more variables would require a geometric increase in the number of initial conditions as a function of the number of variables. We see two approaches to generalizing our method that avoids this problem. On the one hand, it is possible to assess directional gradients of the surfaces that correspond to components of a dynamical symmetry in ways that do not involve comparing pseudo-symmetries. If this can be done with accuracy from typical time series data, then in principle we would not be limited by the large number of initial conditions demanded by a straightforward generalization of the current approach. On the other hand, it is possible to preserve causal information while reducing many variable systems to approximate two-variable systems.^{38} Both approaches are left for future work.

Finally, our algorithm, though it reliably detects bidirectional influence, yields only a binary decision; it does not indicate the relative degree of causal influence in either direction. But such information can be gleaned after a reliable detection is made by applying one or another continuous measure of the degree of causal influence.

Despite its limits, the algorithm presented is a demonstration of the potential of symmetry methods for the reliable detection of causality when data is noisy, causation may be bidirectional, and coupling strengths vary. This work only begins to open a wide scope for improvement, which is left for future studies.

## Acknowledgments

This work was supported by the National Science Foundation under Grant No. 1454190. We would like to thank Naveed Jantzen for his assistance in collecting data while conducting the experiment.

## References

*Artificial Intelligence and Statistics*(