Detecting causal relationships in complex systems from the time series of the individual units is a pressing area of research that has attracted the interest of a broad community. As an open area of study, this entails the development of methodologies to unravel causal relationships that evolve over time, such as switching of leader-follower roles in animal groups. Here, we augment the information theoretic measure of transfer entropy to establish a fitness function suitable for optimal partitioning of time series data to robustly detect leadership switches in collective behavior. The fitness function computes the information outflow from any agent in the group and rewards large sample sizes while normalizing with respect to available information. Our results indicate that for information-rich interactions, leadership switches within a group can be detected over relatively short time durations, with more than 90% accuracy. On a real soccer dataset, instances of leadership counted using the proposed approach are interestingly correlated with ball possession.

Who is a leader and who is a follower is not always encrypted in the DNA. In several animal species, it is hypothesized that individuals can switch their role in the group, such that they can take the role of a leader for a period of time and then return to be a follower. However, robust methodologies to infer switches in leadership from raw time series of individual motion are limited in their scope. Grounded in information theory, statistics, and optimization, this paper puts forward a novel methodology for dynamically detecting leaders. Through the systematic analysis of a classical mathematical model of collective behavior, we demonstrate the possibility of accurately identifying leaders for varying group sizes, individual speeds, degrees of coordination, and duration of the time series. Applying the method to real data from the motion of soccer players during a game supports the intuition that the tendency of a player to take the role of a leader in the team depends on the amount of time the player is in possession of the ball.

## I. INTRODUCTION

Model-free approaches are well suited to identify causality between time series, as they do not assume an underlying relationship—linear or nonlinear—between the processes. These include extreme-event synchronization,^{1} statistical methods,^{2} and information theoretic measures such as transfer entropy.^{3,4} Of these, transfer entropy has been shown to be especially accurate when inferring causal structures in collective behavior.^{5–10} From nonlinear, noisy time series of movement trajectories, transfer entropy successfully detects *a priori* known causal relationships among agents. Such relationships can be directly tied to instances of leadership, where one individual drives the motion of the group.^{11}

Despite successful applications across a wide range of fields,^{12} transfer entropy has been largely limited to the identification of causal relationships that do not evolve in time. This is in contrast to instances in nature,^{11} where group leaders could often switch roles, with one individual taking a leadership role for a portion of time and then switching back to a follower role. A reason why transfer entropy has resisted application toward detecting switching roles is that the computation of transfer entropy is grounded in the estimation of multidimensional probability distributions, where limited size datasets hamper a reliable inference. An ensemble version of transfer entropy that addresses this issue has been proposed by Gomez-Herrero *et al.*^{13}, considering multiple instances of the same time series to carry out the estimation of probability distributions. While the method can deal with short time series, it assumes that dynamic relationships will be consistently replicated over multiple instances, which is unlikely to be true in the study of animal behavior. A recent paper has looked at trends in cumulative transfer entropy along segmented datasets to assess leadership switching over long time series.^{14}

With no guarantee that causal relationships are preserved across multiple trials, an isolated estimate of transfer entropy for a given time series could be performed by maximizing the amount of information exchanged over chosen time segments. In this sense, the time evolution of leadership within a group might be studied by searching for switching patterns that maximize transfer entropy from the candidate leader to the followers. Thus, leaders could be identified by performing a brute-force search for partitions of the time series that maximize transfer entropy. This approach, however, does not address the inherent bias present in plug-in estimators of entropy or the dependence of variance on the size of the dataset,^{15} which may again limit the size of the partitions. Furthermore, owing to the computational expense and exponentially large possibilities in which a dataset can be partitioned, a brute-force search becomes nonviable for large datasets.

Optimal partitioning of time series^{16} can be used to identify leadership roles efficiently, as long as we construct a valid fitness function that rewards the partitions maximizing net transfer entropy. However, to implement an optimal partitioning scheme amenable to transfer entropy estimates, we must ensure that estimates are weighted by some measure of confidence. Without such a measure, a smaller partition with a few samples and an erroneous but large estimate could easily be chosen over a longer partition with a lower transfer entropy value that is closer to the true value.

The main contribution of this work is a statistically-principled, information-theoretic fitness function for detecting switching in information sharing within the group. The fitness function is constructed upon knowledge about the convergence properties of the plug-in estimator of entropy,^{15,17,18} toward an optimal partitioning algorithm that weights transfer entropy readings by their statistical accuracy. We demonstrate the performance of the approach in detecting switching leadership in noisy trajectories of collective motion of simulated particles.

## II. ENTROPY AND ITS CONVERGENCE PROPERTIES

Information entropy is defined as the amount of uncertainty contained in a random variable $X\u2208X$.^{19} Entropy is computed as $H(X)=\u2212\u2211p(x)log2\u2061p(x)$, where $p(x)$ denotes the probability of $X$ taking the value $x$, $p(X=x)$, and the sum is over the sample space $X$. For two random variables $X$ and $Y\u2208Y$, the joint and conditional entropies are defined as $H(X,Y)=\u2212\u2211p(x,y)log\u2061p(x,y)$ and $H(X\u2223Y)=\u2212\u2211p(x,y)log\u2061p(x\u2223y),$ respectively. The sum is now over the joint space $x\u2208X,y\u2208Y$. For stochastic processes ${Xt}t\u2208N$, where all $Xt$ are independent and identically distributed, the corresponding entropy rate $H{Xt}t\u2208N$ is simply the entropy of its random variables $H(Xt)$. If the process is a stationary first-order Markov process with known probability transition matrix, the entropy rate is a function of the stationary distribution $\mu $, assumed to exist and be unique, with the transition matrix $P$ so that $H(Xt)=\u2211ij\mu iPijlog\u2061Pij$.^{20}

For a stochastic process, transfer entropy measures the extent by which it departs from the Markov property.^{3} Given two processes, ${Xt}t\u2208N$ and ${Yt}t\u2208N$, transfer entropy from ${Yt}t\u2208N$ to ${Xt}t\u2208N$ is defined as

For Gaussian processes, this definition is equivalent to Granger causality.^{21,22} The process ${Yt}t\u2208N$ Granger-causes ${Xt}t\u2208N$ if knowledge of ${Yt}t\u2208N$ reduces the uncertainty in predicting a future state of ${Xt}t\u2208N$. Since entropy values rely on estimating the multidimensional probability distribution functions (pdfs), transfer entropy estimates themselves are data-intensive and any inference of causality must be tested for statistical significance.

The empirical distribution function of the histogram is a natural choice for estimating information-theoretic measures, because functions of empirical distributions have provable convergence properties. Specifically $p(j)=(#of samples of a random variable in binj)(total number of samples)$. The resulting “plug-in” estimate of entropy, $H^$, from the empirical distribution is known to be biased ($|E[H^]\u2212H|>\u03f5,\u03f5>0$, where $E(\u22c5)$ indicates expectation); no known unbiased estimator of entropy exists.^{15} However, the plug-in estimator is consistent, such that for a fixed number of bins $m$, as the number of samples increase, the estimator converges in probability almost surely to the true value of entropy.^{15}

Previous studies have shown that the variance of the estimator has a dependence on the number of samples $n$.^{15,18} This variance is bounded^{18} by $(log\u2061n)2/n$ with the order $1/n$, and therefore entropy converges to the true value at a rate of $1/n$. Since transfer entropy can be written as a sum of joint entropies, the same convergence results apply^{23} so that we weight the plug-in estimator of transfer entropy with $n$. In this way, we reward estimates with higher sample sizes when selecting the optimal partitions of a time series.

## III. REPRESENTATIVE SYSTEM OF A COLLECTIVE OF MOTILE AGENTS WITH SWITCHING LEADERS

To generate ground-truth data for testing the validity of our approach, we considered a variant of the classical self-propelled particle model in Vicsek *et al.*^{24} A similar approach was proposed in Mwaffo *et al.*^{9} although leadership roles were assumed to be constant therein. Briefly, each of the $N$ particles moves at a constant speed $s$ in two dimensions and updates its orientation to match the orientation of its neighbors, under the effect of additive noise. Given the two-dimensional position of a particle $i$ at time $t$ as $rti\u2208R2,i=1,\u2026,N$, the orientation $\theta ti$ is updated based on a function of the particle’s instantaneous neighborhood within a distance $d$ denoted by $Nti={j:\u2225rtj\u2212rti\u2225\u2264d}$ as^{24}

Here, the $arg\u2061(\u22c5)$ term captures social influence; $vti=cos\u2061(\theta ti)sin\u2061(\theta ti)T$; $|Nti|$ denotes the number of neighbors within the neighborhood set; $\eta i$, sampled from a uniform distribution $U(\u2212\eta i/2,\eta i/2)$, is the noise that captures environmental disturbances; and time $t+1$ is the next time index after $t$ in a series of regularly sampled time steps. The position is updated as $rt+1i=rti+svt+1i$.

Toward adapting this model to include leader-follower relationships, we simulated the $N$-particle system with a leader particle that updates its orientation without social influence. In other words, if $i$ is a leader particle, $\theta t+1i=\theta ti+\eta i$. The identities of leader particles could be switched randomly over a simulation. The turn rates $\Delta \theta ti$ of all the particles were used as the time series for the transfer entropy computations.

## IV. IDENTIFYING CAUSAL RELATIONSHIPS

We used an optimal partitioning algorithm which partitions a series of data to maximize a fitness function.^{16} Briefly, given a time series ${Xt,t=1,2,\u2026,T}$ defined on an interval $I$, the partitioning algorithm finds the set $P(I)={Bk,k=1,2,\u2026K}$ of blocks (or partitions) within the time series that maximizes an additive fitness function, such that the following conditions hold: the number of blocks are less than or equal to the number of samples ($0\u2264K\u2264T$); the union of the blocks results in the complete interval ($\u22c3kBk=I$); and no two blocks overlap ($Bk\u22c2Bk\u2032=\u2205$).

The partitioning algorithm serves to identify switching roles among the sets of simulated trajectories over an observation window of $T$ time steps. The input to the algorithm are the set of synchronized time series of the $N$ particles ${Xti,t=1,2,\u2026,T;i=1,\u2026,N}$ and an additive fitness function on the partitions $Bk$. The output of the algorithm is a set of successive, non-overlapping partitions that maximize an additive fitness function summed over all the partitions.

To detect when particles take a leader role in the group, we constructed an information-theoretic fitness function based on transfer entropy and weight it by the square root of the size of the partition, $Tk$, so that a partition with the same value of net information transfer but more samples have higher fitness. Specifically, we utilized the notion of net transfer entropy to quantify the degree of asymmetry in each pairwise interaction between any two particles in the group. Since the amount of information transferred in a collective system with a leader will also depend on the information available in the environment, we utilized a normalized version of transfer entropy, which is scaled by the product of the entropy of the two time series^{25,26} so that

where

is the plug-in estimate of the net transfer entropy in the interval $Bk$. Note that in entropy and transfer entropy calculations $0\u2264t\u2264|Tk|$. The proposed fitness function maximizes the total information outflow from any one particle to the rest of the group and rewards large sample sizes while normalizing with respect to available information. The leader particle was identified as the one which has the highest fitness within a partition.

The optimal partitioning algorithm performs operations on the order of $O(T2)$ to find the partition set that maximizes the global fitness $\u2211kg(Bk)$. To reduce the computational burden, we divided the dataset into equal-sized $C<T$ successive cells, each with the same $E=T/C$ amount of samples. Accordingly, this brings down the number of operations to $O(C2)$. The value of $|Tk|$ will therefore be an integral multiple of $E$ with the smallest number of samples in a partition equal to $E$.

To choose the number of bins $m$ in the plug-in estimator, we considered the bias-variance balance criteria. Specifically, for the special case when $n\u226bm,$ the relationship between bias and variance is captured by Ref. 15 in the form of the bias-variance balance function $=(variance/bias2)=n(log\u2061m)2m2$, where if $(variance/bias2)>1$, variance dominates the mean-square error in the estimate, otherwise bias dominates.^{15} Therefore, for single random variables $m<nlog\u2061m$ for variance to dominate mean square error. For the estimation of transfer entropy in (1), we should note that one must compute the joint entropy of single, pairs, or triplets of random variables. As a result, choosing one value of $n$ and of $m$ for each of the random variables will generally lead to different variance to bias ratios for each of the summand in (1). Based on numerical simulations on a few representative cases, we chose a common empirical scaling for all these summands, so that the number of bins $m=E$ to let variance dominate the bias effects for the smallest partition. With this choice, we implicitly assumed that any large bias contributions from entropy estimates of joint random variables will cancel out when the transfer entropy estimates are subtracted from each other in the fitness function (3). The value of $E$ itself should be small enough to capture the expected number of role switches.

We tested the approach over a range of parameters by varying the number of particles $N$, the severity of noise $\eta i$ (common to all the particles), the speed $s$, and the length of the time series $T$. Specifically, we simulated $N={2,4,6,8}$ particles with noise levels $\eta i={0,1,2,3,4}$, at speeds $s={0.01,0.05,0.1}$ over $T={200,400,800,1600}$ steps. The interaction radius was $d=0.5$, the domain was a square of unit size with periodic boundary conditions, and the cell size was $E=100$ with $m=10$. First 500 steps of the simulations were ignored to remove any transient effects. We applied a single switch of leadership role at $T/2$ between any two particles selected at random.

Figure 1 demonstrates the accuracy of detecting a leader over the range of parameters (five runs per parameter set). The error in leader identification is calculated as the fraction of the frames where the leader is identified incorrectly. Given $C$ cells, the probability of selecting the correct leader at random is $1/C$, and since leaders can be identified independently in each partition, the error based on chance selection is $1\u22121/C$.

As expected from past results,^{9} we find strong dependence on the severity of noise (top to bottom across all columns). Specifically, low noise values imply less information transfer and hence no clear leader; in contrast, high noise systems are easier to analyze. For high noise ($\eta i>2$), larger groups appear to aid in information transfer, likely because of higher densities in the fixed domain (left to right across last two rows). Particle speed does not affect the accuracy of the detection except at high noise values, where higher speeds give more accurate results due to more frequent interactions with the leader particle.

The length of time series also plays a role, where a switch in a longer time series is easier to identify. This can be attributed to the sample size $|Tk|$, with which a hundred samples per leader in a two-hundred sample dataset is small for a multi-dimensional probability estimate. However, for short time series of 200 time steps, the accuracy is better than chance levels for few of the high-noise systems. Although not shown, we tested the effect of cell sizes on the accuracy of detecting leader switches. This was based on the assumption that smaller cell sizes will aid in detecting switches in shorter time-series. However, we find that smaller cell sizes ($E=50$) result in worse performance. Specifically, we see many instances of false identities in single cell partitions where there should not have been any. This is likely because smaller cell sizes lead to high variance that cannot be overcome by weighting in the fitness function. As expected, larger cell sizes ($E=200$) could not partition datasets of the same size, but show improved performance in datasets with twice or more samples. Note also that some inaccuracy may also result due to the switch not occurring exactly between two cells. Assuming that the approach works perfectly, this could lead to an identification error of at most half a cell size through the whole sequence.

Figure 2 shows a sample run with four particles and three switches over 3200 time-steps, $\eta i=3$ and $s=0.05$. While all the leaders are correctly identified within the 200 time steps of when the switch takes place, this run also demonstrates that the approach can pick up false identities over short time periods.

## V. ANALYSIS OF A SOCCER DATASET

To demonstrate the use of the proposed approach on real-world data, we studied a dataset of player positions during a soccer game obtained from the literature.^{29} The dataset consists of nine (out of eleven) player positions of one of two teams sampled at 20 Hz in an inertial reference frame located at the corner of the soccer field. The ball position is available in terms of the image coordinates in pixels. We calibrated the camera parameters to minimize the difference in pixels between the projection of known locations on the field to those marked manually on the images. The camera parameters were utilized to project the player positions on the video frames. These were then used to manually encode the ball possession during each video frame by recording the identity of the player who controlled the ball during a play. Video frames during which the ball was either with the opposing team, out of field, or with an untracked own team player were encoded with a different value and, consistently, dismissed from the analysis.

The final dataset consisted of over 39 000 video frames which were sub-sampled to retain one sample every ten frames (2 Hz). Since manual observation revealed that ball possession did not last for more than 10 s and that the entire team may be able to change its playing style in 1 minute, a cell size of 50 (corresponding to 25 s) was selected to perform the analysis. An even smaller cell size would produce high variance in the entropy estimates. The player instantaneous turn rate was calculated from position data by taking the first derivative to compute velocities and then calculating the turn angle between successive directions of motion. By implementing our information-theoretic approach, we noted the players who were identified as leaders throughout the game. From this sequence, we calculated the frequency with which each player acted as the team leader.

As a first step toward exploring why leadership switches within the team, we correlated the number of occurrences that a player acted as the leader with the time that the same player possessed the ball, as shown in Fig. 3. In agreement with our intuition, we observe a positive correlation between these quantities ($p=0.059$, $R=0.646$), suggesting that players who possessed the ball for a longer fraction of the game were more likely to influence the behavior of the rest of the team, measured from their turn rate. In other words, players tended to be more responsive toward the teammate who was in possession of the ball. Interestingly, by looking at the position distribution of the players in the field from Ref. 29, we discover that the two players most frequently identified as leaders are the two central midfielders. The central midfielders are typically players that lead the defense in the midfield and launch attacks, thereby playing a critical role in the team.

In general, the behavior of players can be very complex, where individual actions depend on several local and global parameters including the score of the game that sets the attacking/defensive strategy for the whole team; the time of the game that controls the physical tiredness of the players; the specific role of the player that defines their strategic duties throughout the game; and the relative motion of teammates and opponents that should trigger immediate actions to take. All of these factors should play a role in controlling the switching of leadership among players in the group. To a first degree of approximation, our analysis hints at a simple relationship between the likelihood of a player taking a leader role in the group and their ball possession.

## VI. CONCLUSIONS

The problem of identifying causal relationships within arrays of time series is pervasive to the study of natural and engineered systems. Transfer entropy has emerged as a powerful tool to undertake this task, but several open problems limit our ability to reliably unveil causal relationships from temporal observations. A particularly elusive research question entails the possibility of identifying causal relationships that vary over time. For example, in the context of collective behavior of animal groups, we have made tremendous progress in the identification of leader-follower relationships,^{11} but our ability to detect role switches in a group remains limited. Recent studies that have addressed this problem include,^{30} where time series is partitioned into smaller sections and tested for Granger causality in a statistical sense, and,^{31} which looks for consistency in leader-follower relationships related via extreme-event synchronization over time. In this work, we have made a first, necessary step toward the inference of dynamic causal relationships by establishing a novel statistically-principled, information-theoretic fitness function to detect switching in leadership in collective behavior.

The proposed approach puts forward a new inference methodology that incorporates convergence properties of entropy estimators to weight both the quality and quantity of the information outflow from a single individual in the group. Specifically, the fitness function quantifies the overall information outflow from candidate leaders to the rest of the group over varying time partitions, weighting estimates by the length of the partition to penalize those that are from short time segments. We have illustrated our approach on a synthetic dataset consisting of a group of Vicsek self-propelled particles moving in a periodic domain. One of the particle, the leader, is unresponsive to the rest of the group. The follower particles, instead, average the orientation of their neighbors to update their motion in time. Using the Vicsek model as a ground truth, we demonstrate the performance of our approach on an exhaustive dataset where we vary the group size, noise intensity, sample size within which a role switch takes place, and particle speed. Upon gaining confidence in our approach, we analyze a real dataset, consisting of motion trajectory of a soccer team. In agreement with one’s intuition, we found that ball possession is a predictor of leadership, where the players who posses the ball are more likely to take leadership during the game.

While our approach provides a new lens through which one may infer time-evolving causal relationships, it does not come without a number of limitations that call for future research.

First, we have assumed that the group has a single leader at all times, while we know that in many situations^{32} collective behavior is determined by interactions with multiple leaders, who may also have conflicting behaviors. Our approach could be potentially extended to detect multiple leaders by searching for local optima in the fitness functions, thereby ranking individuals on the basis of their relative influence on the group in each time partition.

Second, we have hypothesized that leadership could be effectively inferred by tracking a single variable for each individual, like the turn angle for the Vicsek agents and the soccer players. However, it is tenable that multiple variables could differentially modulate leader-follower interactions within the group.^{33} For example, a soccer player could not only influence other team players by quickly changing the direction in which they are running, but also by suddenly speeding up or stopping. Our approach could also be to account for these multidimensional interactions, by either considering a vectorial representation of the time series of each individual^{34} or by encoding their behavior through a sequence of symbols, each associated with a specific behavior.^{35}

Third, the approach described here is demonstrated on first-order Markov processes; processes with longer memory will require a larger sample size to estimate transfer entropy consistently. A feasible approach to retain the present approach for processes of longer memory would be to subsample the data using expansion rates of the underlying processes.^{36}

Finally, we have assumed that most of the information flow within the group is determined by dyadic interactions that are well captured by computing transfer entropy between each pair of individuals. We cannot exclude that in some instances indirect coupling might play an important role in shaping the collective dynamics, thereby requiring the use of other information-theoretic measures that are conditioned on a third time series.^{37} Our fitness function can be be easily extended to incorporate such information-theoretic measures without significantly altering the structure of the weighting process. However, care should be placed to ensure that sufficiently long time series are available to attempt at this endeavor.

From schools of fish to human groups, there is a need for understanding how and why leadership emerges.^{38} This paper contributes a data-driven approach to address this question from raw measurements of individual trajectories, balancing quality and quantity of transfer entropy inferences over partitions of the experimental observation. Through the proposed framework, biologists may be able to illuminate the determinants of leadership and, on a lighter, less scientific side, soccer coaches identify winning strategies for their teams.^{39}

## ACKNOWLEDGMENTS

This work was supported by the National Science Foundation under Grant No. CMMI 1433670.

## REFERENCES

*Proceedings of the Pacific Visualization Symposium*(IEEE, 2011), pp. 99–106.

*X*with and without dependency on

_{t}*Y*. The VAR models are fit using ordinary least squares with a maximum time lag of 5 samples.

_{t}*Proceedings of the ACM Multimedia Systems Conference*(ACM, 2014), pp. 18–23.