In a complex system, the interactions between individual agents often lead to emergent collective behavior such as spontaneous synchronization, swarming, and pattern formation. Beyond the intrinsic properties of the agents, the topology of the network of interactions can have a dramatic influence over the dynamics. In many studies, researchers start with a specific model for both the intrinsic dynamics of each agent and the interaction network and attempt to learn about the dynamics of the model. Here, we consider the inverse problem: given data from a system, can one learn about the model and the underlying network? We investigate arbitrary networks of coupled phase oscillators that can exhibit both synchronous and asynchronous dynamics. We demonstrate that, given sufficient observational data on the transient evolution of each oscillator, machine learning can reconstruct the interaction network and identify the intrinsic dynamics.

Many complex systems in nature, science, and society can be modeled as a network of coupled oscillators. The interactions between neurons in the brain, generators in a power grid, and individuals within a crowd or social network can cause unexpected behaviors to emerge. Often, it is possible to observe the behavior of individual oscillators, but not to directly determine their intrinsic properties or the structural connections between them. In this paper, we consider the problem of reconstructing the structural interaction network, the functional form of the coupling, and the oscillator dynamics from time series data. We propose a method for computing the optimal reconstruction using techniques commonly used for training artificial neural networks. For systems that synchronize, we explore various minimally invasive types of perturbations that can be used to temporarily disrupt the synchronization, generating additional useful transient data for reconstruction. Our method can accurately reconstruct the model for systems of phase oscillators over a wide range of parameters and dynamic regimes using less data than alternative methods.

## I. INTRODUCTION

Nature and society brim with systems of coupled oscillators, including pacemaker cells in the heart, insulin-secreting cells in the pancreas, networks of neurons in the brain, cardiorespiratory synchronization, fireflies that synchronize their flashing, chemical reactions, Josephson junctions, power grids, metronomes, and applause in human crowds, to name merely a few.^{1–10} The dynamics of coupled oscillators in complex networks have been studied extensively. In particular, networks of interacting oscillators governed by the seminal Kuramoto model^{11–13} or its variants are known to exhibit a rich variety of behaviors including spontaneous synchronization, phase transitions, and pattern formation.^{14,15} The connections in the network can play a pivotal role in determining these dynamics.

When the governing equations and network topology for a system of coupled oscillators are known, it is fairly straightforward to explore the dynamics of the system using numerical methods and, in special cases, analytical methods. Unfortunately, in many systems, the topology of the interaction network and the intrinsic oscillator properties can be difficult or impossible to observe directly. Imagine, for example, experimental results that track neuronal cell network activity or gene regulatory network activity for a large number of oscillators, giving rise to time series data. One might like to infer models and connectivity information from these time series.

It is crucial to distinguish between functional network connections and structural network connections. Functional connectivity refers to the temporal correlation of the oscillators and is often directly observable. For instance, neurons that fire synchronously are functionally connected. In contrast, structural connectivity, also called network topology, refers to the underlying connections present in the network. For instance, neurons that interact via synapses or neurotransmitters are structurally connected. Unfortunately, these structural connections may not be observable without deconstructing the network.

Network reconstruction is an active area of research, both in theoretical work focused on observations from complex physical systems^{16–30} and in applied fields where simultaneously-recorded physiological signals are used to reconstruct physiological networks of interaction.^{31–33} Our primary contribution is a new algorithm for solving this inverse problem. We use observed time series data consisting of the phases from a network of coupled oscillators to reconstruct both the network topology, i.e., the structural network connections and the intrinsic oscillator dynamics. For the remainder of this paper, when we refer to inferring or reconstructing the model, we mean both of the aforementioned components: network connections and intrinsic oscillator properties.

In Sec. II B, we review the literature most closely related to our proposed model reconstruction technique. This technique addresses certain challenges of existing methods. One such challenge is that existing methods set up the inverse problem as a linear system that involves many unknown parameters. Consequently, a large amount of time series data is necessary. In contrast, our algorithm results in a system of equations that is smaller and is nonlinear. We solve these using optimization tools designed for training artificial neural networks. As a result, we are able to infer the model with a much smaller amount of data. A second challenge, as mentioned in Refs. 18,22,28, and 29, is that it is difficult to infer the model for networks that are synchronized. For such networks, we explore which perturbations of the synchronized state are needed to enable accurate inference of the model.

The mathematical setting of our study is oscillator networks composed of limit-cycle oscillators. These are also known as phase oscillators, meaning they are characterized by a single phase variable defined on the circle. We explore four different choices of oscillator models, all of which fall into the general framework for uniformly coupled phase oscillators developed in Ref. 34: the classic Kuramoto model,^{11} a Kuramoto-like model with square-wave coupling function, the Kuramoto-Sakaguchi model,^{35} and a phase-field reduction of weakly-coupled Hodgkin-Huxley oscillators^{36} (see Appendix A). We take all networks to be Erdös-Rényi random graphs.^{37} We attempt to infer the network topology, the intrinsic frequency of each model oscillator, and the so-called coupling function that specifies the influence of one oscillator on others.

To carry out the model reconstruction, we begin with simulated time series for the oscillator phases and estimated phase velocities. We set up an optimization problem involving the mean squared error for the predicted phase velocities generated by a system of nonlinear differential equations, which can be represented as a convolutional neural network. We then use numerical minimization methods to estimate the optimal parameters, thereby inferring the adjacency matrix for network connectivity, the oscillator coupling strength, the oscillator frequencies, and the Fourier coefficients of the coupling function. This approach is effective for a variety of different networks and model parameters. More specifically, we find the following:

Accurate inference of the coupling network, frequencies, and coupling functions is possible independent of the model when the system does not synchronize or when sufficient perturbations from synchronization are permitted.

For small data sets, model reconstruction with our method is more accurate than with alternative methods based on solving linear systems of equations.

Computational methods designed for training artificial neural networks, such as mini-batch gradient descent implemented in TensorFlow,

^{38}can be used to solve the optimization problem associated with model reconstruction.Synchronizing networks can be reconstructed using random phase resets or sufficiently large phase perturbations to a small, randomly chosen set of oscillators.

Synchronizing networks can also be reconstructed using perturbations to a sufficiently large fixed subset of oscillators.

The rest of this paper is organized as follows. In Sec. II, we introduce the Kuramoto model (a standard model for the dynamics of coupled oscillators), set up the inverse problem for reconstruction, and discuss approaches for solving this inverse problem. In Sec. III, we describe a series of experiments used to test the effectiveness of these reconstruction techniques over a wide range of model parameters. In Sec. IV, we present the results of the aforementioned experiments and discuss perturbation strategies for improving the reconstruction when the system synchronizes. Finally, in Sec. V, we summarize our primary findings and discuss extensions to our methodology.

## II. MODELS AND METHODS

The Kuramoto model is a standard model for the dynamics of coupled oscillators. We consider $N$ oscillators with phases $\theta k\u2208[0,2\pi )$ for $k=1,2,\u2026,N$, each with an intrinsic frequency $\omega k$. These oscillators are coupled through an interaction network with adjacency matrix $A$, with the entry $Akj\u2208{0,1}$ determining whether oscillators $k$ and $j$ are coupled. The dynamics of $\theta k$ are governed by the equation

Here, $\sigma kj$ represents the strength of the coupling between oscillators $j$ and $k$, and $\Gamma (\theta )$ represents the coupling function. If the coupling function $\Gamma (\theta )$ is continuous and $2\pi $-periodic, and $\Gamma \u2032(\theta )$ is piecewise continuous, then it can be represented using a uniformly convergent Fourier series

where the coefficients satisfy $an,bn\u2264M/n2$ for some $M$. Therefore, these coefficients decay to $0$ for large $n$, and this function can be approximated to arbitrary accuracy using a truncated Fourier series

with sufficiently large $m$. Note that within the model, one can set $a0=0$ without loss of generality, using the change of variables $\omega k+\u2211j=1N\sigma kjAkja0\u2192\omega k$.

In Kuramoto’s original formulation,^{11} the oscillators were globally coupled ($Akj=1$) with coupling strengths $\sigma kj=K/N$, where $K$ scales the global coupling strength. He used the coupling function $\Gamma (\theta )=sin\u2061(\theta )$ and showed that, for $K>KC$ where $KC$ is a critical value related to the width of the distribution of intrinsic frequencies, the oscillators begin to synchronize, achieving identical phase velocities with phases distributed around the population mean. Analogous results were obtained later for systems with periodic coupling functions possessing only odd harmonics^{12,34} and arbitrary complex networks.^{39}

### A. Simulated data generation

We investigate a method for inferring the parameters of the Kuramoto model for phase oscillators on random graphs with uniform coupling strengths $\sigma kj=K/N$. In our experiments, we focus on undirected Erdös-Rényi graphs, where each possible edge in the network is present with a probability of $p$. It is straightforward to extend our approach to arbitrary networks with nonuniform coupling strengths, though this would likely require a larger amount of time series data. The details of our method appear in Secs. II B and II C.

In order to test our method, we generate data according to the following procedure:

Generate an Erdös-Rényi network for a fixed connection probability $p$, where the nodes represent individual oscillators with natural frequencies sampled from a normal distribution with mean $\mu $ and standard deviation $\sigma $.

Use numerical integration to generate a time series for the evolution of this system for $t\u2208[0,tmax]$, starting from initial phases drawn from a uniform distribution on $[0,2\pi ]$. For numerical integration, we use an explicit Runge-Kutta method of order 5(4) (Ref. 40) or a stochastic Runge-Kutta method of order 2 (Ref. 41).

Compute the phases $\theta k(tn)$ at times $tn=n\Delta t$ with time step $\Delta t$ and for $n=0,1,\u2026,T+1$, where $T=tmax/\Delta t\u22121$, to obtain observations that are evenly spaced in time.

Estimate the phase velocities $vk(tn)=\theta \u02d9k(tn)$ for $n=1,2,\u2026,T$, using central differencing in time with Savitsky-Golay filtering.

^{42}We use a window length of $5$ with first degree polynomials.Repeat steps (2–4) $Nres$ times with different uniform random initial conditions to obtain sufficient data during the transient evolution of the system.

See Tables I and II for the network parameters and numerical solution parameters used in this procedure. For these default values, one typically finds that some of the oscillators synchronize with a collective rhythm, while others drift relative to this collective rhythm.

Parameter . | Description . | Value . | Sweep values . |
---|---|---|---|

N | Number of oscillators | 10 | {5, 10, 20, 40} |

μ | Average intrinsic frequency | 1.0 | N/A |

σ | Standard deviation of intrinsic frequencies | 0.5 | {0.01, 0.1, 1.0} |

p | Network connection probability | 0.5 | {0.1, 0.2, …, 0.9} |

Γ(θ) | Coupling function | sin (θ) | See Appendix A |

dyn_noise | Noise in system dynamics | 0 | {0, 10^{−5}, 10^{−4}, …, 10^{0}} |

Parameter . | Description . | Value . | Sweep values . |
---|---|---|---|

N | Number of oscillators | 10 | {5, 10, 20, 40} |

μ | Average intrinsic frequency | 1.0 | N/A |

σ | Standard deviation of intrinsic frequencies | 0.5 | {0.01, 0.1, 1.0} |

p | Network connection probability | 0.5 | {0.1, 0.2, …, 0.9} |

Γ(θ) | Coupling function | sin (θ) | See Appendix A |

dyn_noise | Noise in system dynamics | 0 | {0, 10^{−5}, 10^{−4}, …, 10^{0}} |

Parameter . | Description . | Value . | Sweep values . |
---|---|---|---|

t_{max} | Duration of each transient | 20 | {2, 5, 10, 20, 50} |

Δt | Sampling time step | 0.1 | N/A |

Noise | Observation noise | 0 | {0, 10^{−5}, 10^{−4}, …, 10^{0}} |

N_{res} | Number of transients observed | 10 | {1, 2, 5, 10, 20, 40} |

Parameter . | Description . | Value . | Sweep values . |
---|---|---|---|

t_{max} | Duration of each transient | 20 | {2, 5, 10, 20, 50} |

Δt | Sampling time step | 0.1 | N/A |

Noise | Observation noise | 0 | {0, 10^{−5}, 10^{−4}, …, 10^{0}} |

N_{res} | Number of transients observed | 10 | {1, 2, 5, 10, 20, 40} |

The aforementioned process is intended to produce simulated data mimicking that which experimentalists might collect when observing real world networks. Of course, it may not be possible to control the initial phases in an experiment. We, therefore, evaluate the reconstruction methods for varying $Nres$, as well as for cases in which small perturbations are used instead of different initial conditions (see Sec. IV A).

### B. Inverse problem formulation

We now formulate a set of equations whose least squares solution can be used to estimate the natural frequencies $\omega k$ of each oscillator, the adjacency matrix $Akj$ of the coupling network, the coupling strength $K$, and the coupling function $\Gamma (\theta )$, from observed phases $\theta k(tn)$ and phase velocities $vk(tn)$ generated using the method outlined above. This work builds on prior work by Shandilya and Timme,^{18} where the adjacency matrix for a network of oscillators is estimated from a time series. We summarize their approach below.

Define vectors $\theta j=[\theta 1(tj),\theta 2(tj),\u2026,\theta N(tj)]T$ and $vj=[\theta \u02d91(tj),\theta \u02d92(tj),\u2026,\theta \u02d9N(tj)]T$ consisting of the oscillator phases and phase velocities. Note that here $T$ corresponds to the matrix transpose. When the coupling strength $K$ and coupling function $\Gamma $ are known, the phase velocity for oscillator $k$,

is linear in the unknown parameters $\omega k$ and $Akj$. Therefore, one can infer both the natural frequencies and adjacency matrix by solving a linear system of $N\u22c5T$ equations, where $T$ is the number of time steps, and each time step $j$ provides $N$ equations of the form

Here, $L(\theta j)$ is an $N\xd7(N+N2)$ matrix where each row is determined from Eq. (4) for a particular oscillator, and $x$ is an $(N+N2)\xd71$ vector with the unknown natural frequencies $\omega $ and the entries in the adjacency matrix $A$. The number of unknowns in $A$ can be reduced to $N+N(N\u22121)/2$ if one assumes $Ajj=0$ and $Akj=Ajk$.

Since the number of equations in the linear system is determined by the number of observed time steps $T$ and the number of oscillators $N$, one can ensure that the system is overdetermined by collecting enough observations so that $T>1+N$. One can then estimate the parameters by minimizing a loss function such as the mean squared error,

where $vj$ denotes the observed value and $v^j$ denotes the predicted value of the velocity.

As long as the system remains far from synchronization, increasing the number of observations $T$ will provide additional linearly independent equations to aid with reconstruction. However, as networks synchronize, i.e., $(vj)k=\theta \u02d9k\u2192\theta \u02d9$, additional observations become (nearly) linearly dependent. This causes the system of equations to become highly ill-conditioned and makes numerical solutions sensitive to noise and rounding errors. As a result, it is often necessary to perturb the system away from equilibrium to ensure that a sufficient number of linearly independent observations can be collected.

Although the approach of Ref. 18 outlined above is effective, it is limited by the requirement that the coupling function be known *a priori*. This limitation was addressed in the context of pulse-coupled oscillators in Ref. 28 and phase-coupled oscillators in Ref. 29 by representing the unknown functions as a Fourier series so that it can also be estimated by solving an optimization problem. Unfortunately, one challenge remains. If one represents the coupling function as in Eq. (3), then the system of equations is no longer linear as terms of the form $Akjan$ and $Akjbn$ appear. Pikovsky^{29} circumvents this issue by defining distinct coupling functions

for each pair of oscillators and by setting $Akj=1$. In this formulation, if oscillators $k$ and $j$ are uncoupled, then $akjn=bkjn=0$ for all $n$. This modification preserves the linearity of the system of equations and allows for the description of a more general class of networks with distinct coupling functions for each pair of oscillators. However, it comes at a cost: the number of unknown parameters increases dramatically from $N+N2$ to $N+2mN2$. Therefore, even longer observation times are required. Additionally, as with the method of Ref. 18, if the system synchronizes, there may be numerical difficulties when inferring parameters. Finally, the large number of free parameters makes this model particularly prone to overfitting. As before, one could assume that connections in the network are symmetric, i.e., that $\Gamma kj(\theta )=\Gamma jk(\theta )$ and that self edges are not included $\Gamma jj(\theta )=0$, allowing for a reduction in the number of parameters to $N+mN(N\u22121)$.

We propose an alternative approach. We use a single coupling function represented by a Fourier series as described in Eq. (3) for all coupling terms and attempt to infer the $2m$ Fourier coefficients $a=[a1,a2,\u2026,am]T$ and $b=[b1,b2,\u2026,bm]T$ along with the intrinsic frequencies $\omega $, adjacency matrix $A$, and global coupling strength $K$. This leads to a total $2m+N+N2+1$ inferred parameters or $2m+N+N(N\u22121)/2+1$ with symmetry constraints and no self-connections. This is, therefore, a modest increase of $2m+1$ parameters over the case considered by Shandilya and Timme^{18} without requiring prior knowledge of the coupling function.

As mentioned previously, this leads to a set of nonlinear equations due to the appearance of terms of the form $KAkjan$ and $KAkjbn$ in Eq. (4). Since $K$ always appears in products involving $an$ and $bn$, these parameters are not structurally identifiable. One could set $K=1$ without loss of generality. Instead, we include $K$ as an inferred parameter and introduce penalty terms to the objective function we seek to minimize,

The inclusion of a penalty function ensures the existence of a nondegenerate local minimum. In this objective function, the first term represents the mean squared error. The remaining terms are penalty terms with weights $\lambda \Gamma =0.0001$, $\lambda A=10\u22126$, and $\lambda bd=105$. These hyperparameter values were tuned initially to provide satisfactory reconstruction performance and then kept constant in all subsequent experiments (see also Table III). Additional tuning of these parameters for specific networks could further improve the accuracy of the model reconstruction. The term with $\lambda \Gamma $ introduces $L2$ regularization, which favors smaller estimates of the parameter values $an$ and $bn$. This is useful for combating overfitting and promoting sparse representations and is analogous to using a Bayesian prior centered at 0 for the Fourier coefficients.^{43} The term with $\lambda A$ penalizes values in the adjacency matrix that are not close to 0 or 1. The last term with $\lambda bd$ penalizes adjacency values that are negative or greater than 1 to ensure that the estimates fall within the desired range of $[0,1]$. We do not penalize $K$ and instead allow it to be as large as necessary to counteract the parameter shrinkage caused by the penalty terms.

Parameter . | Description . | Value . |
---|---|---|

n_epochs | Iterations through the training data | 300 |

Batch size | Time steps of data per gradient descent batch | 100 |

m | Number of inferred Fourier coefficients | 5 |

$\lambda \Gamma $ | Weight of penalty on nonsparse coupling | 0.0001 |

λ_{A} | Weight of penalty on network connectivity | 10^{−6} |

λ_{bd} | Weight of penalty on $Akj\u2209[0,1]$ | 10^{5} |

Parameter . | Description . | Value . |
---|---|---|

n_epochs | Iterations through the training data | 300 |

Batch size | Time steps of data per gradient descent batch | 100 |

m | Number of inferred Fourier coefficients | 5 |

$\lambda \Gamma $ | Weight of penalty on nonsparse coupling | 0.0001 |

λ_{A} | Weight of penalty on network connectivity | 10^{−6} |

λ_{bd} | Weight of penalty on $Akj\u2209[0,1]$ | 10^{5} |

### C. Inverse problem solution

Minimizing Eq. (6) poses a number of challenges. First of all, due to the nonlinearity of Eq. (4), this function may not be convex, and there are no guarantees that local optimization methods will converge to a global minimum. Secondly, when the number of observations $T$ is large, this function may be costly to compute, and minimizing the number of function evaluations is paramount.

Fortunately, the theoretical challenge of nonconvexity does not prevent our method from obtaining consistently reliable model reconstruction, as we show in Sec. IV. The computational difficulties can be addressed by using tools designed for artificial neural networks and implemented in TensorFlow to efficiently compute gradients and perform minimization with a variant of mini-batch gradient descent.^{44}

To elucidate the connection with artificial neural networks, we comment that Eq. (3) can be viewed as a two-dimensional convolutional neural network with a convolution of size $1\xd71$ and stride one applied to a tensor of phase differences $\Delta \theta i,j,k=\theta k(ti)\u2212\theta j(ti)$ with $i=1,2,\u2026T$ and $k,j=1,2,\u2026N$. This network consists of an input layer with dimensions $T\xd7N\xd7N$ (the dimensions of the phase difference tensor), a hidden layer with dimensions $T\xd7N\xd7N\xd72m$ corresponding to each of the $2m$ harmonics of the form $cos\u2061n\Delta \theta i,j,k$ and $sin\u2061n\Delta \theta i,j,k$, followed by a $T\xd7N\xd7N$ output layer representing $\Gamma (\Delta \theta i,j,k)=\u2211n=1m[ancos\u2061n(\Delta \theta i,j,k)+bnsin\u2061n(\Delta \theta i,j,k)]$. A traditional fully connected neural network includes connections between all nodes in consecutive layers, each with a distinct weight. In contrast, a convolutional network uses sparse connections and shared weights that allow it to represent transformations that are equivalent to the mathematical operation known as a convolution. In our network, we only include connections between nodes that correspond to the same phase difference $\Delta \theta i,j,k$ and use weights that are independent of the indices $i$, $j$, and $k$. The weights in the first hidden layer are fixed at one, and the biases are fixed at zero. In the output layer, each harmonic is assigned a fixed bias of zero and a variable weight $an$ or $bn$ corresponding to the Fourier coefficients. Figure 1 provides a schematic of this transformation as a neural network. Once the coupling terms have been computed, the rest of Eq. (4) and the resulting loss function [Eq. (6)] are straightforward to compute using vectorized operations.

We initialize the inferred parameters as follows: the adjacency matrix $A$ has initial entries drawn from $N(0.5,1/N)$, the frequencies $\omega k$ are initialized from $N(0,1/N)$, the coupling strength $K$ is drawn from $N(1,1/N)$, and the Fourier coefficients $a,b$ of the coupling function are initialized at $0$.

TensorFlow maintains a computational graph for these operations, which allows one to automatically compute gradients. One can, therefore, use gradient descent methods to compute the optimal estimates ($a^n$,$b^n$,$\omega ^k$,$A^kj$,$K^$) for the parameters. We used a mini-batch gradient descent method with a batch size of $100$; i.e., we randomly assign the data to batches of 100 time steps. For each batch, we calculate the gradient of the loss function and update the estimated parameters by taking a small step in a direction determined from the gradient. Once this has been repeated for all batches (one epoch), we pass through the data again for a total of $200$ or more epochs. We determine the number of epochs experimentally by iterating until the mean squared error no longer decreases. Typically, 300 epochs are sufficient. However, for certain sweeps examining the role of $tmax$, $Nres$, and $\sigma $, the number of epochs required for convergence is as high as 10 000.

By using small random batches of time steps, the gradient can be estimated quickly. This has the added benefit of introducing stochasticity into gradients, which makes the algorithm less susceptible to getting caught in local minima. We use AdamOptimizer, which is a gradient descent method with momentum and an adaptive learning rate,^{45} with default parameter values for optimization in TensorFlow.

There is minor variability in the success of the algorithm due to the random initialization of the inferred parameters and to the batching of the observed data. In order to ensure an accurate network reconstruction, we can retrain the model with new initial values several times and choose the result that has the smallest mean squared error for the velocity predictions on the validation set, which was not considered during the training process. For any individual model network, this validation error is correlated with the accuracy of the inferred parameters. In all examples below, we attempt to reconstruct each network five times before choosing the best reconstruction.

### D. Postprocessing

The method described in Sec. II C produces continuous estimates for the parameters ($a^n,b^n,\omega ^k,A^kj,K^$), which could be used to predict the dynamics of the system under a variety of initial conditions. However, we are interested in evaluating whether these parameter estimates are an accurate reconstruction of the original model. Before these values can be compared to the model parameters used to generate the data, additional postprocessing is needed. We, therefore, redefine our parameter estimates via the transformations (in order) $K^A^kj\u2192A^kj$, $c0+c1\Gamma ^\u2192\Gamma ^$, where $c0$ and $c1$ are selected to minimize $\u222b02\pi |\Gamma (\theta )\u2212\Gamma ^(\theta )|d\theta $, and $\omega ^k\u2212c0c1N\u2211j=1NA^kj\u2192\omega ^k.$ These transformations are necessary to address the aforementioned identifiability issues with $K$ and to allow $\Gamma $ to have a nonzero mean.

In the original model, the adjacency matrix entries are either zero or one. However, we treat the entries as continuous variables during optimization and then choose a threshold $\u03f5$ so that $Akj<\u03f5$ is chosen to be 0 and $Akj\u2265\u03f5$ is chosen to be 1. In practice, one could fix $\u03f5$ or select $\u03f5$ so that the reconstructed model minimizes the mean squared error for the data set. However, we explore a range of threshold values using receiver operating characteristic (ROC) curves, which measure the quality of a classifier independent of any particular threshold (see Appendix B). We also report the reconstruction error rates using the value of $\u03f5$ that yields the largest $F1$ score for the adjacency matrix reconstruction. The $F1$ score is the harmonic mean of precision and recall and has a maximum value of 1, which corresponds to perfect reconstruction (see Appendix B). These optimal values are robust, and good performance is typically obtained over a wide range of thresholds (see Sec. IV for details).

## III. EXPERIMENTAL DESIGN

In order to validate the robustness of our approach, we test the reconstruction method on data simulated with a variety of networks and parameter values. We explore this high-dimensional parameter space by fixing all of the model parameters except for one and then sweeping the remaining parameter over a wide range of values. See Tables I and II for a list of default values as well as the ranges considered. For each set of parameter values, we compute 30 networks with random initial phases and intrinsic frequencies. We then attempt to reconstruct the underlying model for each network and calculate various performance metrics for each reconstruction.

### A. Coupling function

We evaluate the inferred coupling function $\Gamma ^(\theta )$ by comparing it to the true coupling function $\Gamma (\theta )$ via the “normalized difference in area” defined as follows:

This represents the area between the true and estimated coupling function curves, weighted by the area under the curve of the true function. This quantity serves as a measure of the error in the reconstruction. A perfect reconstruction would have a normalized difference in area of zero. The initial estimate $\Gamma ^(\theta )=0$ corresponds to a value of 1. Therefore, values significantly lower than 1 indicate progress toward a correct reconstruction.

### B. Intrinsic frequencies

We compare the inferred intrinsic frequencies $\omega ^k$ to the true intrinsic frequencies $\omega k$ using the “mean absolute deviation” defined as follows:

Values near zero indicate accurate reconstructions. We considered alternative metrics such as relative deviations and the correlation between true and inferred frequencies, but these were less informative because they tend to amplify errors when the intrinsic frequencies are close to zero.

### C. Adjacency matrix

We investigate several methods for evaluating the success of adjacency matrix reconstruction. The accuracy can be inspected visually by plotting the true and reconstructed matrices along with the absolute differences in a grid where values range from $0$ (black) to $1$ (white); see Figs. 2(a)–2(c). Since we are primarily interested in discrete adjacency values, we interpret reconstruction as a classification problem and compute three standard evaluation metrics: the $F1$ score, the classification “error rate” for the connections, and the “area under the ROC curve.” See Appendix B for precise definitions of these metrics.

## IV. RESULTS

Our method can successfully reconstruct models with a variety of coupling functions. Figure 3(a) and Table V show the results for four different coupling functions (see Appendix A for details), three of which come from popular coupled oscillator models: the Kuramoto model, the Kuramoto-Sakaguchi model, and a phase reduction of the Hodgkin-Huxley model, and a fourth, a square wave, which represents a generic discontinuous function with high frequency Fourier harmonics. For the square-wave coupling function, the accuracy of the reconstruction suffers since the reconstruction includes only five Fourier harmonics, and the true coupling function contains an infinite number of harmonics with amplitudes that decay slowly due to the discontinuities. Despite these limitations, the adjacency matrix was still estimated with a high degree of accuracy.

Counterintuitively, as the number of oscillators increases [Fig. 3(b) and Table VI], we find that reconstruction accuracy of the coupling function improves initially (from $N=5$ to $N=20$). The negative slope in the normalized difference in error is statistically significant with a $p$-value of $0.0001$. This can be explained by the observation that although increasing the number of oscillators increases the number of unknown parameters, it also provides a greater number of pairwise phase differences, which can aid in reconstruction of the coupling function.

In Fig. 4 and Table VII, we consider simulations where the standard deviation $\sigma $ of the oscillator frequencies ranges from $0.01$ to $1$. The normalized difference in area of the coupling function and the mean absolute deviation of the inferred frequencies both confirm that we can infer the network successfully for $\sigma \u22641$. For larger standard deviations, the quality of the reconstruction suffers slightly, since the large frequencies dominate the coupling terms within the phase velocity relationship.

We also explore the impact of noise on the model reconstruction since both observation noise and dynamic noise would be present in experiments. As in Ref. 18, we introduce dynamic noise using stochastic differential equations with Gaussian noise. We also add Gaussian noise to the observed phases after integrating the governing differential equations. In Tables VIII and IX, we demonstrate that our model reconstruction method is robust to moderate amounts of both types of noise. For both types of noise, we use mean $0$ and standard deviations between $0$ and $1$. Figure 5 shows that dynamic noise with a standard deviation less than $10\u22123$ helps with the reconstruction of both coupling function and frequencies. This can be explained by the fact that a small amount of noise keeps the system from reaching equilibrium, effectively increasing the duration of the transients that provide useful information about the structure of the network.

We carry out additional parameter sweeps with varying network connectivity $p$ (Table X), maximum simulation time $tmax$ (Table XI), and the number of simulation restarts (Table XII) with all other parameters set using the default values given in Tables I–III. In each case, we are consistently able to reconstruct the coupling function, the underlying network, and the intrinsic frequencies over a wide range of model parameters. Tables V–XVII in Appendix C illustrate averaged numerical results for 30 trials of each of these parameter sweeps in terms of evaluation metrics such as normalized difference in area, mean absolute deviation, error rate, area under the ROC curve, and a range of thresholds that yield $F1$ scores above $90%$ of the largest value.

### A. Results for perturbations of synchronous dynamics

As Refs. 18 and 29 point out, when a network remains synchronized, i.e., when $\theta \u02d91=\theta \u02d92=\cdots =\theta \u02d9N$, one cannot infer the model parameters due to the fact that the observed phases no longer provide linearly independent equations. Furthermore, even with precise knowledge of the adjacency matrix, one could not hope to reconstruct the coupling function $\Gamma (\theta k\u2212\theta j)$ without data over a wide range of phase differences $\theta k\u2212\theta j$.

We investigate a method for using small perturbations to introduce brief transients into the dynamics to allow for successful model reconstruction. To test this, we use nearly identical oscillators with frequency standard deviation $\sigma =0.0001$, which causes the system to quickly converge to a synchronized state for almost all initial conditions. We then initialize $\theta (0)=0$ so that the system begins from perfect synchrony. Then, at times $ktmax/Npert$ for $k=1,2,\u2026Npert$, we add a phase perturbation to a subset of the oscillators. Each perturbation causes some of the oscillators to briefly become desynchronized. In this way, the total number of observed phases remains constant, but the fraction of those observations that occur during transient dynamics is proportional to the number of perturbations, $Npert$. The observed phases during these transients provide meaningful data about the structure of the network. Figure 6 demonstrates that, as expected, with too few or too small perturbations, model reconstruction is unsuccessful. However, as the number of perturbations $Npert$ increases, the accuracy of the estimated coupling function, adjacency matrix, and the oscillator frequencies improves.

We explore two methods for selecting which oscillators to perturb: fixed subsets, in which the subset of perturbed oscillators is selected at the beginning of the experiment and these same oscillators are perturbed repeatedly, and random subsets, in which a random subset of the system is selected for each perturbation. Although a perturbation to the phase of a particular oscillator does propagate to the phases of neighboring oscillators through the coupling terms, those perturbations decay quickly and are, therefore, most effective for revealing the local structure of the network. As such, although perturbations to a fixed subset might be more practical in a physical experiment, one must typically perturb a larger fraction of the system to obtain comparable performance to that which is obtained with perturbations to random subsets.

We also consider two types of phase perturbations: phase resets, in which selected oscillators have their phases reset to 0, and phase shifts, in which selected oscillators have their phases modified by adding a random shift $\eta pert\u223cN(0,\sigma pert2)$. Phase resets may be more feasible from an experimental perspective but have the drawback of preserving the mutual synchrony of the subset of oscillators that are perturbed. As such, one will typically need to use random subsets in tandem with phase resets in order to be able to resolve the connections between the perturbed oscillators.

In Figs. 6(a)–6(c), we use phase shifts with $\sigma pert=0.01$ to a fixed subset of size 1, i.e., a single oscillator out of $10$. This gives poor reconstruction regardless of the number of perturbations due to the small size of the perturbation. On the other hand, in Figs. 6(d)–6(f), we perturb a fixed subset (3 out of 10 oscillators) with random phase shifts with $\sigma pert=10$. (These phase shifts are virtually indistinguishable from uniform perturbations $X\u223cU[\u2212\pi ,\pi ]$.) In this case, the perturbations affect a sufficiently large proportion of the oscillators, and performance begins to improve once there are 5 or more perturbations. The results in Figs. 6(g)–6(i) illustrate that one only needs to perturb 1 out of 10 oscillators to obtain similar performance when the oscillators selected for phase shifts are chosen randomly. In Figs. 6(j)–6(l), we show the results using phase resets to random subsets of 3 oscillators. Again, performance begins to improve dramatically once 5 or more perturbations are used. The tables summarizing the results of these perturbation strategies are provided in Appendix C.

### B. Comparison with previous work

The amount of transient data necessary for reconstruction is also useful in comparing our method to previous approaches. As discussed in Sec. II B, Pikovsky proposes an alternative method where distinct coupling functions are considered for the interaction of each pair of oscillators.^{29} This ensures that the system of equations is linear; however, it also increases the number of unknown coefficients significantly. Our approach relies on the assumption that the same coupling function $\Gamma $ is used for all pairs of oscillators. This is a reasonable approximation for physical systems where the physical mechanisms governing the interactions between oscillators are the same.

Figure 7 shows that when the coupling functions are identical, our method provides more accurate reconstructions of both the network topology and the coupling function with smaller amounts of transient data. As the amount of transient data increases, both methods ultimately achieve perfect network reconstruction, while the approach in Ref. 29 ultimately obtains slightly better estimates for the coupling function. Therefore, our approach would be preferred under circumstances where the amount of data is limited and when the coupling functions are nearly identical.

## V. DISCUSSION AND CONCLUSIONS

In this paper, we have designed a method for reconstructing models of coupled oscillator networks including both the network connections and the intrinsic oscillator properties. We hope analysts might investigate (non)convexity of our penalty function and properties of minimizers. These issues aside, after testing our method with many different parameters, we conclude that our algorithm can successfully infer the model.

A common challenge that our method and many related ones encounter is that the procedure fails if the system synchronizes too quickly. One remedy is to have a sufficient number of observations during the desynchronized transients. As we demonstrate, one can ensure that these data are available by repeating experiments with multiple initial conditions or by adjusting the model parameters to inhibit synchronization by instituting large frequency variability ($\sigma $) or dynamic noise. An alternative remedy is to move the system away from synchrony using perturbations, preferably ones that are physically realizable. The introduction of these perturbations provides useful transient data. By perturbing a sufficiently large subset of oscillators with a large enough change, we are able to infer the model accurately. Our hope is that this method will be adopted by experimentalists and used with experimental data to aid with the construction of interpretable models for the dynamics of networks of coupled oscillators.

Although the numerical experiments outlined here involve Erdös-Rényi networks, the method can be applied more generally to a broad class of networks. Preliminary testing suggests that similar results can be obtained for other topologies such as star, small-world, scale-free networks, and clique networks.

It is also straightforward to extend this approach to other models for coupled phase oscillators such as the Winfree model^{46} or even more general oscillator models such as the Stuart-Landau model^{47} in which both phase and amplitude variations are permitted. Indeed, any system where the unknown functions are periodic can be represented using our technique.

For models containing unknown functions that are not periodic such as the Hodgkin-Huxley model,^{48} a Fourier series representation is not possible. In these cases, one could represent the coupling functions using feed-forward neural networks, which are capable of representing continuous functions to arbitrary accuracy using a finite number of parameters.^{49} Given a sufficiently rich data set, one could still use our approach with backpropagation to learn the structure of the neural network approximation for the coupling function.

## ACKNOWLEDGMENTS

This material is based upon work supported by the Mathematics Research Communities of the American Mathematical Society, under the National Science Foundation (NSF) grant (No. DMS-1321794). The project was initiated during the Mathematics Research Community (MRC) on Agent-based Modeling in Biological and Social Systems (2018). A follow-up visit was also supported by a collaboration travel grant from the AMS MRC program. M.J.P. was supported by Hillsdale College. M.-V.C. was supported by The Ohio State University President’s Postdoctoral Scholars Program and by the Mathematical Biosciences Institute at The Ohio State University through NSF DMS-1440386. C.M.T. was supported by the National Science Foundation (Grant No. DMS-1813752) and by Williams College. B.X. was supported by the Robert and Sara Lumpkins Endowment for Postdoctoral Fellows in Applied and Computational Math and Statistics at the University of Notre Dame. We are grateful to Henry Adams, Kelsey Houston-Edwards, and Lori Ziegelmeier for contributions during the formative stages of this work.

### APPENDIX A: COUPLING FUNCTIONS

The coupling functions investigated are described in Table IV. The first three classes of functions were selected due to their use in popular coupled oscillator models. The last was an example of a generic coupling function with higher order harmonics that was tested to verify that reconstruction of the adjacency matrix is possible even with an imperfect approximation to the coupling function.

Name . | Function . | Reference . |
---|---|---|

Kuramoto | sin (θ) | 11 |

Kuramoto-Sakaguchi | sin (θ − 0.1) | 35 |

Hodgkin-Huxley | 0.383 + 1.379sin (θ + 3.93) + 0.568sin (2θ + 0.11) + 0.154sin (3θ + 2.387) | 36 |

Square wave | $sin\u2061(\theta \u2212\pi /4)|sin\u2061(\theta \u2212\pi /4)|$ | N/A |

### APPENDIX B: EVALUATION METRICS FOR CLASSIFICATION

The $F1$ score is defined as

Here, $precision$ is the fraction of true positives among all inferred positives, while $recall$ is the fraction of true positives among all positives. The $F1$ score is a value between $0$ and $1$; when reporting the error rate for the reconstructed adjacency matrix, we use the threshold $\u03f5=\u03f5max$ that corresponds to the largest $F1$ score. We also determine the interval of thresholds that yield $F1$ scores above 90% of this largest value. The width of this interval is reported in Tables V–XVII in Appendix C as “Interval width $(>90%)$.”

Param . | Value . | Normalized difference in area . | Mean absolute deviation . | Error rate . | Area under the ROC curve . | Interval width ( > 90%) . |
---|---|---|---|---|---|---|

Γ | Kuramoto | 0.0175 ± 0.0075 | 0.004 ± 0.001 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.4338 |

Γ | Kuramoto-Sakaguchi | 0.0169 ± 0.0054 | 0.005 ± 0.002 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.3924 |

Γ | Hodgkin-Huxley | 0.0343 ± 0.0159 | 0.011 ± 0.004 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.7054 |

Γ | Square wave | 0.1518 ± 0.0044 | 0.006 ± 0.002 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.4059 |

Param . | Value . | Normalized difference in area . | Mean absolute deviation . | Error rate . | Area under the ROC curve . | Interval width ( > 90%) . |
---|---|---|---|---|---|---|

Γ | Kuramoto | 0.0175 ± 0.0075 | 0.004 ± 0.001 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.4338 |

Γ | Kuramoto-Sakaguchi | 0.0169 ± 0.0054 | 0.005 ± 0.002 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.3924 |

Γ | Hodgkin-Huxley | 0.0343 ± 0.0159 | 0.011 ± 0.004 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.7054 |

Γ | Square wave | 0.1518 ± 0.0044 | 0.006 ± 0.002 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.4059 |

Param . | Value . | Normalized difference in area . | Mean absolute deviation . | Error rate . | Area under the ROC curve . | Interval width (> 90%) . |
---|---|---|---|---|---|---|

N | 5 | 0.026 ± 0.0125 | 0.0052 ± 0.003 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.1193 |

N | 10 | 0.0164 ± 0.007 | 0.0047 ± 0.0031 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.4057 |

N | 20 | 0.015 ± 0.0057 | 0.0046 ± 0.0016 | 0.140 ± 0.601 | 0.9997 ± 0.0016 | 0.3041 |

N | 40 | 0.0189 ± 0.0104 | 0.0045 ± 0.0011 | 1.231 ± 1.018 | 0.9946 ± 0.0053 | 0.223 |

Param . | Value . | Normalized difference in area . | Mean absolute deviation . | Error rate . | Area under the ROC curve . | Interval width (> 90%) . |
---|---|---|---|---|---|---|

N | 5 | 0.026 ± 0.0125 | 0.0052 ± 0.003 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.1193 |

N | 10 | 0.0164 ± 0.007 | 0.0047 ± 0.0031 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.4057 |

N | 20 | 0.015 ± 0.0057 | 0.0046 ± 0.0016 | 0.140 ± 0.601 | 0.9997 ± 0.0016 | 0.3041 |

N | 40 | 0.0189 ± 0.0104 | 0.0045 ± 0.0011 | 1.231 ± 1.018 | 0.9946 ± 0.0053 | 0.223 |

Param . | Value . | Normalized difference in area . | Mean absolute deviation . | Error rate . | Area under the ROC curve . | Interval width (> 90%) . |
---|---|---|---|---|---|---|

σ | 0.01 | 0.0198 ± 0.0052 | 0.0022 ± 0.0008 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.8142 |

σ | 0.1 | 0.0175 ± 0.0057 | 0.004 ± 0.0014 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.8095 |

σ | 1 | 0.0161 ± 0.0049 | 0.0042 ± 0.0021 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.7975 |

Param . | Value . | Normalized difference in area . | Mean absolute deviation . | Error rate . | Area under the ROC curve . | Interval width (> 90%) . |
---|---|---|---|---|---|---|

σ | 0.01 | 0.0198 ± 0.0052 | 0.0022 ± 0.0008 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.8142 |

σ | 0.1 | 0.0175 ± 0.0057 | 0.004 ± 0.0014 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.8095 |

σ | 1 | 0.0161 ± 0.0049 | 0.0042 ± 0.0021 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.7975 |

Param . | Value . | Normalized difference in area . | Mean absolute deviation . | Error rate . | Area under the ROC curve . | Interval width (> 90%) . |
---|---|---|---|---|---|---|

Noise | 0 | 0.0193 ± 0.0098 | 0.0048 ± 0.002 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.4204 |

Noise | 1 × 10^{−05} | 0.0202 ± 0.0108 | 0.0053 ± 0.0024 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.3845 |

Noise | 0.0001 | 0.018 ± 0.0076 | 0.0048 ± 0.0023 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.4365 |

Noise | 0.001 | 0.0186 ± 0.0081 | 0.0053 ± 0.002 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.3956 |

Noise | 0.01 | 0.0186 ± 0.0079 | 0.0052 ± 0.0025 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.4135 |

Noise | 0.1 | 0.0385 ± 0.0138 | 0.0086 ± 0.0021 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.3544 |

Noise | 1 | 0.831 ± 0.1343 | 0.118 ± 0.0629 | 46.296 ± 8.579 | 0.5122 ± 0.0741 | 0.0472 |

Param . | Value . | Normalized difference in area . | Mean absolute deviation . | Error rate . | Area under the ROC curve . | Interval width (> 90%) . |
---|---|---|---|---|---|---|

Noise | 0 | 0.0193 ± 0.0098 | 0.0048 ± 0.002 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.4204 |

Noise | 1 × 10^{−05} | 0.0202 ± 0.0108 | 0.0053 ± 0.0024 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.3845 |

Noise | 0.0001 | 0.018 ± 0.0076 | 0.0048 ± 0.0023 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.4365 |

Noise | 0.001 | 0.0186 ± 0.0081 | 0.0053 ± 0.002 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.3956 |

Noise | 0.01 | 0.0186 ± 0.0079 | 0.0052 ± 0.0025 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.4135 |

Noise | 0.1 | 0.0385 ± 0.0138 | 0.0086 ± 0.0021 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.3544 |

Noise | 1 | 0.831 ± 0.1343 | 0.118 ± 0.0629 | 46.296 ± 8.579 | 0.5122 ± 0.0741 | 0.0472 |

Param . | Value . | Normalized difference in area . | Mean absolute deviation . | Error rate . | Area under the ROC curve . | Interval width (> 90%) . |
---|---|---|---|---|---|---|

dyn_noise | 0 | 0.0211 ± 0.0067 | 0.0055 ± 0.0018 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.3814 |

dyn_noise | 1 × 10^{−05} | 0.0117 ± 0.0069 | 0.0032 ± 0.0023 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.4048 |

dyn_noise | 0.0001 | 0.0146 ± 0.0049 | 0.0032 ± 0.0013 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.411 |

dyn_noise | 0.001 | 0.0241 ± 0.0056 | 0.0049 ± 0.0017 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.4517 |

dyn_noise | 0.01 | 0.0633 ± 0.0189 | 0.0128 ± 0.0034 | 0.370 ± 1.025 | 0.9982 ± 0.0072 | 0.3282 |

dyn_noise | 0.1 | 0.2136 ± 0.065 | 0.0312 ± 0.0055 | 14.148 ± 5.003 | 0.8903 ± 0.0496 | 0.2224 |

dyn_noise | 1 | 0.9716 ± 0.0399 | 0.1552 ± 0.0971 | 42.444 ± 8.953 | 0.5177 ± 0.1004 | 0.0406 |

Param . | Value . | Normalized difference in area . | Mean absolute deviation . | Error rate . | Area under the ROC curve . | Interval width (> 90%) . |
---|---|---|---|---|---|---|

dyn_noise | 0 | 0.0211 ± 0.0067 | 0.0055 ± 0.0018 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.3814 |

dyn_noise | 1 × 10^{−05} | 0.0117 ± 0.0069 | 0.0032 ± 0.0023 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.4048 |

dyn_noise | 0.0001 | 0.0146 ± 0.0049 | 0.0032 ± 0.0013 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.411 |

dyn_noise | 0.001 | 0.0241 ± 0.0056 | 0.0049 ± 0.0017 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.4517 |

dyn_noise | 0.01 | 0.0633 ± 0.0189 | 0.0128 ± 0.0034 | 0.370 ± 1.025 | 0.9982 ± 0.0072 | 0.3282 |

dyn_noise | 0.1 | 0.2136 ± 0.065 | 0.0312 ± 0.0055 | 14.148 ± 5.003 | 0.8903 ± 0.0496 | 0.2224 |

dyn_noise | 1 | 0.9716 ± 0.0399 | 0.1552 ± 0.0971 | 42.444 ± 8.953 | 0.5177 ± 0.1004 | 0.0406 |

Param . | Value . | Normalized difference in area . | Mean absolute deviation . | Error rate . | Area under the ROC curve . | Interval width (> 90%) . |
---|---|---|---|---|---|---|

p | 0.1 | 0.1199 ± 0.2314 | 0.0054 ± 0.0047 | 3.111 ± 14.97 | 0.9688 ± 0.1548 | 0.3387 |

p | 0.2 | 0.023 ± 0.0098 | 0.004 ± 0.0022 | 0.074 ± 0.41 | 0.9999 ± 0.0006 | 0.5222 |

p | 0.3 | 0.0186 ± 0.0073 | 0.0047 ± 0.0025 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.5175 |

p | 0.4 | 0.0207 ± 0.0089 | 0.0047 ± 0.0022 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.5167 |

p | 0.5 | 0.0189 ± 0.0097 | 0.0051 ± 0.0024 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.4669 |

p | 0.6 | 0.0183 ± 0.0072 | 0.006 ± 0.0019 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.349 |

p | 0.7 | 0.0169 ± 0.0071 | 0.005 ± 0.0025 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.3227 |

p | 0.8 | 0.0194 ± 0.0077 | 0.0057 ± 0.0026 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.2725 |

p | 0.9 | 0.0289 ± 0.0271 | 0.0077 ± 0.0056 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.2739 |

Param . | Value . | Normalized difference in area . | Mean absolute deviation . | Error rate . | Area under the ROC curve . | Interval width (> 90%) . |
---|---|---|---|---|---|---|

p | 0.1 | 0.1199 ± 0.2314 | 0.0054 ± 0.0047 | 3.111 ± 14.97 | 0.9688 ± 0.1548 | 0.3387 |

p | 0.2 | 0.023 ± 0.0098 | 0.004 ± 0.0022 | 0.074 ± 0.41 | 0.9999 ± 0.0006 | 0.5222 |

p | 0.3 | 0.0186 ± 0.0073 | 0.0047 ± 0.0025 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.5175 |

p | 0.4 | 0.0207 ± 0.0089 | 0.0047 ± 0.0022 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.5167 |

p | 0.5 | 0.0189 ± 0.0097 | 0.0051 ± 0.0024 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.4669 |

p | 0.6 | 0.0183 ± 0.0072 | 0.006 ± 0.0019 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.349 |

p | 0.7 | 0.0169 ± 0.0071 | 0.005 ± 0.0025 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.3227 |

p | 0.8 | 0.0194 ± 0.0077 | 0.0057 ± 0.0026 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.2725 |

p | 0.9 | 0.0289 ± 0.0271 | 0.0077 ± 0.0056 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.2739 |

Param . | Value . | Normalized difference in area . | Mean absolute deviation . | Error rate . | Area under the ROC curve . | Interval width (> 90%) . |
---|---|---|---|---|---|---|

t_{max} | 2 | 0.0204 ± 0.011 | 0.0042 ± 0.002 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.8225 |

t_{max} | 5 | 0.0185 ± 0.0064 | 0.0047 ± 0.0022 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.8186 |

t_{max} | 10 | 0.0193 ± 0.0066 | 0.004 ± 0.0013 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.8175 |

t_{max} | 20 | 0.0192 ± 0.0053 | 0.0041 ± 0.0013 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.565 |

t_{max} | 50 | 0.0191 ± 0.0093 | 0.0064 ± 0.0038 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.2964 |

Param . | Value . | Normalized difference in area . | Mean absolute deviation . | Error rate . | Area under the ROC curve . | Interval width (> 90%) . |
---|---|---|---|---|---|---|

t_{max} | 2 | 0.0204 ± 0.011 | 0.0042 ± 0.002 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.8225 |

t_{max} | 5 | 0.0185 ± 0.0064 | 0.0047 ± 0.0022 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.8186 |

t_{max} | 10 | 0.0193 ± 0.0066 | 0.004 ± 0.0013 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.8175 |

t_{max} | 20 | 0.0192 ± 0.0053 | 0.0041 ± 0.0013 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.565 |

t_{max} | 50 | 0.0191 ± 0.0093 | 0.0064 ± 0.0038 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.2964 |

Param . | Value . | Normalized difference in area . | Mean absolute deviation . | Error rate . | Area under the ROC curve . | Interval width (> 90%) . |
---|---|---|---|---|---|---|

N_{res} | 1 | 0.0513 ± 0.0266 | 0.0237 ± 0.0127 | 3.111 ± 3.531 | 0.9815 ± 0.027 | 0.447 |

N_{res} | 2 | 0.0343 ± 0.0222 | 0.0082 ± 0.007 | 0.148 ± 0.564 | 0.9986 ± 0.0067 | 0.6943 |

N_{res} | 5 | 0.019 ± 0.0094 | 0.0039 ± 0.002 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.806 |

N_{res} | 10 | 0.0201 ± 0.0084 | 0.0042 ± 0.0028 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.808 |

N_{res} | 20 | 0.018 ± 0.0033 | 0.0042 ± 0.0017 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.8295 |

N_{res} | 40 | 0.0192 ± 0.0055 | 0.0041 ± 0.0016 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.8178 |

Param . | Value . | Normalized difference in area . | Mean absolute deviation . | Error rate . | Area under the ROC curve . | Interval width (> 90%) . |
---|---|---|---|---|---|---|

N_{res} | 1 | 0.0513 ± 0.0266 | 0.0237 ± 0.0127 | 3.111 ± 3.531 | 0.9815 ± 0.027 | 0.447 |

N_{res} | 2 | 0.0343 ± 0.0222 | 0.0082 ± 0.007 | 0.148 ± 0.564 | 0.9986 ± 0.0067 | 0.6943 |

N_{res} | 5 | 0.019 ± 0.0094 | 0.0039 ± 0.002 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.806 |

N_{res} | 10 | 0.0201 ± 0.0084 | 0.0042 ± 0.0028 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.808 |

N_{res} | 20 | 0.018 ± 0.0033 | 0.0042 ± 0.0017 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.8295 |

N_{res} | 40 | 0.0192 ± 0.0055 | 0.0041 ± 0.0016 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.8178 |

Param . | Value . | Normalized difference in area . | Mean absolute deviation . | Error rate . | Area under the ROC curve . | Interval width (> 90%) . |
---|---|---|---|---|---|---|

N_{pert} | 1 | 0.9318 ± 0.082 | 0.118 ± 0.0719 | 48.667 ± 7.459 | 0.2934 ± 0.0999 | 0.0252 |

N_{pert} | 2 | 0.9054 ± 0.1176 | 0.3284 ± 0.0747 | 49.778 ± 6.920 | 0.2904 ± 0.0783 | 0.0306 |

N_{pert} | 5 | 0.7928 ± 0.1568 | 0.3853 ± 0.1243 | 51.852 ± 8.702 | 0.3281 ± 0.0895 | 0.0688 |

N_{pert} | 10 | 0.7828 ± 0.1698 | 0.2566 ± 0.1579 | 50.593 ± 5.110 | 0.3843 ± 0.0697 | 0.1403 |

N_{pert} | 20 | 0.9818 ± 0.0083 | 0.0024 ± 0.0011 | 43.630 ± 9.383 | 0.5153 ± 0.0959 | 0.1026 |

N_{pert} | 40 | 0.931 ± 0.0234 | 0.0019 ± 0.0007 | 47.556 ± 9.403 | 0.4909 ± 0.0935 | 0.0931 |

Param . | Value . | Normalized difference in area . | Mean absolute deviation . | Error rate . | Area under the ROC curve . | Interval width (> 90%) . |
---|---|---|---|---|---|---|

N_{pert} | 1 | 0.9318 ± 0.082 | 0.118 ± 0.0719 | 48.667 ± 7.459 | 0.2934 ± 0.0999 | 0.0252 |

N_{pert} | 2 | 0.9054 ± 0.1176 | 0.3284 ± 0.0747 | 49.778 ± 6.920 | 0.2904 ± 0.0783 | 0.0306 |

N_{pert} | 5 | 0.7928 ± 0.1568 | 0.3853 ± 0.1243 | 51.852 ± 8.702 | 0.3281 ± 0.0895 | 0.0688 |

N_{pert} | 10 | 0.7828 ± 0.1698 | 0.2566 ± 0.1579 | 50.593 ± 5.110 | 0.3843 ± 0.0697 | 0.1403 |

N_{pert} | 20 | 0.9818 ± 0.0083 | 0.0024 ± 0.0011 | 43.630 ± 9.383 | 0.5153 ± 0.0959 | 0.1026 |

N_{pert} | 40 | 0.931 ± 0.0234 | 0.0019 ± 0.0007 | 47.556 ± 9.403 | 0.4909 ± 0.0935 | 0.0931 |

Param . | Value . | Normalized difference in area . | Mean absolute deviation . | Error rate . | Area under the ROC curve . | Interval width (> 90%) . |
---|---|---|---|---|---|---|

N_{pert} | 1 | 0.9193 ± 0.1111 | 0.1161 ± 0.0643 | 49.778 ± 7.042 | 0.3189 ± 0.1051 | 0.0298 |

N_{pert} | 2 | 0.7919 ± 0.2112 | 0.343 ± 0.0724 | 49.852 ± 6.331 | 0.279 ± 0.0949 | 0.0409 |

N_{pert} | 5 | 0.4298 ± 0.1535 | 0.3328 ± 0.0691 | 39.778 ± 11.032 | 0.5271 ± 0.1261 | 0.2041 |

N_{pert} | 10 | 0.3211 ± 0.1187 | 0.1697 ± 0.0769 | 22.741 ± 10.975 | 0.7388 ± 0.0974 | 0.2111 |

N_{pert} | 20 | 0.1311 ± 0.0931 | 0.0288 ± 0.0302 | 4.667 ± 5.296 | 0.9536 ± 0.0601 | 0.3584 |

N_{pert} | 40 | 0.0447 ± 0.0311 | 0.0036 ± 0.0026 | 0.222 ± 0.679 | 0.9988 ± 0.0057 | 0.4824 |

Param . | Value . | Normalized difference in area . | Mean absolute deviation . | Error rate . | Area under the ROC curve . | Interval width (> 90%) . |
---|---|---|---|---|---|---|

N_{pert} | 1 | 0.9193 ± 0.1111 | 0.1161 ± 0.0643 | 49.778 ± 7.042 | 0.3189 ± 0.1051 | 0.0298 |

N_{pert} | 2 | 0.7919 ± 0.2112 | 0.343 ± 0.0724 | 49.852 ± 6.331 | 0.279 ± 0.0949 | 0.0409 |

N_{pert} | 5 | 0.4298 ± 0.1535 | 0.3328 ± 0.0691 | 39.778 ± 11.032 | 0.5271 ± 0.1261 | 0.2041 |

N_{pert} | 10 | 0.3211 ± 0.1187 | 0.1697 ± 0.0769 | 22.741 ± 10.975 | 0.7388 ± 0.0974 | 0.2111 |

N_{pert} | 20 | 0.1311 ± 0.0931 | 0.0288 ± 0.0302 | 4.667 ± 5.296 | 0.9536 ± 0.0601 | 0.3584 |

N_{pert} | 40 | 0.0447 ± 0.0311 | 0.0036 ± 0.0026 | 0.222 ± 0.679 | 0.9988 ± 0.0057 | 0.4824 |

Param . | Value . | Normalized difference in area . | Mean absolute deviation . | Error rate . | Area under the ROC curve . | Interval width (> 90%) . |
---|---|---|---|---|---|---|

N_{pert} | 1 | 0.8851 ± 0.1332 | 0.1203 ± 0.0716 | 50.0 ± 5.277 | 0.316 ± 0.1035 | 0.0293 |

N_{pert} | 2 | 0.7678 ± 0.215 | 0.2857 ± 0.0916 | 50.963 ± 6.879 | 0.315 ± 0.0915 | 0.0326 |

N_{pert} | 5 | 0.286 ± 0.0821 | 0.1973 ± 0.0625 | 30.963 ± 9.096 | 0.6136 ± 0.1118 | 0.1767 |

N_{pert} | 10 | 0.1567 ± 0.0624 | 0.0788 ± 0.0609 | 15.704 ± 4.246 | 0.7659 ± 0.0863 | 0.3186 |

N_{pert} | 20 | 0.0783 ± 0.056 | 0.0268 ± 0.06 | 10.815 ± 4.943 | 0.8699 ± 0.0777 | 0.4238 |

N_{pert} | 40 | 0.0262 ± 0.0191 | 0.0041 ± 0.0078 | 7.185 ± 4.837 | 0.9391 ± 0.0565 | 0.5419 |

Param . | Value . | Normalized difference in area . | Mean absolute deviation . | Error rate . | Area under the ROC curve . | Interval width (> 90%) . |
---|---|---|---|---|---|---|

N_{pert} | 1 | 0.8851 ± 0.1332 | 0.1203 ± 0.0716 | 50.0 ± 5.277 | 0.316 ± 0.1035 | 0.0293 |

N_{pert} | 2 | 0.7678 ± 0.215 | 0.2857 ± 0.0916 | 50.963 ± 6.879 | 0.315 ± 0.0915 | 0.0326 |

N_{pert} | 5 | 0.286 ± 0.0821 | 0.1973 ± 0.0625 | 30.963 ± 9.096 | 0.6136 ± 0.1118 | 0.1767 |

N_{pert} | 10 | 0.1567 ± 0.0624 | 0.0788 ± 0.0609 | 15.704 ± 4.246 | 0.7659 ± 0.0863 | 0.3186 |

N_{pert} | 20 | 0.0783 ± 0.056 | 0.0268 ± 0.06 | 10.815 ± 4.943 | 0.8699 ± 0.0777 | 0.4238 |

N_{pert} | 40 | 0.0262 ± 0.0191 | 0.0041 ± 0.0078 | 7.185 ± 4.837 | 0.9391 ± 0.0565 | 0.5419 |

Param . | Value . | Normalized difference in area . | Mean absolute deviation . | Error rate . | Area under the ROC curve . | Interval width (> 90%) . |
---|---|---|---|---|---|---|

N_{pert} | 1 | 0.891 ± 0.1231 | 0.1058 ± 0.0706 | 46.889 ± 8.444 | 0.3337 ± 0.1156 | 0.0299 |

N_{pert} | 2 | 0.9214 ± 0.088 | 0.3684 ± 0.0897 | 50.444 ± 7.031 | 0.2938 ± 0.0783 | 0.0474 |

N_{pert} | 5 | 0.2075 ± 0.0319 | 0.0313 ± 0.0179 | 23.704 ± 10.379 | 0.7769 ± 0.0874 | 0.2803 |

N_{pert} | 10 | 0.0294 ± 0.0061 | 0.0033 ± 0.0014 | 1.926 ± 2.592 | 0.9894 ± 0.0157 | 0.3855 |

N_{pert} | 20 | 0.0184 ± 0.003 | 0.0015 ± 0.0007 | 0.074 ± 0.406 | 0.9999 ± 0.0008 | 0.522 |

N_{pert} | 40 | 0.0116 ± 0.0028 | 0.0015 ± 0.0005 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.6206 |

Param . | Value . | Normalized difference in area . | Mean absolute deviation . | Error rate . | Area under the ROC curve . | Interval width (> 90%) . |
---|---|---|---|---|---|---|

N_{pert} | 1 | 0.891 ± 0.1231 | 0.1058 ± 0.0706 | 46.889 ± 8.444 | 0.3337 ± 0.1156 | 0.0299 |

N_{pert} | 2 | 0.9214 ± 0.088 | 0.3684 ± 0.0897 | 50.444 ± 7.031 | 0.2938 ± 0.0783 | 0.0474 |

N_{pert} | 5 | 0.2075 ± 0.0319 | 0.0313 ± 0.0179 | 23.704 ± 10.379 | 0.7769 ± 0.0874 | 0.2803 |

N_{pert} | 10 | 0.0294 ± 0.0061 | 0.0033 ± 0.0014 | 1.926 ± 2.592 | 0.9894 ± 0.0157 | 0.3855 |

N_{pert} | 20 | 0.0184 ± 0.003 | 0.0015 ± 0.0007 | 0.074 ± 0.406 | 0.9999 ± 0.0008 | 0.522 |

N_{pert} | 40 | 0.0116 ± 0.0028 | 0.0015 ± 0.0005 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.6206 |

Param . | Value . | Normalized difference in area . | Mean absolute deviation . | Error rate . | Area under the ROC curve . | Interval width (> 90%) . |
---|---|---|---|---|---|---|

N_{res} | 1 | 0.8188 ± 0.1236 | 8633.13 ± 25 993.8 | 45.852 ± 8.159 | 0.5239 ± 0.0886 | 2012.5215 |

N_{res} | 2 | 0.8895 ± 0.2663 | 3481.08 ± 9497.15 | 23.037 ± 11.635 | 0.7415 ± 0.1298 | 1.6593 |

N_{res} | 5 | 0.2267 ± 0.3482 | 2.5746 ± 11.3136 | 2.148 ± 7.063 | 0.9756 ± 0.0733 | 0.75 |

N_{res} | 10 | 0.0238 ± 0.0242 | 0.0061 ± 0.0078 | 0.074 ± 0.406 | 0.9979 ± 0.0114 | 0.8436 |

N_{res} | 20 | 0.0099 ± 0.0037 | 0.0024 ± 0.001 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.9131 |

N_{res} | 40 | 0.0077 ± 0.0033 | 0.0023 ± 0.001 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.9329 |

Param . | Value . | Normalized difference in area . | Mean absolute deviation . | Error rate . | Area under the ROC curve . | Interval width (> 90%) . |
---|---|---|---|---|---|---|

N_{res} | 1 | 0.8188 ± 0.1236 | 8633.13 ± 25 993.8 | 45.852 ± 8.159 | 0.5239 ± 0.0886 | 2012.5215 |

N_{res} | 2 | 0.8895 ± 0.2663 | 3481.08 ± 9497.15 | 23.037 ± 11.635 | 0.7415 ± 0.1298 | 1.6593 |

N_{res} | 5 | 0.2267 ± 0.3482 | 2.5746 ± 11.3136 | 2.148 ± 7.063 | 0.9756 ± 0.0733 | 0.75 |

N_{res} | 10 | 0.0238 ± 0.0242 | 0.0061 ± 0.0078 | 0.074 ± 0.406 | 0.9979 ± 0.0114 | 0.8436 |

N_{res} | 20 | 0.0099 ± 0.0037 | 0.0024 ± 0.001 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.9131 |

N_{res} | 40 | 0.0077 ± 0.0033 | 0.0023 ± 0.001 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.9329 |

The error rate is defined as the percentage of entries in $A$ that are incorrectly classified. We report the error rate for the optimal threshold $\u03f5$ at the end of Sec. II D.

The ROC curve, or receiver operating characteristic, is a parametric curve that uses the classification threshold as a parameter and uses the true positive rate and the false positive rate as the variables.^{50} The area under the ROC curve measures the quality of a classifier independent of any particular threshold. A value of $1/2$ is consistent with blind guesses, and a value of $1$ indicates a perfect classifier since it implies the existence of a threshold for which the rate of true positives is $1$ and the rate of false positives is $0$. Our ROC curves were generated using the scikit-learn package in Python.^{51}

### APPENDIX C: TABLES OF RESULTS

Tables V–XVII report the averaged numerical results of the experimental parameter sweeps described in Sec. III and reported in Sec. IV. The performance metrics used are defined in Appendix B. Values given in the tables represent the mean plus or minus one standard deviation of the corresponding performance metric over the 30 trials run at that parameter value. We note that some of the distributions, such as the error rate (which is nonnegative by definition), are skewed.

## REFERENCES

*International Symposium on Mathematical Problems in Theoretical Physics*(Springer, 1975), pp. 420–422.

*Journal of Physics: Conference Series*(IOP Publishing, 2015), Vol. 640, p. 012013.

*Emergent Complexity from Nonlinearity, in Physics, Engineering and the Life Sciences*(Springer, 2017), pp. 145–165.