The milestoning algorithm of Elber and co-workers creates a framework for computing the time scale of processes that are too long and too complex to be studied using simply brute force simulations. The fundamental objects involved in the milestoning algorithm are the first passage time distributions *K*_{AB}(*τ*) between adjacent conformational milestones *A* and *B*. The method proposed herein aims to further enhance milestoning (or other interface based sampling methods) by employing an artificially applied force, akin to a wind that blows the trajectories from their initial to their final states, and by subsequently applying corrective weights to the trajectories to yield the true first passage time distributions *K*_{AB}(*τ*) in a fraction of the computation time required for unassisted calculations. The re-weighting method is rooted in the formalism of stochastic path integrals. The theoretical basis for the technique and numerical examples are presented.

## I. INTRODUCTION

The task of calculating kinetic properties of complex systems from molecular dynamics simulations is challenging but also of considerable interest.^{1} In contrast to computational methods designed for thermodynamic equilibrium, in which the basic observables are averages over the conformational space with an invariant measure that does not need to obey the system’s exact dynamics, studies of kinetics require physically correct time-ordered trajectories to obtain time-correlation functions as the basic objects.^{2} Since each time-correlation function describes the relaxation under investigation as an average over all relevant trajectories, adequate sampling for accurate calculation of long-time kinetics can quickly become computationally intractable via direct simulation.^{3} This is because a direct and brute force method of this type would require simulation times that are long enough for the system to transition back and forth between all states of interest as many times as needed for the transition statistics to converge. In other words, the trajectory needs to be long enough to obtain a distribution of first passage times. Several computational methods have been developed to address the challenge of calculating kinetics, from the venerable transition state theory, Kramers rate theory, or Anderson’s reaction flux method,^{4–6} and continuing with more recent developments, such as transition path sampling (TPS),^{7} transition path theory (TPT),^{8} and a host of other recent methods based on path sampling reviewed, for example, in Ref. 9.

A common feature of several of these recent methods for the enhanced calculation of pathways is the assignment of interfaces that separate regions of conformational space. Trajectory segments are run between these interfaces, then the segments are combined all together to obtain the entire kinetics of interest. For example, in the weighted ensemble method^{10} (and its further developments^{11}), one runs trajectories that are re-spawned at interfaces. Other prominent examples are the transition interface sampling (TIS) method of van Erp, Moroni, and Bolhuis^{12} and the milestoning method of Elber and co-workers.^{13,14} In both milestoning and TIS (as well as in the forward flux sampling method,^{15} which is related to TIS, or in the boxed molecular dynamics method^{16}), one places interfaces and runs trajectories between the interfaces. The trajectory segments can be generated via integrating the natural dynamical equations of the system or can themselves be accelerated and reweighted to get the correct kinetics.^{17–21} Although for two-state systems transition state theory and related methods have been successfully used in the determination of the kinetics for many systems with well-defined reactant and product states, for which the dynamical bottleneck can be identified,^{22} there are many interesting problems in biophysics, and elsewhere, for which these assumptions do not hold. By contrast, transition path sampling approaches require no intuition for reaction mechanisms or advance knowledge of transition state, although the requirement of a dynamical bottleneck does persist.^{7,23} In the same category of methods is the milestoning algorithm of Elber and co-workers, which is a method for calculating kinetic properties, where the fundamental objects are the first passage time distributions *K*_{AB}(*τ*) between adjacent conformation-space milestone states (configurations *A* and *B* in this case), where the milestone states do not necessarily need to be meta-stable states as in transition state theory. The key feature of the milestoning method (and also of the other interfacing methods) is that long trajectory pathways for large scale configuration changes can be broken up into shorter trajectories for which a linear network of transition probabilities between milestones can be devised. The aforementioned linear networks of transition probabilities can then be solved for such quantities as first passage time between any pair of milestones, including those at the extreme ends of the space and the flux through a given milestone, *s*, as a function of time, written as *P*_{s}(*t*) in Eq. (1). Some of the key gains from this treatment are that breaking up these long trajectory pathways into a network of shorter trajectories leads to increased sampling of the would-be under-sampled areas and that speed-up of computational efficiency is possible due to the capacity to run these short trajectories in parallel.^{13} In practice, previous milestoning calculations have been limited to calculating the constant flux values representative of the system at equilibrium, which can be thought of as the long time flux values lim_{t→∞}*P*_{s}(*t*). A method for calculating the time-dependent flux through a given milestone *P*_{s}(*t*) can be found in Paper II.^{24} The aim of the technique we present in Paper I is to increase the computational speed of the milestoning method via the addition of an artificial constant force ($Fwind$) along the vector pointing from the initial state to the final state for each pair of milestones in the simulation, causing the system to arrive at the destination configuration in far fewer time steps than if it were left to Brownian dynamics alone. The key idea which makes this possible is the use of a re-weighting function we have introduced previously,^{19,20,25,26} which generates a re-weighting coefficient for each trajectory, thus allowing the true distribution of first passage times to be recovered from the artificially accelerated trajectories. Preliminary calculations conducted on model systems, described in Sec. III, have demonstrated a computation time speedup by a factor of nearly 40 using this method.

## II. THEORY

The central quantity in milestoning is the flux through a given milestone;^{13} it is prescribed by the probabilities

where *P*_{s}(*t*) is the probability of being at milestone *s* at time *t* (or, more specifically, arriving at any time *t*′ < *t* and not leaving before time *t*^{13}), *Q*_{s}(*t*) is the probability of a transition to milestone *s* at time *t*, and *K*_{s}(*τ*) is the probability of transitioning out of milestone *s* after an incubation time of *τ*. Thus $\u222b0t\u2212t\u2032Ks(\tau )d\tau $ is the probability of an exit from milestone *s* anytime between 0 and *t* − *t*′, which makes $1\u2212\u222b0t\u2212t\u2032Ks(\tau )d\tau $ the probability of there *not* being an exit from milestone *s* over that same time period. Since the probability of two independent simultaneous events happening concurrently is the product of the two events, the equation for *P*_{s}(*t*) is simply integrating the concurrent probabilities of arriving at milestone *s* and not leaving over the time frame from time 0 to *t*. In dissecting the meaning of *Q*_{s}(*t*), the first term, 2*δ*(*t*)*P*_{s}(0), simply represents the probability that the system is already occupying milestone *s* at time *t* = 0, where the factor of 2 is present since the *δ* function is centered at zero, meaning only half of its area would be counted without this factor. *Q*_{s±1}(*t*″) is the probability that the system transitioned into one of the two milestones adjacent to *s* at an earlier time *t*″. $Ks\xb11\u2213(t\u2212t\u2032\u2032)$ is the probability of a transition from milestones *s* ± 1 into milestone *s*. Thus the second term of the second line of Eq. (1) is another concurrent probability: the probability of the system entering an adjacent milestone at an earlier time, and then transitioning into milestone *s* between time *t* and 0. It is important to note that all functions *P*_{s}(*t*) and *Q*_{s}(*t*) are calculated using the respective values of *K*_{s}(*τ*) between adjacent milestones, and thus the set of *K*_{s}(*τ*) between all milestones of interest contains all the information needed to calculate kinetics using the milestoning method.

The essential connection to make in regard to combining the milestoning method with re-weighting of artificially accelerated trajectories is that a *K* function between two milestones located at *x* = *A* and *x* = *B*, *K*_{AB}(*τ*), is nothing more than a probability distribution as a function of lifetime describing the conditional probability that a system found in state *A* at time *t* = 0 will be found, for the first time, in state *B* at time *t* = *τ*,

Given this relationship, we can now begin to make the connection between milestoning and re-weighting of artificially accelerated trajectories. Assuming Langevin dynamics with the addition of a *wind* force,

where *γ* is the friction coefficient, *V*(*x*) is the potential, *ξ*(*t*) is the random force, and $Fwind$ is a constant force applied in the direction of the goal milestone for each run; conditional probabilities reflecting first passage transitions from milestone *A* to *B* can be expressed as

In this equation, *W*[*ξ*(*t*)] is the probability distribution representing the joint probabilities of all possible series of random kicks, so multiplying by the delta function *δ*(*x*(*τ*) − *x*_{B}) and integrating select for only the portion of the distribution which represents a series of random kicks, which results in a transition from state *A* to state *B* given an incubation time *τ*. It then follows suit that the integral in this equation is simply the expectation value for the probability of a transition from *A* to *B* for each incubation time point *τ*, which, again, is equivalent to *K*_{AB}(*τ*). Because of fluctuation dissipation, ⟨*ξ*(*t*)*ξ*(*t*′)⟩ = 2*k*_{B}*Tmγ*, and assuming a normally distributed stochastic term, the random force in Langevin dynamics is a Gaussian distribution with variance $w$ ≡ 2*k*_{B}*Tmγ*. Thus it can be shown that

With *W*[*ξ*(*t*)], our weighting function for joint probabilities of random kick sequences in terms of our random force *ξ*(*t*) in hand, we can now write the noise history *ξ*(*t*) in terms of the trajectory *x*(*t*), it generates assuming overdamped Langevin dynamics

We are ultimately interested in measuring conditional probability distributions in configuration space, *x*(*t*), not random force space, *ξ*(*t*). Additionally considering the fact that the functional Jacobian for the transformation is constant^{27} and can be thought of as built into the measure of $Dx$, we can define *S*[*x*(*t*)] thusly

We can then express our conditional probability using the Wiener formalism of path integrals^{28} as

where

is the weight of each trajectory *x*(*t*). We can now define the re-weighting factor for obtaining the true weight of an artificially accelerated trajectory as

where the *f* subscript indicates a function generated under the influence of the artificial $Fwind$ force. In practice, once a trajectory *x*(*t*) is generated (in the presence of the wind force), the actions are calculated in discrete numerical form using

using the Ito discretization scheme^{29} for stochastic calculus. Although in the continuum limit, depending on the rules of stochastic calculus (i.e., Ito vs. Stratonovich), one may need to account for an extra term involving the second derivative of the potential in the integrand of the action employed in Eq. (7), the discrete Ito version utilized in this work does not require it and can be used as expressed in Eq. (11).^{30}

The re-weighting factor is calculated from Eq. (10) and stored in an array. When post-processing to compute the *K*_{AB}(*τ*) distribution for a particular pair of milestones *A* and *B* by histogramming trajectories by lifetime *τ*, instead of adding 1 to a particular bin each time the lifetime of a particular trajectory falls within the bounds of that bin, the weight *W*[*x*(*t*)] corresponding to that trajectory is instead added. It is clear from Eq. (10) that as *S*_{f}[*x*(*t*)] for the artificially accelerated trajectory approaches *S*[*x*(*t*)], the weight of the trajectory approaches unity, and thus the method reduces to an unweighted histogram for $Fwind=0$ as it should.

In practice, given a well-chosen wind coordinate that significantly accelerates trajectories from the initial milestone to the final milestone, it is not surprising that shorter and lower weight trajectories are sampled with higher frequency. This can lead to re-weighted transition time distributions that have excellent convergence and accuracy in the regime of fast transitions and poorer convergence and accuracy in the tails (the integrand of the action integral is positive definite so the weights of trajectories increase exponentially with time), which can be seen in some of the following numerical demonstrations.

## III. NUMERICAL DEMONSTRATION

### A. Model system in one dimension

A simple two well potential (see the inset of Fig. 1) of the equation *y* = (*x* − 1)^{2}(*x* + 1)^{2} was chosen to be the model system upon which the wind-assisted milestoning methodology could be developed. In running wind-assisted milestoning, the potential to which the particle is being subjected is first divided into any number of milestones, in this case, 7 milestones, thus 6 separate inter-plane spaces. Next, numerous Langevin trajectories are run both from left to right and right to left between each pair of adjacent milestones. The number of time steps required to go from the starting milestone to the destination milestone for each trial of each pair and the weight of each trajectory is then stored in an array as mentioned in Sec. II. As shown in Figs. 1 and 2, this method has demonstrated a roughly fivefold speedup with no discernible sacrifice to accuracy and a more than tenfold speedup if one is willing to sacrifice some accuracy in the tail of the distribution.

### B. Model system in one dimension with distortion

Thus far, we have approached the wind-assisted reweighted milestoning (WARM) method from the standpoint of speeding up the calculation by pushing $Fwind$ until the *K*(*τ*) functions begin to distort. Here we will explore the possibility that even slightly distorted *K*(*τ*) functions can yield useful information, allowing for even greater computational speedup. The flux value for a given milestone *s*, *P*_{s}(*t*), should approach the probability predicted by the Boltzmann distribution generated from the configurational partition function as time approaches infinity. Given a discrete space in *x*, subject to our 1D potential *y* = (*x* − 1)^{2}(*x* + 1)^{2}, the Boltzmann distribution function can be obtained in the usual way, shown in Eq. (12) below,

where *N*_{s} is the total number of milestone configurations and *x*_{n} signifies the spatial position of each milestone. This discrete space approximation for the equilibrium flux values is utilized below as a test for accuracy in Fig. 3 (dashed lines). Section III consists of dividing the space for the bistable 1D potential between *x* = −2 and *x* = 2 into 11 subspaces bounded by 12 milestones. First hitting trajectories were run between each pair of adjacent milestones, and then each pair of *K*(*τ*) functions describing a transition away from each milestone was normalized (e.g., for milestone 3, all trajectories must terminate at either milestone 2 or milestone 4, therefore $\u222b0\u221eK32(\tau )d\tau +\u222b0\u221eK34(\tau )d\tau =1$). The normalized *K*(*τ*) functions are then integrated over all time, and these values are placed in a matrix, **K**, of equilibrium transition probabilities. The equilibrium flux values for the vector representing the set of milestones, **P**, are then found by numerically solving for the eigenvector: **P·K** = **P**, with reflecting boundary conditions implemented at the extremes.^{31} By using this method to determine equilibrium flux values, it is demonstrated in Figs. 3 and 4, that even when $Fwind$ is set to a value strong enough to distort the *K*(*τ*) functions, accurate equilibrium flux values can still be calculated.

### C. Model systems in two dimensions

Two additional test systems for the WARM technique were implemented for further validating the method in two dimensions. Both systems have double well shapes; however for one well, the barrier to transition from one well to the other is primarily energetic, while the other is primarily entropic (see Fig. 5). The potential with the energetic barrier is a generalization of the 1D potential from Sec. III B to two dimensions, and the potential with the entropic barrier is the same potential implemented by Elber and Faradjian in their original paper on milestoning.^{13} As can be seen in the data below, the WARM method successfully re-weighted first passage time distributions (*K*(*τ*)) generated using artificially accelerated trajectories to yield the true first passage time distributions which would have resulted from trajectories in the absence of the *wind* force (Figs. 6 and 7). In both cases, the method achieved more than 60% faster computation times with very little sacrifice in terms of accuracy (Figs. 8 and 9).

### D. Model system in eleven dimensions

In order to demonstrate that the WARM method possesses no inherent limitations due to scaling, the method was applied to an 11 dimensional hyperspace. For this model, the 11*D* potential was defined as

where the first term is the same bistable potential in *x*_{1} used in the first one dimensional example, the second term couples motion in the 10 dimensions orthogonal to barrier height in *x*_{1}, and the third term simply confines the system to a reasonably sized configurational space using a quartic potential. In order to develop some intuition for this potential, see Fig. 10, then just imagine that there are nine other dimensions which have the same effect as *y* on barrier height in *x*_{1}.

Accordingly, the milestones must be defined as hyperplanes, given the general definition of a hyperplane

To keep things as simple as possible, we set *a*_{2} through *a*_{11} equal to zero, and *a*_{1} = 1, allowing us to define two hyperplanes as *x*_{1} = −1 and *x*_{1} = 1. In this scenario, the features of interest are the transitions between the wells at *x*_{1} = −1 and *x*_{1} = 1, and thus the 11D $Fwind$ is applied with zero components in all dimensions except for *x*_{1} where it is used to push the system over the barrier between wells. Trajectories were initialized at the point {−1, 0, 0, …, 0} and terminated upon reaching their hyperdimensional “finish line,” i.e., when the hyperplane *a*_{1}*x*_{1} + *a*_{2}*x*_{2} + *a*_{3}*x*_{3} + ⋯ + *a*_{n}*x*_{n} < 1 evaluates to false. The WARM method was successfully applied to this 11 dimensional potential, and a speedup by a factor of 4.5 was observed (Figs. 11 and 12).

### E. Wind force as a vector field

In all of the preceding examples, $Fwind$ was applied to the system as a constant force applied in a straight line, perpendicular to the parallel milestone hyperplanes, but $Fwind$ can be defined any way we choose. This is particularly useful when the milestones are carved using Voronoi polyhedra or other more complicated tesselations. This section demonstrates a method whereby the directionality of $Fwind$ is defined by a vector field which allows $Fwind$ to blow in a curved path between two nearly orthogonal milestones (see Figs. 13 and 14); i.e., our wind has become a tornado. In order to define this vector field, the point of intersection between the two planes was determined, then a function was created which finds the straight line connecting the current position to this point of intersection, and then defines $Fwind$ at that point to be a vector both orthogonal to that straight line and pointing in a clockwise direction. When first passage times were calculated going from the milestone shown in red toward the milestone shown in green (Fig. 13), the vector field is simply multiplied by −1 to cause our tornado to spin counterclockwise. Using a wind force defined in this manner, we obtain an efficient directionality for $Fwind$ which biases the system toward both leaving its initial milestone in the right direction and approaching its destination milestone, regardless of the positioning of the milestones in configuration space. Another advantage of this scheme is that our curved vector field of $Fwind$ can be defined without any knowledge of the system itself, we only need to know the positions of the milestones, which are always known in milestoning calculations. Figures 13–18 illustrate the application and results of this method using two different 2D potentials, the Muller-Brown potential,^{32} and a simpler Muller-inspired potential with two Gaussian wells we will refer to as our Gaussian potential. The Gaussian potential is defined as

and the Muller potential is defined as

where

## IV. CONCLUDING DISCUSSION

We have proposed and tested WARM, a wind-assisted reweighting method for accelerating milestoning calculations, whereby the true probability density functions for first passage transition times between milestones, *K*_{AB}(*τ*), are recovered from artificially accelerated trajectories via the re-weighting method described in Sec. II. These *K*_{AB}(*τ*) functions are central to milestoning calculations, and thus the WARM method presented herein shows potential for broad applications.

Our method has been shown to be effective in tests on one and two dimensional potentials with both energetic and entropic barriers, as well as on multi-dimensional models, implying that the method should have no scaling limitations. Thus the next step will be to test the method on more complex biomolecular transitions. The simplest extension would be to apply a force vector to a single atom which steers the system toward the configurational change of interest. In this case, the re-weighting factors *S*[*x*(*t*)] and *S*_{f}[*x*(*t*)] could be calculated by including the 3d force components both with and without the components of the applied force, respectively.

It should be noted that our application of the WARM method to both the high-dimensional model and our vector field-based $Fwind$ implementation demonstrate that this technique can be applied to systems that are too complex to intuit the placement of the artificial forces. One such example for which the vector-field method can be applied concerns rotational local dynamics in DNA and RNA.^{33–35} Given an initial and a final milestone configuration, one could determine the vectors curving around the field from each atom’s initial position to its final position. Artificial forces, $Fwind$, could then be placed upon all atoms in the system pointing in the direction of these vectors and with a magnitude proportional to the length of the vectors. A zero cutoff could also be added so as not to waste computational resources on applying and accounting for $Fwind$ force atoms which are beginning at a position fairly close to their destination. Another way to construct clever wind force biases could involve guessing the directions of a slow manifold, e.g., by averaging out the momenta over times that are longer than fast periods but smaller than the slow periods, as in momentum-enhanced self-guiding methods.^{21,36,37} The wind force could then be applied only in the direction of the slow manifold.

The main limitation of the WARM method, regardless of the number of dimensions present, is obtaining good re-weighting in the longer *τ* range. This is simply a matter of under-sampling, which leads to non-overlap between the probability of the bundles of trajectories with and without wind (probability non-overlap, by the way, is also the limiting factor for umbrella sampling and related techniques in the realm of thermodynamics). If $Fwind$ is pushing the system to the next milestone so quickly that longer values of *τ*, relevant to the true *K*_{AB}(*τ*) distribution, are not being sampled, then there just is not enough density present (or even none at all, in numerical implementations) to re-weight. This is why the accuracy in the low tau regime is often still quite good when too high of an $Fwind$ has caused the latter portion of the distribution to turn to noise. Thus far, the limitations on the WARM method appear to be solely dependent on whether or not we have pushed the force so hard that trajectories in the longer *τ* region of the true *K*_{AB}(*τ*) are even being sampled. For this reason, systems whose true *K*_{AB}(*τ*) distribution functions possess thick tails place the greatest limitations on the degree of computational speedup achievable by WARM. This issue can be addressed in a couple of different ways. One approach is to simply define more milestones in the space, the other is to combine the WARM method with some sort of artificial heating methods, both modifications which will yield *K*_{AB}(*τ*) functions which decay more rapidly after their peak.

We believe that, upon implementation into molecular dynamics packages on top of modules that perform interface sampling, the WARM method has the potential to be a useful tool for the determination of the long-time relaxation properties of macromolecules.

## ACKNOWLEDGMENTS

Funding from the National Institutes of Health, Grant No. R01GM089846, and from the National Science Foundation, Award No. CMMI-1404818, is kindly acknowledged.