We consider the recently developed weighted ensemble milestoning (WEM) scheme [D. Ray and I. Andricioaei, J. Chem. Phys. **152**, 234114 (2020)] and test its capability of simulating ligand–receptor dissociation dynamics. We performed WEM simulations on the following host–guest systems: Na^{+}/Cl^{−} ion pair and 4-hydroxy-2-butanone ligand with FK506 binding protein. As a proof of principle, we show that the WEM formalism reproduces the Na^{+}/Cl^{−} ion pair dissociation timescale and the free energy profile obtained from long conventional MD simulation. To increase the accuracy of WEM calculations applied to kinetics and thermodynamics in protein–ligand binding, we introduced a modified WEM scheme called weighted ensemble milestoning with restraint release (WEM-RR), which can increase the number of starting points per milestone without adding additional computational cost. WEM-RR calculations obtained a ligand residence time and binding free energy in agreement with experimental and previous computational results. Moreover, using the milestoning framework, the binding time and rate constants, dissociation constants, and committor probabilities could also be calculated at a low computational cost. We also present an analytical approach for estimating the association rate constant (*k*_{on}) when binding is primarily diffusion driven. We show that the WEM method can efficiently calculate multiple experimental observables describing ligand–receptor binding/unbinding and is a promising candidate for computer-aided inhibitor design.

## I. INTRODUCTION

Protein–ligand interactions play crucial roles in modulating the biological processes inside the cell. Understanding the molecular details of ligand binding and unbinding is necessary not only to gain fundamental insight into molecular recognition but also to facilitate rational design of drugs and inhibitors, thereby tuning the functionality of a particular protein in a desired way.^{1,2} Molecular dynamics (MD) simulation, with its use of physics-based models to represent the interatomic forces and propagate the dynamics in time, has become a centerpiece method for studying the dynamics of biomolecules in atomistic detail.^{3} Pioneering work—much of it done with the CHARMM simulation package^{4} celebrated in this journal issue—has spawned applications that revealed the role of atomic motions in conformational switching, folding, and ligand binding in proteins and other bio-molecules.^{5}

Atomistic MD simulation has been shown to capture well the physics of ligand–receptor binding and has been successfully used to calculate the binding free energies,^{6} residence times,^{7} and binding and unbinding pathways^{8} for a variety of systems.

However, the majority of the biologically interesting protein–ligand systems are characterized by ligand residence times that span long intervals, from a few milliseconds to multiple hours. Such binding or unbinding processes fall in the realm of rare events.^{9,10} Rare events are particularly challenging from a molecular simulation point of view for primarily two reasons. First, the timescales involved are often beyond the capacity of currently available computer power, which can at best simulate up to multiple microseconds for protein systems. Second, because of the long waiting time before a transition is to occur, most of the invested computational power is wasted in the free energy (local) minimum corresponding to the initial state. Simulations of the binding process also encounter the challenge of needing to find one (or several) entry path(s) via which the ligand diffuses into the binding pocket; this necessitates additional significant investment of computation effort.^{10–12}

To overcome these difficulties, effort has been expanded on the development of enhanced sampling methods that apply artificial biases to accelerate events of interest. Methods such as umbrella sampling (US),^{13,14} steered molecular dynamics (SMD),^{15} metadynamics (MtD),^{16} adaptive biasing force (ABF),^{17,18} and accelerated molecular dynamics (aMD)^{19} could sample the conformational space of bio-molecules at significantly lower computational cost and, consequently, gained popularity and became widespread in computational biophysics. Although the free energy landscape along a chosen set of collective variables can be computed from these methods, recovering kinetic information is extremely difficult because the dynamics of the system is rendered artificial by the application of unnatural biases. As such, the biases distort the unbinding mechanism to a significant degree.^{20}

An attempt to solve the kinetics problem using a master equation based approach, from unbiased simulation data, is commonly referred to as Markov state modeling (MSM)^{21,22} in the MD simulation literature.

MSMs have experienced considerable success in ligand binding/unbinding simulations over the past two decades owing to the rapid development of fast computers and the consequent availability of a large amount of MD simulation data.^{23,24} However, this approach often falls short in the presence of very high free energy barriers, where a low number of transitions results in overestimation of the transition rates (*over*-estimation because one misses the slower transitions that make up tail of the first passage time distribution; this is a particularly severe problem in MSM studies that use only a handful of transition events to infer timescales).

The weighted ensemble (WE) method, introduced by Huber and Kim more than 20 years ago,^{25} is a pioneering example of a technique that tackled the problem of computing *time-dependent* properties (e.g., rate constants and mean first-passage times) that, for a complex dynamical system, could not be derived directly from free energy landscapes obtained via the otherwise enhanced sampling techniques^{13,14} available at that time. Zhang *et al.* recently showed that WE is statistically exact and applied both MD and MC sampling to study a wide range of processes modulated by rare events.^{26} The implementation of this technique in user friendly packages such as WESTPA has led to an extensive use of weighted ensemble based methods in biophysical problems.^{27–31}

The binding and unbinding dynamics of many host guest systems have been studied using WE. The examples include association of the Na^{+}/Cl^{−} ions, K+ binding to 18-crown-6 ether,^{32} protein–peptide binding,^{28,29} and protein–protein binding.^{30,31} Recently, Saglam and Chong used a steady state weighted ensemble methodology to calculate the rate constant of a multi-microsecond protein–protein association in explicit solvent,^{31} spending only <1% of the simulation time required to construct a converged MSM for the same system^{33} from conventional MD simulations.

Novel variants of the weighted ensemble method have been used to study multiple long timescale protein–ligand unbinding kinetics and pathways.^{20,34–37} Donyapour and co-workers used WExplore^{38} and resampling of ensembles by variation optimization (REVO)^{35} to calculate the unbinding time of the benzamidine ligand from trypsin within an order of magnitude accuracy. In an impressive attempt, Lotz and Dickson used the steady state WExplore scheme to simulate the very slow unbinding process of 1-trifluoromethoxyphenyl-3-(1-propionylpiperidin-4-yl)-urea or TPPU from soluble epoxide hydrolase (sEH) from microsecond simulations. The calculated residence time (42 s) was only a little over an order of magnitude away from the experimental result (11 min).^{36} Recently, a multi-microsecond REVO simulation study of PK-11195 ligand dissociation from translocator protein (TSPO), a membrane bound receptor, has revealed a lipid assisted unbinding mechanism. The calculated residence times (4.1 min–260 min) starting from different binding poses were within one order of magnitude from the experimental residence time of 34 min.^{37}

Developed independently about 16 years ago, the milestoning technique of Faradjian and Elber^{39} is another powerful method that can be used to compute timescales, as well as free energy profiles along suitably chosen directions in configuration space. In a recent work, the mathematical details of milestoning theory have been clarified,^{40} allowing the calculation of many thermodynamic and kinetic properties including mean first passage times (MFPTs),^{41,42} free energy profiles,^{43–45} committor probabilities,^{46} and time-correlation functions.^{47}

Recently, the milestoning method has been implemented in the SEEKR package, which led to a number of studies on the binding–unbinding kinetics of host–guest systems^{48–50} involving a combination of molecular dynamics (MD) and Brownian dynamics (BD) simulation.^{51} A major limitation of the milestoning scheme is that the reaction coordinate and the milestone positions must be predefined. In a recent study, principal component based reaction coordinates were used in milestoning to determine the millisecond scale residence time of the protein–ligand system.^{52} For placing milestones on rugged free energy landscapes with multiple minima, i.e., when intuition for a good reaction coordinate is lacking, an approach has been developed for identifying best milestones using machine learning.^{53}

The milestones need to be placed sufficiently far apart so that the memory of the starting milestone is lost before the trajectory hits one of the adjacent milestones.^{41} However, it requires considerable computational effort to converge the transition statistics for milestones placed too far from each other. This problem motivated our previous effort to accelerate milestoning trajectories using biasing forces.^{54} The resulting approach, wind assisted re-weighted milestoning (WARM),^{54} helps climbing steep energy landscapes and trajectory reweighting recovers the correct, bias-free kinetics. However, when the free energy surface is flat, i.e., in the diffusive regime, the reweighted averages and standard deviations of observables are not significantly better than normal milestoning.^{55}

As an alternative to improve sampling during milestoning, we suggested that weighted ensemble (WE) based methods can provide a great opportunity to enhance milestone-to-milestone transitions without applying artificial biasing forces.^{11} In our previous work, we blended the two methods and developed a combined weighted ensemble milestoning (WEM) scheme that performs equilibrium WE simulations^{56} in between milestones. WEM improves conventional milestoning by providing quicker convergence of the transition kernel, thereby allowing one to place the milestones sufficiently far from each other.^{11} The method brings in the mathematical framework of milestoning into the weighted ensemble framework for the calculation of free energy, kinetics, and time correlation function from a master equation-like approach. Moreover, it facilitates the parallelization of WE simulation over multiple milestones, significantly reducing the wall clock time for the simulation. Together, the WEM approach improves over both milestoning and weighted ensemble to compute many important experimental observables from short, easily obtainable MD simulations.

In this paper, we apply our WEM scheme by addressing existing ligand–receptor models regularly studied in the biophysics MD community. As a proof of principle, we first look into a simple model of the Na^{+}/Cl^{−} ion pair, which has previously been used as a primary test case for both WE^{32} and milestoning.^{51} Next, we study the biologically relevant system of 4-hydroxy-2-butanone (BUT) ligand dissociation from FK506 binding protein (FKBP).

FKBP is present in a wide range of eukaryotic cells and functions as a protein folding chaperone for proline containing proteins.^{57} FKBP was previously crystallized bound to BUT with 1.85 Å resolution. The study also reported the dissociation constant (*K*_{D}) to be 500 *μ*M and the binding free energy (Δ*G*) to be 18.9 kJ/mol, using fluorescence quenching measurement of the Trp59 residue near the binding pocket.^{58} The FKBP protein is one of the protein–ligand complexes that have been most widely studied with MD simulations.^{2,20,59,60} The residence time of the BUT ligand is of the order of 10 ns, making exhaustive sampling of reversible binding and unbinding events possible through brute force MD simulation.^{60} In addition, there is no significant protein conformational change coupled to the ligand release pathway. This provides a great opportunity to study new simulation methods to test for their accuracy and efficiency in comparison to regular MD.

An extensive computational study of the dissociation of multiple ligands including BUT from FKBP protein has been performed by Huang and Caflisch^{59} using the CHARMM22 force field^{61} for the protein and the CHARMM general force field (CGenFF)^{62} for the BUT ligand. Conformations sampled from multiple short MD runs were used to perform a complex network analysis from which the cut-based free energy profile (cut-based FEP) was computed following the method proposed by Krivov and Karplus.^{63} The unbinding events were characterized by a distance of more than 15 Å between the ligand center of mass and the FKBP binding site. The system was considered bound when that distance was less than 10 Å. Huang and Caflisch could calculate the residence times and binding Δ*G* in agreement with the experiment^{58} along with the contribution of electrostatic and van der Waals interactions to the binding affinity.^{59} The network analysis of the trajectory data also revealed multiple ligand poses of BUT inside the binding pocket and identified the residues, Asp37, Ile56, and Tyr82, as directly interacting with the ligand in its bound state.^{59}

Dickson and Lotz used their newly developed WExplore method^{38} to perform steady state simulations to calculate the unbinding time and to elucidate the ligand release mechanism and pathways for BUT along with two other ligands.^{20} They used the CHARMM36 force field^{64} for the protein, CGenFF force field^{65,66} for the ligand, and a reaction coordinate based on the root mean squared distance (RMSD) between the bound state of the ligand and the coordinates of the ligand during the simulation. Their results showed quantitative agreement with the work by Huang and Caflisch.^{59}

The most extensive work done so far is by Pan *et al.*, who performed 30 *µ*s of brute force MD simulation on the Anton supercomputer, starting from the FKBP–BUT crystal structure, to sample 277 reversible binding and unbinding events from which Δ*G*, *K*_{D}, and rate constants for binding (*k*_{on}) and unbinding (*k*_{off}) have been obtained.^{60} They also obtained identical results for binding free energy using the free energy perturbation (FEP) method.^{67} Their work was performed using the AMBER ff99SB^{*}-ILDN force field^{68–70} for protein and the generalized AMBER force field (GAFF)^{71} for the ligand. They used the distance between the Trp59 residue and the ligand as the reaction coordinate, and trajectories were considered unbound if the value of the distance is larger than 6 Å.

Pramanik *et al.* also performed multiple microseconds of conventional MD and metadynamics simulation for the FKBP–BUT complex to understand the reliability of kinetic observables calculated from the infrequent metadynamics approach.^{2} They used identical force field parameters as Pan *et al.*^{60} but a complex reaction coordinate obtained using the spectral gap optimization of order parameters (SGOOP) method.^{72–74} Their distance criterion for unbinding is as low as 1.8 Å of separation between Trp59 and BUT.^{2} Their calculated unbinding time and free energy values are consistent with the work of Pan *et al.* but largely different from the experimental numbers^{58} and the extensive simulation results by Huang and Caflisch.^{59}

The availability of both computational and experimental studies on the FKBP–BUT complex motivated us to choose this particular system for our current study. Here, we aim to assess the accuracy and efficiency of the WEM method for simulating ligand–receptor binding–unbinding dynamics.

The rest of our paper is organized as follows: Section II provides a brief description of the WEM scheme and methods of error estimation followed by the computational details in Sec. III. In Sec. IV, we discuss our results for the Na^{+}/Cl^{−} and FKBP/BUT systems and put them into the context of previous studies. Finally, in Sec. V, we conclude by mentioning the key advantages of WEM simulation along with possible directions of improvement and future research.

## II. THEORY

### A. Weighted ensemble milestoning (WEM) scheme

In our previous paper, we introduced the weighted ensemble milestoning (WEM) scheme and provided a detailed description of the algorithm.^{11} Here, we only provide a brief overview.

In the original milestoning method, high dimensional interfaces are placed along the reaction coordinate. They confer on configuration space a stratified structure in which multiple short MD trajectories are propagated in between the interfaces. The probability of transition between milestones is estimated by counting the fraction of the trajectories that start from a given milestone and reach any of the nearest adjacent milestones. The lifetime of a milestone is defined as the average amount of time spent by trajectories staring from it before reaching nearby milestones.

The transition probability between milestones *i* and *j* (*K*_{ij}) and the lifetime of milestone *i* ($T\xafi$) are given by $Kij=Ni\u2192jNi$ and $T\xafi=\u2211l=1NtlNi$, where *N*_{i} is the number of trajectories starting from milestone *i*, *N*_{i→j} is the number of such trajectories ending at milestone *j*, and *t*_{l} is the time spent by the *l*th such trajectory before hitting either of the nearest milestones.^{39–41} Along a one-dimensional reaction coordinate, which is the case considered in this paper, *j* = *i* ± 1.

The gist of the WEM method is to further stratify the space between milestones by performing weighted ensemble simulations therein. We put an integer number of WE bins in between milestones. Trajectories are propagated starting from a given milestone. If the trajectories enter a new bin at the end of an iteration time *δt*, they are split into multiple trajectories or merged into one, maintaining an equal number of trajectories in each bin. The weights of the trajectories are redistributed during this splitting/merging process to conserve the total probability. A stochastic thermostat—such as the Langevin thermostat used herein—ensures that multiple trajectories generated from the same initial configuration follow different paths upon trajectory splitting (although subtleties regarding the choice of random seeds need to be considered^{75}).

Apart from counting the number of trajectories hitting nearby milestones, we now also record their weights as they have different probabilities. The lifetime of milestone *i* in WEM is given by

where the sum is over all trajectories hitting any of the adjacent milestones. The *t*_{k} is the time spent by the *k*th trajectory before reaching a different milestone and *w*_{k} is its weight. The elements of the transition kernel **K** are given by

where Γ(*i* → *j*) is the set of trajectories starting from milestone *i* and reaching at *j* before any other milestone. The *K*_{ij} is zero for milestones that are not adjacent to each other because a trajectory cannot reach further milestones in 1D before reaching the nearest milestones.

The stationary flux through each milestone *q*_{i} can be obtained from the elements of the left principal eigenvector of the transition kernel,^{40}

where superscript *T* denotes transpose. Equation (3) is equivalent to solving the linear equations $\u2211iqiKij=qj$. It may seem counter-intuitive that the principal eigenvector of **K** gives fluxes and not equilibrium probabilities as in the case of a Markov process. A rational behind this is discussed in our previous work.^{11}

The equilibrium probability of milestone *i*, defined as the probability of a trajectory that has started from milestone *i* and has not crossed any other milestones, can be obtained by

and subsequently, the free energy of milestone *i* is computed as

where *P*_{eq,0} is a reference probability usually chosen to be that of the most probable milestone.

Before the application of Eq. (3), a reflective boundary condition has to be implemented in matrix **K** to avoid draining of the probabilities through the last milestone. For example, in a 3-milestone system, the original transition kernel is modified in the following way:^{43,76}

For calculating the rate constant or mean first passage time, a steady state flux has to be established. To achieve that, the **K** matrix is modified in the following way (for a 3-milestone case) where the flux passing through the last milestone is fed back into the starting milestone,^{43,76}

After Eq. (3) is used to obtain the steady state flux, the mean first passage time can be obtained from it using the following expression:^{40,43}

where *q*_{f} denotes the steady state flux through the final or product milestone and the summation runs over all the milestones.

The mean first passage time of the reverse process can be obtained from a new transition kernel **K**′ and lifetime vector $T\xaf\u2032$, where $Kij\u2032$ = *K*_{M−i,M−j} and $T\xafi\u2032=T\xafM\u2212i$ for all *i*, *j*, where *M* is the total number of milestones. Basically, the first milestone is relabeled as the last milestone, the second one is relabeled as the second last, and so on.

So, for a protein–ligand system, we can obtain both unbinding and binding times ⟨*τ*⟩_{off} and ⟨*τ*⟩_{on} using the milestoning framework. We can use them to obtain the unbinding and binding rate constants *k*_{off} and *k*_{on} using the following expression from Pan *et al.*:^{60}

where *v* is the volume of the equilibrated simulation box, *c*^{0} is the standard molar concentration, i.e., 1.0M, and *N*_{av} is Avogadro’s number. The dissociation constant *K*_{D} is then calculated as

which gives the standard binding free energy of

where we infer *K*_{D} to be a unitless ratio relative to the standard concentration of 1M.^{77}

Alternatively, the binding free energy can be obtained from the free energy profile or the potential of mean force (PMF).^{78} For reaction coordinates based on the radial distance [such as the ligand–receptor distance (*r*) used in the current work], the PMF can be computed from the free energy profile *G*(*r*) by correcting for the Jacobian factor,

From the PMF, the binding constant $Kbind(or\u20091KD)$ is computed as

where *r*_{c} is the cutoff distance for the unbinding event. The standard binding free energy equivalent to Eq. (11) is given by

For standard concentration, the value of *C*^{0} is (1/1660) Å^{−3}.^{77,78}

Recent developments in milestoning theory opened up the possibility of calculating the committor distribution along the transition coordinate.^{46} The committor *C*_{i} of a milestone *i* is defined as the probability of a trajectory starting from milestone *i* to eventually reach the product milestone before going to the reactant milestone. To calculate committor values, the transition kernel **K** has to be modified with such a boundary condition that trajectories reaching the product milestone will “stick” there and will not return to previous milestones.^{43,46} For a 3 milestone case, this can be illustrated as

The committor vector **C** is then calculated as

where **e**_{p} is a unit vector whose all elements are zero except for the one corresponding to the final milestone. Numerically, multiple powers of **K**_{C} are computed until the committor converges. We considered the committors converged when the change in the norm of the **C** vector is less than 10^{−3}.

### B. Error analysis

There are multiple ways to perform error analysis for the WE and milestoning methods. We calculated the uncertainties in our results by generating an ensemble of transition matrices sampled from a Bayesian type conditional probability described in Refs. 80, 81, and 51. We described this method in detail in our previous work.^{11} A brief outline is provided here.

The rate matrix **Q** is defined as

Given a set of transition counts and lifetimes, the probability of obtaining the matrix **Q** is given by

where *P*(**Q**) is a uniform prior and *N*_{i} is the total number of trajectories starting from milestone *i*, out of which *N*_{ij} trajectories hit milestone *j*. The **Q** matrices are sampled from the distribution in Eq. (18) using a non-reversible element exchange Monte Carlo scheme.^{82} In each step, one randomly chosen element and the diagonal element of the corresponding row of **Q** are updated as

where Δ is a random number sampled from an exponential distribution of range [−*Q*_{ij}, *∞*) with mean zero. The new matrix **Q**′ is accepted with a probability,

As only one element is modified in one step, the sampled matrices are highly correlated.^{82} So, we performed 10 000 sampling steps out of which some changes are accepted and the others are rejected. We only selected the matrices every 500 steps for our analysis. This provided us with 20 distinct **Q** matrices. From each **Q**, the transition kernel **K** and lifetime vector $T\xaf$ are extracted as

Then, all the calculations described in Sec. II A are performed on each set of **K** and$T\xaf$. The mean and 95% confidence interval on these datasets have been reported throughout the manuscript unless mentioned otherwise.

## III. COMPUTATIONAL METHODS

### A. Systems studied

In this work, we studied the association of sodium and chloride ions in solution as well as the binding of the 4-hydroxy-2-butanone (BUT) ligand to FK506 binding protein (FKBP). Here, we give the simulation details for both systems.

### B. NaCl in water

For our studies of association of sodium chloride in water, all molecular dynamics simulations were performed using the NAMD 2.13 package.^{79} A 40 × 40 × 40 Å^{3} water box was prepared with TIP3P water.^{83} A Na^{+} ion was placed at the center of the box. A Cl^{−} ion was placed along the x axis at the following distances from the Na^{+} ion: 2.45 Å, 2.7 Å, 3.6 Å, 4.6 Å, 5.6 Å, 7.0 Å, 9.0 Å, 11.0 Å, and 13.0 Å. These are the positions of the milestones in this study. The CHARMM36 force field^{61} was used to model the sodium and chloride ions. WE bins were placed in between milestones at 0.05 Å interval between 2.45 Å and 2.7 Å, 0.2 Å interval between 2.8 Å and 3.6 Å, and 0.5 Å interval afterward except for a 0.4 Å interval between 5.6 Å and 6.0 Å. At distances above 6 Å, all bins were 0.5 Å apart. Our choice of milestone positions was influenced by previous studies that suggest that the bound state minima of the Na^{+}/Cl^{−} ion pair is at 2.7 Å distance and the transition state of dissociation is at 3.6 Å.^{84} Initially, all the structures were minimized for 10 000 steps using the conjugate gradient algorithm, followed by an equilibration of 200 ps, with fixed ion positions. A time step of 1 fs was used. All simulations were performed in the NPT ensemble. The temperature was kept constant at 298 K using a Langevin thermostat with a coupling constant of 5 ps^{−1}, and the pressure was maintained constant at 1 atm using a Langevin barostat.^{85} Particle-mesh Ewald summation^{86} was used for electrostatics, with a real space cutoff of 12 Å for the non-bonded interactions.

All WEM trajectories were started from the endpoints of the equilibration simulations. Five trajectories were propagated in each of the occupied bins. Multiple (500–2000) iterations were performed with *δt* = 0.1 ps until results converged. Convergence plots for *K*_{ij} values are shown in the supplementary material. Trajectories were split and merged if they crossed the bin boundaries after each iteration, preserving the total probability. The weights and propagation times of the trajectories reaching either of the nearest milestones were recorded and analyzed according to the WEM scheme (Sec. II A).

Because stopping a trajectory exactly at the milestone interface^{40} is difficult in the current framework, we allow the trajectories to propagate until the end of the iterations. This incurs small additional computational costs but avoids the need to externally monitor the simulation at every MD time step. We do not split or merge the trajectories after crossing the milestones, and we exclude the excess part of the trajectory from our analysis. This is not a problem as we are not doing exact milestoning that requires restarting trajectories from the same phase space point at which a previous trajectory has crossed a milestone. In short, we performed a single iteration of the exact milestoning but accelerating the convergence of the transition kernel and lifetime matrix by applying WE.

The stationary flux, free energy profile, and mean first passage time (MFPT) were computed using Eqs. (3), (5), and (8), respectively. MFPT of the reverse process was also calculated from which information about *k*_{on} can be obtained. To validate our results, a 300 ns long equilibrium MD simulation was performed starting from the bound state of NaCl with identical simulation parameters. The distance between Na^{+} and Cl^{−} is stored every 500 fs, and the free energy is computed from the distribution. Mean first passage times and rates (*k*_{on} and *k*_{off}) were computed from ten dissociation and association events observed during the simulation. The sampling is not exhaustive but provides an order of magnitude estimate of the binding and unbinding kinetics.

### C. FKBP–BUT complex

To study a protein–ligand system, we focused on binding and unbinding of the BUT ligand to the FKBP protein. We obtained the initial structure from the crystal structure in the RCSB PDB database (PDB ID: 1D7J).^{58} The protein is modeled by the CHARMM36m force field with CMAP correction.^{64} The parameters for the ligand were generated using CGenFF.^{65,66} The protein was solvated in a water box of size 89 × 68 × 76 Å^{3} to provide adequate space for the ligand to release. The additional space was also needed to observe the convergence of the free energy profile of the ligand as a function of distance from the protein. One Cl^{−} ion was added to the system to neutralize the charge (following Ref. 20) as necessary for the particle mesh Ewald (PME) method. All input files were generated using the CHARMM-GUI web server^{87} and Psfgen package of VMD.^{88} All simulations were performed using NAMD 2.13.^{79}

The ligand bound FKBP was first minimized by 50 000 steps using the conjugate gradient algorithm followed by a short equilibration with gradually decreasing harmonic restraint on the protein and ligand heavy atoms over 600 ps. The system is further equilibrated for 1 ns in the NPT ensemble with no restraint on heavy atoms. The time step for equilibration was kept at 1 fs, and bonds to hydrogen atoms were constrained using the SHAKE^{89} algorithm. From the final structure, a production run, with 2 fs time step, is performed for 20 ns to generate a ligand unbinding trajectory. From this trajectory, the anchors (seed structures) for individual milestones were obtained. For all simulations, a Langevin thermostat with a coupling constant of 1 ps^{−1} is used to maintain the temperature at 298 K. A Langevin barostat kept the pressure constant at 1 atm. The simulation parameters were identical for all further simulation.

The reaction coordinate for ligand dissociation was chosen to be the distance (*r*) between the center of mass of the protein and the ligand, following previous work.^{59} The bound state from the crystal structure is characterized by *r* ≈ 6 Å. Three trials of independent WEM simulations were performed. Milestones were placed at *r* = 5 Å and at 2 Å interval between *r* = 6 Å and 28 Å. Starting structures for each protein–ligand distance (milestone) was sampled from the ligand release trajectory. Each structure was equilibrated for 100 ps with strong harmonic restraint along the reaction coordinate (*r*) with a force constant gradually increasing up to 80 kcal mol^{−1} Å^{−2} over the first 50 ps and kept constant for the rest 50 ps. The final frame was used to propagate WE simulation. Due to the stochastic nature of the thermostat, repetition of this exact same procedure three times generates three different sets of WEM input structures per milestone, which we used for three trials.

We tried different milestone spacings (results not shown in this paper), but 2 Å spacing is found to be optimum. The trajectories need to lose the memory of their starting point before reaching a different milestone. The loss of memory can be quantified by calculating the velocity autocorrelation function of the reaction coordinate. All the trajectories starting from any given milestone should spend longer than the de-correlation time before reaching another milestone. In the supplementary material, we show that a 2 Å spacing in milestones satisfies this criterion. If a shorter interval is used, the trajectories fail to lose memory of the starting structure, while a larger spacing provides a very sparse free energy profile that is hard to interpolate. WE bins were placed at 0.5 Å interval, and each iteration of WE calculation has *δt* = 2 ps. The bin width and *δt* pair satisfy the condition that a fraction of the trajectories cross the bin boundary in about one iteration.^{90} The number of trajectories per WE bin was kept constant at 5.

To compare the residence time and *k*_{off} results, 20 independent conventional MD trajectories, with different starting velocities, were propagated starting from the equilibrated bound FKBP–BUT complex. Each trajectory was propagated for 20 ns. Out of them, 17 trajectories showed ligand release events. The mean and 95% confidence interval were calculated for the residence time from those trajectories, and *k*_{off} was computed using Eq. (9). All the previous studies with this system used either a different force field or different reaction coordinates and distance criteria for binding/unbinding. To make a fair comparison with the binding free energy data from WEM, a well-tempered metadynamics simulation was performed starting from the equilibrated bound state structure and with simulation parameters identical to those used in the rest of this paper. The free energy profile was obtained as a function of protein–ligand distance, and the PMF was computed by correcting for the radial Jacobian factor. From the PMF, the binding free energy was calculated using Eqs. (13) and (14). Further details about the metadynamics simulation are provided in the supplementary material.

### D. Harmonic restraint release

We obtained very high variability of the free energy and *k*_{off} values for the FKBP–BUT complex using our standard WEM scheme,^{11} which uses only one starting structure per milestone for each independent WEM run. We realized the necessity to have multiple starting structures for obtaining appropriate statistics although we can get multiple hitting points from a single starting structure. One way to achieve that is to perform multiple rounds of WEM simulation. The combined statistics from independent runs for each individual milestone could then be used to construct the transition kernel (**K**) and lifetimes ($T\xaf$). This process was performed for the three independent WEM trials, and all the observables were computed using the combined (or averaged) **K** and $T\xaf$ using milestoning protocol described in Sec. II A. The results of this exercise are referred to as “WEM average.”

However, this did not improve our results significantly, as described in Sec. IV. Increasing the number of starting structures to a large extent is not practical because in the weighted ensemble method, multiple daughter trajectories are generated through splitting of a small number of initial parent trajectories. Another possibility is to discard the first few iterations of the WEM trajectories for each milestone. In this way, we get a set of trajectories with different starting points. However, most of such trajectories have their starting points in a bin that is far from the milestone interface. So, most of the simulation effort in the initial iterations would be wasted in generating irrelevant starting structures. To address this, we used a harmonic restraint around the milestone interface and performed a few (10 in this case) WE iterations starting from one starting structure. As we allow five trajectories per bin, we eventually get a total of ten trajectories, five slightly left of the milestone and five slightly to the right. This is a result of the fact that a WE bin boundary coincides with the milestone interface. We chose the harmonic restraint to be 50 kcal mol^{−1} Å^{−2} for all milestones except for the one at *r* = 5 Å, which had 100 kcal mol^{−1} Å^{−2}. The advantage of this technique is that, even in the presence of a very steep energy landscape, we get an equal number of starting points on both sides of the milestone. This almost never happened in the standard WEM protocol, and the one starting point is generally on the side of the lower free energy. Having an equal number of points on both sides does not cause any errors in probability calculation as these structures will have different weights to begin with. This modification, which we call weighted ensemble milestoning with restraint release (WEM-RR) (Fig. 1), has been applied to study the FKBP–BUT complex. Both procedures were initiated from the same structures on each milestone in order to measure the benefit gained from the applied restraint-release step. The averaging of the WEM-RR results was also performed for three independent trials, following the same protocol used for the regular WEM method, and the results were reported as “WEM-RR average.”

## IV. RESULTS

### A. NaCl in water

Figure 2 shows the free energy profiles as a function of Na^{+} to Cl^{−} distance (*r*) computed from both conventional MD and WEM simulation. For conventional MD, the value of *r* was sampled from a 300 ns MD simulation and the free energy profile was computed from the histogram of the data. There is a deep minimum at *r* = 2.7 Å indicating the bound state, which is followed by a sharp free energy barrier at *r* = 3.6 Å. At a slightly larger separation *r* = 5.2 Å, there is a shallow minimum indicating the formation of a second hydration shell between the ion pair.^{91,92} The free energy profile is flat after a separation of 7 Å and is devoid of any interesting features. So, we define a state to be unbound when *r* > 7 Å and bound when *r* < 3.6 Å.

The free energy profile obtained from all three sets of weighted ensemble milestoning (WEM) simulation shows excellent agreement with the results from conventional MD. The free energy minima and the barrier heights could be estimated with <1*k*_{B}*T* accuracy (Fig. 2). The Jacobian corrected PMF for the system has also been calculated using Eq. (12), as shown in Fig. 2. Small structural features such as the formation of a second hydration shell is not very apparent from the WEM free energy profiles because in milestoning based methods, the free energy can only be calculated at the milestone interfaces, leading to values of the PMF being defined only at intervals that are well separated, making it hard to distinguish the finer structure. To obtain more detailed features of the PMF, we would need to place milestones at smaller intervals. However, the interval must be large enough such that the process remains memoryless. Because of this, there is an effective limit to the level of detail the PMF can represent. If finer details of the ruggedness in between milestones are needed, non-milestoning methods (such as umbrella sampling or other biased methods) can be used locally.

We computed the mean first passage times (MFPTs) for transitions from the bound milestone (minimum at *r* = 2.7 Å) to all other milestones (Fig. 2). The results of three trial WEM runs agree with each other. The unbinding time computed from the continuous MD trajectory is in accord with the MFPT from the milestone at *r* = 2.7 Å to *r* = 7.0 Å calculated using the WEM method (Table I). Since we only observed 10 transition events between the bound and unbound states in the 300 ns conventional MD simulation, the residence time (⟨*τ*⟩_{off}) has high uncertainty (Table I).

Simulation . | ⟨τ⟩_{off} (ns)
. | ⟨τ⟩_{on} (ns)
. | k_{off} (×10^{6} s^{−1})
. | k_{on}^{a} (×10^{9} M^{−1} s^{−1})
. | K_{D} (M)
. | ΔG°^{b} (k_{B}T)
. |
---|---|---|---|---|---|---|

WEM trial 1 | 1.02 ± 0.09 | 30.5 ± 5.00 | 910 ± 87 | 1.2 ± 0.19 | 0.8 ± 0.14 | 0.2 ± 0.2 |

WEM trial 2 | 1.23 ± 0.14 | 57.9 ± 8.78 | 810 ± 93 | 0.61 ± 0.09 | 1.3 ± 0.25 | −0.3 ± 0.2 |

WEM trial 3 | 0.81 ± 0.07 | 45.0 ± 3.55 | 1200 ± 110 | 0.78 ± 0.06 | 1.5 ± 0.18 | −0.4 ± 0.1 |

WEM average | 0.97 ± 0.08 | 37.3 ± 5.04 | 1030 ± 85 | 0.90 ± 0.13 | 1.1 ± 0.19 | −0.1 ± 0.2 |

Long MD | 1.25 ± 0.81 | 285.8 ± 193.6 | 800 ± 520 | 0.12 ± 0.08 | 7.0 ± 6.2 | −1.9 ± 0.9 |

Simulation . | ⟨τ⟩_{off} (ns)
. | ⟨τ⟩_{on} (ns)
. | k_{off} (×10^{6} s^{−1})
. | k_{on}^{a} (×10^{9} M^{−1} s^{−1})
. | K_{D} (M)
. | ΔG°^{b} (k_{B}T)
. |
---|---|---|---|---|---|---|

WEM trial 1 | 1.02 ± 0.09 | 30.5 ± 5.00 | 910 ± 87 | 1.2 ± 0.19 | 0.8 ± 0.14 | 0.2 ± 0.2 |

WEM trial 2 | 1.23 ± 0.14 | 57.9 ± 8.78 | 810 ± 93 | 0.61 ± 0.09 | 1.3 ± 0.25 | −0.3 ± 0.2 |

WEM trial 3 | 0.81 ± 0.07 | 45.0 ± 3.55 | 1200 ± 110 | 0.78 ± 0.06 | 1.5 ± 0.18 | −0.4 ± 0.1 |

WEM average | 0.97 ± 0.08 | 37.3 ± 5.04 | 1030 ± 85 | 0.90 ± 0.13 | 1.1 ± 0.19 | −0.1 ± 0.2 |

Long MD | 1.25 ± 0.81 | 285.8 ± 193.6 | 800 ± 520 | 0.12 ± 0.08 | 7.0 ± 6.2 | −1.9 ± 0.9 |

^{a}

*v* = 58 500 Å^{3}.

^{b}

Estimated using Eq. (11).

Using the above comparison, the binding time (⟨*τ*⟩_{on}) from the WEM simulations was computed as the mean first passage time for the transition from milestone at *r* = 7.0 Å to the bound interface (*r* = 2.7 Å). The average binding and unbinding times were found to be ∼37 ns and ∼1 ns (Table I), respectively. Compared to the conventional simulation, the unbinding time showed good agreement, but the binding times differed by an order of magnitude. We believe this is primarily because of the poor statistics of binding events captured in the conventional simulation.

We used Eq. (9) to estimate the unbinding and binding rate constants *k*_{off} and *k*_{on}, respectively. The volume of the equilibrated water box fluctuated between 58 000 Å^{3} and 59 000 Å^{3} in the constant pressure simulation. We used the average volume *v* = 58 500 Å^{3} in Eq. (9). The values of *k*_{off} and *k*_{on} estimated from WEM simulation are 10^{9} s^{−1} and 10^{9}M^{−1} s^{−1}, respectively. The mean first passage times and rate constants are within one order of magnitude of those obtained from NaCl dissociation and association studies using the AMBER force field.^{51,91,93} The WEM simulations produced a dissociation constant *K*_{D} close to 1M and a binding free energy ∼0 [using Eq. (10)]. These numbers are within 1*k*_{B}*T* of the Δ*G* values calculated from separate classical CHARMM and *ab initio* simulations reported by Timko *et al.*^{84}

In essence, the application of the WEM method to the dissociation of the Na^{+}/Cl^{−} ion pair produced a free energy profile and residence time consistent with conventional MD simulation. Additionally, the binding time, rate constants, and dissociation constant could also be calculated.

### B. FKBP–BUT complex

Next, we apply WEM to a protein/ligand unbinding process with a relatively complex energy landscape. Although not as simple as the case of NaCl, we show that WEM-RR is still able to produce results comparable to expensive, extended MD simulations at a fraction of the computational cost.

The binding thermodynamics and kinetics of the ligand BUT to the enzyme FKBP were studied using WEM sampling along the center of mass separation. The free energy profile of BUT ligand dissociation from the FKBP protein is illustrated in Fig. 3. All trials showed a clear minimum corresponding to the bound state at about 6 Å. This is consistent with the protein–ligand distance in the crystal structure. The free energy converges to a constant value after 14 Å, with fluctuations <1*k*_{B}*T*, indicating that the ligand is unbound, with little influence of the protein on the ligand. From this observation, we used the milestone at 16 Å as the point where BUT is considered unbound for the purposes of calculating kinetics, specifically *k*_{off}. Similarly, for calculating residence time, we report the mean first passage time of transition from the milestone at *r* = 6 Å to the milestone at *r* = 16 Å. The value of *r*_{c} in Eq. (13) for integrating the PMFs was also chosen to be *r* = 16 Å.

The free energy profiles calculated from three independent WEM trials showed a large variance of binding free energy between 8 and 12*k*_{B}*T*. Similarly, for the ligand residence time in Fig. 4, we observed results spanning over two orders of magnitude. To investigate these issues with the aim to increase the accuracy of the WEM method, we focused on ways to improving the initial conditions of the method. Since the WEM approach is based on the approximate form of milestoning,^{41} compared to the much more expensive exact approach,^{40} we found that our calculations were sensitive to the starting coordinates defined on each milestone. To alleviate the sampling issues, we show that inclusion of a “restraint-release” cycle prior to WEM application allows for a more thorough sampling of initial conditions (see Sec. III for details).

We generated multiple starting points per milestone using harmonic restraint release. These starting points were used to propagate unbiased WEM trajectories for further analysis. This approach significantly reduces the variance of the data both for free energy profiles (see Fig. 3) and for the residence time (Fig. 4). The ten starting structures sampled from the harmonically restrained first few WE iterations for one of the milestones are shown in the supplementary material. Multiple distinct orientations of the ligand were sampled within these ten starting structures, which enhanced the accuracy of the milestoning calculation.

The residence time and unbinding rate constants were calculated using WEM and WEM-RR simulations and compared against the average value of 20 conventional MD trajectories. The mean first passage time to all milestones have been calculated and depicted in Fig. 4. The results of WEM-RR show significantly better agreement with the conventional MD results compared to the regular WEM, which overestimated the timescales by a few orders of magnitude. The binding free energy was also computed from the PMFs obtained from WEM and WEM-RR simulation (Fig. 3) using Eqs. (13) and (14). The results were compared against the Δ*G*^{0} from the PMF obtained from well tempered metadynamics simulation (see the supplementary material).

Comparing the results with previous simulation studies should be done carefully because different force fields, reaction coordinates, and distance criteria for unbinding have been used. The residence time and binding free energy of our work (WEM-RR) agrees better with the results of Huang and Caflisch, and Lotz and Dickson; both groups used a CHARMM force field in their simulation. Additionally, the ligand residence time from our work is within the same order of magnitude with that of the AMBER ff99SB^{*}-ILDN + GAFF simulations^{2,60} (Table II). Our reaction coordinate (RC) is similar to the binding-pocket to ligand distance used by Huang and Caflisch but different from the RCs used in other studies. For example, Dickson and Lotz used a ligand RMSD based RC, Pan *et al.* used the distance between Trp59 and the ligand, while Pramanik *et al.* constructed a novel RC based on a linear combination of multiple inter-atomic distances involving the protein and the ligand. Despite the different definitions for RC, our kinetic results are consistent with the existing literature.

. | . | . | k_{off} (×10^{6})
. | k_{on}^{a} (×10^{9})
. | $kondiff$ (×10^{9})
. | K_{D}
. | $KDdiff$ . | ΔG°^{b}
. | ΔG°^{,diff}
. | ΔG°^{c}
. |
---|---|---|---|---|---|---|---|---|---|---|

Simulation . | ⟨τ⟩_{off} (ns)
. | ⟨τ⟩_{on} (ns)
. | (s^{−1})
. | (M^{−1} s^{−1})
. | (M^{−1} s^{−1})
. | (μM)
. | (mM) . | (k_{B}T)
. | (k_{B}T)
. | (k_{B}T)
. |

WEM trial 1 | 63.7 ± 18.2 | 0.55 ± 0.10 | 16 ± 4.5 | 410 ± 75 | … | 40 ± 13 | … | 10.1 ± 0.3 | … | 9.3 |

WEM trial 2 | 1976 ± 491 | 0.74 ± 0.08 | 0.5 ± 0.13 | 300 ± 33 | … | 1.7 ± 0.5 | … | 13.3 ± 0.3 | … | 12.9 |

WEM trial 3 | 475 ± 122 | 0.81 ± 0.07 | 2.1 ± 0.54 | 280 ± 24 | … | 8 ± 2 | … | 11.7 ± 0.2 | … | 11.6 |

WEM average | 242 ± 66.4 | 0.64 ± 0.10 | 4 ± 1.1 | 350 ± 55 | … | 11 ± 3.6 | … | 11.4 ± 0.3 | … | 10.9 |

WEM-RR trial 1 | 8.11 ± 2.38 | 0.63 ± 0.08 | 120 ± 36 | 410 ± 52 | 1.1 | 290 ± 95 | 109 ± 33 | 8.1 ± 0.3 | 2.2 ± 0.3 | 8.4 |

WEM-RR trial 2 | 6.84 ± 1.63 | 0.45 ± 0.02 | 150 ± 35 | 500 ± 22 | 0.48 | 300 ± 71 | 312 ± 73 | 8.1 ± 0.2 | 1.2 ± 0.2 | 8.3 |

WEM-RR trial 3 | 13.4 ± 3.73 | 0.53 ± 0.05 | 70 ± 21 | 480 ± 45 | 0.56 | 150 ± 46 | 125 ± 37 | 8.8 ± 0.3 | 2.1 ± 0.3 | 9.1 |

WEM-RR average | 9.25 ± 2.56 | 0.53 ± 0.04 | 110 ± 31 | 420 ± 35 | 0.71^{d} | 260 ± 77 | 170 ± 26 ^{e} | 8.3 ± 0.3 | 1.8 ± 0.2 | 8.7 |

Conventional MD | 7.15 ± 2.95 | … | 140 ± 58 | … | … | … | … | … | … | 2.3^{f} |

Experiment^{58} | … | … | … | … | … | 500 | … | 7.6 | … | … |

Huang and Caflisch^{59} | 8 ± 2 | … | … | … | … | … | … | 7.33^{g} | … | … |

Dickson and Lotz^{20} | 4 ± 1, 7 ± 2 | … | … | … | … | … | … | … | … | … |

Pramanik et al.^{2} | 21.3 ± 0.2, | … | … | … | … | … | … | … | … | 2.0 |

27.3 ± 0.1 | ||||||||||

Pan et al.^{60} | … | … | 45 ± 4 | … | 1.23 ± 8 | … | 38 ± 4 | … | 3.35 ± 10.0 | … |

. | . | . | k_{off} (×10^{6})
. | k_{on}^{a} (×10^{9})
. | $kondiff$ (×10^{9})
. | K_{D}
. | $KDdiff$ . | ΔG°^{b}
. | ΔG°^{,diff}
. | ΔG°^{c}
. |
---|---|---|---|---|---|---|---|---|---|---|

Simulation . | ⟨τ⟩_{off} (ns)
. | ⟨τ⟩_{on} (ns)
. | (s^{−1})
. | (M^{−1} s^{−1})
. | (M^{−1} s^{−1})
. | (μM)
. | (mM) . | (k_{B}T)
. | (k_{B}T)
. | (k_{B}T)
. |

WEM trial 1 | 63.7 ± 18.2 | 0.55 ± 0.10 | 16 ± 4.5 | 410 ± 75 | … | 40 ± 13 | … | 10.1 ± 0.3 | … | 9.3 |

WEM trial 2 | 1976 ± 491 | 0.74 ± 0.08 | 0.5 ± 0.13 | 300 ± 33 | … | 1.7 ± 0.5 | … | 13.3 ± 0.3 | … | 12.9 |

WEM trial 3 | 475 ± 122 | 0.81 ± 0.07 | 2.1 ± 0.54 | 280 ± 24 | … | 8 ± 2 | … | 11.7 ± 0.2 | … | 11.6 |

WEM average | 242 ± 66.4 | 0.64 ± 0.10 | 4 ± 1.1 | 350 ± 55 | … | 11 ± 3.6 | … | 11.4 ± 0.3 | … | 10.9 |

WEM-RR trial 1 | 8.11 ± 2.38 | 0.63 ± 0.08 | 120 ± 36 | 410 ± 52 | 1.1 | 290 ± 95 | 109 ± 33 | 8.1 ± 0.3 | 2.2 ± 0.3 | 8.4 |

WEM-RR trial 2 | 6.84 ± 1.63 | 0.45 ± 0.02 | 150 ± 35 | 500 ± 22 | 0.48 | 300 ± 71 | 312 ± 73 | 8.1 ± 0.2 | 1.2 ± 0.2 | 8.3 |

WEM-RR trial 3 | 13.4 ± 3.73 | 0.53 ± 0.05 | 70 ± 21 | 480 ± 45 | 0.56 | 150 ± 46 | 125 ± 37 | 8.8 ± 0.3 | 2.1 ± 0.3 | 9.1 |

WEM-RR average | 9.25 ± 2.56 | 0.53 ± 0.04 | 110 ± 31 | 420 ± 35 | 0.71^{d} | 260 ± 77 | 170 ± 26 ^{e} | 8.3 ± 0.3 | 1.8 ± 0.2 | 8.7 |

Conventional MD | 7.15 ± 2.95 | … | 140 ± 58 | … | … | … | … | … | … | 2.3^{f} |

Experiment^{58} | … | … | … | … | … | 500 | … | 7.6 | … | … |

Huang and Caflisch^{59} | 8 ± 2 | … | … | … | … | … | … | 7.33^{g} | … | … |

Dickson and Lotz^{20} | 4 ± 1, 7 ± 2 | … | … | … | … | … | … | … | … | … |

Pramanik et al.^{2} | 21.3 ± 0.2, | … | … | … | … | … | … | … | … | 2.0 |

27.3 ± 0.1 | ||||||||||

Pan et al.^{60} | … | … | 45 ± 4 | … | 1.23 ± 8 | … | 38 ± 4 | … | 3.35 ± 10.0 | … |

Other important properties of ligand–receptor interaction, such as *k*_{off}, *k*_{on}, *K*_{D}, and Δ*G*, were calculated, and the results are reported in Table II. The value of ⟨*τ*⟩_{on} was calculated as the MFPT of the transition from milestone *r* = 16 Å to milestone *r* = 6 Å. The *k*_{on} was then obtained using Eq. (9). The binding affinity *K*_{D} is obtained as a ratio of *k*_{off} and *k*_{on}. The ⟨*τ*⟩_{off} and *k*_{off} values computed from WEM-RR agree very well with the conventional MD results. It is difficult to compare the numbers for ⟨*τ*⟩_{on} and *k*_{on} as they are relatively hard to obtain from direct MD simulation because of the large computational cost as demonstrated for the Na^{+}/Cl^{−} ion pair. As our residence times agree with that of conventional MD simulation as well as literature results and our *K*_{D} (*k*_{off}/*k*_{on}) shows order-of-magnitude similarity with the experimental *K*_{D}, we can say with some degree of confidence that our binding times and rate constants are reasonably accurate. We observed a discrepancy by two orders of magnitude in our *K*_{D} value with those obtained from equilibrium simulations by Pan *et al.*^{60} We address this issue and propose modifications to our current analyses in the next two paragraphs. In short, our results show that the calculated kinetic parameters are sensitive to how the diffusive regime is modeled.

It can be argued that while the binding rate constant is often diffusion limited, the effect of diffusion is ignored when *k*_{on} is computed from the transition timescale from one milestone to another. To the best of our knowledge, there are no experimental data on *k*_{on} for the FKBP–BUT system to compare to. The only available literature value is the result obtained by Pan *et al.* using very different criteria for binding or unbinding and at the same time exhibiting a very large variance. As a consistency check of the numbers, we calculated the MFPT to all other milestones from *r* = 6 Å and the MFPT to *r* = 6 Å from all other milestones. Using this information, we calculated *k*_{on} and *k*_{off} for all milestones (considering every milestone with *r* > 6 Å one at a time as the cutoff for unbinding). The value of binding affinity *K*_{D}, computed as the ratio *k*_{off}/*k*_{on}, converges rapidly with increasing *r* and remains invariant at *r* > 12 Å (see the supplementary material). This indicates that the distance criterion for unbinding in our calculation is a matter of choice beyond 12 Å as the ratio of *k*_{on} and *k*_{off} will remain unchanged. All results we reported in Table II uses *r* = 16 Å as the distance criterion for unbinding.

The effect of diffusion in the *k*_{on} values obtained from our work is worth consideration. We used an analytical model^{95} to account for the effect of diffusion in the *k*_{on} values. The method is described in detail in the Appendix. The diffusion dependent arrival rate $kondiff$ on milestone *r* = 16 Å is two orders of magnitude smaller than the *k*_{on} values computed for transition from *r* = 16 Å to *r* = 6 Å. It indicates that the arrival by diffusion is the rate determining step in ligand binding to FKBP. So, we directly reported the $kondiff$ in Table II and did not try to combine these values with previously calculated *k*_{on} numbers. The $kondiff$ values such obtained agree very well with the results obtained by Pan *et al.*^{60} who, in their simulation, allowed the ligand to diffuse around in the solvent after a dissociation event and included the time period of ligand diffusion into the *k*_{on} calculation. We also calculated the diffusion corrected binding affinity, $KDdiff$, as the ratio $koff/kondiff$ (Table II). They are within an order of magnitude agreement with the *K*_{D} values obtained by long equilibrium simulation,^{60} although this number is far from the experimental value. The average binding affinity from diffusion corrected results (Δ*G*°^{,diff}) is 1.8 ± 0.2*k*_{B}*T* (Table II), which is within 2*k*_{B}*T* of the results of Pan *et al.*^{60} and within 1*k*_{B}*T* of the results of Pramanik *et al.*^{2} and the well tempered metadynamics results from our current work.

If the effect of diffusion dependent arrival time is not included in the calculation, the *k*_{on} we obtain leads to a *K*_{D} that is consistent with the experimental value, the binding free energy computed by Huang *et al.*, and the long distance limit of the WEM-RR free energy profile (see Fig. 3). However, if the effect of diffusion is included (our calculations suggest that the binding process is diffusion dominant), our calculated $kondiff$ leads to the $KDdiff$ results in Table II. The resulting Δ*G*°^{,diff} obtained using Eq. (11) is in quantitative agreement with our own metadynamics result and the results of Pan *et al.* and Pramanik *et al.* This Δ*G* of binding is however different by 5–6*k*_{B}*T* from the experimental value of binding affinity and the long distance limit of the free energy profile obtained from WEM-RR simulation (Fig. 3). The values of *k*_{off} and ⟨*τ*⟩_{off} are similar to the previous MD simulation studies with different force fields. So, any discrepancies are likely the result of the *k*_{on} numbers. Our hypotheses behind this discrepancy stem from the following realization. In the fluorescence quenching experiment,^{58} the binding affinity is estimated using the following equation:

where *δF* is the change of Trp59 fluorescence due to ligand induced quenching and [*L*] is the ligand concentration. This equation is only valid when the ligand concentration is much higher than the protein concentration.^{96} At such high concentration, the FKBP protein will be surrounded by many ligands and the binding kinetics will be dominated by ligands situated close to the binding pocket. The effect of diffusion-dependent arrival from infinite distance may not be relevant in the context of the experiment. In the simulation work by Huang and Caflisch, the trajectories were terminated after each ligand unbinding event, preventing the ligand to diffuse around in the solvent.^{59} Consequently, their binding free energy results agree with the experimental values^{58} but differ from the results of extensive equilibrium MD^{60} and metadynamics simulations^{2} where ligand diffusion was taken into account. Our WEM and WEM-RR calculations also suffered from this problem, so the correction for the diffusion effect was necessary.

The committor probability of ligand dissociation for trajectories starting at different milestones was calculated using the procedure described in Sec. II A. The results for the regular WEM and WEM-RR method [Fig. 5(a)] show very different absolute values but a common trend. The committor probability to reach the unbound state is near zero up to *r* = 8 Å, where it then shows a sharp increase until *r* = 12 Å, after which it becomes linear. In the theoretical limit of a sharp *δ*-function barrier, the committor follows a step function dependence, switching its value from zero to one at the location of the barrier. Conversely, a linearly increasing committor function is the signature of a flat free energy surface. The results in Fig. 5(a) are consistent with the free energy profile in Figs. 3(a) and 3(b). There is a sharp increase in *G* at 6 Å < *r* < 12 Å followed by an essentially flat landscape. Because of the difference in the depth of the free energy minimum between WEM and WEM-RR calculation, the absolute values of the committor probability are different. As we identify *r* = 12 Å to be the beginning of the diffusive regime, we chose *r* = 16 Å as our distance criterion for unbinding.

We also plotted the committors considering *r* = 16 Å as the unbound state [Fig. 5(b)], where BUT is assumed to be transitioning to the bulk solvent. In the bulk solvent, BUT is subject to simple diffusion processes, where the free energy surface is relatively flat. The committor to *r* = 16 Å, therefore, models the transition from protein-mediated dynamics to bulk-solvent dynamics. Now, the committor distribution function of both WEM and WEM-RR methods agrees with each other. Both indicate that the transition state for unbinding (with C = 0.5) should be between milestones *r* = 10 Å and *r* = 12 Å.

In Table III, we showed the computational cost of the WEM simulation with harmonic restraint release. The values reported exclude the first ten iterations for each starting point generation. They contributed about 10 × 10 × 2 ps = 200 ps, which is negligible compared to the total simulation time. Apparently, converged results from WEM simulation can be obtained using a total simulation time of just ∼10⟨*τ*⟩_{off} (Tables II and III). The convergence plots for the matrix elements of **K** are shown in the supplementary material. On the average, only one or two binding/unbinding events could be observed from a continuous equilibrium MD simulation of that length for the same system.^{60}

. | Total simulation . | r = 6 Å
. | r = 6 Å molecular
. |
---|---|---|---|

Method . | time (ns) . | milestone (ns) . | time (ns) . |

WEM-RR trial 1 | 56.64 | 16.97 | 0.54 |

WEM-RR trial 2 | 63.80 | 21.57 | 0.69 |

WEM-RR trial 3 | 83.15 | 26.32 | 0.80 |

. | Total simulation . | r = 6 Å
. | r = 6 Å molecular
. |
---|---|---|---|

Method . | time (ns) . | milestone (ns) . | time (ns) . |

WEM-RR trial 1 | 56.64 | 16.97 | 0.54 |

WEM-RR trial 2 | 63.80 | 21.57 | 0.69 |

WEM-RR trial 3 | 83.15 | 26.32 | 0.80 |

In Table III, we also present the total simulation time of the bound state (*r* = 6 Å) as it requires the highest amount of total computational time to converge transition probabilities. A longer simulation time is often required for convergence for a milestone buried in deep free energy minimum, as discussed in Ref. 11.

In addition, the individual molecular time (i.e., the time describing continuous phase space evolution as defined in Ref. 97) needed to converge the results for the bound state milestone is less than 1 ns (Table III). It indicates that in the presence of abundant computational resources, when we can parallelize each weighted ensemble trajectory for each milestone into a different core, we need to spend a wall clock time worth less than 1 ns to perform the entire WEM simulation. WEM, thus, provides a significant reduction of computational cost without applying any biasing force.

The distribution of the ligand around the FKBP protein for different milestones has been depicted in Fig. 6. It indicates that the WEM simulation samples a large part of the configurational space around the binding site, potentially including multiple binding and unbinding pathways. Various ligand poses of BUT in the pocket and in the release site of the FKBP protein could also be sampled from WEM simulations pertaining to different milestones.

We clustered the structures associated with the milestone *r* = 6 Å (bound state) in the WEM-RR simulation using the root mean squared difference of inter-atomic distances to nearby residues computed with the UCSF Chimera package,^{98} which implements the method proposed in Ref. 99. The ligand poses are shown in Fig. 7. BUT is observed to be interacting with the Asp37 side-chain, Ile56 backbone, and Tyr82 side-chain through hydrogen bonding. The ligand poses are in agreement with previous computational studies.^{20,59} This process can be repeated for other milestones near the binding pocket to search for additional meta-stable states.

## V. CONCLUDING DISCUSSION

In summary, we applied and refined our combined weighted ensemble milestoning (WEM) scheme to study the binding and unbinding dynamics of protein–ligand systems. We studied a simple model of a Na^{+}/Cl^{−} ion pair and a realistic model of BUT ligand bound to FKBP protein. The regular WEM method, proposed in our original publication,^{11} could calculate the residence time, binding and unbinding rate constants, binding affinity, and free energy for the NaCl system with reasonable accuracy. However, the results for the FKBP–BUT system are far from accurate and call for serious modifications in our methodology. Particularly, the residence time is often 2–3 orders of magnitude different from the results obtained by conventional MD simulation and literature values, while the binding free energy is incorrect by several *k*_{B}*T* units. The large variance of the results across multiple WEM runs is also a cause of concern.

Here, we introduced a modification and extension of the WEM procedure to address these issues. Particularly, the regular WEM procedure was limited by the absence of multiple starting structures per milestone and led to a strong dependence of the calculated values of the observables on the initial conditions. One possible solution could be to sample multiple structures from an umbrella sampling simulation with a progress variable confined to the hyperplane of the milestone and use them as starting points for multiple WE simulations. Indeed, such an approach is followed in some of the recent milestoning-based host–guest binding studies.^{48,49} However, in the weighted ensemble method, a small number of trajectories are split into a very large number of trajectories with smaller weights. So, this umbrella sampling based approach would increase the computational cost for WEM simulation to a large extent. Previous studies showed that the majority (∼80%) of the total simulation time for milestoning simulations is spent in sampling initial structures on the milestone interface.^{44,48,49} This was necessary because in regular milestoning, the number of transition events sampled per milestone is, at most, equal to the number of starting trajectories. Because of the capability of the WE method to spawn new trajectories on the fly, it is possible to obtain hundreds of transition events even from a single initial starting structure per milestone.^{11} However, as we realized that using only one starting point per milestone results in large variance in computed observables, we generated ten different structures for each milestone. We achieved that by modifying the WEM scheme by incorporating a harmonic restraint on the reaction coordinate for a few initial iterations of the weighted ensemble. It resulted in additional initial conditions from which unbiased WE simulation could be started. This approach, which we refer to as WEM restraint release (WEM-RR), involves a very simple workflow, only requiring the stopping and restarting of the WE simulation after the first few iterations and discarding the initial iterations from the analysis (so that the effect of the restraint is removed). However, the number of iterations required to generate the initial coordinates can vary from system to system, depending on the presence of slow orthogonal degrees of freedom.

We showed that the WEM-RR method can be used to accurately calculate not only the residence time and binding free energy but also the binding and unbinding rate constants and the binding affinity. The entire binding coordinate can be simultaneously sampled due to both the fine-grained parallelization of the WE procedure and the statistical independence of the milestones. The binding free energy for the protein–ligand system is overestimated because of limited sampling of the milestone hypersurface (Fig. 8) even when the ligand is far from the binding pocket. This ignores the possibility that the ligand can approach the protein or explore regions of 3D space at any solid angle when it is not bound. This effect is clear when we compare the PMF computed from WEM or WEM-RR simulations (Fig. 3) to the one computed using metadynamics (see the supplementary material). The metadynamics PMF decreases to a lower value at a larger ligand–receptor distance because of the additional sampling of ligand configurations in the solvent. Consequently, while calculating the binding time, WEM runs were also unable to take into account the time spent by the ligand diffusing in the solvent before arriving near the protein. This prompted the need to include this diffusion effect into our methodology. We proposed an approximate analytical method, which utilizes the WEM transition kernel, to treat the effect of diffusion in the calculation of binding time and rate constant. This technique could successfully reproduce the *k*_{on} and Δ*G*^{0} values consistent with extensive MD simulation studies from the literature and the enhanced sampling simulation independently performed herein.

Our WEM procedure for the FKBP–BUT binding process required a net CPU time of less than 100 ns simulation. The fact that such a short simulation length produces kinetic results similar to microsecond simulations highlights the ability of the WEM procedure to rapidly evaluate a protein–ligand binding–unbinding process. With sufficient computational resources, the simulations for each milestone can be performed in parallel, providing significant gain in the wall clock time. Yet, the comparison of the simulation time with the literature should be performed with caution. Except for the work by Pan *et al.*,^{60} all previous studies have calculated only one or two properties out of a list containing the binding affinity, dissociation constant, and binding and unbinding kinetic constants. For example, Dickson and Lotz could compute the ligand residence time of the same system from 240 ns of WExplore simulation.^{20} However, they could not report binding kinetics or binding free energy. One of the advantages of WEM (and milestoning based methods, in general) is that the kinetics of *both forward and backward* processes can be computed from one set of simulation data, leading to the availability of both the binding and unbinding rate constants and the binding affinity. We therefore compared our simulation time only to the extensive equilibrium MD study by Pan *et al.*

For both regular milestoning and weighted ensemble based methods, the gain in computational efficiency is relatively higher for complex systems involving longer timescales (*μ*s–ms), compared to systems with fast kinetics (ps–ns). Consequently, we believe that the WEM method will also be highly efficient for simulating the binding–unbinding dynamics of more potent (low micromolar to nanomolar) ligands with longer residence time.

This higher efficiency comes at the cost of complexity of the simulation protocol, involving an initial biased/unbiased MD to generate milestone anchors, performing WE simulations starting from those anchors and monitoring the trajectories for crossing of milestones. However, due to the user friendly implementation of the weighted ensemble method through the WESTPA^{27} package, the overall complexity of the method is equivalent to that of regular milestoning simulation. We also share, with this paper, the codes of WEM and WEM-RR (see the data availability statement). The implementation is very generic, requiring only minor modifications for switching to a completely different system, and also includes scripts to automate most of the processes described in this work.

In the study of ligand–receptor dynamics, the choice of reaction coordinate (RC) is often a crucial factor. Our current methodology does not directly facilitate the choice of RC as it has to be picked beforehand. We used a single geometric distance based RC in our current study that could sufficiently reproduce the experimental observables from the literature. However, for more complex systems, additional degrees of freedom, such as solvation, protein conformational change, and loop motion, can play an important role in ligand binding or unbinding, and their effect should be taken into consideration.

Although it is theoretically possible to generalize the WEM protocol into higher dimension to include these collective variables, the number of milestones scales exponentially with the number of dimensions.^{46} So, it will lead to a significant increase in the complexity of the methodology, reducing its applicability in all practical purpose. We plan to address this issue in future work, by developing an easier protocol to include additional collective variables alongside the milestoning coordinate. Moreover, once one- or multi-dimensional milestones are defined, enhanced sampling protocols inside the milestones can be used to accelerate relaxation of slow degrees of freedom perpendicular to the reaction coordinate manifold.

Other possible directions for further development of the current methodology are to perform simplified, Brownian dynamics (BD) simulation for distant milestones.^{48,49} This use of simplified dynamics for milestones far away from the binding site will further reduce the overall computational cost and increase the accuracy of *k*_{on} and related properties. Importantly, this will also allow one to include the position dependence of the ligand diffusion constant using, e.g., a Rotne–Prager diffusion tensor.^{100} We plan to include this improvement into WEM in future work.

Overall, the weighted ensemble milestoning (WEM) method can be used as a computationally inexpensive yet reliable tool to study kinetics and thermodynamics of molecular processes using all-atom MD simulation. The accuracy of the results is primarily limited by the choice of force field and the reaction coordinate, although an apparently simple reaction coordinate (the center of mass distance between the protein and the ligand) could provide results in agreement with experimental and simulation values. We believe that the WEM method, with its efficiency inherited from the weighted ensemble and its ability to calculate free energy and kinetics derived from milestoning, has the potential to become an essential tool for high level screening of protein inhibitors and to play a key role in computational drug design.

## SUPPLEMENTARY MATERIAL

See the supplementary material for the justification of milestone placing, the details of metadynamics simulation, the convergence plots of *K*_{D}, the sampled starting structures for WEM-RR simulation, and the convergence plots of the elements of transition kernel for both the systems.

## ACKNOWLEDGMENTS

The authors thank Alex Dickson for providing the BUT ligand parameters. D.R. thanks Ron Elber for a stimulating discussion and also for suggesting the idea to use the velocity auto-correlation function to determine the appropriate spacing between milestones. I.A. acknowledges the support of the National Science Foundation (NSF) via Grant No. MCB 2028443. D.R. was supported partially by the Molecular Science Software Institute seed fellowship funded by NSF via Grant No. OAC-1547580. The work has benefited from the computational resources of the UC Irvine High Performance Computing (HPC) cluster and San Diego Supercomputer Center (SDSC).

The authors declare the following competing financial interest: D.L.M. is a current member of the Scientific Advisory Board of OpenEye Scientific Software and an Open Science Fellow with Silicon Therapeutics.

## DATA AVAILABILITY

The software codes of the WEM implementation, input files, and analysis scripts specific to this work are available from the following github repository: https://github.com/dhimanray/WEM-Ligand.git. The code for calculating the surface coverage using the Monte Carlo method (see the Appendix) is available from https://github.com/trevorgokey/misc/tree/master/sphere_SA_population. The data that support the findings of this study are available within the article and its supplementary material. Additional data files are available from the corresponding author upon reasonable request.

### APPENDIX: DIFFUSION EFFECT ON *k*_{on}

The inward flux *k*(*r*) of the ligand at a spherical surface of radius *r* is given by^{95}

BUT trajectories arriving at all points of the sphere are not equally likely to propagate to the bound state. So, we used a Monte Carlo based estimator to calculate the fraction of surface area of the sphere of radius *r*, explored by the ligand while being at milestone corresponding to radial distance *r* (Fig. 8). Let us say this fraction is *α*. Some of the trajectories at that milestone will possibly propagate toward the unbound state. The probability of traveling toward the binding pocket was estimated from the row-normalized milestoning transition matrix element *K*_{ii−1}. Including these two factors, the flux of binding trajectories through milestone at distance *r* is

In order to estimate the $kondiff$ due to diffusion in M^{−1} s^{−1} unit, considering that the milestone at *r* is the binding surface, we need to multiply with Avogadro’s number *N*_{av},

Hydrodynamics studies showed that at room temperature, the diffusion constant of small organic molecules is around 1 × 10^{5} cm^{2} s^{−1} in water.^{101} Using this value for *D* and after proper unit conversion, the $kondiff$, in M^{−1} s^{−1} unit, is given by

where *r* is measured in Å.

## REFERENCES

_{on}using direct simulation of protein-protein association with flexible molecular models

^{+}/Cl

^{−}, methane/benzene, and K

^{+}/18-Crown-6 ether