Empirical force fields employed in molecular dynamics simulations of complex systems are often optimized to reproduce experimentally determined structural and thermodynamic properties. In contrast, experimental knowledge about the interconversion rates between metastable states in such systems is hardly ever incorporated in a force field due to a lack of an efficient approach. Here, we introduce such a framework based on the relationship between dynamical observables, such as rate constants, and the underlying molecular model parameters using the statistical mechanics of trajectories. Given a prior ensemble of molecular dynamics trajectories produced with imperfect force field parameters, the approach allows for the optimal adaption of these parameters such that the imposed constraint of equally predicted and experimental rate constant is obeyed. To do so, the method combines the continuum path ensemble maximum caliber approach with path reweighting methods for stochastic dynamics. When multiple solutions are found, the method selects automatically the combination that corresponds to the smallest perturbation of the entire path ensemble, as required by the maximum entropy principle. To show the validity of the approach, we illustrate the method on simple test systems undergoing rare event dynamics. Next to simple 2D potentials, we explore particle models representing molecular isomerization reactions and protein–ligand unbinding. Besides optimal interaction parameters, the methodology gives physical insights into what parts of the model are most sensitive to the kinetics. We discuss the generality and broad implications of the methodology.
I. INTRODUCTION
Molecular dynamics (MD) is a powerful tool that can, in principle, rigorously fully characterize molecular structures, thermodynamics, and kinetics. This scope is delimited by the accuracy of the atomistic molecular dynamics force field. Coarse-graining quantum mechanical molecular interactions in force fields enables modeling many-body molecular systems; yet, this process transfers errors associated with the model selection bias and parameterization based on sparse experimental datasets,1,2 such as vaporization data for small fragments.1,3 Some of these limitations are lifted by modern approaches employing open science and machine learning to coarse-grain electronic degrees of freedom4–7 or by constraining to experimental nuclear-magnetic resonance (NMR) data.8–10
Current force fields can reach experimental accuracy in determining highly populated structures and their thermodynamics,11 but can differ vastly in predictions of kinetic properties.2,12 This systematic error is caused by the fact that during force field parameterization, ensemble averages are very sensitive to the free energy differences of the metastable states emerging from the potential energy function, but not so much to the free energies of the barriers between these states. Thus, by parameterizing against thermodynamic ensemble averages, it is unlikely to obtain kinetically accurate force fields. Several systematic approaches to optimize force field parameters aiming at reproducing thermodynamics have been developed throughout the years, e.g., force field parameterization for large biomolecules against solution NMR data13–15 or against distances between labeled sites.16 However, so far, there is no systematic strategy to optimize the force field parameters for the underlying kinetics such that an improved force field would match the target (experimental) interconversion rates directly, although procedures for model optimization for dynamical trajectories have been recently proposed.17–19 Such force fields, capable of representing experimental kinetics, would have a large impact on molecular dynamics since they would accurately report on transition state structures and populations of processes that are impossible to resolve by experiments,20 thus offering a leverage in, for instance, protein regulation and design (e.g., by mutations) or design of materials with specified kinetic properties.
In this work, we explore an efficient way to infer the relationship between molecular (force field) model parameters and kinetic data using only prior ensembles of reference trajectories and employing techniques to reweight these paths. (Note that in this work, trajectory and path refer to the same concept: an ordered sequence of states representing the dynamical evolution of a system.) This avoids the computationally expensive step of recomputing the (rare event) rate constant and moving randomly in the high-dimensional force field parameter space (e.g., a typical protein force field contains 100’s of parameters). Recently, Donati et al. have explored such path reweighting techniques, which explicitly compute the change in path action based on a force field perturbation.21–23 We combine this path reweighting technique with the continuum path ensemble maximum caliber (CoPE-MaxCal) approach20,24 in order to impose the dynamical constraint and at the same time select the best solution among multiple sets of parameters that all give the correct kinetics. This selection thus corresponds to a minimal perturbation, i.e., the change in the force field that causes the smallest possible perturbation to the entire path ensemble, while still obeying the constraint.20 As the computation of the reference path ensemble (at the current force field) and the rate constant by direct MD is inefficient, we employ trajectory sampling methodology, in particular transition path sampling (TPS),25,26 and its descendent single replica transition interface sampling (SRTIS) to efficiently obtain path ensembles.27 Our method is applicable to any rare event method that gives access to the reference path ensemble (e.g., Refs. 28 and 29).
The aim of this paper is to lay the theoretical foundations for the above-mentioned framework and to illustrate the proof of principle of our methodology on several simple particle-based systems undergoing stochastic rare event dynamics. The remainder of this paper is organized as follows. In Sec. II, we introduce and develop the reweighted estimators for the CoPE-MaxCal approach and show how these equations can be used to optimize the underlying force field parameters. In Sec. III, we describe the model systems and give simulation details. We discuss the results of the optimization procedure in Sec. IV. We end with conclusions in which we give an outlook on future directions.
II. THEORY
A. Continuum path ensemble maximum caliber
B. The reweighted path Lagrange function
C. Derivatives of the Lagrange function
To find the optimal parameters a of the potential energy function for a single constraint, we determine the stationary point of the Lagrange function, i.e., we solve for all a1, …am, leading to m + 1 constraint equations.
Since the path expected values in Eqs. (11) and (13) can be estimated from a set of trajectories generated at the reference potential V0(x), these two equations provide the basis for optimizing the parameters of the modified potential V(x, a). So far, the logarithm of the relative path probability w(x, a) = lnW[x, a] and the path observable s(x) have been abstract functions. Next, we derive explicit expressions for constraining the rate constant and expressions for w and w′ for paths generated by the EM integrator.
D. The rate constant estimate
In the constraint term, the path ensemble average is calculated using the reweighted path probability (indicated by subscript W) over all trajectories that leave A (indicated by subscript A). The effect of s[x], as defined above, is to select the reactive AB paths, i.e., those that reach B, from the set of all transition paths. With this choice of s[x], Eq. (12) becomes a reweighted path ensemble average over the set of reactive AB trajectories. Thus, we can write , where subscript AB indicates that the path ensemble average is calculated with the path probability over all trajectories that leave A. Note the similarities to the temperature derivative of the rate constant in, e.g., Refs. 43 and 44.
The term between brackets in Eq. (15) denotes the derivative of the rate constant with respect to the force field parameters and can serve as a first sanity check whether the reweighting approach actually works.
E. Path reweighting using the adapted force field
III. SIMULATION METHODS
A. Triatomic system
Sampling 10 000 cycles with SRTIS,27 where each cycle consisted of 100 shots, 100 interface exchanges, and 100 state swaps, resulted in a path ensemble for each interface. The measured crossing histograms were joined with the weighted histogram analysis method (WHAM),46 which together with the effective positive flux through the first interface27,39,41 leads to rate constant estimates over the barrier. The WHAM procedure also allowed us to assign a weight to each trajectory in the path ensemble, yielding the RPE. Each of these RPE trajectories can be evaluated in terms of the path action and its derivatives.
B. Dissociation model
The switching at the minimum of the potential is done to avoid numerical problems with the rather steep repulsive part of the potential arising for the high values of ϵ1 required to bind the particle. Therefore, we chose to keep the repulsive part of the potential equal to the standard WCA potential, i.e., not scale with ϵ. In this way, only when the particles are in the attractive part of the potential, they contribute to the path action derivatives, making the evaluation of the path action more robust. Note that this does not change the generality of our approach.
Subsequently, we allow the other particles to move as well. To keep the cluster together, we apply an additional potential that binds the non-red particles to the central blue particle by an additional strong LJ interaction with ϵ3 = 20. While this keeps the cluster intact, rearrangements are still possible. We avoid these by imposing an additional weak harmonic spring between neighboring particles with a spring constant k = 1 (this, of course, excludes the red particle).
The order parameter λ used for SRTIS is the center-to-center distance r between the red and blue particles. Stable states were defined as r < 21/6 and r > 2.5 for the initial and final states, respectively. (Note that beyond r = 2.5, the ligand cannot yet be considered escaped to the bulk. While it is possible to take this into account, see, e.g., Refs. 47 and 48, we assume here for simplicity that the ligand is dissociated.) Interfaces were positioned at λ = 0.15, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.50, 0.60, 0.70, 0.80, 1.00 with respect to the minimum distance rA = 21/6. We integrate the equations of motion using the EM integrator with a time step of dt = 0.001 and a friction of 2.5.
Finally, in order to show that our methodology easily extends to 3D systems, we consider a 3D version of the LJ-model, now considering 13 LJ particles as depicted in Fig. 3. Here, four outer green particles bind the ligand with an attraction ϵ2.
IV. RESULTS AND DISCUSSION
We illustrate our methodology and system setup for several models, including a bistable potential (see the supplementary material). Here, we explore the 2D molecular isomerization reaction from Fig. 1 and the particle dissociation from a 2D cluster of seven Lennard-Jones (LJ) particles shown in Fig. 2, which can be viewed as analogous to the protein–ligand unbinding, an important biophysical process49,50 or dissociation for small nano-clusters.51–56 In addition, we present and discuss 3D LJ particle systems, which are remarkably similar to the 2D LJ particle systems, showing the robustness of the results.
A. Triatom isomerization
1. Path ensembles
For the triatom model (see Sec. III), we perform SRTIS with in total 106 shooting and replica exchange moves. Acceptance ratios for the shooting move ranges from 0.4 for the first interface to 0.15 for the last interface. Replica exchange moves were accepted around 50%. Path lengths vary from 50 time steps for the first interface to a few thousand for the last interface. From this reference path ensemble, we estimate the rate constant for the unmodified potential as lnkAB ≈ − 9. (See the supplementary material for more details.)
Before embarking on the full problem of model parameter optimization, we first checked the path reweighting method by computing the derivative of the rate constant, i.e., the term in brackets in Eq. (15). In the supplementary material, we show that the rate constant derivatives are well-predicted by our path reweighting approach for both the triatom and a simpler diatomic system.
2. Optimization of the Lagrange function
Now, we can turn to the optimization of the molecular (force field) model parameters. The trimer system has two parameters that can both be tweaked to reproduce the imposed rate. To optimize the Lagrange function [Eq. (9)], we have to be able to compute it as a function of parameters not only at zero perturbation but also for the reweighted path ensemble W[x; a]. Moreover, we need to be able to take its derivatives at nonzero perturbation.
For a particular set of values of a = 20 and req = 1.5, we computed the path ensemble using SRTIS. In Fig. 4, we show DKL and log rate constant lnk[a, r;Δa, Δr] on top of each other as a function of Δa and Δr. As expected, DKL is minimal for the reference value Δa = 0, Δr = 0, that is, at zero perturbation.
The reweighted rate constant is identical to the predicted rate constant from the prior path ensemble for Δa = 0, Δr = 0, but clearly varies if the system is perturbed.
Note that the rate contour lines in Fig. 4 are roughly parallel to the Δreq axis, implying that, for a given spring constant a + Δa, the rate is rather insensitive to the choice of equilibrium distance. At the transition state, the bond between atoms 1 and 2 is quite stretched, whereas the other two bonds are compressed, which explains why the spring constant has greater influence on the rate than the equilibrium distance. In addition, DKL is less sensitive to Δa than to Δreq, indicating that it is probably better to adjust a than req.
When we change the rate constant from the observed value of lnkAB = −9 to the new value lnkexp, we need to compute the derivatives and optimize the Lagrange function. To do so, we apply the method of Ref. 57. This iterative method starts at certain initial values and slowly converges to the solution. The solution for the most optimal set of {Δa, Δr} is shown in Table I. In Table I, we show several sets of optimal solutions as a function of imposed . The first set shows the optimal solution for both Δa and Δr. The other two sets we optimized for only one of the two parameters, and set the other to zero. From the value of DKL (which is identical to the negative of the Lagrange function when the constraints are obeyed), it is clear that this is always less optimal than the two parameter solution, thus illustrating the need for multi-parameter force field optimization. In addition, note that the further the imposed is from −9, the more and, thus, DKL deviate.
. | μ . | Δa . | Δr . | DKL . |
---|---|---|---|---|
−8 | −0.162 56 | −3.034 52 | −0.002 90 | 0.083 18 |
−12 | 0.275 89 | 8.439 88 | 0.014 61 | 0.454 86 |
−16 | 0.303 01 | 19.057 9 | 0.044 08 | 1.685 25 |
−20 | 0.280 86 | 29.106 6 | 0.066 00 | 2.879 69 |
−8 | −0.164 47 | −3.064 28 | 0 | 0.084 08 |
−12 | 0.317 65 | 8.842 44 | 0 | 0.489 06 |
−16 | 1.000 39 | 19.867 5 | 0 | 2.981 42 |
−20 | 0.572 78 | 28.773 4 | 0 | 6.958 17 |
−12 | 0.587 05 | 0 | 0.258 99 | 4.181 48 |
−16 | 0.286 30 | 0 | 0.385 95 | 5.617 00 |
−20 | 0.547 23 | 0 | 0.490 97 | 6.965 69 |
. | μ . | Δa . | Δr . | DKL . |
---|---|---|---|---|
−8 | −0.162 56 | −3.034 52 | −0.002 90 | 0.083 18 |
−12 | 0.275 89 | 8.439 88 | 0.014 61 | 0.454 86 |
−16 | 0.303 01 | 19.057 9 | 0.044 08 | 1.685 25 |
−20 | 0.280 86 | 29.106 6 | 0.066 00 | 2.879 69 |
−8 | −0.164 47 | −3.064 28 | 0 | 0.084 08 |
−12 | 0.317 65 | 8.842 44 | 0 | 0.489 06 |
−16 | 1.000 39 | 19.867 5 | 0 | 2.981 42 |
−20 | 0.572 78 | 28.773 4 | 0 | 6.958 17 |
−12 | 0.587 05 | 0 | 0.258 99 | 4.181 48 |
−16 | 0.286 30 | 0 | 0.385 95 | 5.617 00 |
−20 | 0.547 23 | 0 | 0.490 97 | 6.965 69 |
B. Dissociation from a LJ cluster
Having shown that we can apply our framework to model systems for molecular (isomerization) reactions, we explore in this section a slightly more elaborate system, which also has interesting physical properties, namely, particle dissociation from a cluster of LJ particles. Such a process is reminiscent of ligand unbinding, which is an important problem in biophysics.49,50 Moreover, it can be seen as dissociation for small nano-clusters.51–56
We describe this dissociation transition using a model of Lennard-Jones (LJ)-like particles. We first consider a seven particle setup in two dimensions, indicated in Fig. 2. In the simplest instance of this model, all particles are kept fixed except the red particle, which can dissociate from the cluster and diffuse away.
Here, the only important interactions are that of the red particle with the six other particles. In the second model, we allow the other particles to move as well, resulting in a fluctuating cluster of six particles, which can expel the red particle. During the dissociation, the green particles can move closer to each other, gaining in entropy. We can interpret this simple model as representing a ligand unbinding reaction, e.g., of a protein, in which the protein binding pocket slightly rearranges upon (un)binding. We stress that this is just an analogy.
Finally, in order to show that our methodology easily extends to 3D systems, we consider a 3D version of the model with 13 LJ particles.
1. Path ensembles
We first start exploring the fixed seven-particle model. Setting the interaction with the central particle to ϵ1 = 10 and with the green outer particle to ϵ2 = 1, we sample unbinding transitions using SRTIS (see Sec. III).
In total, we perform 105 shooting and replica exchange moves. Acceptance ratios for the shooting move ranges from 0.4 for the first interface to 0.15 for the last interface. Replica exchange moves were accepted around 50%. Path lengths vary from 50 time steps for the first interface to a few thousand for the last interface. The crossing probability of the final interface (obtained from WHAM) is lnP(λB|λ1) = −6.567. The flux of the first interface is ϕ = 0.003 550. The total rate constant is thus kAB = 5 × 10−6 in units of time steps. Note that, when optimizing the rate constant, we assume that the fluxes are not altered much (as above), and we only have to consider the crossing probability.
We also performed similar runs for the flexible 2D cluster.
2. Caliber/DKL and rate constant predictions
Our framework then gives the rate constant predictions for the altered parameters, as well as the caliber or DKL. Figure 5 represents both predictions as contour plots. The green contours delimit the (log of the) dissociation rate constant (in fact, the crossing probability) predictions, while the blue-ochre contours depict DKL. Several conclusions follow from Fig. 5. The first is that for the fixed cluster, the DKL contours show some anti-correlation in the two parameters. This indicates that the path ensemble is least disturbed when an increase in ϵ2 is compensated by a decrease of ϵ1. The DKL contours also are slightly asymmetric, showing a larger sensitivity to negative values of ϵ2. As this parameter is set to the relatively low value of ϵ2 = 1, reduction below Δϵ2 < −1 will, therefore, reverse the sign of the attractive interaction, which, of course, completely alters the systems. The flexible cluster in Fig. 5(b) shows no such anti-correlation in DKL, indicating that the compensating effect has largely disappeared.
The second observation is that the green rate constant contours are roughly linear with a negative slope. This indicates that both increasing ϵ1 and ϵ2 have similar effects for a large variety of values. For a similar decrease in the rate constant, one could choose to increase either ϵ1 or ϵ2. Note that the slope of the contour is roughly −0.5, as there are two outer particles (green) and only one central particle, so changing their interaction strengths ϵ2 has, therefore, twice the effect. This observation indicates that ϵ1 and ϵ2 are more or less interchangeable, and one can choose many combinations for arriving at the same rate constant.
3. Optimization of the Lagrange function
Next, we would like to find the optimal choice for the force field parameters ϵ1,2. As before, optimizing the parameters for a given rate constant amounts to following a green rate constant contour until DKL is minimal. Using the optimization procedure outlined in Sec. IV A 2, we arrive at a prediction given by the red points in Fig. 5. This prediction corresponds, thus, to the most optimal force field parameters, which minimizes the change in the path ensemble with respect to the original force field.
In all cases in Fig. 5, a trivial observation is that the curve passes through the origin, as there, the original force field reproduces the original rate constant most optimally. For the fixed cluster in Fig. 5(a), the optimal curve is roughly vertical, with only a relatively small deviation in ϵ1. This means that that whether enhancing or reducing the rate, it is always better to change ϵ2 rather than ϵ1. Remarkably, this trend even holds for Δϵ2 < −1, which changes the interaction from attraction to repulsion. This is likely a consequence of the immobility of the particles in this case.
For the flexible cluster cases in Figs. 5(b) and 5(c), the most striking feature is perhaps the L-shaped curve, indicating an asymmetry concerning reducing the rate or enhancing the rate. For a rate change of about one natural log unit, the red curve is roughly diagonal, that is, ϵ1,2 are contributing to the rate constant in equal proportions. However, when reducing the rate constant further, the curve bends over more vertically, indicating that it is better to change ϵ2 instead of ϵ1. In contrast, when increasing the rate, the curve bends over horizontally, indicating that it is now better to change ϵ1 instead of ϵ2. This conclusion also holds for Fig. 5(c), where ϵ1 = 15 and ϵ2 = 5.
We can interpret this behavior as follows. The best parameters are those that perturb the path ensemble as little as possible. When reducing the rate constant, it is better to adjust the ϵ2 parameter than the ϵ1 parameter, even if they both can lead to the same rate constant predictions. This can be interpreted by realizing that changing the central particle interaction ϵ1 will alter the entire reweighted path ensemble: all interface ensembles will be affected. In contrast, the change of ϵ2 will mostly affect only the interface ensembles further out. Since distant interfaces have a (much) lower weight in the ensemble, the perturbation, as measured by DKL/caliber will be smaller. In contrast, for increasing the rate, changing ϵ2 will not get you very far, and a substantial change of ϵ1 is also necessary. These effects can be shown by plotting the total crossing probability P(λ|λ0) for the different settings in Fig. 6. For the curves for Δϵ1 = 4 and Δϵ2 = 2, the final rate constant predictions (crossing probs) are almost equal, but the intermediate crossing probability and, hence, the path ensembles are very different. Clearly, the Δϵ2 = 2 case is much closer to the original dataset (black solid line), especially in the beginning of the crossing probability where the path ensemble is most dominant.
4. Comparison with rate constant predictions
While the prediction of the optimal parameters is already providing valuable insight, the ultimate goal is to establish a better force field model. To assess the quality of the predictions, we can compare the predicted rate constants with independent calculations at these different force field parameters. Figure 7 shows this comparison for the flexible 2D unbinding. The agreement is good, especially up to two kBT from the reference point (ϵ1 = 12, ϵ2 = 2.0).
Note that Fig. 7 compares just the reweighted crossing probability part of the rate to validate the rate prediction of our method. Thus, in this comparison, we have not included possible differences in flux factor. In this particular example, we verified that this difference is relatively small (≲20%), which would translate in a small correction on the logarithmic scale of less than 0.18, but in general, we encourage to recalculate the flux factor through the first interface. This is usually not the most expensive part.
5. Physical insight from the optimization
These results also reveal several physical aspects: While the central particle is more important for increasing the rate, outer particles are more important for decreasing the rate, as the entire path ensemble is mostly influenced by the central particle’s interaction change, whereas outer particles only affect the barrier region. Outer particles, therefore, are prime targets for modulating when reducing the dissociation rate constants. Translating to real proteins this amounts to engineering mutations or post-translational modifications58 of binding pocket residues close to the surface or modulation of the ligand chemistry in order to bind better to encounter complex sites at the surface. Of course, this extrapolation to realistic systems is currently no more than a hypothesis, which requires further testing. Yet, the general principle is likely to be robust.
C. Dissociation from a 3D LJ cluster
In Fig. 8, we plot the results for the 3D systems. They are remarkably similar to the 2D systems, showing the robustness of the results. One striking difference is the slope of the green rate constant contours, which is now −0.25 (for the flexible 3D case), as we now have four outer particles. Hence, a change in ϵ2 has a fourfold effect on the rate constant. In addition, the DKL contours appear different and a bit more skewed compared to the more circular ones in the 2D case. Remarkably, the optimal solution for the parameters (the red curves) looks again qualitatively similar to those of the 2D cases. Only for very strong interaction Δϵ2 > 2 of the outer particles, shown in Figs. 8(b) and 8(c), the red curve bends over.
V. CONCLUSION
In this paper, we have introduced a Continuum Path Ensemble Maximum Caliber (CoPE MaxCal) based method to optimize force field parameters in order to impose an dynamical constraint, in particular a rate constant in a complex molecular transition. Without any computationally expensive recalculation of the kinetics, the method yields the optimal change in the parameters that leads to an imposed rate constant after the path reweighting, while making the least possible perturbation to the prior trajectory ensemble as measured by the caliber or KL divergence.
We show that the path reweighting leads to meaningful prediction of the rate constants, which agrees with the direct calculation for the new force field, even up to more than an order of magnitude suppression of the rate constant in the case of the unbinding model. While in this work we develop the methodology and applied it only to simple models, we expect the method to be generally applicable.
Our method is especially aimed at force fields with multiple parameters. If there are only one or two parameters to change in a system, such as in a monoatomic Lennard-Jones fluid, adding kinetic constraints might overdetermine the parameter space, prohibiting a sensible solution. Instead, we focus on the other extreme, that of underdetermined systems, where the best solution should be found among many candidate parameters.
Besides a corrected force field, we find that the optimization for multiple parameters allows for an interpretation that gives physical insights in which parts of the path ensemble are affected by the parameter changes. Thus, the method provides a powerful tool to inspect rare event trajectory ensembles and extract valuable information from them. Trajectory ensembles provide us not only with mechanisms, rate constants, and transition states for rare event dynamics, but they can also inform on how such properties change with the model parameters. Importantly, they can predict the change in rate constants and, moreover, point us in the direction of parameters that least affect the original path ensembles. This gives the possibility to extend the optimization of force fields that are already optimized for thermodynamics to kinetics and, moreover, gives us pointers to what parts of the systems are most sensitive and thus most sensible to adapt or mutate.
One concern is how altering the potential might change the thermodynamics (or other static properties). When the force field perturbation only affects the barrier region between the two stable states and does influence the stable states themselves, this effect will be minimal (see the trimer example). However, if, for instance, changing force field parameters makes one of the states relatively more stable, this has the (most likely undesired) side-effect of changing the free energy difference. This issue might be resolved by including an additional constraint, e.g., by demanding that the forward and backward rates are modified with the same factor so that the free energy difference is unchanged in the optimized force field. Note that other thermodynamic or dynamic properties not targeted in the optimization might also be affected and might need an additional reevaluation.
Looking toward the future, we consider several directions for further research. First of all, we have used only the simplest of path actions, i.e., the Onsager–Machlup action for the EM integrator. It is known that EM is not a great integrator, and an obvious extension of the methods would be to extend it to underdamped Langevin integrators.23
Second, we only have considered up to two parameters. We envision that extension to many parameters is, in principle, straightforward, but some bookkeeping issues arise, e.g., keeping track of all the cross terms in the square gradient. Nevertheless, the method should be equally effective for optimization in higher dimensions.
Third, the method can and should be applied in a realistic force field and included in a MD engine. This is non-trivial since the evaluation of the path action should be done in the integrator. Nevertheless, we believe that this is a worthwhile and necessary research direction to make our approach useful. One of the issues is the functional form of the potential perturbation. To start, one could try simple functional forms, such as dihedral terms, and Lennard-Jones-like interactions. In a more advanced exploration, one could try to use machine learning to learn the functional form of the perturbation.
Fourth, the methods can be pushed beyond the standard atomistic force fields into coarse-grained force fields, for which rate constants are notoriously difficult to reproduce. Such an approach would go beyond a simple adjustment of the diffusion constant or friction.
Fifth, the method should be also applicable to other dynamical observables, such as diffusion and viscosity. In fact, all observables based on time-correlation functions of a trajectory ensemble should be treatable.
Sixth, parameters beyond the force field might be optimized, including the diffusion, effective mass, and even molecular topology. The holy grail in chemistry, material science, and biology is to accurately determine the link between microscopic degrees of freedom, such as the chemistry, the microscopic mechanisms, transition rates, and structure populations with macroscopic thermodynamic or kinetic properties related to the function. Having an efficient algorithm that is able to provide the link between kinetic/thermodynamic macroscopic properties and force field parameters, reporting on the underlying physics could be potentially very useful in protein engineering and design of small molecules and of materials with tunable macroscopic properties.49–56 Examples of such design applications are coarse-grained molecular force fields where the sole parameters are the identity of the aminoacid, which are a.o. employed in protein aggregation and protein folding.59,60 For such problems, one could apply our approach to suggest changes in the aminoacid sequence in the direction of promoting or depressing the rate constant of, e.g., aggregation, protein folding, or liquid/liquid phase separation. In this way, one could optimize the design of the novel (bio)material to be further tested in the wet lab, thus bypassing expensive and timely optimization protocols in the wet lab.
Moreover, our method could be particularly useful in chemical reactions by correcting the kinetics of imperfect while computationally cost-effective calculations by reactive force fields61 instead of doing calculations with accurate albeit computationally expensive density functional theory potentials. Another utility is the generality of the method, giving the possibility to combine it with deep learning potentials that are known to be very flexible and have already been used when calibrating force fields for thermodynamics in material science.5,62
Finally, our methodology of constrained optimization based on the maximum entropy principle is not limited to molecular systems and could even potentially contribute to various problems of time-series, where the underlying dynamics can be modeled as stochastic, undergoing rare events and with a potential energy function reporting on interactions of agents, such as in modeling of stock-market, fluid-flows, and health-pandemics.63–65
SUPPLEMENTARY MATERIAL
The supplementary material comprises an appendix with several theory sections containing detailed explanations and derivations mentioned in the text and results and discussion sections that are provided using the rate constant derivative as a test, applied to a diatomic and triatomic system.
ACKNOWLEDGMENTS
The authors thank Christoph Dellago and Pieter Rein ten Wolde for useful feedback and helpful discussions. B.G.K. acknowledges the funding by Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) through Grant No. SFB 1449–431232613, sub-project C02. The ideas for this contribution were initiated during the workshop “Accelerating the Understanding of Rare Events” (6-10 September 2021) at the Lorentz Center in Leiden, NL.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Peter G. Bolhuis: Conceptualization (equal); Data curation (lead); Investigation (lead); Methodology (equal); Writing – original draft (equal); Writing – review & editing (equal). Z. Faidon Brotzakis: Conceptualization (equal); Investigation (supporting); Methodology (equal); Writing – original draft (equal); Writing – review & editing (equal). Bettina G. Keller: Conceptualization (equal); Investigation (supporting); Methodology (equal); Writing – original draft (equal); Writing – review & editing (equal).
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.