Several enhanced sampling techniques rely on the definition of collective variables to effectively explore free energy landscapes. The existing variables that describe the progression along a reactive pathway offer an elegant solution but face a number of limitations. In this paper, we address these challenges by introducing a new path-like collective variable called the “deep-locally non-linear-embedding,” which is inspired by principles of the locally linear embedding technique and is trained on a reactive trajectory. The variable mimics the ideal reaction coordinate by automatically generating a non-linear combination of features through a differentiable generalized autoencoder that combines a neural network with a continuous k-nearest neighbor selection. Among the key advantages of this method is its capability to automatically choose the metric for searching neighbors and to learn the path from state A to state B without the need to handpick landmarks a priori. We demonstrate the effectiveness of DeepLNE by showing that the progression along the path variable closely approximates the ideal reaction coordinate in toy models, such as the Müller-Brown potential and alanine dipeptide. Then, we use it in the molecular dynamics simulations of an RNA tetraloop, where we highlight its capability to accelerate transitions and estimate the free energy of folding.

Atomistic molecular dynamics (MD) simulations have proven to be a powerful tool for investigating intricate aspects of physical, chemical, and biological systems.1 The power of MD investigations has increased with advances in computational resources and refinement of force field accuracy. Still, many phenomena of importance exhibit timescales well beyond the reach of conventional unbiased MD, even when used on state-of-the-art supercomputing platforms. In response to this temporal constraint, a variety of enhanced sampling algorithms have been developed over time to augment the sampling methods and reconstruct the corresponding free energy landscapes.2–6 Two of the most widely used families of enhanced sampling algorithms are based on the definition of a set of collective variables (CVs) that approximate the reaction coordinate7 or on paths that connect two end states of the process of interest.8 Each of these families of methods has their strengths and limitations.

The CV-based algorithms crucially depend on the identification of slow degrees of freedom that capture the transition under investigation and are typically limited to biasing a maximum of three variables at the same time.9,10 Path-based methods can collect a rather large number of degrees of freedom into a feature space to be used in the path-building process. However, the effectiveness of this process generally depends on a proper definition of the end states and the effectiveness of the path search algorithm. The original path-collective variables (PATHCVs) combine the various aspects of the two families of approaches in an effective manner.11 They are typically used in combination with an enhanced sampling algorithm, and thus, they can be used to explore paths that are separated by a high free energy barrier from the initial (guess) path. Because of their efficiency in exploring path space, PATHCVs have been successfully used to reconstruct the free energy profiles associated with many complex systems.12–16 In addition, their combination with partial path transition interface sampling was shown to produce accurate numerical prediction of rate constants.17 Recent developments include approaches where the path is updated adaptively18 or where multiple paths are taken into account simultaneously.19 Such approaches make the construction of the path more robust to user input, but at the same time, more costly because of the path optimization.

Still, the effectiveness of PATHCVs depends on the choice of the metric used to define the path, as well as on milestones number, position, and other parameters. We have recently explored a machine learning approach based on a spectral gap optimization procedure to find optimal linear combinations of different features for the metric of PATHCVs.20 However, the approach is limited by the fact that a fixed linear combination of different features does not always provide an optimal metric for all the milestones defining complex paths. A typical example of this is provided, for instance, by the binding of a ligand to a protein. In such a case, the role of ligand hydration and dehydration is crucial at a given milestone (when the ligand enters or exits the cavity), while it is less relevant in the remaining path (corresponding to the solvated ligand being free in solution or finding its way to the cavity). This was clearly shown by quantifying the relevance of various descriptors, including ligand hydration, at different stages of ligand binding.21 In that context, it was shown that a CV that combines a number of descriptors in a non-linear way22 is able to capture the relevant slow degrees of freedom (including hydration) at the right time. Thus, using a fixed metric to define the distance from the milestones defining the path can result in a sub-optimal performance. In recent years, several methods have been proposed to learn CVs directly from data, using either supervised or unsupervised methods.22–36 Some of the learned variables were based on atomic representation vectors, e.g., using variational autoencoders and applied to complex reactions.37 Others were shown to be close to paths connecting metastable states,38,39 without this condition being explicitly imposed. In addition, a data-driven variant of PATHCV based on kernel ridge regression has been proposed, in which the committor probability is approximated.40 

In this paper, we use machine learning (ML) techniques to overhaul the path CV approach and overcome the limitations associated with previous implementations. The resulting variable, called deep-locally non-linear-embedding (DeepLNE), offers a path-like-behavior when trained on a time series or synthetic data describing a path connecting A to B in phase space. It offers an automatic procedure of building an optimal 1D description of the training data, inspired by the formalism of locally linear embedding41 (LLE) and the PATHCV. Essential elements of the algorithm are a dimensionality reduction operated by an artificial neural network (ANN), a differentiable k-nearest neighbor selection (k-NN), and an encoding of the neighborhood into a 1D latent space via an ANN. Starting from a predefined feature set describing the system dynamics of interest, the DeepLNE algorithm creates a representation of each datapoint based exclusively on its neighbors in the training dataset. This metric is then transformed into a path-like CV locally anchored by the data provided during the training phase.

We test the DeepLNE CV on the simulations of a particle in the Müller-Brown potential, alanine dipeptide, and an RNA 8-mer using different variants of the on-the-fly probability enhanced sampling42–45 (OPES) method. We show that the application of DeepLNE is successful not only in the simplest examples but also in the cases of high-dimensional inputs or the combination of a set of heterogeneous features into an optimal path-like CV.

Prior to introducing the DeepLNE method, we provide a brief overview of the PATHCVs and LLE frameworks, highlighting the areas of inspiration for our work.

In the original PATHCVs approach,11 one encodes a path connecting two end state A and B into a progress along the path CV s and a distance from the path z. Given a D-dimensional feature space X, one can express it parametrically X(θ) along the path so that X(0) = XA and X(1) = XB. Formally, the PATHCVs s and z are two functions of X(θ), which are defined as
s(X)=limλ01θeλ(XX(θ))2dθ01eλ(XX(θ))2dθ,
(1)
z(X)=limλ1λln01eλ(XX(θ))2dθ.
(2)

In practice, this PATHCV formulation cannot be employed directly to deposit bias during simulations because of the crucial requirement of a CV to be smooth and differentiable. One has to resort to a number of assumptions and approximations. First, the parametric path is replaced by a discretized version Xi constructed on a series of m path milestones, such that the integrals transform into finite sums. Choosing a number of high-quality milestones that are equidistant in X is a non-trivial task that requires empirical optimization.

Another challenge is the choice of λ. Here, λ → ∞ leads to a step-wise path dominated by contributions for which (XXi)2 is minimal and whose derivative is ill-behaved. A finite optimal λ strikes a balance between high values corresponding to sharp PATHCVs and low values that sacrifice resolution in favor of differentiability.

An often overlooked challenge is the choice of X itself. For a PATHCV to be effective in enhanced sampling simulations, the feature space must contain as many as possible degrees of freedom so that the relevant ones pertaining to the transition between states A and B are captured. At the same time, the number of dimensions of X should be reasonably small, as the Euclidean norm used in Eqs. (1) and (2) quickly loses resolution for an increasing dimensionality of X. Here, the conundrum lies in navigating these two conflicting requirements, which represents a complex optimization problem. Attempting to solve it requires finding a difficult compromise that is often problem dependent and cannot be easily generalized.

In order to construct the PATHCVs, one typically follows a multi-step procedure. The starting point is a trajectory of the system of interest in which at least one transition from A to B is observed. Next, a feature vector X(θ) is determined, which should discriminate, as uniquely as possible, the state of the system at any given time θ. Then, one selects snapshots along the transition that define the path X and picks a finite λ. For reasons of computational efficiency, the number of snapshots m must be much smaller than the number of sampled datapoints θ. The milestone selection can be performed in a number of ways but is typically empirically crafted, requiring expert knowledge of the system and CV-building experience. We will see how some of these steps can be automated with the help of machine learning.

Locally linear embedding (LLE)41 is a non-linear dimensionality reduction technique that can be used to transform high-dimensional data into a lower-dimensional space with the constraint of preserving the local structure (i.e., the neighborhood) of the data. This is achieved in two steps.

Assuming that each datapoint, Xi possesses k neighbors that lie on a locally linear patch of its manifold, the LLE method reconstructs Xi from its neighbors via a linear regression. At first, one minimizes the cost function,
ε(W)=i|XijWijXj|2
(3)
with the constraints that jWij = 1 and that each datapoint Xi is reconstructed exclusively from its neighbors. By doing so, one determines the optimal weight matrix W, where Wij represents the contribution of the jth datapoint to the ith reconstruction. For our purposes, this corresponds to assuming that each point in a trajectory used to generate a PATHCV can be represented locally by k other points.
In the second step, LLE builds a lower-dimensional neighborhood-preserving mapping in such a way that the local linear relationships are approximately preserved. This implies that the same weights Wij that reconstruct the ith datapoint in D dimensions should also be able to reconstruct its embedded manifold coordinates xi (of dimension d). This is accomplished by minimizing the embedding cost function,
ϕ(xi)=i|xijWijxj|2,
(4)
where Wij is now fixed.

Constructing a PATHCV by applying the standard LLE method still involves the application of the Euclidean metric on the feature space [compare Eqs. (1) and (4)]. The higher dimension of the space, the more the computed Euclidean distances and the low-dimensional mapping suffer from degeneracy, i.e., a number of different feature space vectors produce analogous compressed values. Therefore in both contexts, one faces the same challenge: the choice of an optimal metric. Furthermore, while one has to define the total number of m milestones for PATHCVs, in LLE, one has also to make an a priori choice of the number of neighbors k that locally represent each datapoint.

The DeepLNE CV that we develop here (see Fig. 1) aims at representing a high dimensional dataset with a single dimension, by incorporating concepts from PATHCV and LLE into neural network architecture. Its starting point is the trajectory of a system going from state A to state B and a set of physical features X evaluated along it. Its objective is to build a directional CV, s, that can describe the progress along the transition through a non-linear combination of such a feature vector. The architecture of DeepLNE can be seen as a generalized autoencoder, with the first part being composed by an artificial neural network (ANN) to perform a preliminary dimensionality reduction, followed by a continuous k-nearest neighbor (k-NN) step on each datapoint, which is then passed to an encoder that compresses the neighborhood into a one-dimensional representation. This 1D CV, s, is then used to reconstruct the original input via a decoder, which is also used for computing an accessory perpendicular distance z CV.

FIG. 1.

Schematic of the DeepLNE CV architecture; the colored circles depicting the weights and biases of the individual transformer components. We construct a “generalized neighborhood encoder” by using an ANN, performing a first dimensionality reduction, a differentiable k-nearest neighbor selection (“k-NN”), followed by another ANN in order to compress the identified neighborhood representation into a single dimension, the s variable. The original choice of input features Xi for a single datapoint (out of m total training data) with dimensionality D is reduced to a d-dimensional vector x. In this non-linearly transformed manifold, for each datapoint, k-nearest neighbor datapoints are selected with a weight matrix based on the Euclidean distance. We concatenate the features x1, …, xk of these k-nearest neighbors to construct the vector xikNN. This vector represents the input to a subsequent ANN used to compress the neighborhood into a one-dimensional vector, the DeepLNE variable s, denoting the progression along the path. The decoder tries to optimally reconstruct the input X from s, resulting in vector X̂. The perpendicular distance from the path z is then computed as a function of the Euclidean distance from X̂. The flowchart below the network architecture reports the vector dimensionality of the single training datapoint at each step along the DeepLNE CV construction.

FIG. 1.

Schematic of the DeepLNE CV architecture; the colored circles depicting the weights and biases of the individual transformer components. We construct a “generalized neighborhood encoder” by using an ANN, performing a first dimensionality reduction, a differentiable k-nearest neighbor selection (“k-NN”), followed by another ANN in order to compress the identified neighborhood representation into a single dimension, the s variable. The original choice of input features Xi for a single datapoint (out of m total training data) with dimensionality D is reduced to a d-dimensional vector x. In this non-linearly transformed manifold, for each datapoint, k-nearest neighbor datapoints are selected with a weight matrix based on the Euclidean distance. We concatenate the features x1, …, xk of these k-nearest neighbors to construct the vector xikNN. This vector represents the input to a subsequent ANN used to compress the neighborhood into a one-dimensional vector, the DeepLNE variable s, denoting the progression along the path. The decoder tries to optimally reconstruct the input X from s, resulting in vector X̂. The perpendicular distance from the path z is then computed as a function of the Euclidean distance from X̂. The flowchart below the network architecture reports the vector dimensionality of the single training datapoint at each step along the DeepLNE CV construction.

Close modal

This strategy is built upon the satisfaction of a number of constraints. The first one requires that the initial m datapoints X can be optimally reconstructed from their one-dimensional representation s. We set this restriction in place with an asymmetric autoencoder architecture, where the reconstruction loss between the original data X and the decoded data X̂ is minimized.

Second, in analogy with the PATHCVs and LLE, each datapoint i must be exclusively represented by its neighbors. In order to satisfy this constraint, we implement a continuous and differentiable relaxation of the k-NN selection rule (detailed in Subsection I D). Notably, the compressed latent space representation si is not directly obtained from a transformation of the input features of the datapoint Xi but from those of its k neighbors. Thus, the resulting CV is more robust to extrapolation as it is anchored to the local description of the training datapoints at all times. As a third constraint, the neighbors of each datapoint are found in a different manifold of arbitrary dimension, in our case in a lower dimension d compared to the original feature space dimension D. This is achieved by transforming the initial inputs using an ANN. The PATHCV and LLE methods discriminate between datapoints by applying the Euclidean distance directly on the original feature space; therefore, their compressed representation may suffer from a degeneracy issue for large D. Instead, DeepLNE computes Euclidean distances exclusively in the reduced dimensional space d during the k-NN step, thus alleviating the degeneracy issue.

The fourth and last constraint of DeepLNE is to represent each datapoint via a non-linear combination of the neighbors’ features. This aspect is also a novelty as PATHCV and LLE do not offer the possibility to automatically construct the metric as a non-linear combination of the original input features. We expect this increased flexibility to play a major role in cases where the sampled data are sparse, such as in high dimensions or in the vicinity of sampling bottlenecks. In analogy with the seminal PATHCV framework, the DeepLNE CV can provide a measure of both the progress along the path (s) and the distance from it (z), which can be used for driving the system along the process and controlling it. While s is computed as the output of the encoder part of the DeepLNE architecture described above, the z variable is defined from the decoded s variable. For a new datapoint Y, the perpendicular distance to the path z is computed as
z(Y)=limλ1λlnimeλ(YX̂i)2,
(5)
which is analogous to Eq. (2), where X(θ) is replaced by the decoded training data X̂ (shown in the scheme of Fig. 1). By not comparing the new data directly with the training datapoints, but with their decoded versions (i.e., passed through neighbor selection and then reconstructed), we can more effectively measure the distance from the average learned path. Note that in this definition, the z variable might still suffer from the high dimensionality problem of X̂. In this regard, we checked whether the variable z can be calculated with respect to the reduced dimension x instead of X̂. However, in the numerical examples, we found that the z computed in this alternative way did not improve upon our recommended definition in Eq. (5), and in some cases, it even decreased the sampling efficiency along the transition of interest.

In summary, the DeepLNE algorithm makes it possible to construct a variable that approximates a transition path between states, while overcoming multiple issues of PATHCVs, as follows:

  • A lower dimensional metric d is learned automatically and reduces the degeneracy when computing the Euclidean norm to identify the local neighborhood of a datapoint.

  • Local descriptions of the high dimensional dataset do not have to be empirically chosen but are found in an automatic way by constructing the neighborhood of each datapoint using a differentiable k-NN step.

  • Heterogeneous input features are permitted, e.g., we can combine distances, angles, and contact maps into X.

DeepLNE implements the continuous and differentiable k-NN in the spirit of Ref. 46. First, we describe the 1-NN selection rule, where we define the negative Euclidean distance r between query point xi and the transformed training datapoints xj, to be a = −r(xi, xj). The expectation of w̄1 of the first index vector taking a value iI is given by
w̄i1=P[w1=i|a1,t]=ea1/tiIe(ai1/t),
(6)
where t is the temperature scaling factor and P is the nearest neighbor probability for point xi.
Then, the distances a are updated, such that
āj+1=āij+log(1w̄ij).
(7)
These updated distances are then used to compute the expectation over the next index vector,
w̄ij+1=P[wj+1=i|aj+1,t]=ea1j+1/tiIe(aij+1/t).
(8)
The continuous k-nearest neighbors (x1k-NN,,xkk-NN) of xi using the expectations w̄ij follow as
xik-NN=iw̄ijxj.
(9)
A computational speed up during the training and application of DeepLNE can be achieved by decreasing the size of the weight matrix w̄. To this extent, we use Farthest Point-Sampling (FPS),47 selecting a subset of the training datapoints x only for the k-NN selection and the construction of xikNN.

An important ingredient for the deployment of the DeepLNE CV in the enhanced sampling context is the adoption of a continuous and differentiable approximation of the k-NN selection rule. Here, we focus on the choice and the interpretation of two hyperparameters: the number of neighbors k and the sparsity of the selection matrix t. Concerning the number of neighbors k, we have the following limiting cases:

  • k → 1: leads to sharp local description of training data and high variability of the DeepLNE CV when extrapolating out of distribution. This is due to the fact that each datapoint is exclusively described by a single neighbor.

  • k → ∞: in this limit, we have a loss of locality and, therefore, loss of path-like behavior, especially in the regions that are sparsely described or under-sampled, such as the transition state region, which is crucial for capturing the dynamics of the process. Furthermore, the interpolation and extrapolation of variables s and z become more degenerate as a given datapoint is described by many training samples.

The second hyperparameter t determines the smoothness of the k-NN step by adjusting whether only the first k neighbors or a larger set of neighbors contribute (see Fig. SI 1). Its role is comparable to the one of λ in the PATHCV framework. We single out the following limiting cases:

  • t → 0 imposes exact nearest neighbor selection, in which a datapoint is exclusively described by the features of the selected neighbors. The effect is analogous to the case where λ → ∞ for the PATHCV. In this limit, vanishing gradients occur as a result of the applied softmax function in the k-NN (compare Fig. SI 2).

  • t → ∞ implies that a datapoint is described by a function of all the training datapoints, strongly lowering the resolving power of s, in analogy with λ → 0 for the PATHCV. As a consequence, each new point gets transformed into a point at the center of the training set.

Based on these limiting cases, we suggest that k should be rather small (k < 10) to retain the path-like behavior of the CV, while t should be small but finite (t ≈ 0.1) to ensure numerical stability during enhanced sampling simulations.

For all the systems studied in this paper, we followed the same protocol: (1) sampling of the transition of interest, (2) training the DeepLNE CV on a set of input features, and (3) performing enhanced sampling MD simulation using the DeepLNE CV for bias deposition. In the following, we outline all the key steps of our recommended strategy for training the DeepLNE CV.

1. Training data generation

Starting trajectories must describe the transition from state A to B fully and as densely as possible. They can be unbiased in the simplest of cases where the energetic barrier between states is small or they can be biased data. Among the biasing procedures to generate input trajectories, we would single out the ratchet-and-pawl restraint,48,49 steered MD,50 and OPES in exploration mode.44 Since the computational cost of DeepLNE increases with an increasing number of datapoints, one can rely on FPS to select a maximally diverse subset of the original data, such that only the m frames are retained.

2. Neural network architecture

In the examples reported in this study, we use 2 feed-forward ANN, both containing at most two hidden layers and the hyperbolic tangent activation function. The ANN architecture can be adjusted as desired by the user. We recommend choosing d such that the dimensional reduction per hidden layer and the total number of fitting parameters strike a balance. While the expected application scenario for complex biological systems of many degrees of freedom is dD, cases in which d > D can also be envisioned. Furthermore, in order for the s variable to describe the progress along the path, we scale it using a sigmoid activation in the final encoder layer, restricting its values to the range of 0–1.

3. Neural network optimization

The network parameters are optimized using the mean square error between the original input and the reconstructed ones,
L=1mi=1m|XX̂|2
(10)
via gradient descent using the ADAM optimizer with a learning rate of 10−3 and 5000 epochs, using the machine learning library PyTorch,51 with the help of the mlcolvar package.38 

4. Inspecting the results

Furthermore, it can be instructive to plot the DeepLNE CV along a few important physical descriptors or against the first principal components of the input features X. A more detailed analysis can be obtained via a sensitivity analysis to identify the features that contribute the most22 or by using linear models to interpret the neural network-based CV.52 

5. Exporting the model into PLUMED

After the training is finalized, the model is compiled using TorchScript. The DeepLNE CV is evaluated on the fly in MD simulations by loading the model into the PLUMED plugin53 as a PYTORCH CV.38 An implementation of the method and usage tutorials are available at https://github.com/ThorbenF/DeepLNE.

To test the introduced algorithm, we report simulation results for a particle in the Müller-Brown potential, alanine dipeptide, and an RNA tetraloop (Fig. 2).

FIG. 2.

DeepLNE CVs are tested on three toy models: (a) Müller-Brown potential used as a two-state potential energy landscape for the simulation of a particle moving in two dimensions. (b) Structure of the well-studied biomolecule alanine dipeptide, a system that can be sufficiently described via its dihedral angles ϕ and ψ. (c) Structure of an RNA tetraloop made up of 8 nucleotides (the RNA with sequence “CCGAGAGG” is colored in 5′ to 3′ direction: orange, cyan, purple, pink, lime, yellow, and green). We show the correctly folded structure.

FIG. 2.

DeepLNE CVs are tested on three toy models: (a) Müller-Brown potential used as a two-state potential energy landscape for the simulation of a particle moving in two dimensions. (b) Structure of the well-studied biomolecule alanine dipeptide, a system that can be sufficiently described via its dihedral angles ϕ and ψ. (c) Structure of an RNA tetraloop made up of 8 nucleotides (the RNA with sequence “CCGAGAGG” is colored in 5′ to 3′ direction: orange, cyan, purple, pink, lime, yellow, and green). We show the correctly folded structure.

Close modal

1. Müller-Brown

The simulations of a particle subjected to the Müller-Brown potential are performed using the simple Langevin dynamics code contained in the VES module of PLUMED,54 and the biased simulations are performed using the OPES42 method with a pace of 200 steps, the automatic bandwidth selection, and a barrier parameter equal to 15 kBT. The particle is moving in two dimensions under the action of the three-state potential shown in Fig. 2, built out of a sum of Gaussians. During step (1), we choose to employ the ratchet and pawl ABMD method48 biasing the y-coordinate to sample the transition between the two states A and B. Then, in step (2), the DeepLNE CVs (s and z) are trained using the x and y positions of the particle as input features. For step (3), a new simulation is run where s is used as a CV to deposit bias via OPES. In addition, a half parabolic harmonic constraint with a cutoff value of 0.1 (UWALL) and κ = 1000 kBT is applied on the z variable. We estimate statistical errors via block analysis using ten blocks.

2. Alanine dipeptide

For the extensively studied FES of alanine dipeptide, three transition paths are investigated considering the dihedral angles ϕ and ψ. During step (1), we apply ABMD biasing both dihedrals so that five independent transitions across the FES barrier of interest are observed. To test the dimensionality reduction capability of DeepLNE, we also employ all inter-atomic distances in the molecule as initial features, which corresponds to D = 190. To save computational costs, we use the FPS tool of Ref. 47 to reduce the number of training datapoints (m = 3000). After training the DeepLNE CV, we use the s variable as a CV to bias in combination with OneOPES45 and eight replicas. We choose a pace of 500 steps, the automatic bandwidth selection, and a barrier parameter equal to 60 and 2.5 kJ/mol for OPES-METAD and OPES-EXPLORE, respectively. In addition, a harmonic constraint with cutoff value κ = 20 000 kJ/mol (UWALL) is applied on the z variable. Simulations are carried out with the DES-Amber ff.55 We analyze replica 0 and estimate the statistical errors using ten blocks of length 2 ns.

3. RNA tetraloop

We select an RNA 8-mer as a model for a system with biological relevance and configurational complexity in order to perform a prediction task using DeepLNE. The system consists of eight nucleotides with the sequence “CCGAGAGG” containing the GAGA tetraloop motif (Fig. 2) and has previously been studied using extensive enhanced sampling MD56 using 24 replicas with a simulation length of 1 µs per replica. As proposed by the authors in the previous study, we included recent corrections to the van der Waals parameters and a more accurate water model. Consequently, we used the standard OL3 RNA ff57–60 for our simulation with the van der Waals modification of phosphate oxygens developed in Ref. 61 without adjustment of dihedral parameters. As a water model, we chose OPC.62 This combination has been originally proposed in Ref. 63 and already tested on RNA systems.64,65 The simulations were run at salt concentrations corresponding to a system with neutralized total charge using potassium with the Joung–Cheatham ion parameters66 optimized for TIP4PEwald. To sample the transition from natively folded state A to unfolded state B, we use the ABMD method biasing 3 different CVs: (1) the eRMSD metric67 with respect to state A, (2) a contact map constructed using the heavy atom distances between the 4 nucleotides in stems 1 and 2 as well as 7 and 8 with cutoff value 5 Å (CMAP) inspired by Ref. 68, and (3) a contact map constructed using the heavy atom distances between the 4 nucleotides in the loop 3 and 4 as well as 5 and 6 with cutoff value 3.5 Å (CMAP). We start ABMD runs from the unfolded state at an eRMSD >2.1 and stop the simulation upon reaching the natively folded state at an eRMSD <0.2. We reduce the number of training datapoints (m = 2250) via FPS and choose the same distance metrics used for the ABMD simulations as input features in the DeepLNE training. In the enhanced sampling simulation using DeepLNE, we use the s variable to deposit bias with OneOPES. For OPES-EXPLORE, we choose a PACE of 20 000 steps, automatic bandwidth selection, and a barrier parameter equal to 50 kJ/mol. The temperatures for the 8 replicas are selected from a distribution ranging from 300 to 400 K, and each replica is simulated for 500 ns. As done in the previous study,56 the free energy differences were calculated by integrating the probability of being in the folded/unfolded states by ΔF = −kBT[log(i,foldedp(xi)) − log(i,unfoldedp(xi))]. Structures with an eRMSD >0.7 from the natively folded structure are considered unfolded. Replica 0 is analyzed excluding the first 200 ns. The statistical errors were estimated using 10 blocks of 30 ns.

Simulations of alanine dipeptide and GAGA tetraloop are performed using the GROMACS 2022.5 engine69 patched with the PLUMED 2.9 plugin53 with the Hamiltonian replica exchange algorithm.70 

After simulations of the toy models Müller-Brown and the alanine dipeptide were completed, we performed committor analysis to investigate whether the identified DeepLNE variable s approximates the real reaction coordinate well. This analysis protocol is meant to estimate the committor values near the maximum of the FES. For the Müller-Brown potential, the transition region is split into 24 bins along the s CV in the region of the maximum in the FES, and for each bin, 10 times 50 particle positions are extracted via bootstrapping and independent simulations are performed. In the case of alanine dipeptide, we split the FES of replica 0 in the vicinity of the transition state region into 11 bins along the s variable and extract 100 structures with the lowest values of the z variable. Then, we start new independent simulations from these structures, book-keeping which state they visit first (A or B). We repeat this 5 times with randomized starting velocities to obtain averages and error estimates.

We demonstrate how to train and apply the DeepLNE CV to enhanced sampling MD simulations. In Fig. 3, we show the collected results of the three steps along the proposed DeepLNE algorithm. We start by performing two simulations biasing the y-coordinate via ABMD using different spring constants κ = 0.5 kBT and κ = 1.25 kBT. As shown in Fig. 3(a), both runs start from the same state A and are stopped as soon as they reach state B. We show the portion of trajectory that describes the transition from state A to state B. We use this reduced set of datapoints to compute the input features for the DeepLNE CV training step. In Fig. 3(b), we show the results of biasing the particle simulation using the s variable of the DeepLNE model and constraining perpendicular movement via the z variable. One can appreciate that the desired exploration moving from state A to state B is achieved and that s describes the intended progression along the path.

FIG. 3.

Results of training DeepLNE for a particle simulation in the Müller-Brown potential. (a) Training datapoints for DeepLNE obtained using ABMD (biasing y-coordinate with different spring constants κ in kBT), describing the movement from state A to state B. (b) Sampled configuration during the biased simulation using OPES applied on the trained DeepLNE CVs s and a harmonic constraint applied on z. The colors of the datapoints correspond to the DeepLNE s variable showing a path-like behavior. (c) Free energy with respect to the s variable and the committor probability for state B. We calculated estimates and uncertainties with a 10-block analysis and compared to the FES obtained biasing the x and y coordinates as a reference. Uncertainties in both FES are below 0.1 kBT. (d) FES and the committors in the subsection around the transition state. The maximum of the FES and the committor value of 0.47 ± 0.04 coincide in the same bin for s. Committors and their error bars are obtained via bootstrapping 50 starting configurations 10 times from the simulation biased using DeepLNE. The dashed horizontal line indicates a committor of 0.5.

FIG. 3.

Results of training DeepLNE for a particle simulation in the Müller-Brown potential. (a) Training datapoints for DeepLNE obtained using ABMD (biasing y-coordinate with different spring constants κ in kBT), describing the movement from state A to state B. (b) Sampled configuration during the biased simulation using OPES applied on the trained DeepLNE CVs s and a harmonic constraint applied on z. The colors of the datapoints correspond to the DeepLNE s variable showing a path-like behavior. (c) Free energy with respect to the s variable and the committor probability for state B. We calculated estimates and uncertainties with a 10-block analysis and compared to the FES obtained biasing the x and y coordinates as a reference. Uncertainties in both FES are below 0.1 kBT. (d) FES and the committors in the subsection around the transition state. The maximum of the FES and the committor value of 0.47 ± 0.04 coincide in the same bin for s. Committors and their error bars are obtained via bootstrapping 50 starting configurations 10 times from the simulation biased using DeepLNE. The dashed horizontal line indicates a committor of 0.5.

Close modal

In Fig. SI 3(a), we show the decoded training datapoints X̂ of DeepLNE. Since all datapoints used for training the DeepLNE CV describe a transition across the same barrier, X̂ well approximates the path along the minimum in the Müller-Brown FES. Consequently, applying a harmonic constraint on z during enhanced sampling MD allows for frequent transitions while sampling the movement between the state A and B as shown in Fig. SI 3(b). We can appreciate that the entire spectrum of the s variable is explored. Since we exclusively sample the transition and not the entire free energy landscape [Fig. 3(b)], we can confirm that the applied harmonic constraint on z is effective, limiting the exploration perpendicular to the path-like variable s. In Fig. 3(c), we show the results of free energy estimation as well as committor analysis based on the estimated FES in more detail in panel (d). We compute the committor probability for state A in 24 different bins along s in the region of the highest free energy. Along the increasing values of the s variable, one can appreciate that the committor value changes from 0 to 1 and takes a value of 0.47 ± 0.04 in proximity of the maximum barrier value in the FES, indicating that the DeepLNE CV approximates well the ideal reaction coordinate. In Fig. SI 3(c), we show the individual committor values for each of the 10 estimations, resulting in the average committors reported herein.

For the trained DeepLNE model, we perform a benchmark of the computation time for evaluating the variables s and z (single forward pass) for increasing the number of training datapoints kept in memory and used during the k-NN. The computation timescales sub-linearly for less than 400 training datapoints and linearly beyond (Fig. SI 4).

Figure 4 shows the results of constructing the DeepLNE CVs for three different transitions of alanine dipeptide and estimating the FES via OneOPES simulations using the s variable to deposit the bias potential. We note that the DeepLNE CVs are a non-linear function of the 190 inter-atomic distances in alanine dipeptide. Figures 4(a)4(c) show the datapoints initially sampled from short ABMD runs across three different barriers in the reference FES obtained using OPES-METAD, biasing ϕ and ψ over 200 ns. We color the FPS-reduced training datapoints based on the DeepLNE variable s. The variable describes the intended progression from one state A to another state B across the different barriers in all three cases.

FIG. 4.

Sampling of different pathways in alanine dipeptide using DeepLNE. In (a)–(c), the scatter points represent the ABMD trajectories crossing three distinct barriers. The DeepLNE variable is trained on those datapoints using 190 interatomic distances as input features instead of dihedral angles. Next, the DeepLNE CVs are employed to run OneOPES simulations, and the new data are used to calculate the progress along the path, computed through the expectation value ⟨sϕ,ψ shown in the colored regions. The black lines below denote the reference FES, obtained by biasing ϕ and ψ through OneOPES. The basin at ϕ = −1.5, ψ = 2 is referred to as C7eq, while the basin at ϕ = 1, ψ = −1 is referred to as C7ax. The new OneOPES runs yield FES estimates, with calculated uncertainties using a 10-block analysis, shown in panels (d)–(f). The FES maxima coincide with the committor probabilities close to 0.5, computed by running short simulations from the configuration in the same s bins shaded in orange. Here, we also compare to the reference FES by reweighting the simulation with respect to DeepLNE. For committor analysis, we extract 100 starting configurations from the biased simulation using DeepLNE. Averages and error bars are computed by performing 5 simulations with randomized velocities for each starting configuration. The dashed horizontal line indicates a committor of 0.5.

FIG. 4.

Sampling of different pathways in alanine dipeptide using DeepLNE. In (a)–(c), the scatter points represent the ABMD trajectories crossing three distinct barriers. The DeepLNE variable is trained on those datapoints using 190 interatomic distances as input features instead of dihedral angles. Next, the DeepLNE CVs are employed to run OneOPES simulations, and the new data are used to calculate the progress along the path, computed through the expectation value ⟨sϕ,ψ shown in the colored regions. The black lines below denote the reference FES, obtained by biasing ϕ and ψ through OneOPES. The basin at ϕ = −1.5, ψ = 2 is referred to as C7eq, while the basin at ϕ = 1, ψ = −1 is referred to as C7ax. The new OneOPES runs yield FES estimates, with calculated uncertainties using a 10-block analysis, shown in panels (d)–(f). The FES maxima coincide with the committor probabilities close to 0.5, computed by running short simulations from the configuration in the same s bins shaded in orange. Here, we also compare to the reference FES by reweighting the simulation with respect to DeepLNE. For committor analysis, we extract 100 starting configurations from the biased simulation using DeepLNE. Averages and error bars are computed by performing 5 simulations with randomized velocities for each starting configuration. The dashed horizontal line indicates a committor of 0.5.

Close modal

Having constructed three path-like variables, we estimate the FES performing OneOPES simulations along them and also perform committor analysis for each case [Figs. 4(d)4(f)]. Before going into more detail, we note that in Fig. SI 5, we collected the expectation value of the z variable and the time evolution of the s variable. The analysis of ⟨zϕ,ψ [see Figs. SI 5(a)–5(c)] confirms that path-like CVs are learned, which exhibit low z values along the transition of interest. Importantly, frequent transitions occur, and the entire spectrum of the s variables is explored during the sampling of the three different transition paths [Figs. SI 5(d)–5(f)]. Figure 4(a) superimposes 5 ABMD simulations starting from the same configuration in the state C7eq and then progressing to state C7ax over a barrier at ϕ = 2, ψ = −2.5. From the reference FES, it can be seen that there exists a single transition state region, which is quantified, as shown in Fig. 4(d), by performing OneOPES simulation. The transition region is located between s values 0.13 to 0.25, and accordingly, configurations from 10 bins in this range are extracted to perform committor analysis. We find that the s values in range 0.17–0.19 of the DeepLNE CV correspond to the transition state with the committor value 0.49 ± 0.05. In Fig. 4(b), we show another transition between the state C7eq and state C7ax, this time across a barrier located close to ϕ = 0, ψ = 0. The superimposition of the training data obtained from 5 ABMD runs and coloring them with the learned s variable shows that the transition should be expected for s values in the range 0.53–0.65, and accordingly, we analyze the corresponding OneOPES simulation shown in Fig. 4(e). It turns out that in this case, s values in the range between 0.59 and 0.61 lead to committor probabilities of 0.52 ± 0.01, also allowing us here to identify the configurations corresponding to the transition state. Finally, in Fig. 4(c) we train the DeepLNE model on a third possible transition path in the alanine dipeptide FES. We can see for this case that the system starts from configurations of the state C7eq with ϕ = −1.5, ψ = 0.5. The transition along a transition region close to ϕ = −2, ψ = −1 is considered completed when ϕ = 3, ψ = −3 is approached. In this case, OneOPES simulations are also performed using the DeepLNE s variable as the biased CV that leads to the FES shown in Fig. 4(f). The maximum of the free energy occurs for s values close to 0.65, and through committor analysis in the range 0.57–0.75, we find committor probabilities of 0.49 ± 0.04 for s values 0.64–0.67. Furthermore in Figs. SI 6(a)–6(c), we show a closer examination of the subsections of variable s analyzed by committor analysis showing the individual committor estimations for all three transition paths.

In addition, the relevance of the input features X can be estimated by accumulating the gradients of the DeepLNE s variable with respect to X. This is especially insightful in a scenario where the input feature dimension is very high. We report this analysis in Fig. SI 7 showing that for all three path-like DeepLNE CVs, there exist differences in relevance between input features [Figs. SI 7(a)–7(c)]. Importantly, the most relevant inputs identified are sensitive to the relevant dihedral angle changes required to perform the respective transitions [Figs. SI 7 (d)–7(f)].

Finally, since the size of the features X is very high for this system, we analyzed the effect of the dimensionality reduction operated by the first ANN (Dd) for all three transition paths. To this end, we inspected the distribution of Euclidean distances between the training data in both the high and low dimensional representations X and x (Fig. SI 8). The latter one leads to broader distributions and, consequently, reduced degeneracy when defining the local neighborhood, thus resulting in a better input for the nearest-neighbor selection algorithm.

In conclusion, these results show that despite the challenge of being trained on high-dimensional input features, the DeepLNE algorithm can successfully guide a simulation from state A to B in free energy space and help identify the configurations belonging to the transition state.

In Fig. 5(a), we compare the FES estimation of the RNA GAGA tetraloop after 500 ns of simulation using OneOPES in combination with the eRMSD or DeepLNE variable s as the CV for bias deposition. As in Ref. 56, the eRMSD is computed with respect to the natively folded conformation. The DeepLNE CV is trained on the eRMSD and two contact maps providing additional information about the contacts formed between the four nucleotides in the stem and the loop, respectively.

FIG. 5.

Free energy surface and free energy of folding (ΔF) for the GAGA tetraloop based on 500 ns of OneOPES simulations using different biasing CVs (eRMSD, DeepLNE variable s). (a) Comparison of the FES along the eRMSD CV with respect to the natively folded conformation. (b) Comparison of the ΔF over simulation time comparing the OneOPES simulations biasing the eRMSD or DeepLNE variable s, respectively. The simulation using DeepLNE variable s as a biased CV shows a constant ΔF estimate throughout the simulation time, contrasting with the drifting trend observed in the estimation for the eRMSD biased simulation.

FIG. 5.

Free energy surface and free energy of folding (ΔF) for the GAGA tetraloop based on 500 ns of OneOPES simulations using different biasing CVs (eRMSD, DeepLNE variable s). (a) Comparison of the FES along the eRMSD CV with respect to the natively folded conformation. (b) Comparison of the ΔF over simulation time comparing the OneOPES simulations biasing the eRMSD or DeepLNE variable s, respectively. The simulation using DeepLNE variable s as a biased CV shows a constant ΔF estimate throughout the simulation time, contrasting with the drifting trend observed in the estimation for the eRMSD biased simulation.

Close modal

Our OneOPES simulation biasing the eRMSD for 500 ns results in an FES estimate that is in disagreements with previous studies where extensive enhanced sampling simulations were carried out. The FES in regions of conformations corresponding to high eRMSD values exhibit high statistical errors and are strongly underestimating the propensity of these conformations with respect to what has been previously observed (compare FES in Ref. 56). Instead the OneOPES simulation using the trained DeepLNE variable s to deposit a bias potential over time results in an FES which shows the expected relative population between natively folded and misfolded states. The statistical errors over the entire FES are consistently low. The source of high statistical uncertainty in the case of the simulations biased via eRMSD are shown in Fig. SI 9. We compare the exploration of the biased CV (eRMSD, DeepLNE) over the course of the simulation. The analysis confirms that the eRMSD is not reliably exploring its entire range of values and instead for simulations with the DeepLNE s variable, the entire spectrum of s values is explored and frequent transitions occur.

In Fig. 5(b), we show the estimated free energy of folding over the last 300 ns of simulation time. In agreement with what has been discussed above, the OneOPES simulation biasing the eRMSD is starting with a ΔF overestimating the free energy of folding and shows a drift over the course of the simulation. Instead, the folding free energy estimate for the OneOPES run, employing the DeepLNE CV s for bias deposition, approaches the value obtained after 500 ns of enhanced sampling within the first 200 ns. The ΔF estimate plateaus at 11.5 kJ/mol. Since our simulations are performed using a force-field, including van der Waals modification of phosphate oxygens and a more accurate OPC water model, both of which were not used in Ref. 56, our ΔF estimation is expected to be similar, but not identical to the previous findings. We specifically note for the GAGA tetraloop system that even though experimentally the folded state is expected to be energetically favored, the opposite is observed in simulations because of force field inaccuracies. Interestingly, our simulation performed with the corrections envisioned in Ref. 56 indeed lowers the free energy of folding. Nevertheless, the misfolded state still represents the minimum in the FES.

In this work, we propose a strategy to construct a path-like CV using a data-driven approach. The DeepLNE method builds a 1D representation of reactive trajectories that can be used to capture and especially accelerate rare events, with a focus on biophysical systems. Through multiple examples, we have shown how this CV well approximates the ideal reaction coordinate and can be effectively used to improve sampling using enhanced sampling methods, such as OPES or its recently developed OneOPES variant.

Our methodology begins with obtaining a first reactive trajectory that captures the transition from state A to state B. Herein, these starting trajectories have been obtained with a ratchet-and-pawl restraint,48 which provided training data of sufficient quality for all the systems investigated. For more complex systems, an input reactive trajectory that samples the transition state regions more extensively might result in a DeepLNE CV that captures the region of the free energy barrier more precisely. To this end, an iterative biasing procedure, such as the one introduced in Ref. 71 or the generation of synthetic training data as in Ref. 72, can be considered in future applications.

The input trajectory is used to define a feature vector to distinguish not only state A from state B but also the intermediate states encountered. With this featurized trajectory, one can automatically derive path-like variables applying the DeepLNE CV method introduced in this study. Input features are initially subjected to dimensionality reduction. In this lower dimensional manifold, we compute Euclidean distances to search for a small number of k-nearest neighbors that describe the local neighborhood of each datapoint. Compressing this neighborhood representation even further finally yields the DeepLNE variable s, which, upon decoding, can be used to generate in turn the DeepLNE CV z. The interpretation of s and z is analogous to the PATHCVs as they describe the progress along the path and its perpendicular distance, respectively.

The proposed method alleviates the issue of degeneracy that occurs when comparing datapoints in a high dimensional space via the Euclidean metric. It does so by identifying the local neighborhood of a datapoint in a lower dimensional representation instead. At the same time, it automates the process of constructing the CVs since the s and z CVs are generated without making the empirical choices of selecting a set of milestones and a metric that have a strong impact both on the quality of the CV and on the time required to fine-tune it. We show that non-linear combinations of hundreds of descriptors can easily be handled with DeepLNE, providing high flexibility and expressiveness. In addition, the input features can be selected by using data-driven methods.32 Unlike standard neural network-based CVs, the DeepLNE CV exclusively represents new datapoints through their neighbors in the training set. This feature is fundamental in preventing erroneous extrapolations when unknown data regions are visited.

Our method is not limited to two state systems and can be applied to more complex systems that are characterized by multiple metastable states. A DeepLNE CV can be successfully trained as long as the training trajectories visit all the states of interest and their transition regions. In the systems studied here, we did not impose directionality on the CV in order to show that path-like CVs can be successfully derived even in a completely unsupervised manner. If needed, this can be ensured by adding a supervised term to the optimization (such as cross entropy or a discriminative term), which can be done straightforwardly in a multi-task framework.38 We believe that DeepLNE provides a powerful and flexible method to automatically create efficient path-like CVs to accelerate sampling of complex phenomena.

The supplementary material contains additional information on the influence of hyperparameters on the neighborhood representation and the phenomenon of vanishing gradients in continuous k-nearest neighbors; further analysis of the particle in the Müller-Brown potential and alanine dipeptide, incorporating committor analysis; and a computational benchmark, importance ranking analysis, and a distribution comparison of the Euclidean norm between input dimensions X and reduced dimension x are included. In addition, the exploration of biased CVs in enhanced sampling simulations of the RNA GAGA-tetraloop is provided.

We acknowledge Simone Aureli for providing several useful suggestions. F.L.G., V.R., and T.F. acknowledge the Swiss National Supercomputing Center (CSCS) for large supercomputer time allocations Project ID: s1228. They also acknowledge the Swiss National Science Foundation and BRIDGE for financial support (Project Nos. 200021_204795, CRSII5_216587, and 40B20_203628). L.B. acknowledges funding from the Federal Ministry of Education and Research, Germany, under the TransHyDE research network AmmoRef (Support Code: 03HY203A).

The authors have no conflicts to disclose.

Thorben Fröhlking: Conceptualization (equal); Data curation (lead); Formal analysis (lead); Investigation (lead); Methodology (equal); Software (lead); Validation (lead); Visualization (lead); Writing – original draft (equal); Writing – review & editing (equal). Luigi Bonati: Conceptualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Valerio Rizzi: Methodology (equal); Writing – original draft (equal); Writing – review & editing (equal). Francesco Luigi Gervasio: Funding acquisition (lead); Project administration (lead); Resources (lead); Supervision (lead); Writing – original draft (equal); Writing – review & editing (equal).

At https://github.com/ThorbenF/DeepLNE we provided the DeepLNE class with a corresponding tutorial together with scripts that allow to reproduce the figures and the results of this study.

1.
D.
Frenkel
and
B.
Smit
,
Understanding Molecular Simulation
, 3rd ed. (
Academic Press, Inc.
,
USA
,
2023
).
2.
A.
Laio
and
F. L.
Gervasio
, “
Metadynamics: A method to simulate rare events and reconstruct the free energy in biophysics, chemistry and material science
,”
Rep. Prog. Phys.
71
,
126601
(
2008
).
3.
R. C.
Bernardi
,
M. C.
Melo
, and
K.
Schulten
, “
Enhanced sampling techniques in molecular dynamics simulations of biological systems
,”
Biochim. Biophys. Acta, Gen. Subj.
1850
,
872
877
(
2015
), part of Special Issue: Recent developments of molecular dynamics.
4.
O.
Valsson
,
P.
Tiwary
, and
M.
Parrinello
, “
Enhancing important fluctuations: Rare events and metadynamics from a conceptual viewpoint
,”
Annu. Rev. Phys. Chem.
67
,
159
184
(
2016
).
5.
C.
Camilloni
and
F.
Pietrucci
, “
Advanced simulation techniques for the thermodynamic and kinetic characterization of biological systems
,”
Adv. Phys.: X
3
,
1477531
(
2018
).
6.
J.
Hénin
,
T.
Lelièvre
,
M. R.
Shirts
,
O.
Valsson
, and
L.
Delemotte
, “
Enhanced sampling methods for molecular dynamics simulations [article v1.0]
,”
Living J. Comput. Mol. Sci.
4
,
1
60
(
2022
).
7.
A.
Laio
and
M.
Parrinello
, “
Escaping free-energy minima
,”
Proc. Natl. Acad. Sci. U. S. A.
99
,
12562
12566
(
2002
).
8.
P. G.
Bolhuis
,
D.
Chandler
,
C.
Dellago
, and
P. L.
Geissler
, “
Transition path sampling: Throwing ropes over rough mountain passes, in the dark
,”
Annu. Rev. Phys. Chem.
53
,
291
318
(
2002
).
9.
F.
Pietrucci
, “
Strategies for the exploration of free energy landscapes: Unity in diversity and challenges ahead
,”
Rev. Phys.
2
,
32
45
(
2017
).
10.
K.
Palacio-Rodriguez
,
H.
Vroylandt
,
L. S.
Stelzl
,
F.
Pietrucci
,
G.
Hummer
, and
P.
Cossio
, “
Transition rates and efficiency of collective variables from time-dependent biased simulations
,”
J. Phys. Chem. Lett.
13
,
7490
7496
(
2022
).
11.
D.
Branduardi
,
F. L.
Gervasio
, and
M.
Parrinello
, “
From A to B in free energy space
,”
J. Chem. Phys.
126
,
054103
(
2007
).
12.
A.
Berteotti
,
A.
Cavalli
,
D.
Branduardi
,
F. L.
Gervasio
,
M.
Recanatini
, and
M.
Parrinello
, “
Protein conformational transitions: The closure mechanism of a kinase explored by atomistic simulations
,”
J. Am. Chem. Soc.
131
,
244
250
(
2009
).
13.
J.
Fidelak
,
J.
Juraszek
,
D.
Branduardi
,
M.
Bianciotto
, and
F.
Gervasio
, “
Free-energy-based methods for binding profile determination in a congeneric series of CDK2 inhibitors
,”
J. Phys. Chem. B
114
,
9516
9524
(
2010
).
14.
M.
Fribourg
,
J. L.
Moreno
,
T.
Holloway
,
D.
Provasi
,
L.
Baki
,
R.
Mahajan
,
G.
Park
,
S. K.
Adney
,
C.
Hatcher
,
J. M.
Eltit
,
J. D.
Ruta
,
L.
Albizu
,
Z.
Li
,
A.
Umali
,
J.
Shim
,
A.
Fabiato
,
A. D.
MacKerell
,
V.
Brezina
,
S. C.
Sealfon
,
M.
Filizola
,
J.
González-Maeso
, and
D. E.
Logothetis
, “
Decoding the signaling of a GPCR heteromeric complex reveals a unifying mechanism of action of antipsychotic drugs
,”
Cell
147
,
1011
1023
(
2011
).
15.
G.
Saladino
,
L.
Gauthier
,
M.
Bianciotto
, and
F.
Gervasio
, “
Assessing the performance of metadynamics and path variables in predicting the binding free energies of p38 inhibitors
,”
J. Chem. Theory Comput.
8
,
1165
1170
(
2012
).
16.
E.
Cignoni
,
M.
Lapillo
,
L.
Cupellini
,
S.
Acosta-Gutiérrez
,
F. L.
Gervasio
, and
B.
Mennucci
, “
A different perspective for nonphotochemical quenching in plant antenna complexes
,”
Nat. Commun.
12
,
7152
(
2021
).
17.
J.
Juraszek
,
G.
Saladino
,
T. S.
van Erp
, and
F. L.
Gervasio
, “
Efficient numerical reconstruction of protein folding kinetics with partial path sampling and pathlike variables
,”
Phys. Rev. Lett.
110
,
108106
(
2013
).
18.
A.
Pérez de Alba Ortíz
,
A.
Tiwari
,
R. C.
Puthenkalathil
, and
B.
Ensing
, “
Advances in enhanced sampling along adaptive paths of collective variables
,”
J. Chem. Phys.
149
,
072320
(
2018
).
19.
A. P. D. A.
Ortíz
and
B.
Ensing
, “
Simultaneous sampling of multiple transition channels using adaptive paths of collective variables
,” arXiv:2112.04061 (
2021
).
20.
L.
Hovan
,
F.
Comitani
, and
F. L.
Gervasio
, “
Defining an optimal metric for the path collective variables
,”
J. Chem. Theory Comput.
15
,
25
32
(
2019
).
21.
V.
Rizzi
,
L.
Bonati
,
N.
Ansari
, and
M.
Parrinello
, “
The role of water in host-guest interaction
,”
Nat. Commun.
12
,
93
(
2021
).
22.
L.
Bonati
,
V.
Rizzi
, and
M.
Parrinello
, “
Data-driven collective variables for enhanced sampling
,”
J. Phys. Chem. Lett.
11
,
2998
3004
(
2020
).
23.
A.
Ma
and
A. R.
Dinner
, “
Automatic method for identifying reaction coordinates in complex systems
,”
J. Phys. Chem. B
109
,
6769
6779
(
2005
).
24.
G.
Díaz Leines
and
B.
Ensing
, “
Path finding on high-dimensional free energy landscapes
,”
Phys. Rev. Lett.
109
,
020601
(
2012
).
25.
G.
Pérez-Hernández
,
F.
Paul
,
T.
Giorgino
,
G.
De Fabritiis
, and
F.
Noé
, “
Identification of slow molecular order parameters for Markov model construction
,”
J. Chem. Phys.
139
,
015102
(
2013
).
26.
P.
Tiwary
and
B. J.
Berne
, “
Spectral gap optimization of order parameters for sampling complex molecular systems
,” in
Proc. Natl. Acad. Sci. U. S. A.
113
,
2839
2844
(
2016
).
27.
M.
M Sultan
and
V. S.
Pande
, “
tICA-Metadynamics: Accelerating metadynamics by using kinetically selected collective variables
,”
J. Chem. Theory Comput.
13
,
2440
2447
(
2017
).
28.
J. M. L.
Ribeiro
,
P.
Bravo
,
Y.
Wang
, and
P.
Tiwary
, “
Reweighted autoencoded variational bayes for enhanced sampling (rave)
,”
J. Chem. Phys.
149
,
072301
(
2018
).
29.
D.
Mendels
,
G.
Piccini
, and
M.
Parrinello
, “
Collective variables from local fluctuations
,”
J. Phys. Chem. Lett.
9
,
2776
2781
(
2018
).
30.
W.
Chen
and
A. L.
Ferguson
, “
Molecular enhanced sampling with autoencoders: On-the-fly collective variable discovery and accelerated free energy landscape exploration
,”
J. Comput. Chem.
39
,
2079
2102
(
2018
).
31.
P.
Gkeka
,
G.
Stoltz
,
A.
Barati Farimani
,
Z.
Belkacemi
,
M.
Ceriotti
,
J. D.
Chodera
,
A. R.
Dinner
,
A. L.
Ferguson
,
J.-B.
Maillet
,
H.
Minoux
,
C.
Peter
,
F.
Pietrucci
,
A.
Silveira
,
A.
Tkatchenko
,
Z.
Trstanova
,
R.
Wiewiora
, and
T.
Lelièvre
, “
Machine learning force fields and coarse-grained variables in molecular dynamics: Application to materials and biological systems
,”
J. Chem. Theory Comput.
16
,
4757
4775
(
2020
).
32.
L.
Bonati
,
G.
Piccini
, and
M.
Parrinello
, “
Deep learning the slow modes for rare events sampling
,”
Proc. Natl. Acad. Sci. U. S. A.
118
,
e2113533118
(
2021
).
33.
F.
Hooft
,
A.
Pérez de Alba Ortíz
, and
B.
Ensing
, “
Discovering collective variables of molecular transitions via genetic algorithms and neural networks
,”
J. Chem. Theory Comput.
17
,
2294
2306
(
2021
).
34.
E.
Trizio
and
M.
Parrinello
, “
From enhanced sampling to reaction profiles
,”
J. Phys. Chem. Lett.
12
,
8621
8626
(
2021
).
35.
L.
Sun
,
J.
Vandermause
,
S.
Batzner
,
Y.
Xie
,
D.
Clark
,
W.
Chen
, and
B.
Kozinsky
, “
Multitask machine learning of collective variables for enhanced sampling of rare events
,”
J. Chem. Theory Comput.
18
,
2341
2353
(
2022
).
36.
D.
Ray
,
E.
Trizio
, and
M.
Parrinello
, “
Deep learning collective variables from transition path ensemble
,”
J. Chem. Phys.
158
,
1
22
(
2023
).
37.
M.
Šípka
,
A.
Erlebach
, and
L.
Grajciar
, “
Constructing collective variables using invariant learned representations
,”
J. Chem. Theory Comput.
19
,
887
901
(
2023
).
38.
L.
Bonati
,
E.
Trizio
,
A.
Rizzi
, and
M.
Parrinello
, “
A unified framework for machine learning collective variables for enhanced sampling simulations: Mlcolvar
,”
J. Chem. Phys.
159
,
014801
(
2023
).
39.
T.
Lelièvre
,
T.
Pigeon
,
G.
Stoltz
, and
W.
Zhang
, “
Analyzing Multimodal Probability Measures with Autoencoders
,”
J. Phys. Chem. B
128
(
11
),
2607
2631
(
2024
).
40.
A.
France-Lanord
,
H.
Vroylandt
,
M.
Salanne
,
B.
Rotenberg
,
A. M.
Saitta
, and
F.
Pietrucci
, “
Data-Driven Path Collective Variables
,”
J. Chem. Theory Comput.
20
(
8
),
3069
3084
(
2024
).
41.
S. T.
Roweis
and
L. K.
Saul
, “
Nonlinear dimensionality reduction by locally linear embedding
,”
Science
290
,
2323
2326
(
2000
).
42.
M.
Invernizzi
and
M.
Parrinello
, “
Rethinking metadynamics: From bias potentials to probability distributions
,”
J. Phys. Chem. Lett.
11
,
2731
2736
(
2020
).
43.
M.
Invernizzi
,
P. M.
Piaggi
, and
M.
Parrinello
, “
Unified approach to enhanced sampling
,”
Phys. Rev. X
10
,
041034
(
2020
).
44.
M.
Invernizzi
and
M.
Parrinello
, “
Exploration vs convergence speed in adaptive-bias enhanced sampling
,”
J. Chem. Theory Comput.
18
,
3988
3996
(
2022
).
45.
V.
Rizzi
,
S.
Aureli
,
N.
Ansari
, and
F. L.
Gervasio
, “
Oneopes, a combined enhanced sampling method to rule them all
,”
J. Chem. Theory Comput.
19
,
5731
5742
(
2023
).
46.
T.
Plötz
and
S.
Roth
, “
Neural nearest neighbors networks
,” in
Advances in Neural Information Processing Systems
, edited by
S.
Bengio
,
H.
Wallach
,
H.
Larochelle
,
K.
Grauman
,
N.
Cesa-Bianchi
and
R.
Garnett
(
Curran Associates, Inc.
,
2018
), Vol.
31
.
47.
A.
Goscinski
,
V.
Principe
,
G.
Fraux
,
S.
Kliavinek
,
B.
Helfrecht
,
P.
Loche
,
M.
Ceriotti
, and
R.
Cersonsky
, “
Scikit-matter: A suite of generalisable machine learning methods born out of chemistry and materials science
,”
Open Res. Eur.
3
,
81
(
2023
).
48.
C.
Camilloni
,
R. A.
Broglia
, and
G.
Tiana
, “
Hierarchy of folding and unfolding events of protein G, CI2, and ACBP from explicit-solvent simulations
,”
J. Chem. Phys.
134
,
045105
(
2011
).
49.
S.
A Beccara
,
T.
Škrbić
,
R.
Covino
, and
P.
Faccioli
, “
Dominant folding pathways of a ww domain
,”
Proc. Natl. Acad. Sci. U. S. A.
109
,
2330
2335
(
2012
).
50.
H.
Grubmüller
,
B.
Heymann
, and
P.
Tavan
, “
Ligand binding: Molecular mechanics calculation of the streptavidin-biotin rupture force
,”
Science
271
,
997
999
(
1996
).
51.
A.
Paszke
,
S.
Gross
,
S.
Chintala
,
G.
Chanan
,
E.
Yang
,
Z.
DeVito
,
Z.
Lin
,
A.
Desmaison
,
L.
Antiga
, and
A.
Lerer
, “
Automatic differentiation in pytorc
,” in
NIPS 2017 Workshop on Autodiff
,
2017
.
52.
P.
Novelli
,
L.
Bonati
,
M.
Pontil
, and
M.
Parrinello
, “
Characterizing metastable states with the help of machine learning
,”
J. Chem. Theory Comput.
18
,
5195
5202
(
2022
).
53.
G. A.
Tribello
,
M.
Bonomi
,
D.
Branduardi
,
C.
Camilloni
, and
G.
Bussi
, “
PLUMED 2: New feathers for an old bird
,”
Comput. Phys. Commun.
185
,
604
613
(
2014
).
54.
O.
Valsson
and
M.
Parrinello
, “
Variational approach to enhanced sampling and free energy calculations
,”
Phys. Rev. Lett.
113
,
090601
090605
(
2014
).
55.
S.
Piana
,
P.
Robustelli
,
D.
Tan
,
S.
Chen
, and
D. E.
Shaw
, “
Development of a force field for the simulation of single-chain proteins and protein–protein complexes
,”
J. Chem. Theory Comput.
16
,
2494
2507
(
2020
).
56.
S.
Bottaro
,
P.
Banáš
,
J.
Šponer
, and
G.
Bussi
, “
Free energy landscape of GAGA and UUCG RNA tetraloops
,”
J. Phys. Chem. Lett.
7
,
4032
4038
(
2016
).
57.
W. D.
Cornell
,
P.
Cieplak
,
C. I.
Bayly
,
I. R.
Gould
,
K. M.
Merz
,
D. M.
Ferguson
,
D. C.
Spellmeyer
,
T.
Fox
,
J. W.
Caldwell
, and
P. A.
Kollman
, “
A second generation force field for the simulation of proteins, nucleic acids, and organic molecules
,”
J. Am. Chem. Soc.
118
,
2309
(
1996
).
58.
J.
Wang
,
P.
Cieplak
, and
P. A.
Kollman
, “
How well does a restrained electrostatic potential (RESP) model perform in calculating conformational energies of organic and biological molecules?
,”
J. Comput. Chem.
21
,
1049
1074
(
2000
).
59.
A.
Pérez
,
I.
Marchán
,
D.
Svozil
,
J.
Sponer
,
T. E.
Cheatham
,
C. A.
Laughton
, and
M.
Orozco
, “
Refinement of the AMBER force field for nucleic acids: Improving the description of α/γ conformers
,”
Biophys. J.
92
,
3817
3829
(
2007
).
60.
M.
Zgarbová
,
M.
Otyepka
,
J.
Šponer
,
A.
Mládek
,
P.
Banáš
,
T. E.
Cheatham
, and
P.
Jurečka
, “
Refinement of the Cornell et al. nucleic acids force field based on reference quantum chemical calculations of glycosidic torsion profiles
,”
J. Chem. Theory Comput.
7
,
2886
2902
(
2011
).
61.
T.
Steinbrecher
,
J.
Latzer
, and
D. A.
Case
, “
Revised AMBER parameters for bioorganic phosphates
,”
J. Chem. Theory Comput.
8
,
4405
4412
(
2012
).
62.
S.
Izadi
,
R.
Anandakrishnan
, and
A. V.
Onufriev
, “
Building water models: A different approach
,”
J. Phys. Chem. Lett.
5
,
3863
3871
(
2014
).
63.
C.
Bergonzo
and
T. E.
Cheatham
III
, “
Improved force field parameters lead to a better description of RNA structure
,”
J. Chem. Theory Comput.
11
,
3969
3972
(
2015
).
64.
C.
Bergonzo
,
A. V.
Grishaev
, and
S.
Bottaro
, “
Conformational heterogeneity of UCAAUC RNA oligonucleotide from molecular dynamics simulations, SAXS, and NMR experiments
,”
RNA
28
,
937
946
(
2022
).
65.
T.
Fröhlking
,
M.
Bernetti
, and
G.
Bussi
, “
Simultaneous refinement of molecular dynamics ensembles and forward models using experimental data
,”
J. Chem. Phys.
158
,
214120
(
2023
).
66.
I. S.
Joung
and
T. E.
Cheatham
, “
Determination of alkali and halide monovalent ion parameters for use in explicitly solvated biomolecular simulations
,”
J. Phys. Chem. B
112
,
9020
9041
(
2008
).
67.
S.
Bottaro
,
F.
Di Palma
, and
G.
Bussi
, “
The role of nucleobase interactions in RNA structure and dynamics
,”
Nucleic Acids Res.
42
,
13306
13314
(
2014
).
68.
K.
Rahimi
,
P. M.
Piaggi
, and
G. H.
Zerze
, “
Comparison of on-the-fly probability enhanced sampling and parallel tempering combined with metadynamics for atomistic simulations of RNA tetraloop folding
,”
J. Phys. Chem. B
127
,
4722
4732
(
2023
).
69.
M. J.
Abraham
,
T.
Murtola
,
R.
Schulz
,
S.
Páll
,
J. C.
Smith
,
B.
Hess
, and
E.
Lindahl
, “
GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers
,”
SoftwareX
1-2
,
19
25
(
2015
).
70.
G.
Bussi
, “
Hamiltonian replica exchange in GROMACS: A flexible implementation
,”
Mol. Phys.
112
,
379
384
(
2014
).
71.
P.
Kang
,
E.
Trizio
, and
M.
Parrinello
, “
Computing the committor with the committor: An anatomy of the transition state ensemble
,” arXiv:2401.05279 [physics.comp-ph] (
2024
).
72.
S.
Yang
,
J.
Nam
,
J. C. B.
Dietschreit
, and
R.
Gómez-Bombarelli
, “
Learning collective variables for protein folding with labeled data augmentation through geodesic interpolation
,” arXiv:2402.01542 [physics.chem-ph] (
2024
).

Supplementary Material