Deep learning path-like collective variable for enhanced sampling molecular dynamics

Several enhanced sampling techniques rely on the definition of collective variables to effectively explore free energy landscapes. Existing variables that describe the progression along a reactive pathway offer an elegant solution but face a number of limitations. In this paper, we address these challenges by introducing a new path-like collective variable called the ‘Deep-locally-non-linear-embedding’, which is inspired by principles of the locally linear embedding technique and is trained on a reactive trajectory. The variable mimics the ideal reaction coordinate by automatically generating a non-linear combination of features through a differentiable generalized autoencoder that combines a neural network with a continuous k-nearest-neighbor selection. Among the key advantages of this method is its capability to automatically choose the metric for searching neighbors and to learn the path from state A to state B without the need to handpick landmarks a priori . We demonstrate the effectiveness of DeepLNE by showing that the progression along the path variable closely approximates the ideal reaction coordinate in toy models such as the Müller-Brown-potential and alanine dipeptide. We then use it in molecular dynamics simulations of an RNA tetraloop, where we highlight its capability to accelerate transitions and converge the free energy of folding. Atomistic molecular dynamics (MD) simulations have

Several enhanced sampling techniques rely on the definition of collective variables to effectively explore free energy landscapes.Existing variables that describe the progression along a reactive pathway offer an elegant solution but face a number of limitations.In this paper, we address these challenges by introducing a new path-like collective variable called the 'Deep-locally-non-linear-embedding', which is inspired by principles of the locally linear embedding technique and is trained on a reactive trajectory.The variable mimics the ideal reaction coordinate by automatically generating a non-linear combination of features through a differentiable generalized autoencoder that combines a neural network with a continuous k-nearest-neighbor selection.Among the key advantages of this method is its capability to automatically choose the metric for searching neighbors and to learn the path from state A to state B without the need to handpick landmarks a priori.We demonstrate the effectiveness of DeepLNE by showing that the progression along the path variable closely approximates the ideal reaction coordinate in toy models such as the Müller-Brown-potential and alanine dipeptide.We then use it in molecular dynamics simulations of an RNA tetraloop, where we highlight its capability to accelerate transitions and converge the free energy of folding.
Atomistic molecular dynamics (MD) simulations have proven to be a powerful tool for investigating intricate aspects of physical, chemical and biological systems. 1 The power of MD investigations has increased with advances in computational resources and refinement of force field accuracy.Still, many phenomena of importance exhibit timescales well beyond the reach of conventional unbiased MD, even when used on state-of-the-art supercomputing platforms.3][4][5][6] Two of the most widely used families of enhanced sampling algorithms are based on the definition of a set of collective variables (CVs) that approximate the reaction coordinate 7 or on paths that connect two end states of the process of interest. 8Each of these families of methods have their strengths and limitations.
CV-based algorithms crucially depend on the identification of slow degrees of freedom that capture the transition under investigation and are typically limited to biasing a maximum of three variables at the same time. 9,10Path-based methods can collect a rather large number of degrees of freedom into a feature space to be used in the path-building process.However, the effectiveness of this process generally depends on a proper definition of the end states and the effectiveness of the path search algorithm.The original Path-collective variables (PATHCVs) combine in an effective manner various aspects of the two families of approaches. 11They are typically used in combination with an enhanced sampling algorithm and thus they can be used to explore paths that are separated by high free energy barrier from the initial (guess) path.3][14][15][16] In addition, their combination with partial path transition interface sampling was shown to produce accurate numerical prediction of rate constants. 17Recent developments include approaches where the path is updated adaptively 18 or where multiple paths are taken into account simultaneously. 19ill, the effectiveness of PATHCVs depends on the choice of the metric used to define the path, as well as on milestones number, position and other parameters.We have recently explored a machine learning approach based on a spectral gap optimization procedure to find optimal linear combinations of different features for the metric of PATHCVs. 20However, the approach is limited by the fact that a fixed linear combination of different features not always provides an optimal metric for all the milestones defining a complex paths.A typical example of this is provided for instance by the binding of a ligand to a protein.In such a case, the role of ligand hydration and dehydration is crucial at a given milestone (when the ligand enters or exits the cavity), while it is less relevant in the remaining path (corresponding to the solvated ligand being free in solution or finding its way to the cavity).This was clearly shown by quantifying the relevance of various descriptors, including ligand hydration, at different stages of ligand binding. 21In that context, it was shown that a CV that combines in a non-linear way a number of descriptors 22 is able to capture the relevant slow degrees of freedom (including hydration) at the right time.Thus, using a fixed metric to define the distance from the milestones defining the path can result in a sub-optimal performance.
3][24][25][26][27][28][29][30][31][32][33][34][35][36] Some of the learned variables were shown to be close to paths connecting metastable states 37,38 , without this condition being explicitly imposed.Also, a datadriven variant of PATHCV based on kernel ridge regression has been proposed, in which the committor probability is approximated. 39n this paper, we use machine learning (ML) techniques to overhaul the path CV approach and overcome the limitations associated with previous implementations.The resulting variable, called Deep-locally-non-linear-embedding (DeepLNE), offers a path-like-behavior when trained on a short timeseries of a system moving from A to B in phase space.It offers an automatic procedure of building an optimal 1D description of the training data, inspired by the formalism of locally linear embedding 40 (LLE) and the PATHCV.Essential elements of the algorithm are a dimensionality reduction operated by an artificial neural network (ANN), a differentiable k-nearestneighbor selection (k-NN) and an encoding of the neighborhood into a 1D latent space via an ANN.Starting from a predefined feature set describing the system dynamics of interest, the DeepLNE algorithm creates a representation of each datapoint based exclusively on its neighbors in the training dataset.This metric is then transformed into a path-like CV locally anchored by the data provided during the training phase.
We test the DeepLNE CV on simulations of a particle in the Müller-Brown potential, alanine dipeptide, and an RNA 8-mer using different variants of the On-the-fly Probability Enhanced Sampling [41][42][43][44] (OPES) method.We show that the application of DeepLNE is successful not only in the simplest examples but also in the cases of high-dimensional inputs or the combination of a set of heterogeneous features into an optimal path-like CV.

I. METHODS
Prior to introducing the DeepLNE method, we provide a brief overview of the PATHCVs and LLE frameworks, highlighting the areas of inspiration for our work.

A. The path collective variables
In the original PATHCVs approach 11 , one encodes a path connecting two end state A and B into a progress along the path CV s and a distance from the path z.Given a Ddimensional feature space X, one can express it parametrically X(t) along the path so that X(0) = X A and X(1) = X B .Formally, the PATHCVs s and z are two functions of X(t) that are defined as In the practice, this PATHCV formulation cannot be employed directly to deposit bias during simulations because of the crucial requirement of a CV to be smooth and differentiable.One has to resort to a number of assumptions and approximations.First, the parametric path is replaced by a discretised version X i constructed on a series of m path milestones, such that the integrals transform into finite sums.Choosing a number of high-quality milestones that are equidistant in X is a non-trivial task that requires empirical optimization.
Another challenge is the choice of λ .A λ → ∞ leads to a step-wise path that is dominated by contributions for which (X − X i ) 2 is minimal and whose derivative is ill-behaved.A finite optimal λ strikes a balance between high values corresponding to sharp PATHCVs and low values that sacrifice resolution in favour of differentiability.
An often overlooked challenge is the choice of X itself.For a PATHCV to be effective in enhanced sampling simulations, the feature space must contain as many as possible degrees of freedom so that the relevant ones pertaining to the transition between states A and B are captured.At the same time, the number of dimensions of X should be reasonably small, as the Euclidean norm used in Eq. 1-2 quickly loses resolution for an increasing dimensionality of X.Here the conundrum lies in navigating these two conflicting requirements which represent a complex optimization problem.Attempting to solve it requires finding a difficult compromise that is often problem dependent and cannot be easily generalized.
In order to construct the PATHCVs one typically follows a multi-step procedure.The starting point is a trajectory of the system of interest in which at least one transition from A to B is observed.Next, a feature vector X(t) is determined, that should discriminate as uniquely as possible the state of the system at any given time t.Then, one selects snapshots along the transition that define the path X and picks a finite λ .For reasons of computational efficiency, the number of snapshots m must be much smaller than the number of sampled datapoints t.The milestone selection can be performed in a number of ways but is typically empirically crafted, requiring expert knowledge of the system and CV-building experience.We will see how some of these steps can be automated with the help of machine learning.

B. The Locally Linear Embedding method
Locally Linear Embedding (LLE) 40 is a nonlinear dimensionality reduction technique that can be used to transform high-dimensional data into a lower-dimensional space with the constraint of preserving the local structure (i.e. the neighborhood) of the data.This is achieved in two steps.
Assuming that each data point X i possesses k neighbors that lie on a locally linear patch of its manifold, the LLE method reconstructs X i from its neighbors via a linear regression.At first, one minimizes the cost function with the constraints that ∑ j W i j = 1 and that each data point X i is reconstructed exclusively from its neighbors.By doing so, one determines the optimal weight matrix W , where W i j represents the contribution of the j th data point to the i th reconstruction.For our purposes, this corresponds to assuming that each point in a trajectory used to generate a PATHCV can be represented locally by k other points.In a second step, LLE builds a lower-dimensional neighborhood-preserving mapping in such a way that the local linear relationships are approximately preserved.This implies that the same weights W i j that reconstruct the i th data point in D dimensions should be able to reconstruct also its embedded manifold coordinates x i (of dimension d).This is accomplished by minimizing the embedding cost function: where W i j is now fixed.Constructing a PATHCV by applying the standard LLE method still involves the application of the Euclidean metric on the feature space (compare Eq. 1 and Eq. 4).The higher dimensional the space, the more the computed Euclidean distances and the low-dimensional mapping suffer from degeneracy, i.e. a number of different feature space vectors produce analogous compressed values.Therefore in both contexts, one faces the same challenge: the choice of an optimal metric.Furthermore, while for the PATHCVs one has to define the total number of m milestones, in LLE one has also to make an a priori choice of the number of neighbors k that locally represent each datapoint.

C. The Deep-locally-non-linear-embedding method
The DeepLNE CV that we develop here (see Fig. 1) aims at representing a high dimensional dataset with a single dimension, by incorporating concepts from PATHCV and LLE into a neural network architecture.Its starting point is a trajectory of a system going from state A to state B and a set of physical features X evaluated along it.Its objective is to build a directional CV s that can describe the progress along the transition through a non-linear combination of such a feature vector.The architecture of DeepLNE can be seen as a generalized autoencoder, with the first part being composed by an artificial neural network (ANN) to perform a preliminary dimensionality reduction, followed by a continuous knearest neighbor (k-NN) step on each datapoint, which is then passed to an encoder that compresses the neighborhood into a one-dimensional representation.This 1D CV s is then used to reconstruct the original input via a decoder, that is also used for computing an accessory perpendicular distance z CV.
This strategy is built upon the satisfaction of a number of constraints.The first one requires that the initial m datapoints X can be optimally reconstructed from their one-dimensional representation s.We set this restriction in place with an asymmetric autoencoder architecture where the reconstruction loss between the original data X and the decoded data X is minimized.
Second, in analogy with the PATHCVs and LLE, each datapoint i must be exclusively represented by its neighbors.In order to satisfy this constraint, we implement a continuous and differentiable relaxation of the k-NN selection rule (see Sec. SI 1 and Ref. 45).Notably, the compressed latent space representation s i is not directly obtained from a transformation of the input features of a datapoint X i , but from those of its k neighbors.The resulting CV is thus more robust to extrapolation as it is anchored to the local description of the training data points at all times.
As a third constraint, the neighbors of each datapoint are found in a different manifold of arbitrary dimension, in our case in a lower dimension d compared to the original feature space dimension D. This is achieved by transforming the initial inputs using an ANN.The PATHCV and LLE methods discriminate datapoints by applying the Euclidean distance directly on the original feature space, therefore their compressed representation may suffer from a degeneracy issue for large D. Instead DeepLNE computes Euclidean distances exclusively in the reduced dimensional space d during the k-NN step, thus alleviating the degeneracy issue.
The fourth and last constraint of DeepLNE is to represent each datapoint via a non-linear combination of the neighbors' features.This aspect is also a novelty, as PATHCV and LLE exclusively rely on linear transformations of the original datapoints.We expect this increased flexibility to play a major role in cases where the sampled data are sparse, such as in high dimensions or in the vicinity of sampling bottlenecks.
In analogy with the seminal PATHCV framework, the DeepLNE CV can provide a measure of both the progress along the path (s) and the distance from it (z), which can be used for driving the system along the process and controlling it.While s is computed as the output of the encoder part of the DeepLNE architecture described above, the z variable is defined from the decoded s variable.For a new datapoint Y , the perpendicular distance to the path z is computed as which is analogous to Eq. 2, where X(t) is replaced by the decoded training data X that are schematically depicted in Fig. 1.By comparing the new data not directly with the training datapoints, but with their decoded versions (i.e., passed through neighbor selection and then reconstructed), we can more effectively measure the distance from the average learned path.Note that, in this definition, the z variable might still suffer from the high dimensionality problem of X.In this regard, we checked whether the variable z can be calculated with respect to the reduced dimension x instead of X.However, we found in the numerical examples that the z computed in this alternative way did not improve upon our recommended definition Eq. 5, and in some cases, it even decreased the sampling efficiency along the transition of interest.In summary, the DeepLNE algorithm makes it possible to construct a variable that approximates a transition path between states, while overcoming multiple issues of PATHCVs: Generalized Neighborhood Encoder FIG.1: Schematic of the DeepLNE CV architecture.We construct a 'Generalized Neighborhood Encoder' by using an ANN, performing a first dimensionality reduction, a differentiable k-nearest-neighbor selection ('k-NN') followed by another ANN in order to compress the identified nieghborhood representation into a single dimension, the s variable.The original choice of input features X i for a single datapoint (out of m total training data) with dimensionality D is reduced to a d-dimensional vector x.In this non-linearly transformed manifold, for each datapoint, k-nearest neighbor datapoints are selected with a weight matrix based on the Euclidean distance.We combine the features x j of k-nearest-neighbors to construct the vector x k−NN i .This vector represents the input to a subsequent ANN that is used to compress the neighborhood into a 1D vector, the DeepLNE variable s, denoting the progression along the path.The decoder tries to optimally reconstruct the input X from s, resulting in vector X.The perpendicular distance from the path z is then computed as a function of the Euclidean distance from X.The flowchart below the network architecture reports the vector dimensionality of the single training datapoint at each step along the DeepLNE CV construction.
and reduces the degeneracy when computing the Euclidean norm to identify the local neighborhood of a datapoint.
• Local descriptions of the high dimensional dataset do not have to be empirically chosen, but are found in an automatic way by constructing the neighborhood of each data point using a differentiable k-NN step.
• Heterogeneous input features are permitted, e.g.we can combine distances, angles and contact maps into X.

D. Differentiable nearest neighbors selection
An important ingredient for the deployment of the DeepLNE CV in the enhanced sampling context is the adoption of a continuous and differentiable approximation of the k-NN selection rule.While the technical details of the implementation are given in Section SI 1, here we focus on the choice and the interpretation of two hyperparameters: the number of neighbors k and the sparsity of the selection matrix t.Concerning the number of neighbors k, we have the following limiting cases: • k → 1: leads to sharp local description of training data and high variability of the DeepLNE CV when extrapolating out-of-distribution.This is due to the fact that each datapoint is exclusively described by a single neighbor.
• k → ∞: in this limit, we have a loss of locality and therefore loss of path-like behavior, especially in the regions that are sparsely described or under-sampled, such as the transition state region which is crucial for capturing the dynamics of the process.Furthermore, the inter-and extrapolation of variables s and z become more degenerate, as a given datapoint is described by many training samples.
The second hyperparameter t determines the smoothness of the k-NN step by adjusting whether only the first k neighbors or a larger set of neighbors contribute (see Fig. SI 1).Its role is comparable to the one of λ in the PATHCV framework.We single out the following limiting cases: • t → 0 imposes exact nearest neighbor selection in which a datapoint is exclusively described by the features of the selected neighbors.This behavior is problematic in biased trajectories where, during the phase space exploration in s, the identity of the neighbors would sharply change causing discontinuities in its derivative.The effect is analogous to the case where λ → ∞ for the PATHCV.
• t → ∞ implies that a datapoint is described by a function of all the training datapoints, strongly lowering the resolving power of s, in analogy with λ → 0 for the PATHCV.As a consequence, each new point gets transformed into a point in the center of the training set.
Based on these limiting cases, we suggest that k should be rather small (k < 10), to retain the path-like behavior of the CV, while t should be small but finite (t ≈ 0.1) to ensure numerical stability during enhanced sampling simulations.

E. Training the DeepLNE CV
For all the systems studied in this paper, we followed the same protocol: 1) sampling of the transition of interest, 2) training the DeepLNE CV on a set of input features, 3) performing enhanced sampling MD simulation using the DeepLNE CV for bias deposition.In the following, we outline all the key steps of our recommended strategy for training the DeepLNE CV.
Training data generation.Starting trajectories must describe fully and as dense as possible the transition from state A to B. They can be unbiased in the simplest of cases where the energetic barrier between states is small or they can be biased data.Among the biasing procedures to generate input trajectories, we would single out the ratchet-and-pawl restraint 46,47 , steered MD 48 and OPES in exploration mode 43 .Since the computational cost of DeepLNE increases with an increasing number of datapoints, one can rely on Farthest Point-Sampling (FPS) 49 to select a maximally diverse subset of the original data, such that only m frames are retained.
Neural network architecture.
In the examples reported in this manuscript, we use a 2 feed-forward ANN both containing at most 2 hidden layers and the hyperbolic tangent activation function.The ANN architecture can be adjusted as desired by the user.We recommend choosing d such that dimensional reduction per hidden layer and the total number of fitting parameters strike a balance.While the expected application scenario for complex biological systems of many degrees of freedom is d << D, cases in which d > D can also be envisioned.Furthermore, in order for the s variable to describe the progress along the path, we scale it using a sigmoid activation in the final encoder layer, restricting its values to the range of 0 to 1.
Neural network optimization.The network parameters are optimized using the mean square error between the original input and the reconstructed ones: via gradient descent using the ADAM optimizer with a learning rate of 10 −3 and 5000 epochs, using the machine learning library PyTorch 50 , with the help of the mlcolvar package 37 .
Inspecting the results.Furthermore, it can be instructive to plot the DeepLNE CV along a few important physical descriptors or against the first principal components of the input features X.A more detailed analysis can be obtained via a sensitivity analysis to identify the features that contribute the most 22 or by using linear models to interpret the neural network-based CV. 51 Exporting the model into PLUMED.After the training is finalized, the model is compiled using TorchScript.The DeepLNE CV is evaluated on the fly in MD simulations by loading the model into the PLUMED plugin 52 as a PY-TORCH CV. 37 An implementation of the method and usage tutorials are available at https://github.com/ThorbenF/DeepLNE.

F. Simulation details
To test the introduced algorithm, we report simulation results for a particle in the Müller-Brown potential, alanine dipeptide and an RNA tetraloop (Fig. 2).
Müller-Brown.The simulations of a particle subjected to the Müller-Brown potential are performed using the simple Langevin dynamics code contained in the VES module of PLUMED 53 , and the biased simulations are performed using the OPES 41 method with a pace of 200 steps, the automatic bandwidth selection, and a barrier parameter equal to 10 k B T .The particle is moving in two dimensions under the action of the three-state potential depicted in Fig. 2 built out of a sum of Gaussians.During step 1) we choose to employ the ratchet and pawl ABMD method 46 biasing the y-coordinate to sample the transition between 2 states A and B. Then in step 2) the DeepLNE CVs (s and z) are trained using the x and y position of the particle as input features.For step 3) a new simulation is run where s is used as a CV to deposit bias via OPES.Also a harmonic constrained with a cutoff value of 0.02 (UWALL) is applied on the z variable.We estimate statistical errors via block analysis using 3 blocks.
Alanine dipeptide.For the extensively studied FES of alanine dipeptide, 3 transition paths are investigated considering the dihedral angles φ and ψ.During step 1) we apply ABMD biasing both dihedrals so that 5 independent transitions across the FES barrier of interest are observed.To test the dimensionality reduction capability of DeepLNE, we also employ all inter-atomic distances in the molecule as initial features, which corresponds to D = 190.To save computational costs we use the FPS tool of Ref. 49 to reduce the number of training datapoints (m = 3000).After training the DeepLNE CV, we use the s variable as a CV to bias in combination with OneOPES 44 and 8 replicas.We choose a pace of 500 steps, the automatic bandwidth selection, and a barrier parameter equal to 60 k B T and 2.5 k B T respectively for OPES-METAD and OPES-EXPLORE.Also a harmonic constrained with cutoff value and κ = 20000 (UWALL) is applied on the z variable.Simulations are carried out with the DES-Amber ff. 54We analyze replica 0 and estimate statistical errors using blocks of 2 ns.
RNA tetraloop.We select an RNA 8-mer as a model for a system with biological relevance and configurational complexity.The system consists of 8 nucleotides with the sequence 'CCGAGAGG' containing the GAGA Tetraloop motif (Fig. 2) and has been previously studied using extensive enhanced sampling MD 55 .We use the FES obtained by using 24 replicas with a simulation length of 1 µs per replica for comparison.As proposed by the authors in this previous study we will include recent corrections to the van der Waals parameters and a more accurate water model.Consequently for our simulation we used the standard OL3 RNA ff [56][57][58][59] with the vander-Waals modification of phosphate oxygens developed in Ref. 60 without adjustment of dihedral parameters.As a water model we chose OPC. 61This combination has been originally proposed in Ref. 62 and already tested on RNA systems. 63,64he simulations were run at salt concentrations corresponding to a system with neutralised total charge using the Joung-Cheatham ion parameters 65 optimized for TIP4PEwald.To sample the transition from natively folded state A to unfolded state B we use the ABMD method biasing 3 different CVs: 1) the eRMSD metric 66 with respect to state A, 2) a contact map constructed using the heavy atom distances between the 4 nucleotides in the stem 1,2 and 7,8 with cutoff 5 Å (CMAP) inspired by Ref. 67 , 3) a contact map constructed using the heavy atom distances between the 4 nucleotides in the loop 3,4 and 5,6 with cutoff 3.5 Å (CMAP).We start ABMD runs from the unfolded state at an eRMSD > 2.1 and stop the simulation upon reaching the natively folded state at an eRMSD < 0.2.We reduce the number of training datapoints (m = 2250) via FPS and choose the same 3 CVs used for the ABMD simulations as input features in the DeepLNE training.In the enhanced sampling simulation using DeepLNE we use the s variable to deposit bias with OneOPES.For OPES-EXPLORE we choose a PACE of 20000 steps, automatic bandwidth selection, and a barrier parameter equal to 50 k B T .The temperatures for the 8 replicas are selected from a distribution ranging from 300 K to 400 K and each replica is simulated for 500 ns.As done in the comparison study of the GAGA-Tetraloop 55 , replica 0 is analyzed excluding the first 200 ns.Statistical errors were estimated using blocks of 100 ns.
Simulations of Alanine dipeptide and GAGA-tetraloop are performed using the GROMACS 2022.5 engine 68 patched with the PLUMED 2.9 plugin 52 with the Hamiltonian replica exchange algorithm. 69fter simulations of the toy models Müller-Brown and the alanine dipeptide were completed, we performed a committor analysis to investigate whether the identified DeepLNE variable s approximates well the real reaction coordinate.For the Müller-Brown potential the transition region is split into 5 bins along the s CV in the region of the maximum in the FES and for each bin 300 particle positions are extracted and independent simulations are performed.In the case of alanine dipeptide, we split the FES of replica 0 in the vicinity of the transition state region into 10 bins along the s variable and extract 100 structures with the lowest values of the z variable.We then start new independent simulations from these structures, bookkeeping which state they visit first (A or B).

A. Particle in Müller-Brown-Potential
We demonstrate how to train and apply the DeepLNE CV to enhanced sampling MD simulations.In Fig. 3 we collected the results of the 3 steps along the proposed DeepLNE algorithm.We start by performing 2 simulations biasing the ycoordinate via ABMD using different spring constants κ = 0.5 and κ = 1.25.As can be seen from Fig. 3 (a), both runs start from the same state A and are stopped as soon as they reach state B. We show the portion of trajectory that describes the transition from state A to state B. We use this reduced set of datapoints to compute the input features for the DeepLNE CV training step.In Fig. 3 (b) we see the results of biasing the particle simulation using the s variable of the DeepLNE model and constraining perpendicular movement via the z variable.One can appreciate that the desired exploration moving from state A to state B is achieved and that s describes the intended progression along the path.
In We can appreciate that the entire spectrum of the s variable is explored.Since we exclusively sample the transition and not the entire free energy landscape (Fig. 3 (b)) we can confirm that the applied harmonic constraint on z is effective, limiting the exploration perpendicular to the path-like variable s.In Fig. 3 (c) we show the results of a free energy estimation as well as the committor analysis based on the estimated FES.We compute the committor probability for state A in the 5 different bins along s in the region of the highest free energy.Along increasing values of the s variable, one can appreciate that the committor value changes from 0 to 1 and takes a value of 0.52 in proximity of the maximum barrier value in the FES, indicating that the DeepLNE CV approximates well the ideal reaction coordinate.Having constructed 3 path-like variables, we estimate the FES performing OneOPES simulations along them and also perform a committor analysis for each case (Fig. 4 (d), (e), (f)).Before going into more detail, we note that in Fig. SI  From the reference FES it can be seen that there exist a single transition state region, which is quantified as shown in Fig. 4 d) by performing a OneOPES simulation.The transition region is located between s values 0.14 to 0.25 and accordingly configurations from 10 bins in this range are extracted to perform a committor analysis.We find that the s values in range 0.17 to 0.2 of the DeepLNE CV correspond to the transition state with committor values close to 0.5.

B. Alanine dipeptide
In the corresponding OneOPES simulation in Fig. 4 e).It turns out that in this case s values in the range between 0.58 and 0.6 lead to committor probabilities close to 0.5, allowing us also here to identify the configurations corresponding to the transition state.Finally, in Fig. 4 c) we train the DeepLNE model on a third possible transition path in the alanine dipeptide FES.We can see for this case that the system starts from configurations of the state C 7eq with φ = −1.5, ψ = 0.5.The transition along a transition region close to φ = −2, ψ = −1 is considered completed when φ = 3, ψ = −3 is approached.Performing also in this case OneOPES simulations using the DeepLNE s variable as the biased CV leads to the FES shown in Fig. 4 f).The maximum of the free energy occur for s values close to 0.65 and through the committor analysis in the range 0.63 to 0.68 we find committor probabilities close to 0.5 for s values 0.65-0.66.These results show that despite the challenge of being trained on high-dimensional input features the DeepLNE algorithm can successfully guide a simulation from A to B in free energy space and help identify the configurations belonging to the transition state.
Additionally, the relevance of the input features X can be estimated by accumulating the gradients of X with respect to the DeepLNE s variable.This is especially insightful in a scenario where the input feature dimension is very high.We report this analysis in Fig. SI 4 showing that for all 3 path-like DeepLNE CVs there exist differences in relevance between input features (Fig. SI 4 (a), (b), (c)).Importantly the most relevant inputs identified are sensitive to the relevant dihedral angle changes required to perform the respective transitions (Fig. SI 4 (d), (e), (f)).

C. RNA GAGA-tetraloop
In Fig. 5 a) we compare the FES estimation of the RNA GAGA-Tetraloop after 500 ns of simulation using OneOPES in combination with eRMSD or DeepLNE variable s as the CV for bias deposition.As in the study by Ref. 55, the eRMSD is computed with respect to the natively folded conformation.The DeepLNE CV is trained on the eRMSD and 2 contact maps providing additional information about the contacts formed between the 4 nucleotides in the stem and the loop respectively.
Our OneOPES simulation biasing the eRMSD for 500 ns results in a FES estimate that is in disagreements with previous studies where extensive enhanced sampling simulations were carried out.The FES in regions of conformations corresponding to high eRMSD values exhibit high statistical errors In Fig. 5 b) we report the estimated free energy of folding over the last 300 ns of simulation time.In agreement with what has been discussed above the OneOPES simulation biasing the eRMSD is starting with a ∆F very different from previous studies and shows a drift over the course of the simulation.Instead, the OneOPES run using the DeepLNE CV s for bias deposition already starts with an estimate of the folding free energy close to 16.1 ± 0.5 kJ/mol found by Ref. 55.The ∆F estimate plateaus around 11.5 kJ/mol.Since our simulations are performed using a force-field including van-der-Waals modification of phosphate oxygens and more accurate OPC water model, which were both not used in the study by Ref. 55, our ∆F estimation is expected to be similar, but not identical to the previous findings.We note specifically for the GAGA-tetraloop system, that even though experimentally the folded state is expected to be energetically favored in simulations the opposite is observed in simulations because of force field inaccuracies.Interestingly our simulation performed with the corrections envisioned by Ref. 55   indeed lowers the free energy of folding.Nevertheless, the misfolded state still represents the minimum in the FES.

III. DISCUSSION
In this work, we propose a strategy to construct a pathlike CV using a data-driven approach.The DeepLNE method builds a 1D representation of reactive trajectories that can be used to capture and especially accelerate rare events, with a focus on biophysical systems.Through multiple examples, we have shown how this CV approximates well the ideal reaction coordinate and can be effectively used to improve sampling using enhanced sampling methods such as OPES or its recently developed OneOPES variant.
Our methodology begins with obtaining a first reactive trajectory that captures the transition from state A to state B. Herein, these starting trajectories have been obtained with a ratchet-and-pawl restraint 46 , which provided training data of sufficient quality for all systems investigated.For more complex systems, an input reactive trajectory that samples the transition state regions more extensively might result in a DeepLNE CV that captures the region of the free energy barrier more precisely.To this end, an iterative biasing procedure such as the one introduced in Ref. 70 might be helpful.
The input trajectory is used to define a feature vector to distinguish not only state A from state B but also the intermediate states encountered.With this featurized trajectory, one can automatically derive path-like variables applying the DeepLNE CV method introduced in this study.Input features are initially subjected to dimensionality reduction.In this lower dimensional manifold we compute Euclidean distances to search for a small number of k-nearest neighbors that describe the local neighborhood of each datapoint.Compressing this neighborhood representation even further finally yields the DeepLNE variable s, which, upon decoding, can be used to generate in turn the DeepLNE CV z.The interpretation of s and z is analogous to the PATHCVs, as they describe the progress along the path and its perpendicular distance, respectively.
The proposed method alleviates the issue of CV degeneracy that occurs when calculating Euclidean distances in a high dimensional space.At the same time, it automates the process of constructing the CVs, since the s and z CVs are generated without making empirical choices such as selecting a set of milestones or a metric that have a strong impact on the quality of the CV and on the time required to fine-tune it.Unlike standard neural network-based CVs, the DeepLNE CV exclusively represents new data points through their neighbors in the training set.This feature is fundamental in preventing erroneous extrapolations when unknown data regions are visited.
Our method is not limited to two state systems and can be applied as is to more complex systems characterized by multiple metastable states.A DeepLNE CV can be successfully trained as long as the training trajectories visit all the states of interest and their transition regions.In the systems studied here, we did not impose a directionality on the CV in order to show that path-like CVs can be successfully derived even in a completely unsupervised manner.If needed, this can be ensured by adding a supervised term to the optimization (such as cross entropy, or a discriminative term), which can be done straightforwardly in a multi-task framework. 37ll in all, we believe that DeepLNE provides a powerful and flexible method to automatically create efficient path-like CVs to accelerate sampling of complex phenomena.

Fig. SI 2
(a), we show the decoded training datapoints X of DeepLNE.Since all datapoints used for training the DeepLNE CV describe a transition across the same barrier, X approximates well the path along the minimum in the Müller-Brown FES.Consequently applying a harmonic constraint on z during enhanced sampling MD allows for frequent transitions while sampling the movement between the state A and B as shown in Fig. SI 2 (b).

Fig. 4 FIG. 2 :FIG. 3 :
Fig.4reports the results of constructing the DeepLNE CVs for 3 different transitions of alanine dipeptide and estimating the FES via OneOPES simulations using the s variable to deposit the bias potential.We note that the DeepLNE CVs are 3 we collected the expectation value of the z variable as well as the time evolution of the s variable.The analysis of < z|(φ , ψ) > (see Fig. SI 3 (a), (b), (c)) confirms that path-like CVs are learned, that exhibit low z values along the transition of interest.Importantly, frequent transitions occur and the entire spectrum of the s variables is explored during the sampling of the 3 different transition paths (Fig. SI 3 (d), (e), (f)).Fig. 4 a) superimposes 5 ABMD simulations starting from the same configuration in the state C 7eq and then progressing to state C 7ax over a barrier at φ = 2, ψ = −2.5.

Fig. 4 bFIG. 4 :
FIG.4: Sampling of different pathways in alanine dipeptide using DeepLNE.In (a), (b) and (c), the scatter points represent ABMD trajectories crossing three distinct barriers.The DeepLNE variable is trained on those datapoint using 190 interatomic distances as input features instead of dihedral angles.Next, the DeepLNE CVs are employed to run OneOPES simulations and the new data is used to calculate the progress along the path, computed through the expectation value < s|(φ , ψ) > shown in the colored regions.The black lines below denote the reference FES, obtained by biasing φ and ψ through OneOPES.The basin at φ = −1.5, ψ = 2 is referred to as C 7eq , while the basin at φ = 1, ψ = −1 is referred to as C 7ax .The new OneOPES runs yield FES estimates of which subsections around the transition states are shown in panels (d), (e) and (f).The FES maxima coincide with committor probabilities close to 0.5 computed by running short simulations from the configuration in the same s bins highlighted as vertical lines.

FIG. 5 :
FIG. 5: Free energy surface and free energy of folding (∆F) for the GAGA-Tetraloop based on 500 ns of OneOPES simulations using different biasing CVs (eRMSD, DeepLNE variable s).(a) Comparison of the FES along the eRMSD CV with respect to the natively folded conformation.(b) Comparison of the ∆F over simulation time comparing the OneOPES simulations using 8 replicas to a previous study of Ref. 55, in which also the eRMSD was used as the biasing CV, but instead 24replicas were simulated for 1 µs each.Also a different force-field and TIP3P water was used instead.

SI 2 .
FIG SI. 1. Depiction of the x k-NN i of a single datapoint x i within its neighborhood for k = 3, sampled from a bivariate normal distribution with zero mean and a covariance of 0.4 between dimensions.Various values of the hyperparameter t are considered, and we visualize the weighted connections between datapoints using the selection matrix w (computed as described in Sec.SI 1).(a) For t = 0.001, exactly three neighbors contribute to x k-NN i .(b) With t = 0.1, those three neighbors of x i again exhibit high w values, however additional datapoints also contribute significantly.(c) Increasing the hyperparameter to t = 1 results in many datapoints contributing significantly to the neighborhood representation.

FIG SI. 3 .FIG SI. 4 .
FIG SI. 3. Since X is of high dimensionality and can not be conveniently visualized we superimpose the expectation value < z|(φ , ψ) > of the DeepLNE CV based on the new alanine dipeptide simulations using OneOPES simulations with the reference FES of φ and ψ.In this way we try to visualize the perpendicular distance to the path-like CV automatically learned from the DeepLNE algorithm.(a) < z|(φ , ψ) > based on datapoints of the simulation along the path between C 7eq and C 7ax over a barrier at φ = 2, ψ = −2.5.(b) < z|(φ , ψ) > estimated for the transition between the states C 7eq and state C 7ax , across a barrier located close to φ = 0, ψ = 0. (c) <z|(φ , ψ) > based on configurations of a path across the barrier close to φ = −2, ψ = −1 beginning and ending in the state C 7eq .The resulting time-evolution of the biased DeepLNE CV s when depositing a bias potential on it via OneOPES along the 3 different transition paths can be found in the row below (d), (e), (f), respectively.