The heat shock protein 90 (Hsp90) is a molecular chaperone that controls the folding and activation of client proteins using the free energy of ATP hydrolysis. The Hsp90 active site is in its N-terminal domain (NTD). Our goal is to characterize the dynamics of NTD using an autoencoder-learned collective variable (CV) in conjunction with adaptive biasing force Langevin dynamics. Using dihedral analysis, we cluster all available experimental Hsp90 NTD structures into distinct native states. We then perform unbiased molecular dynamics (MD) simulations to construct a dataset that represents each state and use this dataset to train an autoencoder. Two autoencoder architectures are considered, with one and two hidden layers, respectively, and bottlenecks of dimension *k* ranging from 1 to 10. We demonstrate that the addition of an extra hidden layer does not significantly improve the performance, while it leads to complicated CVs that increase the computational cost of biased MD calculations. In addition, a two-dimensional (2D) bottleneck can provide enough information of the different states, while the optimal bottleneck dimension is five. For the 2D bottleneck, the 2D CV is directly used in biased MD simulations. For the five-dimensional (5D) bottleneck, we perform an analysis of the latent CV space and identify the pair of CV coordinates that best separates the states of Hsp90. Interestingly, selecting a 2D CV out of the 5D CV space leads to better results than directly learning a 2D CV and allows observation of transitions between native states when running free energy biased dynamics.

## I. FREE ENERGY BIASING, COLLECTIVE VARIABLES, AND AUTOENCODERS

Efficiently sampling the full configuration space of complex systems in molecular dynamics (MD) remains a challenge due to the long timescales involved in the transition events from one metastable state to another. Large proteins are historic and still particularly relevant examples of a metastable system. Direct numerical simulation of proteins is limited to physical times of a few nanoseconds, exceptionally (fraction of) milliseconds.^{1,2} Biologically relevant conformational changes are rare events on this timescale. The dynamics needs to be biased in order to favor otherwise unlikely transitions. Metastability is caused by the usually very irregular shape of the energy function that contains many local minima (metastable states) separated by high energy barriers (transition states).

We recall in this section one convenient way to bias the dynamics and increase the frequency of transition events from one metastable state to another, namely the use of a biasing force corresponding to (an approximation of) the derivative of the free energy. We start by recalling in Sec. I A the principle of free energy biasing and then discuss in Sec. I B the key element in this method, namely the choice of the collective variable (CV). This function was chosen mostly based on domain knowledge and physical intuition until a few years ago, but, as we discuss in Sec. III, recent advances in machine learning (ML) techniques, and in particular deep learning methods, have revolutionized the way CVs are constructed.

### A. Free energy biasing

Adaptive biasing methods are important sampling techniques where the free energy *F* is simultaneously estimated and used to bias the potential. More precisely, the biasing relies on the use of a collective variable *ξ*, which maps the system from a high dimensional molecular space to a much smaller dimensional space $Rd$, which effectively describes the metastability of the dynamics. The potential energy function of the system is then replaced by the modified potential *V* − *F*◦*ξ*, where *F* is the free energy associated with *ξ*, in order to eliminate metastability along *ξ* and, thus, help accelerate sampling of transitions between metastable states. Here and in the following, ◦ is the function composition operator i.e., *F*◦*ξ*(*x*) = *F*(*ξ*(*x*)). We refer to Refs. 3–7 for instance for reviews and pedagogical introductions to free energy calculation methods.

However, the free energy *F* is of course unknown in general. Adaptive free energy biasing methods replace the free energy *F* by an estimated function *F*_{t} in the biased dynamics at time *t*. The potential becomes *V* − *F*_{t}◦*ξ*, where the estimate *F*_{t} is updated on the fly and converges to *F* as the sampling proceeds. There are two categories of adaptive biasing techniques:^{8} (i) adaptive biasing potentials, e.g., metadynamics,^{9} where the free energy *F*_{t} is estimated and its gradient, the so-called mean force, is then derived and used in the dynamics and (ii) adaptive biasing force (ABF) methods,^{10,11} where the free energy derivative, i.e., the mean force, is estimated directly as a vector Γ_{t}, and the free energy is subsequently obtained by numerical integration of the mean force.^{12}

By design, ABF requires the knowledge of second order derivatives of the CV *ξ* to compute the local mean force *f*. The analytical expression of this quantity is cumbersome for most choices of reaction coordinates, especially when *ξ* is vector valued. To overcome this limitation, a method known as extended system ABF (eABF)^{13} was devised. A fictitious degree of freedom *λ* is added to the configurational space, as in early versions of metadynamics. The new extended mean force does not depend on the second (or any) derivatives of the CV *ξ*. Only the gradient of *ξ* is needed for computing the gradient of the associated extended potential *V*^{ext}. ABF can therefore easily be applied to the new extended system.

### B. Choosing collective variables

The collective variable *ξ* greatly impacts the physical relevance of the computed free energy differences. As mentioned in the introductory lines of this section, the dynamics are often metastable in biomolecular systems. As a consequence, a CV *ξ* should be chosen to explore these metastable states that often describe key biologically relevant biomolecular conformations and dynamics. Classical examples of reaction coordinates are combinations of well-defined simple functions of the positions *q*, such as distances, dihedrals, or contacts.

Moreover, the choice of CV also impacts the computational efficiency of the free energy adaptive biasing procedure. When the CV *ξ* is able to describe the slow motions of interest, the process *ξ*(*q*_{t}) is also metastable, i.e., its value may stay trapped inside some region of the space $Rd$ before crossing to another region, indicating a transition of the system from one metastable state to another. As discussed in Sec. I A, the free energy associated with *ξ* can be used to bias the potential of the system so as to make the process [*ξ*(*q*_{t})] no longer metastable. In fact, the marginal distribution along *ξ* under the potential *V* − *F*◦*ξ* is uniform. This emphasizes even more the importance of the choice of CV: The biased potential *V* − *F*◦*ξ* is only as effective at sampling metastable motions of interest as the CV *ξ* is at describing them.

With a poor choice of CV, the free energy cannot provide efficient biasing of the dynamics and cannot be used for analyzing important motions of the system. It is thus important to use a CV that encompasses the metastability of the system, a notion that can be mathematically quantified.^{14} In general, the choice of CV can be made somewhat intuitively for small and/or extensively studied systems. However, the more complex and/or larger the system is the less trivial it is to manually select a CV. The idea of automatically selecting or constructing the collective variable thus becomes attractive. For this purpose, many methods have been devised to construct CVs using sampled configurations of a given system. In particular, as the recent years have seen a surge in popularity for machine learning (ML) techniques in various fields, the discovery of collective variables using machine learning has gained growing interest. There are now various reviews on this lively topic, see, in particular, Refs. 15–19.

There are two main classes of ML methods to find CVs: (i) those seeking high variance CVs, which aim at reproducing overall features of the Boltzmann–Gibbs distribution at hand, and (ii) those seeking slowly evolving CVs (such as tICA^{20,21} for instance). Both classes can be separated into linear and nonlinear methods. Our focus in this work is on high variance CVs. ML-guided methods in this context can be separated into (i) linear algorithms, e.g., principal component analysis (PCA) or factor analysis, and (ii) nonlinear algorithms, e.g., kernel methods, autoencoders (AE), decision trees, and random forests (see, for instance, Refs. 22 and 23 for introductory references on these classes of methods). In the first case, CVs are interpretable but often lack the necessary complexity to describe complicated biological phenomena. In the present study, we employ one method of the second type, viz., an autoencoder, following up on our previous work.^{24}

### C. CV identification using autoencoders

Artificial neural networks (NNs) mimic by design the function of the human brain, in that artificial neurons are made to send signals to one another. More precisely, a NN is composed of several layers, each of which contains a number of neurons. The input layer contains as many neurons as the dimension of the input data and the output layer neurons are meant to contain the information we wish to learn using the NN. The intermediate layers are used to increase the complexity and expressivity of the NN and aid in the learning process. A neuron in each layer is assigned a weight vector that connects it to the neurons of the next layer. The value in each neuron is then computed as a weighted combination of the values of the previous layer neurons, passed through a nonlinear differentiable transformation called activation function. The neural network is optimized by modifying the weights assigned to all neurons, so as to optimize a target distance between the NN’s predicted output (i.e., output layer) and the actual output.

Autoencoders (AEs)^{25} are a type of neural network designed for unsupervised learning tasks. The aim is usually to learn a new representation of the data, called an encoding. The AE is composed of two parts: The encoder learns the new representation and the decoder simultaneously learns to reconstruct the original data from this representation. The AE thus seeks to approximate the identity function. When the encoder is composed of one fully connected layer, which reduces the dimension, together with a linear activation function, its learned representation is essentially the same as that of a PCA projection of the same dimensionality;^{26,27} more precisely, the two models project on the same bottleneck space, but not using the same vectors. In general, however, AEs are used with nonlinear activation functions. This allows for nonlinear encoding functions and, thus, potentially better encoders than those restricted to stay within the smaller class of linear functions.

AEs can have different topologies depending on the learning task, the data size, dimensionality, etc. Below, we describe the general autoencoder topology used in this work. We denote by $X\u2286RD$ the data space and by $A\u2286Rd$ a lower dimensional space (*d* < *D*). The autoencoder can be represented by a mapping *f* = *f*_{dec}◦*f*_{enc}, where $fenc:X\u2192A$, $fdec:A\u2192X$, and ◦ is again the function composition operator: $fdec\u25e6fenc(x)=fdecfenc(x)$. The AEs we consider are symmetric in structure, fully connected, and contain 2*L* layers. Each hidden layer is of dimension *d*_{ℓ} = *d*_{2L−ℓ} for *ℓ* = 1, …, *L*, and the output layer is of dimension *d*_{2L} = *D* (by convention, the input layer does not count as a layer of the network). Each layer *ℓ* ∈ 1, …, 2*L* has an activation function *g*_{ℓ} and is connected to the previous layer by a projection matrix $W\u2113\u2208Rd\u2113\xd7d\u2113\u22121$, and a bias vector $b\u2113\u2208Rd\u2113$. There are thus $K=\u2211\u2113=1Ld\u2113(d\u2113\u22121+1)$ learnable real parameters denoted by $(p1,\u2026,pK)\u2208RK$. As the activation functions are predefined and do not change during learning, the autoencoder function is fully described by its parameters $(pk)k=1,\u2026,K$.

## II. Hsp90: AN IMPORTANT PHARMACOLOGICAL TARGET WITH MANY FACES

The heat shock protein 90 (Hsp90) is an ATP-dependent molecular chaperone that controls protein maturation, stability, and folding of over 100 key cellular growth-regulatory and signaling molecules.^{28} These molecules, i.e., Hsp90 clients, include 60% of the human kinome, transcription factors, and multiple mutated, chimeric, and overexpressed signaling proteins that promote cancer cell growth and survival.^{29–31} Such a wide variety of biologically critical clients in combination with its interaction with several co-chaperones makes Hsp90 a promising pharmacological target against diseases such as cancer, Alzheimer and other neurodegenerative diseases, diabetes, as well as viral and bacterial infections.^{32–34}

Hsp90 is a highly conserved enzyme that dimerizes and comprises an N-terminal ATP binding domain (NTD), a middle co-chaperone and client-binding domain (M-domain), and a C-terminal dimerization domain (CTD).^{35} Interestingly, the Hsp90-NTD possesses an unconventional ATP binding site with a structure named Bergerat fold.^{36,37} This ATP binding pocket is situated within the *α*-helices formed by residues 28–51, 85–97, and 123–130 of NTD of human Hsp90. The mechanism of action of Hsp90 is driven by ATP-influenced large-scale conformational changes during its chaperone cycle.^{38} These conformational changes are in fact transitions, or even more precisely transition paths, between an open inactive conformation when in apo form, i.e., in the absence of bound nucleotide, to a close active conformation. This exchange between open and closed conformations takes place at a timescale ranging from milliseconds to several minutes.^{39}

Despite the large and slow conformational changes between open and close states during chaperon activity, the individual Hsp90 domains remain largely stable. Most of the flexibility of Hsp90 is the result of rigid body movements centered at the linkers between the NTD and M-domain and between the MD and CTD. Interestingly, for these large-scale global movements to occur, significant local changes take place in the NTD during nucleotide binding and unbinding.^{40}

The most significant of these local changes occur in the so-called “ATP-lid” or “active site lid,” a helix-loop-helix segment adjacent to the ATP binding pocket.^{41} The lid and, in particular, its L2 loop (residues 104–114) demonstrate significant variability both in the numerous crystal structures as well as in molecular dynamics (MD) simulations of the Hsp90 NTD.^{42} Changes in the orientation or positioning, or in some cases partial or complete folding, are some of the L2 loop conformational differences related to the Hsp90 chaperon activity^{43} and nucleotide binding (e.g., unbound, ATP-bound, ADP-bound).^{40,44–46}

Despite the importance of the flexibility of this loop, its mechanistic origin has not been understood due to the large time and space scales involved.^{43} Insights into intrastate protein dynamics are key for better understanding the transitions between different Hsp90 states and their eventual targeting for drug design purposes.

### A. The ATP-lid and Hsp90 inhibition

Hsp90 is one of the most studied pharmacological targets with more than 13 800 publications during the last 40 years (source: PubMed) and more than 270 available structures in the Protein Data Bank (PDB). In 2022, a first Hsp90 inhibitor, pimitespib, was approved in Japan for patients with gastrointestinal stromal tumor. Most of the small molecules targeting Hsp90 bind to the ATP binding site in the NTD. By blocking ATP binding, these inhibitors prevent the release of the ATP-lid segment and the NTD dimerization, a critical initial step in the catalytic cycle of Hsp90 prior to its chaperoning activity.

Despite the undeniable importance of the ATP-lid that appears in different conformations in the available structures, it is not yet clear what is the driving force behind the transition between these different conformations. Interestingly, in the work of Colombo *et al.*, using MD simulations of the NTD a spontaneous transition was shown to occur between the apo, i.e., nucleotide free, and the holo, i.e., nucleotide bound, conformations.^{42} In other words, it is not the nucleotide binding/hydrolysis that determines Hsp90 conformations; instead, Hsp90 exists in conformational equilibrium between its different states and this equilibrium may or may not be shifted upon nucleotide or ligand binding.^{42,43,46–50}

### B. Data mining Hsp90 structural diversity

The Hsp90 NTD has been extensively studied and more than 270 times experimentally resolved. These structures, both with or without ligands, are available in the Protein Data Bank (PDB) and demonstrate some specific structural differences in the binding site L2 loop (residues 105–114).

We used this plethora of structures to identify potential native states of the Hsp90 NTD. First, once all available structures of the NTD Hsp90 were fetched from PDB (278 different conformations), they were aligned and necessary fixes were performed, i.e., numbering after sequence alignment, completion of partially resolved residues, number of chains per PDB, atom naming, tags, etc. The protein structures were prepared with hydrogens being added and protonation being assigned using the Schrödinger Protein Preparation Wizard.^{51} Then, the dihedral distribution of the L2 loop was used for clustering of structures. More precisely, for each conformation, the sine and cosine values of the dihedral angles Φ and Ψ for residues 105–114 were computed.

Six clusters were identified, indicating the existence of six key states (Fig. 1). These states can be differentiated by a loop/helix conformation formed by residues 105–114. States 1 and 2 exist in both apo/holo forms, while the rest of the states have been resolved in holo conformations only. A representative structure of each state was selected using as criterion the highest resolution. The exact PDB IDs are shown in Fig. 1.

For the remainder of the work, each state will be referred to by its associated number. It should also be noted that the corresponding ligands have been removed from the holo structures before any simulation. All MD simulations described and analyzed in this work are thus of the unbound NTD. It is also important to recall that the states were defined based on the loop of interest alone (residue 105–114, and are thus not necessarily directly representative of the states of the whole Hsp90 protein.

## III. CV LEARNING USING AN AUTOENCODER

Hsp90’s conformational cycle is a dynamic equilibrium between the open and closed states, passing through intermediate states, which are also accessible in the absence of nucleotide. It has been shown that the role of nucleotide is not to determine the conformation but to lower the energy barriers between the states.^{43,46,48–50} In Sec. II B, we demonstrated that the “known” Hsp90 conformational space can be clustered into six native states. As a next step, we aim at identifying potential transitions between these states through biased MD simulations using an autoencoder-learned collective variable space.

Most often, ML-constructed CVs are obtained via unsupervised learning (see however Refs. 52 and 53 for notable exceptions). This is more than justified by the fact that supervised learning models rely on the knowledge of a label set *y* associated with the dataset *X*. In the case of molecular dynamics, this label set assigns, for example, the index of a metastable state to each sample. Yet, the assumption that these states are known and distinguished makes supervised learning models non-applicable in many cases. Nevertheless, in cases where conformational states are known and the samples can indeed be assigned to states, it seems essential to include this information to the learning of a collective variable, as this would help recover a CV that distinguishes the different conformational states.

### A. Dataset generation

The dataset used for training of the autoencoder is composed of short MD simulations of each identified state (Fig. 1). For each of the six states, the protein structures were prepared in order to have the exact same number and naming of atoms (3258 atoms). Replicates of 20-ns MD trajectories were used as the training set and were generated with Gromacs v5.1.3.^{54} The Amber99-DISP all-atom force field^{55} and the TIP4P model^{56} were used to model the protein and water, respectively. All proteins were solvated in a cubic box with an 11 Å margin around the protein to ensure a sufficient minimum separation of the protein from its periodic images. Na^{+} counterions were randomly placed in the system to neutralize the total charge while a 0.15M of NaCl salt was also added to resemble the physiological conditions. One simulation per state representative structure was performed (PDB IDs reported in Fig. 1). Before the production runs, a minimization and a five-step restrained MD relaxation protocol was followed. First, all heavy atoms of the system were restrained using a force constant of 1000 kJ/mol nm^{2} for 100 ps. Then, only the protein atoms were restrained using a force constant of 1000 kJ/mol nm^{2} for 100 ps. Two more 100 ps restrained MD with the same force constant were performed with restrains only on backbone and then only on *C*_{α} atoms. Finally, a flat-bottomed restraint was applied on the *C*_{α} carbons, with a force constant of 500 kJ/mol nm^{2}, starting at a distance larger than 2 Å from the reference structure and for 400 ps. The *C*_{α} atom Root Mean Square Deviation (RMSD) during these last 400 ps of relaxation remained stable. The convergence of production simulations to a stable conformation was evaluated using the total *C*_{α} carbon RMSD (Fig. S1).

For each state, ten trajectories were independently sampled starting from the same configuration within that state and different initial velocities. We thus obtained 60 trajectories of 20 ns each. The final dataset consists of a concatenation of the short 20-ns trajectories sampling the six different conformational states, shown in Fig. 1. All resulting configurations were aligned to the same reference structure, which is the representative conformation of State 1 (PDB ID: 3T10). It should be noted that States 4 and 5 converge even after a very short simulation time to the same conformation of the L2 loop and have therefore been merged for the rest of the study into the same state, i.e., State 5. The C*α* carbon coordinates were kept as input features, making the input dimension *D*_{B} = 3 × 207 = 621 coordinates. The dataset totals to *N*_{B} = 240 000 points.

To ensure that the five states that have been identified using structural clustering are metastable even after the removal of ligands, we performed 200-ns unbiased MD starting from each of these states. RMSD and cluster centroid distance calculations show that the sampled simulations do not visit any of the other states but possibly explore apparent substates (Figs. S2 and S3).

### B. Autoencoder architecture

A symmetric AE architecture was used and only fully connected layers were considered. The number and size of hidden layers determine the complexity of the model and thus of the learned collective variables. Generally, when layers are added, more complex representations can be modeled by the AE. However, it should be taken into account that more layers also require a larger dataset for training and can lead to overfitting. Moreover, from a practical point of view, the purpose of the autoencoder collective variable is to run biased sampling. The run time of the biased sampling simulation dramatically increases with more complex CVs.

To examine whether the addition of a hidden layer improves the learned model, we considered two AE architectures, with the size of layers being chosen so as to gradually reduce the dimensionality from input to bottleneck:

*S*_{1}, one hidden layer between the input and bottleneck: Input (621) → Hidden 1 (100) → Bottleneck (*k*) → Hidden 2 (100) → Output (621).*S*_{2}, two hidden layers between the input and bottleneck: Input (621) → Hidden 1 (150) → Hidden 2 (40) → Bottleneck (*k*) → Hidden 3 (40) → Hidden 4 (150) → Output (621).

The numbers between brackets correspond to the size (i.e., number of neurons) of each layer. The parameter *k* is the dimensionality of the bottleneck layer (the final layer of the encoder, i.e., the CV dimension). We used different values of *k*, ranging from 1 to 10, to identify the optimal CV dimension. For each value of *k*, two AEs with structures *S*_{1} and *S*_{2} were trained.

The autoencoders were constructed and trained using the Keras library^{57} in Python. All autoencoders are trained on 75% of the dataset, leaving 25% for validation. The learning rate used is *η* = 10^{−4} with Adam optimization. A batch size of 1000 samples was used and the training ran for a maximum of 1000 epochs. Early stopping of the training is applied when the validation loss does not improve for 40 consecutive epochs to avoid overfitting.

To compare the two AEs, we plot the evolution of their training and validation losses throughout training (Fig. S4). It can be observed that the training of the AEs with structure *S*_{1} shows more stability, i.e., the evolution of the validation and training losses is approximately the same. Conversely, structure *S*_{2} autoencoders (apart from the *k* = 10 autoencoder) overfit after ∼100 epochs. Overfitting is, however, already handled by the early stopping procedure and is thus not an issue. For each *k*, the optimal structure *S*_{1} model, i.e., the model with the optimal validation loss, reaches approximately the same validation and training loss as the optimal *S*_{2} model. In particular, for *k* = 1, it can be observed that *S*_{2} seems to outperform *S*_{1} on the training loss, but the corresponding validation loss evolution shows that this actually corresponds to the *S*_{2} model overfitting. The only advantage of structure *S*_{2} is that the training finishes in fewer epochs. Nevertheless, since *S*_{2} represents larger models, one training epoch takes longer to complete, meaning that lower number of epochs does not necessarily translate to faster convergence in wall-clock time. More importantly, the most time-consuming part of our procedure is by several orders of magnitude the biased simulations, which run ∼20 times faster with an encoder CV from structure *S*_{1} than one from structure *S*_{2}. We thus choose structure *S*_{1} for the autoencoders used in this work.

### C. Choice of the CV dimensionality

The optimal dimensionality of the CV, i.e., AE bottleneck layer size, for a certain system is defined as the minimal dimension of variables that are enough to represent a maximal portion of the patterns and features of the conformational space. Numerically, this can be translated as a trade-off between the subspace dimensionality and the amount of data variance covered by that subspace. In the case of a PCA, for example, this means keeping the principal components (PCs) with the highest eigenvalues. More specifically, the optimal dimension is determined by some “elbow” in the scree plot (which corresponds to plotting the eigenvalues ranked in decreasing order). PCA therefore provides direct quantities to help determine the optimal dimension of a system: the eigenvalues. This is not the case for other models, such as the one used herein, i.e., autoencoders.

In order to choose the dimension of the bottleneck for our production calculations, we considered a set of ten autoencoders with bottleneck layer dimensions ranging from *k* = 1 to *k* = 10 that were trained on our dataset using the architecture *S*_{1} (see Sec. III B). The corresponding training curves (loss optimization and validation loss evolution) as well as the last obtained model and the model that achieved the best validation loss were kept.

To compare these ten models, we plot their corresponding loss evolution and compare their final losses (Fig. S5). Based on our results, two CVs would be able to provide enough information, while the optimal dimension of the AE bottleneck is 5 (Fig. S5). The loss values do not significantly decrease as the bottleneck size *k* increases, which means that increasing the CV dimensionality does not significantly improve the reconstructed output. The additional dimensions of the CV evidently learn directions that are of much lower variance compared to the initial one-dimensional bottleneck. A possible explanation is that these directions correspond to learning noise in the training data. However, because the validation curves are similar to the training curves, this hypothesis can be discarded. Additionally, plotting the one-dimensional bottleneck encoder CV over the training dataset shows that this direction alone is actually not able to differentiate between all the identified states of Hsp90. We therefore argue that despite their relatively small variance, the additional dimensions may still be of importance. Moreover, as our goal is to train the AE using a relatively small latent space, we select the dimensionality after which some fluctuations appear (possibly due to small differences in the loss function not showing up because of stochastic errors in the optimization procedure), i.e., *k*^{⋆} = 5.

The final autoencoder architecture used for the rest of our study is composed of four layers with the following sizes: Input (621) → Hidden 1 (100) → Bottleneck (*k* = 5) → Hidden 2 (100) → Output (621). The activation function used for all layers is hyperbolic tangent.

## IV. CHOOSING THE “OPTIMAL” COLLECTIVE VARIABLE FOR Hsp90

Once the autoencoder is trained, the encoder is used as the learned collective variable. In theory, the five-dimensional (5D) CV could be used directly. In practice however, the simultaneous biasing in five directions and estimation of the free energy over this five-dimensional space is prohibitively expensive from a computational viewpoint. ABF typically requires a low CV dimension of 1 to a maximum of 3. Our aim here is to reduce the CV to a two-dimensional (2D) one. We explore two approaches: selecting a two-dimensional CV out of the five-dimensional one (see Sec. IV A) and then comparing the quality of this selected CV to directly learning a two-dimensional CV (see Sec. IV B).

### A. Selecting a two-dimensional CV

To minimize the dimensions of the CV to be used for biasing, we assessed the efficiency of each component of the CV at describing and differentiating the five identified states of Hsp90, always based on the L2 loop conformations. First, we plotted the values of each of the five CV coordinates, which we refer to by *CV*_{i} for 1 ≤ *i* ≤ 5. For this, we separate our training dataset into the five conformational states to observe how each coordinate of the CV varies from one state to another (Fig. 2). The results indicate that *CV*_{1} cannot discriminate between the five identified states of Hsp90. Next, *CV*_{4} takes comparable values over all states, and therefore does not provide a clear separation between states either. The coordinates *CV*_{1} and *CV*_{4} are therefore eliminated, as they fail to differentiate significantly between the five states and are thus not expected to be helpful for driving transitions among these states.

Next, to discriminate between the remaining three directions, i.e., *CV*_{2}, *CV*_{3}, and *CV*_{5}, we perform hierarchical clustering over each pair of CVs and select the CV pair whose clustering best separates the five states. For this, we use the agglomerative clustering method, with Ward’s minimum variance criterion to merge clusters. The method starts with as many clusters as the number of datapoints in our dataset and successively merges clusters by minimizing the variance of the newly merged clusters. We performed agglomerative clustering over the five-dimensional CV space and over all three possible CV pairs: (*CV*_{2}; *CV*_{3}), (*CV*_{2}; *CV*_{5}), and (*CV*_{3}; *CV*_{5}). We then stop the hierarchical clustering at five clusters aiming at identifying the five Hsp90 L2 loop states identified previously. Interestingly, the five states are recovered both for the five-dimensional CV and the combination of coordinates (*CV*_{3}; *CV*_{5}) (Fig. 3). Moreover, the dendrograms corresponding to the clustering indicate that clusters L2 and L5, i.e., States 1 and 6, are the most similar compared to the remaining states (Fig. S6).

Based on this analysis, we can conclude that the combination of CV coordinates (*CV*_{3}; *CV*_{5}) is able to separate the five states with the highest accuracy, i.e., the lowest number of mislabeled points, almost equally well as the five-dimensional CV. In Fig. 4(a), we plot the CV coordinates (*CV*_{3}; *CV*_{5}) over the dataset of short unbiased trajectories started from each state (the samples from each state are colored using a different color). The plot shows that these two coordinates indeed differentiate between the five L2 loop conformational states, making it a good choice for running a free energy biasing procedure. For the remainder of this work, we refer to the coordinate pair (*CV*_{3}, *CV*_{5}) as the autoencoder CV but keep the indexing *CV*_{3} and *CV*_{5}.

### B. The effect of bottleneck dimensions

In light of the results of Sec. IV A, i.e., a two-dimensional CV is able to separate the five different L2 loop states of Hsp90, and the discussion in Sec. III C, one might argue that it might not be necessary to use a bottleneck dimension of *k*^{⋆} = 5 and make a selection of two coordinates; instead, a direct training of an AE with a bottleneck dimension of *k*^{⋆} = 2 could be sufficient. As an unsupervised model trained for reconstruction of the input, the autoencoder will have to learn some features that do not necessarily distinguish between the metastable states (as *CV*_{1} in Fig. 2 for instance) but may nonetheless be important for data reconstruction (e.g., features representing a large number of residues of the protein). This is especially true for large proteins whose various states are sometimes only distinguished by motions in relatively small functionally important regions. Training an autoencoder with a large bottleneck size *k*^{⋆} = 5 makes it possible to learn and separate features representing the actual motions of interest from the features representing high variance directions that are unrelated to these motions. Then, handpicking the *d* < *k* bottleneck dimensions of interest ensures more efficiency for biasing, compared to directly learning a CV using a *d*-dimensional bottleneck autoencoder. To illustrate our point, we compare in Fig. 4 the CV obtained from the selection of (*CV*_{3}; *CV*_{5}) coordinates against a two-dimensional bottleneck autoencoder. The first coordinate of the second CV fails to distinguish between any state, while the second coordinate does not make a clear separation between States 1 and 6. The CV learned by an AE with a two-dimensional bottleneck is therefore of lower quality than the one extracted from a five-dimensional bottleneck AE.

## V. EXPLORING Hsp90 TRANSITIONS USING AE-LEARNED CVs AND eABF

### A. Technical information

All the biased MD simulations were performed using OpenMM^{58} under its Python API. The starting conformation for each state is the last conformation after the position restraint MD. As previously discussed, the Amber99SB-ILDN forcefield and the TIP4P water model were used, with a cubic box of 12 Å buffer around the protein. Simulations were run in the NVT ensemble, with a Langevin integrator, a collision rate of 1 ps^{−1}, a time step of 2 fs, and temperature *T* = 300 K. The extended system adaptive biasing force algorithm implemented in PLUMED^{59} was used for biasing. Each coordinate of the CV was discretized into 50 bins. All other parameters of the ABF algorithm were kept to their default values in PLUMED.

MD trajectories generated in this work are initially analyzed to ensure that the system remains stable throughout the simulation. All RMSD computations were done using the mdtraj library^{60} in Python. Only the *C*_{α} carbons were used in the RMSD computations. For each trajectory, the RMSD was computed with respect to each of the five states, over the whole protein, and over the ten residues forming the L2 loop of interest (105–114). Moreover, the dihedral angles of the residues forming the L2 loop were calculated post-simulation using mdtraj. The dihedral analysis uses the initial dihedral clustering model that was used to define the states (Sec. II B). The clustering distances, i.e., the distance of each conformation to the centers of the identified clusters, was used on the unbiased trajectories to delimit the states/clusters from a dynamical (rather than crystal structure based) viewpoint. This makes it possible to detect transitions in the biased trajectories by computing the same distances. As an additional analysis step, the hydrogen bonds formed within the L2 loop were computed for each state using VMD,^{61} with threshold values 25° for the acceptor–donor-hydrogen angle and 0.35 nm for the donor–acceptor distance. More precisely, the short unbiased trajectories are used to compute the frequency of the hydrogen bonds for each state. These computations were used to determine a set of the hydrogen bonds specific to each state, making it possible to define an H-bond-based fingerprint for each state, which can then be used to detect (or rather confirm) transitions in subsequent biased simulations.

### B. Transition from State 1 to State 3

After selecting the best two-dimensional candidate CV for Hsp90, i.e., (*CV*_{3}; *CV*_{5}), we initiated a series of biased simulations starting from one state and tried to explore transitions to other states, either already identified (Fig. 1) or unexplored.

To showcase the ability of the autoencoder CV to facilitate the transitions between different states, we selected the example of a 100 ns biased MD simulation starting from State 1. Both coordinates of the autoencoder CV, i.e., biasing CV, span the whole range of the [−1, 1] space (Fig. S7). This indicates that there is indeed a transition from State 1 to another state. To identify the visited states, the RMSD of the biased trajectory is calculated with respect to the initial conformation (i.e., energy minimized, and position restrained conformation) of each of the five states, using only the *C*_{α} carbons of the L2 loop. As expected, the minimal RMSD is at the beginning of the simulation obtained with respect to State 1, i.e., the starting state. Interestingly, at ∼30 ns of biased MD and for 25 ns, the minimal RMSD shifts to the one with respect to State 3 (Fig. 5). After 55 ns, the RMSD is increasing and possibly another previously not known state was visited.

The clustering model trained to obtain the six original states (Sec. II B) was also used to characterize the potential transitions to different states. More precisely, the distance of each frame of the biased trajectory to each of the five cluster centers corresponding to States 1, 2, 3, 5, and 6 was calculated. Recall that distances to cluster 4 are not plotted as we have already discarded this state after merging it with State 5. As a point of comparison, the same distances are plotted for the unbiased dataset of trajectories started from each state (Fig. S6). The results obtained on the unbiased trajectories demonstrate that to label a conformation as State 3, its distance to the centroid of the corresponding cluster 3 should range between 2.8 and 4, and its distance to the remaining centroids should be higher than 4. The same applies for all other states except State 5, for which the distances are smaller. The results obtained on the biased trajectory show that the computed distances are compatible with a transition to State 3 between 30 and 55 ns, i.e., approximately the same time frame for which a transition was identified using the RMSD (Fig. 5). We also note a small time frame around 25 ns where the conformations are closest to the centroid of State 5 than to any other state. These sampled points are thus automatically classified as State 5 configurations. However, their distances to the centroid 5, ranging between 3 and 4 Angstrom, are actually higher than the typical distance observed in State 5 unbiased trajectories, and which ranges between 1 and 2. This indicates that these conformations probably do not belong to State 5 (or any of the other states determined in the beginning of this work). We note that these “pseudo-state 5” conformations were sampled just before the time when the system transitions to State 3 and may therefore belong to a transient state between State 1 and State 3. Importantly, this misclassification emphasizes the importance of looking into the clustering distances rather than simply trusting the cluster assignments.

Dihedral PCA was also employed to identify possible transitions. Initially, we applied dihedral PCA on the 278 crystal structures already used for the definition of states to reduce the 20 dihedrals of the L2 loop into two principal components (Fig. 6, top). The obtained two-dimensional space of the two principal components is able to discriminate the various crystal structures belonging to the different states and are colored accordingly. After applying this PCA projection to the biased trajectory, it can be inferred that what was previously assigned as State 5 (or pseudo-state 5) is most probably a new transient state between States 1 and 3 (Fig. 6, bottom).

To be able to further characterize the conformational differences between the different states, we defined a “fingerprint” of each state computed as the range of values taken by each dihedral angle in the L2 loop for each one of the states. The L2 loop comprises ten residues or 20 dihedral angles and thus the obtained fingerprint can be hard to analyze. To guide the selection of dihedral angles sufficient to separate the different states, we trained a standard logistic regression model with *ℓ*_{1}-norm regularization on the 278 crystal structures. This so-called LASSO regularization ensures sparsity in the trained model, so that only relevant and nonredundant features, in our case dihedral angles, are kept. The obtained model selected six dihedral angles (Fig. 7). Here, we seek to define States 1 and 3 through the values of their dihedral angles.

Figure 7 shows the ranges of these six dihedral angle values on States 1 and 3 as well as the values visited during the biased simulation of our example. The ranges of the dihedral angles sampled during the biased simulation can be seen as an interpolation between the typical values identified for States 1 and 3, inferring again that a transition between these states has most probably occurred. This is for example illustrated by the values taken by the dihedral angle Ψ_{112}, which is colored in brown in Fig. 7. The angle Ψ_{112} ranges between 90° and 190° for State 1, and between −80° and 0° for State 3; while its range during the biased simulation is [−70°, 200°], thus interpolating the values observed in both States 1 and 3.

To further characterize the different states, using the ten 20-ns trajectories of the initial dataset, we identified all the backbone hydrogen bonds for residues 100–120. A total of 70 hydrogen bonds appear in at least one of the five states and in at least one of the ten short trajectories. For each hydrogen bond *h* and each conformational state *i*, we determine a mean *m*_{h;i} and standard deviation *σ*_{h;i} of the hydrogen bond’s occurrence over the total sample of the specific state. Then, for each hydrogen bond *h*, we compute the overall mean *m*_{h} and standard deviation *σ*_{h} from the means *m*_{h;i}. To rule out hydrogen bonds that appear similarly often in all states or that appear rarely, we keep only hydrogen bonds with high enough values of *m*_{h} and *σ*_{h}, i.e., *m*_{h} > 4% and $\sigma h>mh4$. The final number of hydrogen bonds with high occurrence and variation among the states are 43. If we now keep only hydrogen bonds with high occurrence per state, i.e., *m*_{h;i} > 10% for at least one state, we have 19 hydrogen bonds left (Fig. S8). These high occurrence, high variance hydrogen bonds can be used to characterize each state. Focusing on States 1 and 3, there are three hydrogen bonds that appear only in State 1, viz., Thr115–Lys112, Leu107–Leu103, and Asn106–Asp102, and four hydrogen bonds that appear only in State 3, viz., L116–L112, Thr115–Ala111, Ile110–Leu107, and Ala117–Ser113 (highlighted in Fig. S9). In Fig. 8, we have included the two starting structures for States 1 and 3 along with the hydrogen bonds identified as stable and unique for each state.

To examine the transition from State 1 to State 3 during the 100-ns biased simulation, we monitor the abovementioned unique-per-state hydrogen bonds. Interestingly, three out of four hydrogen bonds that characterize State 3, viz., Lys116–Lys112, Ile110–Leu107, and Ala117–Ser113, although not present at the beginning of the simulation (Fig. 8), gradually appear and coexist at ∼48 ns and for 10 ns [Figs. S9(a)–S9(c); the fourth hydrogen bond, i.e., Thr115–Ala111, appears for less than 1 ns and is thus not reported]. On top of that, the three hydrogen bonds that exist only in State 1 and not in State 3 are broken during the biased simulation. Taken together, the formation of the three “State 3” hydrogen bonds and the breaking of the “State 1” hydrogen bonds indicate a potential transition to a State 3-like conformation in agreement with the previous analyses. The hydrogen bond analysis demonstrates that on top of more holistic structural elements of the protein, e.g., *C*_{α} RMSD (Fig. 5) or PCA (Fig. 6), subtle differences in the vicinity of the binding site and in particular on the ATP-lid, occurring during the binding of different ligands, can be captured during a short biased simulation in the absence of ligand. This is of particular interest since, using our Autoencoder-Adaptive Biasing Force (AE-ABF) protocol and starting from a state that exists both in apo and holo forms, i.e., State 1, Hsp90-NTD transitions to a state where the ATP-lid is more structured and exists most probably only in a holo form (based on the existing experimentally resolved structures).

### C. Other observed transitions

To ensure that the results presented in Sec. V B are not specific to the simulation at hand (ABF starting from State 1), additional biased simulations were run. In particular, we ran ABF using the same 2D-CV, i.e., (*CV*_{3}; *CV*_{5}), starting from States 1, 3, and 5 and extending the sampling time to 200 ns. The distance from the centroids representing each state and dihedral PCA confirm the transition from State 1 to State 3 and an eventual transition to State 5 [Figs. S10(a) and S10(b)]. Moreover, as shown in Figs. S10(c) and S10(d), starting from State 3, we observe an intermediate transition to State 6 before the system finally transitions to State 5 (there are indications that State 2 is also transiently visited). Finally, analysis of the biased simulation of State 5 indicates a transition to State 1 [Figs. S10(e) and S10(f)].

The abovementioned transitions between States 1, 3, and 5, and occasional visits to States 2 and 6, can provide information about the potential mechanistic effect of ligand binding. For example, the binding of geldanamycin,^{62} a natural product with Hsp90 inhibitory action, leads to an ATP-lid conformation of Hsp90 similar to the conformation seen in the isolated apo human Hsp90 NTD (State 1).^{63} This conformation on the other hand is different from the one observed upon nucleotide binding, where hydrophobic surfaces are accessible for NTD dimerization (State 2). Based on Fig. S10, no transitions between States 1 and 2 are observed and thus this supports further the argument that an inhibitor, geldanamycin in this case, may stabilize the NTD in an apo-like conformation, or allosterically shift the conformation of Hsp90 to a state that is incapable of client protein binding. In this context, the presented framework can be used to identify potential transitions (or lack thereof) between biologically relevant states.

## VI. DISCUSSION AND NEXT STEPS

Let us first emphasize that the methodology proposed in this work is very general and could be applied in principle to many other systems: Starting from known metastable configurations from the PDB, augmented with short molecular dynamics simulations to explore the associated metastable basins, we were able by using autoencoders to build collective variables that are sufficiently good to, first, distinguish between the metastable states, and, second, enable transitions between those states when used in a free energy adaptive biasing method. Metaphorically, we could say that, at least on the test case considered in this work, autoencoders were able to shed light on possible pathways between already lightened basins. Let us mention that we have also used autoencoders in an even less supervised setting: starting from only one metastable basin and progressively discovering other basins to refine the collective variables, see the Free-Energy Biasing and Iterative Learning with AutoEncoders (FEBILAE) method presented in Ref. 24. Autoencoders thus seem to be efficient tools to build relevant collective variables for molecular systems, in particular for biological applications. Of course, the method should be further tested to assess its performance, which may also depend on the machine learning and molecular set-up, including the autoencoder structure and training parameters, and the MD simulation parameters, in particular the choice of the force field.

Coming back more precisely to the results obtained on Hsp90, the obtained states are visually different from each other, and display some level of metastability as showcased by the RMSD evolution of unbiased simulations. The autoencoder trained on the dataset of short unbiased simulations allows the construction of a low dimensional CV able to clearly differentiate between the conformational states. Additionally, the results obtained on the 100 ns biased simulation showcase that this autoencoder CV can be used to efficiently sample transition paths among at least two states. Running biased simulations using more classical choices of CVs, such as some dihedral angles of the L2 loop, did not yield such transitions. Training the autoencoder on all five states was central to obtaining this CV. The entire analysis is in fact based on the definition of the five states, which was done using the clustering of the experimentally determined structures. The results are thus all guided by this classification choice. While we argue that our classification of the states is well defined, many other choices could have been applied, for example, by basing the clustering on the binding site itself rather than on the L2 loop adjacent to it.

The obtained states differ by the conformation of the L2 loop, originally due to their binding (or lack thereof) with different ligands. We chose in this work to simulate Hsp90 unbound, i.e., by removing the ligand from the bound states and applying restrained MD to stabilize the obtained systems. In Fig. S7, we report the *C*_{α} RMSD of 200 ns unbiased simulations of the different states. These states are shown to be stable as they do not achieve any transition between the identified states over the time frame of 200 ns. This indicates that these conformations do exist within the apo form of Hsp90. Importantly, this is in agreement with other studies of the NTD, which observe that the apo state visits various conformations of the NTD lid.^{42,64} While the stability of each of these states for the apo structure should be further examined using more MD simulations, we can conclude in the context of the present study that the five separate conformational states of the L2 loop can all be visited in the apo form, and transitions among these conformations have been achieved by biased sampling of the autoencoder CV. We emphasize again that biased sampling applied with more traditional CVs, i.e., a selection of the dihedral angles of the loop of interest, did not yield any transition.

To the best of our knowledge, enhanced sampling of the Hsp90 NTD alone in apo form has not been previously performed. Unbiased sampling of the apo NTD shows some structural motions of the L2 loop, but no clear transitions (e.g., between the helix conformation of State 5 and the loop conformations of other states) are achieved in the *μ*s timescale.^{64} Several studies report enhanced sampling simulations of the bound NTD (taken separately or within the whole dimer), where the collective variable used is often the distance between the NTD and the ligand.^{65–68} These simulations often aim at computing the binding affinity of the ligands to the NTD and consist of short dissociation trajectories to compare a wide array of inhibitors. Some of these studies do differentiate between conformations of the L2 loop of interest,^{69} by e.g., distinguishing between ligands that bind to the various conformations (loop or helix). However, the aim of these studies is not to drive transitions among these conformations, and structural motions of the L2 loop during the dissociation MD simulations are not reported/observed. In Ref. 70, Umbrella Sampling of the whole Hsp90 dimer, using the angle formed by the two monomers, is performed for the apo structure, where the calculated free energy surface shows the presence of two low energy states corresponding to the stretched and compact conformations, but the authors report no corresponding important motions in the NTD site itself.

A natural next step for the present study is to compute the free energy landscape of the two-dimensional autoencoder CV. Because the identified states can somewhat be linked to apo or holo conformations of Hsp90, calculating the free energy surface could provide insights into the ligand binding mechanism. Additionally, while we argue that our choice of state definition, based on the clustering of a large number of crystal structures, is adequate, it is not the only way to obtain the initial states. Free energy calculations can help redefine the conformational states of Hsp90 and possibly discover new states in the local minima of the energy landscape, which were not included in the present analysis.

Finally, another possibility is to forgo state definitions altogether and to use only one conformation of Hsp90 to build the learning dataset. Such unbiased simulations are expected to be trapped within the initial state or only visit conformations close to it, and so iterative algorithms such as FEBILAE^{24} or other methods^{71–74} could be used. Nevertheless, assessing the efficiency of such iterative protocols method on a large biological system like Hsp90 presents an interesting computational challenge.

## SUPPLEMENTARY MATERIAL

See the supplementary material for additional analyses not shown in the main text. In particular, we have included RMSD plots of the training set, training loss convergence, clustering information, the collective variable evolution, a hydrogen bond analysis, and a dihedral angles analysis of some additional biased simulations.

## ACKNOWLEDGMENTS

The work of T.L. and G.S. is partially funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (Grant Agreement No. 810367). ZB was partially funded by the Association Nationale de la Recherche et de la Technologie (ANRT) contract 2018/0759.

## AUTHOR DECLARATIONS

### Conflict of Interest

Marc Bianciotto, Hervé Minoux, and Paraskevi Gkeka are Sanofi employees and own stock in Sanofi.

### Author Contributions

**Zineb Belkacemi**: Conceptualization (equal); Formal analysis (equal); Methodology (equal); Supervision (equal); Writing – original draft (equal); Writing – review & editing (equal). **Marc Bianciotto**: Conceptualization (equal); Data curation (equal); Methodology (equal); Supervision (equal); Writing – review & editing (equal). **Hervé Minoux**: Conceptualization (equal); Methodology (equal); Supervision (equal); Writing – review & editing (equal). **Tony Lelièvre**: Conceptualization (equal); Data curation (equal); Methodology (equal); Supervision (equal); Writing – review & editing (equal). **Gabriel Stoltz**: Conceptualization (equal); Methodology (equal); Supervision (equal); Writing – review & editing (equal). **Paraskevi Gkeka**: Conceptualization (equal); Formal analysis (equal); Methodology (equal); Supervision (equal); Writing – original draft (equal); Writing – review & editing (equal).

## DATA AVAILABILITY

The data that support the findings of this study are available from Sanofi R&D and ENPC. Restrictions apply to the availability of these data, which were used under license for this study. Data are available from the authors upon reasonable request and with the permission of Sanofi R&D and ENPC.

## REFERENCES

*Free Energy Calculations: Theory and Applications in Chemistry and Biology*

*Free Energy Computations: A Mathematical Perspective*

*Numerical Mathematics and Advanced Applications 2011: Proceedings of Enumath 2011, the 9th European Conference on Numerical Mathematics and Advanced Applications, Leicester, September 2011*

*E. coli*Hsp90 and conformation changes upon ADP binding

^{19}F NMR spectroscopy