A method for calculating the free energy difference between two structurally defined conformational states of a chemical system is developed. A path is defined using a previously reported collective variable that interpolates between two or more conformations, and a restraint is introduced in order to keep the system close to the path. The evolution of the system along the path, which typically presents a high free energy barrier, is generated using enhanced sampling schemes. Although the formulation of the method in terms of a path is quite general, an important advance in this work is the demonstration that prior knowledge of the path is, in fact, not needed and that the free energy difference can be obtained using a simplified definition of the path collective variable that *only* involves the endpoints. We first validate this method on cyclohexane isomerization. The method is then tested for an extensive conformational change in a realistic molecular system by calculating the free energy difference between the *α*-helix and *β*-hairpin conformations of deca-alanine in solution. Finally, the method is applied to a biologically relevant system to calculate the free energy difference of an observed and a hypothetical conformation of an antigenic peptide bound to a major histocompatibility complex.

## I. INTRODUCTION

Molecular dynamics (MD) simulations are commonly used to calculate a key quantity in biomolecular chemistry, materials chemistry, and other areas in molecular science, namely, the free energy difference between distinct metastable conformational states of a system under given thermodynamic conditions. Such relative free energy calculations can predict, for example, the favored conformations of a peptide chain,^{1,2} the binding affinity of a drug candidate to a target protein,^{3–5} the most likely crystal structures of a polymorphic solid,^{6–9} or free-energy pathways of first-order phase transitions.^{10}

Generating free energy differences between the conformational states of a complex system requires visiting these states sufficiently frequently to obtain their relative populations in a Boltzmann distribution. However, if the free energy barrier between conformational states of a system is significantly greater than *k*_{B}*T*, where *k*_{B} is Boltzmann’s constant and *T* is the temperature, then barrier crossing events are rare and standard MD requires an inordinate amount of time to build up the required populations and, by extension, obtain converged free energy differences. The use of a “bias” to enhance the rate of free energy barrier crossing, while simultaneously attempting to obtain unbiased free energy differences, is the principle underlying several simulation methods for overcoming the sampling problem. Longstanding rare-event sampling methodologies include the blue moon ensemble^{11,12} and umbrella sampling^{13,14} approaches. More modern techniques include metadynamics,^{15} well-tempered metadynamics,^{16} the adaptive biasing force approach,^{17} local elevation umbrella sampling,^{18} adiabatic free energy dynamics (AFED)^{19–21} combined with machine learning techniques such as neural networks,^{22} and temperature-accelerated molecular dynamics (TAMD)^{23} combined with the single sweep method.^{24}

In the adiabatic free energy dynamics (AFED) method,^{19,20} as in most of the aforementioned techniques, a set of collective variables (CVs), which are functions of the atomic coordinates of the system capable of distinguishing the different conformational states of interest, is defined and a coordinate transformation is introduced to make these explicit if they are not explicit already. The CVs are then maintained at a high temperature *T*_{s} ≫ *T* and assigned high masses so as to effect an adiabatic decoupling of these CVs from the remainder of the system, which is maintained at the physical temperature *T*. The high temperature *T*_{s} enables the system to overcome free energy barriers in the space of the chosen CVs. The adiabatic decoupling condition leads to a simple relation between the sample histogram and the free energy surface, which yields a free energy that is correct for the physical temperature *T*. The related methods, temperature-accelerated molecular dynamics^{23}/driven adiabatic free energy dynamics^{21} (TAMD/dAFED), are versions of AFED in which high-temperature degrees of freedom, termed *coarse-grained variables*, are defined in an extended phase space and coupled harmonically to the CVs. In the unified free energy dynamics (UFED),^{25} which unified TAMD/dAFED with metadynamics,^{15} a Gaussian potential is periodically added at the current position of the coarse-grained variables, disfavoring regions already visited and further increasing the rate of free energy barrier crossing events. Note that UFED becomes equivalent to extended-Lagrangian metadynamics in the limit that *T*_{s} = *T*, and it becomes TAMD/dAFED in the limit of zero Gaussian biasing potential. It was shown that for each of the adiabatic free energy methods mentioned above, either the sample histogram or the average force (or a combination of both) can be used to reconstruct the free energy surface,^{26} and ensemble averages of any observable can be obtained using the appropriate reweighting scheme.^{27}

Biased MD for the calculation of a free energy difference of conformational states critically relies on the choice of CVs, as these must adequately distinguish the different conformational states of interest. These CVs can be any differentiable function of the system coordinates. Their complexity ranges from a single Cartesian coordinate, for example, to describe ion transport in a channel, to more elaborate expressions including angles, radii of gyration, numbers of contacts, the root mean square deviation (RMSD) with respect to a reference structure, or crystalline lattice order parameters.^{6,28,29} Recently, software has been developed^{30} that can perform several biased MD schemes with respect to complex pre-defined or user-defined CVs within various popular MD codes.

Several attempts have been made^{31–34} to define CVs that reduce the number of dimensions on which a large-scale molecular conformational change can be mapped^{1} or to learn useful CVs on the fly via manifold learning approaches.^{35} For an *N*-particle system with positions $r1,\u2026,rN\u2261r$, Branduardi *et al.* introduced a pair of orthogonal CVs *S*(**r**) and *Z*(**r**) which are defined relative to an ordered sequence of reference structures.^{31} If the sequence of reference structures is constructed as a “path” connecting two conformational states of interest, then *S*(**r**) is taken as a progress coordinate along the path and *Z*(**r**) is a deviation orthogonal to the path. A detailed sequence with many intermediate states can characterize a hypothetical minimum free energy path that can be optimized in an iterative procedure. In this work, however, we are interested in using the Branduardi *et al.* path variables with only two reference structures at the end points, which alleviates the need to construct a realistic transition path. Because there is no expectation that the path dictated by the functional form of *S*(**r**) and *Z*(**r**) is the minimum free energy path, transition states and free energy barriers are irrelevant. However, the free energy difference of the end states is path-independent and can be recovered if the barrier can be crossed. This two-structure, arbitrary path is similar in spirit to Refs. 32 and 33, where a path is specified (with different functions) in a two-dimensional space spanned by quantities measuring the similarity to each end-point structure (such as the fraction of common contacts). It also bears some similarity to the *λ*-dynamics schemes of Abrams *et al.*^{36} and of Wu *et al.*^{37} in which a path is designed to separate two thermodynamic states with a large barrier in the *λ*-parameter space, which allows solvation and binding free energies to be computed by enhanced sampling techniques. In our method, we utilize the two-state/endpoint version of the *S*(**r**) CV in combination with TAMD/dAFED or UFED to enhance sampling and generate free energy differences with high efficiency. In order to keep the system close to the transition path, we apply a harmonic restraint on the orthogonal coordinate, *Z*(**r**), whose free energy contribution is carefully taken into account, as described below. We call the corresponding methods “path-dAFED” and “path-UFED.”

The path-dAFED and path-UFED methodologies allow us to study dramatic conformational transitions with a very simple CV. In this work, we apply our method to calculate free energy differences in three example systems: (1) the two distinct chair forms of gas-phase cyclohexane, (2) the *α*-helix and *β*-hairpin structures of solvated deca-alanine, and (3) two conformations of a melanoma-specific peptide antigen presented by the major histocompatibility complex (MHC) molecule HLA-A2 and modeled in explicit solvent. These systems are described in more detail in the following paragraphs.

The first prediction of multiple cyclohexane conformations was made by Sachse in 1890.^{38} Theoretical and experimental measurements of relative populations of chairs, boats, and twist-boats for substituted and unsubstituted cyclohexane span the century that followed.^{39–42} The chair-chair interconversion of unsubstituted cyclohexane is a useful process to validate our method because it contains two free energy minima of equal depth and a free energy barrier greater than 10*k*_{B}*T*. Although we are not directly interested in the barrier *per se*, this value indicates that the interconversion is a rare event, requiring enhanced sampling in order to effect the interconversion.

In the second example, we chose the capped peptide deca-alanine, which is large enough to adopt both *α*-helix and *β*-hairpin structures. Its 112 atoms and 18 backbone torsion angles define a conformational space of very high dimension and complexity. Solvation in water enables the possibility of hydrogen bonding interactions between the peptide and solvent, which makes this conformational space even larger. Computational techniques that have been used to study the conformational free energy of deca-alanine include adaptive biasing force,^{17} multidimensional adaptive umbrella sampling,^{43} and deactivated morphing.^{44} Structural preferences of deca-alanine are dependent on the presence or absence of solvent^{43} as well as the choice of the force field and/or methodology.^{43,44}

The solvated deca-alanine system also allows comparison of our method and the deactivated morphing method proposed by Park *et al.*^{44} to calculate relative free energies of conformational states as follows: (1) a molecule’s atomic positions are gradually restrained to a first reference structure, (2) non-bonded and intramolecular interactions are deactivated, (3) atoms are moved to the positions required for the new structure, (4) interactions are reactivated, and (5) the restraints are gradually removed. This approach requires simulating 30 intermediate states for each of the *α*-helix and *β*-hairpin structures of deca-alanine due to the activation of position restraints. This method also requires four steps to deactivate the physical interactions. Using this method, Park *et al.* calculated that an ensemble of states within a 0.2 nm RMSD cutoff from the *α*-helix has a free energy that is 42 ± 4 kJ/mol lower than an equivalently defined ensemble for the *β*-hairpin in solution. With RMSD cutoffs of 0.05 nm, the helix state was found to be approximately 66.0 ± 4.2 kJ/mol lower in free energy than the hairpin.

In the third example, we apply the new method to a more complex and biologically relevant system. Selective activation of T cells relies on recognition by T-cell receptors (TCRs) of peptide (p) antigens, which are displayed by the major histocompatibility complex (MHC) at the surface of antigen-presenting cells. The TCR-pMHC complex has been a topic of considerable interest in immunology,^{45–48} in particular due to the combination of cross-reactivity and exquisite antigen specificity. Melan-A is a melanoma-specific antigen that is naturally expressed as a nonamer, AAGIGILTV (noted AAG below), or a decamer, EAAGIGILTV (noted EAA), both presented by the HLA-A2 MHC allele. The modest antigenicity of Melan-A, attributed to sub-optimal Ala anchor residues at positions 1 or 2, has prompted the design of engineered peptides with a Leu residue at these positions to improve MHC binding affinity and potentially enhance T-cell recognition.^{49} In crystallographic studies of pMHC molecules, the peptides AAG and ALG (where leucine substitutes for alanine at position 2) displayed an “extended” conformation in the MHC groove, while LAG (where the leucine-alanine substitution occurs at position 1) and the decameric peptides EAA and ELA adopted a “bulged” structure^{50} (see Fig. 7). Surprisingly, despite strong MHC binding of ALG, the leucine residue appears to lock the peptide in the extended conformation causing it to exhibit a significantly decreased immunogenicity compared to both AAG and the bulged peptides.^{51} Several explanations have been put forward to resolve this apparent disconnect between peptide conformation in the unliganded pMHC and immunogenicity. One possible explanation is related to pMHC dynamics: computational and nuclear-magnetic resonance evidence that ALG is markedly more flexible than AAG seems to imply an increased entropic penalty upon TCR binding.^{52} A tantalizing alternative explanation would be that AAG, lacking any strong anchor residues, can switch to a bulged conformation via an induced fit mechanism upon binding of TCRs that also recognize the bulged peptides. One X-ray structure showed that the TCR of a second-generation T-cell clone (called DMF5) binds to AAG in a modified extended state (although it binds ELA in its original bulged state),^{53} which proves that it is possible for AAG to partly adapt its conformation. Here, to probe whether a complete induced fit transition is thermodynamically possible for AAG, we use the path-UFED methodology to calculate the free energy difference between the “extended” and “bulged” conformations of the AAG/MHC complex.

The remainder of this paper is organized as follows. We first describe the methodology of the path CVs and enhanced sampling schemes as well as our approach to calculate the free energy contributions associated with the restraint. We then present results for the three systems mentioned above with a range of parameters and discuss the efficiency of our method. A discussion about unbiasing a free energy result obtained in the presence of restraints on atomic positions is presented in Appendix A. All computational details are given in Appendix B. Details on constructing the path CV for each system are given in Appendix C.

## II. METHODOLOGY

### A. Definition of the path collective variables

Consider an *N*-particle system in a volume *V* at temperature *T*. The particles have masses *m*_{1}, …, *m*_{N}, positions **r**_{1}, …, **r**_{N} ≡ **r**, where $r\u2208R3N$, and conjugate momenta **p**_{1}, …, **p**_{N} ≡ **p**, where $p\u2208R3N$. The classical Hamiltonian is then *H*(**p**, **r**) = *K*(**p**) + *V*(**r**), where *K*(**p**) and *V*(**r**) are the kinetic and potential energies, respectively.

For any sequence of *n* reference structures **r**^{(1)}, …, **r**^{(n)}, linking two points in configuration space, a path can be interpolated based on the following functions:^{31}

*S*(**r**) and *Z*(**r**) then define a pair of path based CVs. Here, *D*(**r**^{(i)}, **r**) is the distance between the instantaneous coordinates **r** of the dynamical system and those of the reference structure **r**^{(i)}, and *λ* is a parameter chosen so that *λD*(**r**^{(i)}, **r**^{(i+1)}) ≈ 2.3. If *λ* is chosen in this way, then for a system where *n* = 2 at an instant when *D*(**r**^{(1)}, **r**) = 0, the exponential terms in the summand of Eq. (1) will equal 1 and 0.1, respectively. In the definition of *S*(**r**), the *i* in the summand is included so that *S*(**r**) has different values at each of the different conformations **r**^{(i)} along the path. Specifically, as constructed, *S*(**r**) is approximately equal to *i* when the system is close to the *i*th structure.

In this work, we show that it is possible to use only the path endpoints as reference structures, i.e., *n* = 2 by defining *D*(⋅, ⋅) as the mean-square deviation,

Figure S4 of the supplementary material illustrates the construction of the *S*(**r**) and *Z*(**r**) CVs in the space of the mean-square displacements to the extended and bulged reference states for the Melan-A system, together with samples from equilibrium and path-UFED simulations (see below). Note that with the definitions above, *S*(**r**) is similar, but not equivalent, to the CVs from Ref. 32. Importantly, as will be shown, the use of endpoints is sufficient to generate a conformational change without the need to generate an entire path between the structures via some other sampling approach and, therefore, to obtain free energy differences between two structures with high efficiency using enhanced sampling methods.

### B. Enhanced MD sampling along the path

MD algorithms such as dAFED/TAMD^{21,23} and UFED^{25} can be used to enhance sampling along *S*(**r**) and calculate the corresponding potential of mean force (PMF). However, driving a complex system between *S*(**r**) = 1.0 and *S*(**r**) = 2.0 does not necessarily lead to sampling of the path characterized by *Z*(**r**) ≈ 0. Farther away from *Z*(**r**) = 0, for a given value of *S*(**r**), increasingly diverse structures are included that are not relevant to the end states of interest. For this reason, a restraining potential of the form $Vz(r)=12\kappa zZ2(r)$, which discourages the sampling of irrelevant structures far from the path connecting end states, is added to the physical potential *V*(**r**). The harmonic constant *κ*_{z} is a parameter of the simulation.

In the presence of an external restraint on one or more CVs, the dAFED/TAMD method can be derived in much the same way as the original method.^{21,23} We outline the main steps below. The effect of the *Z*-restraint on the PMF, *F*(*s*), will be accounted for in a post-processing step. We begin by expressing the restrained marginal probability distribution *P*_{r}(*s*), corresponding to the restrained Hamiltonian *H*_{r}(**p**, **r**) = *H*(**p**, **r**) + *V*_{z}(**r**),

By replacing the Dirac delta function *δ*(*S*(**r**) − *s*) by the limit of a Gaussian function,

and multiplying *P*_{r}(*s*) by a Gaussian integral over an additional variable *p*_{s}, we obtain

In this form, (*s*, *p*_{s}) can be viewed as conjugate dynamical variables for an extra degree of freedom with mass *m*_{s} evolving in an extended phase space.

To enhance sampling, *s* is maintained at a temperature *T*_{s} ≫ *T* and drives the physical system to regions of *S*(**r**) that are unlikely to be explored at temperature *T*. It is then necessary that the extended mass *m*_{s} be chosen such that *m*_{s} ≫ *m*_{i} in order to effect an adiabatic decoupling of the physical system from the high temperature extended system. Similar to prior studies,^{19,21,36} the adiabatic separation generates a distribution function of the form

where

In the limit of stiff coupling, *κ*_{s} → *∞*, the restrained free energy, *F*_{r}(*s*) ≑ −*k*_{B}*T* ln *Z*_{r}(*s*, *β*), can be calculated from the histogram observed in the TAMD/dAFED simulation according to

The restrained free energy can also be calculated from the average force exerted by the physical system on *s*, *i.e.*, as a potential of mean force by performing an integral of the form^{24–26}

where $\u22c5r,s\u2032$ represents an ensemble average at a fixed value of *s*′ in the restrained adiabatic simulation. In practice, in path-dAFED and path-UFED simulations, *S*(**r**) is not fixed at *s*′. Rather, *s*′ is sampled dynamically, and the adiabatic decoupling allows us simply to bin up the average value of $\kappa sS(r)\u2212s\u2032$ over the trajectory. The integral is then performed numerically or analytically by fitting the integrand to the coefficients in a basis-set expansion.^{24,25}

In Ref. 26, it was shown that the mean force approach to calculating a free energy surface converges more rapidly than the probability distribution approach in the cases considered. However, the main advantage of the mean force approach to calculating the free energy is that it does not depend on the actual distribution of *s* in the simulation. In the UFED approach of Chen *et al.*,^{25} this fact permits the use of a history-dependent bias potential of the form

where *h* and *σ* are the height and width of a Gaussian function, respectively, and *t*_{j} are time points prior to current time *t*. This bias potential, in addition to the elevated temperature of the extended variable, further facilitates overcoming high free energy barriers, akin to earlier methods such as local elevation,^{54} conformational flooding,^{55} or metadynamics.^{15}

In our framework, a conformational state of the *Z*-restrained molecular system is represented by an interval of *S*(**r**) values, $smin,smax$. These values can be determined via an NVT simulation in that state with no restraint on *Z*(**r**) or by an NVT simulation with some smaller restraint on *Z*(**r**) as described below in Subsection II C. Based on the PMF, Eq. (10) or Eq. (11), the restrained free energy of state *A* can be calculated as

with an analogous definition of $FrB$. The constant *C* contains the partition function as well as other constants needed to make the integral dimensionless and becomes irrelevant if we are interested in the path-restricted free energy difference between two states *A* and *B*, $\Delta FpathAB=FrB\u2212FrA$.

### C. Unbiasing with staged free energy perturbation

Given the restrained PMF *F*_{r}(*s*), two approaches are available to obtain the unbiased free energy difference, Δ*F*^{AB}, between conformational states *A* and *B*: the *PMF approach* and the *end-point approach*. In the PMF approach, *F*_{r}(*s*) is converted to the PMF of the unrestrained system, *F*(*s*), using the unbiasing scheme derived in Appendix A,

Here, $\u22c5r,s$ denotes an ensemble average in the presence of a *Z*-restraint, at a fixed value of *s*. In practice, the bias factor $e\beta VZ(r)$ in Eq. (14) is prone to large, sudden changes during a simulation due to singular excursions into regions of high *Z*(**r**), which then require a large amount of sampling in regions of lower *Z*(**r**) in order to converge. Such sampling issues are most likely to arise in the region of *S*(**r**)-space where the complexity of the space orthogonal to *S*(**r**) increases. Our preliminary simulations showed that in these regions, the PMF unbiased through Eq. (14) becomes unreliable (data not shown). Therefore, the *PMF approach* is not the best option to calculate Δ*F*^{AB}.

In the end-point approach, the unrestrained free energy of states *A* and *B*, *F*^{A} and *F*^{B}, are calculated from the corresponding restrained free energies, $FrA$ and $FrB$ [Eq. (13)], using free energy perturbation (FEP).^{56} In Ref. 27, we derived the expression for the ensemble average of any arbitrary observable in the adiabatic ensemble. Using this expression with the usual FEP observable, $e\beta VZ(r)$, we obtain the free energy contribution of the *Z*-restraint in state *A*, $\Delta FrA=FA\u2212FrA$,

Here we used the short-hand notation *∫*_{A}*ds* to refer to the integral over the range $s\u2208sAmin,sAmax$ that defines state *A* as in Eq. (13). In addition, we have introduced the state-restricted, biased ensemble average $\u22c5r,A$. Note that Eq. (15) can also be obtained directly by exponentiating Eq. (14) and integrating over *s*.

Instead of this one-way forward perturbation expression (from restrained to unrestrained ensembles), one can also evaluate $\Delta FrA$ via overlap sampling,^{57} a kind of average of the perturbation in both the forward and backward directions. For State *A*,

Equation (16) represents an FEP scheme where a single perturbation per end-state is required to remove the effect of the restraint on *Z*(**r**). This approach will be effective given sufficient overlap of the conformational space distributions in the restrained and unrestrained ensembles. In practice, however, this overlap is not always sufficient, and *m* intermediate states are added with gradually increasing harmonic constants 0 < ⋯ < *κ*_{i} < *κ*_{i+1} < ⋯ < *κ*_{z} and corresponding restraint potentials, $ViZ(r)\u2251\kappa iZ2(r)/2$, with *i* = 0, …, *m* such that *κ*_{0} = 0 and *κ*_{m+1} = *κ*_{z}. This method is referred to as “staged free energy perturbation.”^{3} After performing simulations at intermediate states, the free energy can be calculated as

Here, $\u22c5i,A$ represents the ensemble average with *Z*-restraint potential *V*_{i} in state *A*, *i.e.*, with $s\u2208sAmin,sAmax$.

In summary, we propose the following protocol: (1) run unbiased MD simulations at the end states, (2) define a restraint on *Z*(**r**) that, when used in restrained MD simulations, provides reasonable overlap with the unbiased MD simulation, (3) run path-dAFED or path-UFED simulations with the appropriate *Z*-restraint, (4) calculate biased end-point free energies using Eq. (13) with cutoffs *s*^{min} and *s*^{max} determined by the extent of the states observed in step (2), (5) unbias end-point free energies using the FEP scheme, Eq. (16), or the staged FEP scheme, Eq. (17), and (6) obtain the final value of Δ*F*^{AB} = *F*^{B} − *F*^{A} by following the thermodynamic cycle shown in Fig. 1,

## III. RESULTS

For the three systems that follow, equilibration and production simulation details are provided in Appendix B. Appendix C contains details for the construction of the path CVs, *S*(**r**) and *Z*(**r**). Where relevant, free energy differences associated with each segment of the staged FEP are provided in the supplementary material. Errors reported below are equal to one standard deviation as calculated by bootstrapping the simulation data^{58} using a procedure specific to correlated time series, in which blocks of data are resampled instead of using single observations.^{59}

### A. Cyclohexane

As a first example, we calculated the free energy difference between the two chair states of cyclohexane, which are depicted in Fig. 2 and for which the exact result is (intuitively) zero. Path-dAFED simulations were performed to enhance sampling along *S*(**r**) with *λ* = 996 nm^{−2} (no *Z*-restraint is necessary for this system). For this example, the effects of varying *m*_{s} and *κ*_{s} were studied. Using the same CV, an umbrella sampling calculation^{14} was performed for reference. For this we used 101 windows, all with a harmonic constant of *κ*_{s} = 1 × 10^{5} kJ/mol and 50 ns of simulation time, combined with the weighted histogram analysis method (WHAM).^{60,61}

Figure 3(a) shows the force-based PMFs [Eq. (11)] generated with various *κ*_{s} values in path-dAFED simulations with *T*_{s} = 5000 K and *m*_{s} = 2 × 10^{7} kJ ps^{2}/mol, as well as the reference PMF from umbrella sampling. The structures sampled around the minimum *s* = 1.5 are primarily twist-boats with some contribution from boat forms. The error reported in Fig. 3(b) is the time evolution of the *L*^{1}-norm error *ζ*,

where *N*_{b} is the number of points or bins, *s*_{i}, at which a function *F*(*s*_{i}, *t*) is evaluated at simulation time *t* and *F*_{ref}(*s*_{i}) is the reference PMF. Due to the definition of the CV *S*(**r**), the errors near *s* = 1 and *s* = 2 are large, so this *L*^{1}-norm is calculated using the interval 1.05 ≤ *s* ≤ 1.95. When *F*_{ref}(*s*) is defined as the final PMF estimated using the same method, we refer to the evolution of *ζ*(*t*) as *self-convergence*. For cyclohexane, *F*_{ref}(*s*) is the PMF computed from umbrella sampling. To compare convergence speeds, the self-convergence of the umbrella sampling PMF is also shown.

The low value of *κ*_{s} = 1 × 10^{4} kJ/mol yields free energy basins that are wider and with lower transition barriers than the other simulations, as shown in Fig. 3(a). With increasing *κ*_{s}, there is increasing agreement with the umbrella sampling result with regard to the transition barrier height and the steepness of the PMF near *s* = 1 and *s* = 2.

As an alternative to the PMF convergence measured with *ζ*(*t*), we can monitor the convergence of the state free energy difference $\Delta FpathAB\u2261\Delta FAB$ defined by Eq. (13) (no *Z*-restraint). Each state *A* or *B* is defined as the domain *s*_{0} − 0.02 ≤ *s* ≤ *s*_{0} + 0.02, where *s*_{0} is the location of the corresponding PMF minimum. Figure 3(d) shows similar converge times of approximately 100 ns. The converged values of Δ*F*^{AB} in Table I show agreement with the zero target value. These calculations suggest that the dAFED coupling constant *κ*_{s} must be chosen with some care in order to recover the correct PMF. In the low-*κ*_{s} case, we see clearly a regime in which the stiff coupling assumption in Eq. (7) is violated. While choosing larger values of *κ*_{s} increases the accuracy, increasing *κ*_{s} beyond 1 × 10^{5} kJ/mol is limited by the time step used to integrate the equations of motion. Other implementations with multiple time step integration, such as the reversible reference system propagator algorithm (r-RESPA)^{62} and isokinetic RESPA,^{63,64} would ameliorate this limitation. It is also important to note that the apparent effects of *κ*_{s} on the resultant PMF of a path-dAFED simulation have little influence on the position of the free energy minima, only on the width of free energy basins and the height of free energy barriers in the case of this symmetric PMF.

κ_{s} (kJ/mol)
. | $\Delta FpathAB$ (kJ/mol) . |
---|---|

1 × 10^{4} | 0.01(0.07) |

3 × 10^{4} | 0.08(0.09) |

1 × 10^{5} | 0.09(0.10) |

κ_{s} (kJ/mol)
. | $\Delta FpathAB$ (kJ/mol) . |
---|---|

1 × 10^{4} | 0.01(0.07) |

3 × 10^{4} | 0.08(0.09) |

1 × 10^{5} | 0.09(0.10) |

In Fig. 4(a), the force-based PMFs [Eq. (11)] generated by varying *m*_{s} in path-dAFED simulations with *T*_{s} = 5000 K and *κ*_{s} = 1 × 10^{5} kJ/mol are shown. Although using lower values for *m*_{s} leads to a small increase in the *L*^{1}-norm error in the calculation of the PMF, likely due to the nonadiabaticity of the system [Fig. 4(b)], this does not translate into a noticeable trend in the free energy difference of states [Fig. 4(d)], which appears to be limited to an interval of ±0.15 kJ/mol around zero. The final values of $\Delta FpathAB$ given in Table II for different values of *m*_{s} are indistinguishable within their error bars. It is evident that to accurately compute the free energy difference of states using path-dAFED methodology, adiabatic separation of the physical system and the extended system is not a strict requirement for the case of a symmetric PMF. This, importantly, is only possible with the mean force approach to computing a PMF, Eq. (19), as we previously noted in the context of UFED.^{25}

m_{s} (kJ ps^{2}/mol)
. | $\Delta FpathAB$ (kJ/mol) . |
---|---|

2 × 10^{4} | −0.14(0.11) |

2 × 10^{7} | 0.09(0.10) |

2 × 10^{9} | −0.09(0.15) |

m_{s} (kJ ps^{2}/mol)
. | $\Delta FpathAB$ (kJ/mol) . |
---|---|

2 × 10^{4} | −0.14(0.11) |

2 × 10^{7} | 0.09(0.10) |

2 × 10^{9} | −0.09(0.15) |

The smoothing parameter *λ* in the definition of the path variable *S*(**r**) was also studied. As shown in Eq. (1), *λ* is a constant similar to the inverse of the variance of a Gaussian distribution. In Fig. S1 of the supplementary material, it is shown that by modification of *λ* below the prescribed value where *λD*(**r**^{(i)}, **r**^{(i+1)}) ≈ 2.3, the two free energy minima move closer to each other and to the midpoint of the domain, *s* = 1.5. While the PMF does contract, the free energy difference of the chair states is insensitive to this change, within a range of ±0.20 kJ/mol. Additionally, the ability of the system to revisit the original chair state is not affected, as determined by the RMSD from the initial chair state during the path-dAFED trajectory (see Fig. S1 of the supplementary material). However, the free energy barrier height between states does decrease as *λ* decreases, with a barrier range of ≈5 kJ/mol between the largest and smallest values of *λ*. With values of *λ* that are larger than the prescribed value, the barrier region is undefined on *s*, so driving the system outside of the initial free energy minimum is impossible. From these studies, we can confirm the importance of choosing *λ* as recommended by Branduardi *et al.*^{31}

An additional cyclohexane path-dAFED simulation was conducted with a restraint on *Z*(**r**) with *κ*_{z} = 1 × 10^{2} kJ/mol nm^{4} as a first and simple test of the use of such a restraint. For this system, the unbiasing technique provided in Appendix A leads to a PMF that is very similar to those determined by unrestrained path-dAFED (see Fig. S2 of the supplementary material).

For the chair-chair interconversion process, the path-dAFED method generates the PMF and free energy difference of states with high accuracy relative to umbrella sampling despite high driving temperature *T*_{s} and high transition barriers. For a single simulation, the free energy difference of the two chair states stabilizes in approximately 100 ns of simulation time on a single trajectory, where umbrella sampling requires approximately 20 ns of combined sampling across all windows. While it is not surprising that path-dAFED does not offer a speed advantage over umbrella sampling for this simple toy example, the cyclohexane chair-chair isomerization process is valuable as a proof of concept. We note that in more complex systems with high free energy barriers in dimensions orthogonal to the path, the umbrella sampling approach can lead to sampling issues that are alleviated in the context of path-UFED, as discussed further below.

### B. Deca-alanine

Path-UFED simulations were performed to calculate the free energy difference, Δ*F*^{αβ}, between the *α*-helix and *β*-hairpin conformations of methyl-capped deca-alanine in explicit solvent. We emphasize that this is a challenging problem to approach with the path methodology of this paper, as this transition involves a large structural difference (RMSD = 0.62 nm) with reorganization of all H-bonds and rotation of all backbone dihedral angles. The *α*-helix and *β*-hairpin structures used as path endpoints (see Fig. 5) were obtained from the study of Park *et al.* so that our results could be directly compared to the 42 ± 4 kJ/mol free energy difference determined via the deactivated morphing method.^{44} These coordinates are from energy optimizations in a vacuum; during standard MD simulations, the solvated *β*-hairpin unfolds in under 200 ps and folds into the *α*-helix in less than 5 ns. Although the *β*-hairpin structure of deca-alanine is unstable without atomic restraints in explicit solvent, a corresponding end state can nonetheless be defined in the current method according to a cutoff in *s*-space, as described in Eq. (13). In the calculations, *Z*(**r**) was restrained with *κ*_{z} ranging from 1 × 10^{4} kJ/mol nm^{4} to 5 × 10^{4} kJ/mol nm^{4}. These values are high enough for the variable *S*(**r**) to define non-degenerate end states and low enough that the CV coupling forces could be accurately integrated with an MD time step of 0.5 fs.

In Fig. 6(a), the restrained PMFs, *F*_{r}(*s*), generated from the path-UFED simulations using Eq. (11), are shown. As *κ*_{z} increases, we observe a large increase in the free energy barrier between end states due to the fact that larger *κ*_{z} restraints confine the system closer to the *Z*(**r**) = 0 path. However, this mathematically defined path is not necessarily favorable for the molecular system in the intermediate region between end states. We note that the *Z*(**r**) = 0 path is not intended to mimic the physical transition path as in Ref. 31, but rather to enforce sampling of conformations close to the reference end-point structures for *S* = 1 and 2 and to restrict conformational diversity in intermediate regions. Figure 6(c) shows the time evolution of the *S*(**r**) CV in the beginning of the simulation. For the trajectories with large *κ*_{z} that have higher free energy barriers, we see that the simulations take longer to leave the initial state as the UFED adaptive potential must fill a deeper minimum on the PMF.

End-state free energies [Eq. (13)] calculated from the path-UFED simulations were unbiased to remove the free energy contributions due to the *Z*-restraints using the staged overlap sampling FEP scheme of Eq. (17). Preliminary calculations (data not shown) indicated that efficient unbiasing could be obtained with a single intermediate state between the *Z*-restrained path-UFED end-state and the unrestrained peptide in the NVT ensemble. This intermediate state represents an NVT ensemble with a *Z*-restraint slightly softer than that used in the path-UFED simulation. For consistency, we chose to apply the same procedure at all end-points, although this step of the method could be further optimized for better efficiency. The results of the unbiasing simulations and the exact parameters used in each case are shown in Tables S3 and S4 of the supplementary material.

The state free energy differences calculated along the path, $\Delta Fpath\alpha \beta $, and the unbiased state free energy differences, Δ*F*^{αβ}, calculated following Eq. (18) are shown in Table III for different *κ*_{z} values. Also shown are the effective RMSD cutoffs, which are a consequence of the cutoffs on *S*(**r**) used to define conformational states in Eqs. (13) and (17). In general, these cutoffs are smaller as *κ*_{z} increases due to the reduced conformational sampling allowed by the stronger *Z*-restraints. In the lowest *κ*_{z} simulations, the *β*-hairpin state has a larger average RMSD relative to the defined end-state than in any other path-UFED simulation (0.15 nm as compared to a maximum of 0.13 nm for other *κ*_{z} values). By contrast, the *α*-helix form has less conformational entropy, as in the simulations in the work of Park *et al.*,^{44} and the sampling of this end state is less variable for this range of *κ*_{z} values. As a result of the increased conformational entropy of the *β*-hairpin in the lowest *κ*_{z} simulation, the value $\Delta Fpath\alpha \beta $ is smaller than the other path-UFED simulations. We also note that increasing the value of *κ*_{z} restricts the overall conformational freedom of the system, as seen in Fig. S3 of the supplementary material, which shows the variance of deca-alanine dihedral angles during path-UFED simulations with different *κ*_{z} values. When the restraint on *Z*(**r**) is too large, motions that the system would naturally undergo during a conformational change could be restricted and thorough sampling of each free energy minimum becomes more unlikely. For intermediate values of *κ*_{z}, the *α*-helix conformation of deca-alanine is lower in free energy than the *β*-hairpin conformation by approximately 40-44 kJ/mol, in good agreement with the deactivated morphing result for comparable RMSD cutoffs of 0.2 nm, which yielded a Δ*F*^{αβ} of 42 ± 4 kJ/mol in Ref. 44.

. | κ_{z}
. | $\Delta Fpath\alpha \beta $ . | Unbiased ΔF^{αβ}
. | RMSD cutoff, . | RMSD cutoff, . |
---|---|---|---|---|---|

. | (kJ/mol nm^{4})
. | (kJ/mol) . | (kJ/mol) . | helix (nm) . | hairpin (nm) . |

This work | 1 × 10^{4} | 37.1(2.0) | 24.6(2.4) | 0.28 | 0.31 |

2 × 10^{4} | 53.1(4.5) | 43.5(4.7) | 0.26 | 0.25 | |

3 × 10^{4} | 55.6(5.6) | 43.7(5.9) | 0.22 | 0.24 | |

4 × 10^{4} | 51.2(6.8) | 42.9(6.9) | 0.19 | 0.21 | |

5 × 10^{4} | 49.4(4.0) | 39.7(4.1) | 0.22 | 0.21 | |

Park et al. | 42(4) | 0.2 | 0.2 |

. | κ_{z}
. | $\Delta Fpath\alpha \beta $ . | Unbiased ΔF^{αβ}
. | RMSD cutoff, . | RMSD cutoff, . |
---|---|---|---|---|---|

. | (kJ/mol nm^{4})
. | (kJ/mol) . | (kJ/mol) . | helix (nm) . | hairpin (nm) . |

This work | 1 × 10^{4} | 37.1(2.0) | 24.6(2.4) | 0.28 | 0.31 |

2 × 10^{4} | 53.1(4.5) | 43.5(4.7) | 0.26 | 0.25 | |

3 × 10^{4} | 55.6(5.6) | 43.7(5.9) | 0.22 | 0.24 | |

4 × 10^{4} | 51.2(6.8) | 42.9(6.9) | 0.19 | 0.21 | |

5 × 10^{4} | 49.4(4.0) | 39.7(4.1) | 0.22 | 0.21 | |

Park et al. | 42(4) | 0.2 | 0.2 |

The deactivated morphing approach requires 30 intermediate states for each of the two defined structures in order to generate the difference in free energy due to the activation of position restraints plus four steps to deactivate intramolecular interactions, with an additional 50 steps to morph the structures of the deactivated atoms. This amounts to a total of more than 110 independent simulations for an estimated combined duration exceeding 150 ns. The method of this paper requires one long path-UFED simulation and a handful of short simulations of the end states to unbias for the *Z*-restraint. As can be seen in Fig. 6(b), the state free energy difference reaches acceptable convergence after 150 ns, which makes our method competitive with that of Ref. 44 in terms of performance and very straightforward to implement. We note that we were able to produce this transition using only the conformational end-points, without specifying any specific path, which is quite remarkable given the complexity of the transition.

### C. Melan-A

Path-UFED simulations were performed to study the free energy difference Δ*F*^{EB} between the extended and bulged states of the Melan-A peptide bound to the MHC protein (see Fig. 7). *Z*(**r**) was restrained with *κ*_{z} ranging from 1 × 10^{5} kJ/mol nm^{4} to 3 × 10^{5} kJ/mol nm^{4}, with *κ*_{s} = 1 × 10^{6} kJ/mol and *λ* = 47.5 nm^{−2}. Figure 8 shows the restrained PMFs from these calculations. As expected, increasing the restraint on *Z*(**r**) increases the free energy barrier between states. For each path-UFED simulation, MD simulations of the extended and bulged states with a *Z*-restraint yielding favorable distribution overlap were run in order to determine cutoffs in *S*(**r**)-space, which in turn translate to RMSD cutoffs for each state, as shown in Table IV.

. | κ_{z}
. | $\Delta FpathEB$ . | Unrestrained . | RMSD cutoff, . | RMSD cutoff, . |
---|---|---|---|---|---|

. | (kJ/mol nm^{4})
. | (kJ/mol) . | ΔF^{EB}(kJ/mol)
. | extended (nm) . | bulged (nm) . |

UFED | 1 × 10^{5} | −19.4(3.0) | −14.4(3.5) | 0.24 | 0.20 |

2 × 10^{5} | −20.6(2.2) | −12.9(2.5) | 0.18 | 0.18 | |

3 × 10^{5} | −18.1(4.5) | −12.5(4.6) | 0.17 | 0.16 | |

NVT | 0.20 | 0.16 |

. | κ_{z}
. | $\Delta FpathEB$ . | Unrestrained . | RMSD cutoff, . | RMSD cutoff, . |
---|---|---|---|---|---|

. | (kJ/mol nm^{4})
. | (kJ/mol) . | ΔF^{EB}(kJ/mol)
. | extended (nm) . | bulged (nm) . |

UFED | 1 × 10^{5} | −19.4(3.0) | −14.4(3.5) | 0.24 | 0.20 |

2 × 10^{5} | −20.6(2.2) | −12.9(2.5) | 0.18 | 0.18 | |

3 × 10^{5} | −18.1(4.5) | −12.5(4.6) | 0.17 | 0.16 | |

NVT | 0.20 | 0.16 |

The same staged overlap sampling FEP Eq. (17) procedure as described above for deca-alanine was applied to calculate the unbiased Δ*F*^{EB}, which are shown in Table IV. Tables S1 and S2 of the supplementary material show the details of the unbiasing procedure in each case. Table IV also includes the maximum RMSD from reference bulged and extended structures observed during unrestrained MD simulations in each state, labeled “NVT.” The simulation with the most permissive effective RMSD cutoffs yields a bulged state which is lower in free energy than the extended state by Δ*F*^{EB} = −14.4 ± 3.5 kJ/mol. With a higher restraint on *Z*(**r**) and consequently more strict cutoffs on RMSD from reference structures, Δ*F*^{EB} becomes slightly less negative; however, each calculated free energy difference is within the standard deviations of all other free energy differences.

The prediction that the bulged conformation is preferable to the extended conformation in the Melan-A system with peptide AAG is not entirely expected as the system crystallizes in the extended state.^{50} However, every other peptide crystallized in the extended state in Melan-A has been non-immunogenic while every peptide crystallized in the bulged state in Melan-A has been immunogenic. The notion that AAG might actually favor the bulged state in solution at 300 K would resolve this puzzle without even invoking induced fit upon TCR binding. To explore thermodynamic factors that may contribute to the stabilization of the bulged state at 300 K, we calculated the difference in internal energy and entropy of the peptide using 200-ns NVT simulations in each state. The average internal energy works out to be 113 ± 10 kJ/mol lower in the bulged peptide (uncertainty estimated by bootstrapping). This is counterbalanced by an entropic advantage of 120 ± 29 kJ/mol (at 300 K) for the extended peptide calculated by the quasi-harmonic approximation^{65} (uncertainty estimated by dividing each 200 ns trajectory into 16 overlapping 50 ns segments). The fact that the extended state is entropically favored (as previously pointed out by others^{51}) seems to be at odds with the presence of the extended state in the (low temperature) crystal structure, whereas, according to our calculations, the bulged state is preferred at room temperature. We note, however, that this analysis includes only the peptide’s internal entropy and neglects the rest of the protein and the solvent. In particular, the number of water molecules trapped between the peptide and the MHC might be larger in the extended conformation, which would make this state entropically disfavored compared to the bulged state and thus more stable at low temperatures.

While the example of the Melan-A AAG peptide is useful to demonstrate the applicability of the path-UFED method to a system of biological interest, the usual rules of caution apply when attempting to compare MD simulation results to experimental data. For example, the force field might not faithfully reflect subtle equilibria between metastable molecular conformations. In particular, the CHARMM27 force field used here has been shown to over-stabilize helical conformations over more extended conformations such as *β*-strands,^{66,67} which may shift the equilibrium of AAG toward the bulged state. In addition, from a simulation perspective, our decision to focus on the MHC *α*1,2 domains and leave out the *α*3 domain and the *β*-microglobulin (see Fig. 7) may be an oversimplification. Finally, we note that the particular conditions for crystallization of the AAG/MHC complex, such as the presence of polyethylene glycol in the solution,^{50} may also have stabilized the unusual extended conformation in the crystal structure, which would explain the disconnect with simulations in pure water.

## IV. DISCUSSION AND CONCLUSIONS

A method for calculating the free energy difference of two structurally defined conformational states of a chemical system was developed. This method involves identifying one reference structure for each conformational state and driving the system along a mathematically defined path connecting these reference structures with an enhanced sampling method such as dAFED or UFED. Sampling is restricted to a tube around the path by imposing a restraint on displacements orthogonal to the path. The path CV methodology was first introduced in Ref. 31 with many intermediate reference structure to define the path. We note that when using only two reference structures as in the present study, the resulting path CVs take on a form similar in spirit to the pair of CVs introduced in Refs. 32 and 33 to drive molecular conformational changes, although these possess a different, more complex, functional form.

In the application of path-dAFED and path-UFED, there are several important parameters to optimize. First, *λ*, involved in the definition of the path CVs themselves, has some flexibility, although a simple and effective prescription is available and was employed in this paper. Second, the *κ*_{s} and *m*_{s} parameters defining, respectively, the strength of the harmonic potential coupling the physical system to an extended phase-space variable *s* and the fictitious mass of *s* are specific to the dAFED/UFED methodology. The coupling constant *κ*_{s} should be chosen as tight as possible while preserving accurate numerical integration with the chosen time step. The mass *m*_{s} should be chosen large enough to ensure effective adiabatic decoupling between extended and physical systems, while keeping the average diffusion velocity fast enough for extensive sampling during the planned simulation time. In the case of UFED, the hill size and deposition rate must be chosen as well,^{25} keeping in mind that these choices are much less sensitive than in standard metadynamics (e.g., large hills can be used), since the PMF is reconstructed using the force.

A final parameter, *κ*_{z}, is more consequential. The variable *Z*(**r**) represents an orthogonal “distance” from the path defined by *Z*(**r**) = 0 and *κ*_{z} is the strength of a harmonic potential which restrains *Z*(**r**) to be near zero. In a rigid, simple system such as cyclohexane, driving *S*(**r**) is sufficient and no *Z*(**r**)-restraint is needed. However, for more complex systems, defining the path with only two end states yields high degeneracy for intermediate values of *S*(**r**), which leads to poor convergence properties. Applying a restraint on *Z*(**r**) restricts the system to a narrow tube around the path and to physically meaningful basins around the reference structures close to *S*(**r**) = 1 and *S*(**r**) = 2. The flip side is that increasing the strength of this restraint increases the free energy barrier separating end states as the system is forced to follow the arbitrary, mathematically defined path, *Z*(**r**) = 0, instead of its lowest free energy path. With a very high barrier, barrier crossing requires aggressive enhanced sampling and the free energy calculated is prone to error. The use of a restraint on *Z*(**r**) also introduces the need to unbias for the effect of the restraint on the computed free energy difference, which is itself an important source of error. When the restraint on *Z*(**r**) is neither too weak nor too strong, end states represent only conformations of interest and enhanced sampling along *S*(**r**) can proceed along a physically reasonable path connecting the end states.

With the minimal input of two reference structures and a few parameters, path-dAFED/UFED simulations can efficiently generate free energy differences between very different conformational states in a straightforward manner. The philosophy of the method relies on the fact that sampling the minimum free energy path (or the bundle of paths possible for a spontaneous transition) is not necessary to calculate end-point free energy differences. Instead it is preferable to sample one well-defined artificially enforced path, even if it has a higher free energy barrier. We note that methods such as umbrella sampling^{14} or the single sweep method,^{24} where sampling is performed locally in separate windows, would not be appropriate in this framework because neighboring windows would be likely to sample regions that are completely disconnected in the multi-dimensional conformational space. Instead, the fact that the trajectories generated by path-dAFED or path-UFED effectively connect the end states ensures connectivity of the transition path and the meaningfulness of the calculated free energies. Note that path-dAFED or path-UFED could be used to find increasingly optimized paths between two states by adding intermediate reference conformations following an iterative procedure as in Ref. 31. Alternatively, path-dAFED or path-UFED can be the final stage after a realistic path is found by, for instance, the nudged elastic band method.^{68} We note that other methods relying on nonphysical paths, such as deactivated morphing,^{44} cannot be easily extended for path discovery.

In this report, we demonstrated the effectiveness of the path-dAFED/UFED method in three examples. For the cyclohexane chair-chair interconversion process, path-dAFED generates the PMF and free energy difference between states with high accuracy despite a high driving temperature *T*_{s} and a high transition barrier. For this simple system, the observed consistency of both methods represents a convincing proof of concept for path-dAFED.

For deca-alanine, path-UFED simulations of the *α*-helix to *β*-hairpin transition followed by staged FEP simulations to unbias for the restraint on *Z*(**r**) agrees well with the previously published computational results using deactivated morphing^{44} and similar cutoffs to define conformational states, as seen in Table III. The performance of both methods, converging in about 150 ns, is comparable. Note that while deactivated morphing required more than 110 different simulations with complex alchemical decoupling steps, path-UFED required only one path sampling simulation plus four end-point simulations to unbias the *Z*-restraint.

From path-UFED simulations of Melan-A AAG peptide bound to HLA-A2, we find that the hypothetical bulged state appears more stable at room temperature than the experimentally crystallized extended state. Simple estimations of the peptide’s conformational entropy in both states indicate that this tendency could not be reversed by temperature alone. This disagreement with experiment could be due to simplifying assumptions in the model of HLA-A2 for this paper or the choice of the CHARMM27 force field. However, it is also possible that the extended state was artificially stabilized by the particular crystallization conditions used to obtain the X-ray structure and that the preferred conformation at room temperature is indeed the bulged one, as is the case for all other known antigenic variants of the Melan-A peptide.

In summary, we believe that the endpoint-based path-dAFED/UFED approach presented here offers an efficient and promising approach to generate free energy differences between two conformational states and allows for the possibility of subsequent path discovery. In future work, we will further explore such problems as oligopeptide binding in different receptor sites and the ranking of different binding poses of small molecule inhibitors in enzyme active sites. We will also combine the path-dAFED approaches with isokinetic resonance-free multiple time stepping algorithms,^{63,69,70} which will allow conformational free energy calculations to be performed with very large time steps.^{64}

## SUPPLEMENTARY MATERIAL

See supplementary material for the following additional figures, tables, and discussions: (1) Dependence on the free energy profiles *F*(*s*) of cyclohexane on the *λ* parameter [cf. Eq. (1)] (Fig. S1). (2) Comparison of free energy profiles *F*(*s*) of cyclohexane with and without a *Z*-restraint (Fig. S2). (3) Examination of the dihedral angle fluctuations of each alanine residue in a UFED simulation of the alanine decamer (Fig. S3). (4) A demonstration that overlap between reference structures and the ideal [*Z*(**r**) = 0] path need not be perfect using the Melan-A system as an example (Sec. S5). (5) A discussion of the forward and backward *Z*-restraint unbiasing terms (Fig. S4). Two tables (S1 and S2) reporting the forward and backward contributions to free energy differences in the deca-alanine and Melan-A systems, respectively. (6) Two tables (S2 and S4) reporting the restrained and unrestrained contributions as a function of *κ*_{z} to free energy differences in the deca-alanine and Melan-A systems.

## ACKNOWLEDGMENTS

M.E.T. acknowledges support from the National Science Foundation (NSF), Award No. CHE-1565980. M.A.C. acknowledges support from the Swiss National Science Foundation fellowship PA00P2_129092 for the initial phase of this work. L.V.-M. acknowledges support from the Army Research Office (ARO), Award No. W911NF-13-1-0387. E.S. acknowledges support from the New York University Materials Research Science and Engineering Center (MRSEC) program of the NSF, Award No. DMR-1420073.

### APPENDIX A: UNBIASING FREE ENERGY PROFILES

A biased probability distribution function can be related to the probability distribution function in the absence of a biasing potential. Following a published unbiasing procedure,^{71} we begin with the definition of a joint probability distribution of two variables, specifically the path variables *S*(**r**) and *Z*(**r**),

We need to express a relationship between the unbiased probability, *P*(*s*), and the restrained probability, *P*_{r}(*s*). Multiplying the numerator and denominator by

and

we obtain

Of the three factors, the first is equal to the restrained distribution *P*_{r}(*s*), the second is a constant, and the third can be expressed as a biased ensemble average by inserting a factor $exp{\u2212\beta Vz(r)}exp{\beta Vz(r)}$ into the numerator to yield

which is an expression for a restrained ensemble average of $e\beta Vz(r)$ for fixed *s*, $e\beta Vz(r)r,s$. This leads to expressions for the unrestrained probability *P*(*s*) and unrestrained free energy *F*(*s*),

### APPENDIX B: SIMULATION DETAILS

#### 1. Cyclohexane

A single molecule composed of six methylene united atoms was used to study gas-phase dynamics. An initial chair conformation and Gromos 43a5^{72} topology were obtained from the PRODRG server.^{73} All Coulomb and van der Waals forces were calculated with no cutoff. With no bond constraints, the time step employed was 1.0 fs. All cyclohexane simulations were run in the PINY_MD software^{74} using a modified version of the PLUMED plugin^{30} to implement dAFED and UFED dynamics with *s* and *z* CVs.

#### 2. Deca-alanine

System equilibration prior to enhanced sampling MD proceeded as follows. The optimized capped *α*-helix structure provided by Park *et al.* was solvated in 1697 TIP3P water molecules. Following energy minimization by steepest descent, the system was heated to 300 K in 150 ps and held at 300 K for 150 ps, while all atoms were restrained using harmonic constants of 1000 kJ/mol nm^{2}. The system was held at 300 K using the Nosé-Hoover chain algorithm.^{75} Restraints on peptide atoms were relaxed to 300 kJ/mol nm^{2}, for 50 ps, and to 100 kJ/mol nm^{2}, for another 50 ps, before being completely released. After 500 ps of simulation at constant pressure, we switched to NVT simulations using the average volume of the constant pressure phase. All stages of equilibration used a time step of 2.0 fs and bond lengths were constrained to their equilibrium values using the LINCS (LInear Constraint Solver) algorithm.^{76,77}

To equilibrate the *β*-hairpin structure for *Z*(**r**)-restrained MD, the above procedure was followed, although the position restraints were not removed. NPT and NVT equilibration phases were run with harmonic constants of 100 kJ/mol nm^{2}. This was necessary to prevent rapid unfolding of the *β*-hairpin structure provided by Park *et al.*

UFED simulations were carried out with a coupling constant *κ*_{s} = 1 × 10^{6} kJ/mol. The adaptive bias potential was built by depositing Gaussians of height 1.0 kJ/mol and standard deviation *σ* = 0.01 every 2500 time steps. For each UFED simulation, two equilibrium MD simulations—one for the helix and one for the hairpin—with an equivalent *κ*_{z} were run in order to determine cutoffs in *S*(**r**)-space, which in turn could be translated to an equivalent RMSD cutoff for each state. All simulations were run in the GROMACS (GRonigen MAchine for Chemical Simulations) software^{78} using a modified version of the PLUMED plugin^{30} to implement dAFED and UFED dynamics. PMFs and convergence plots were constructed using in-house Matlab scripts as described previously.^{26}

#### 3. Melan-A

From the PDB entry 2GUO,^{50} the crystallized form of AAG/HLA-A2 in the extended conformation, we extracted the coordinates for the peptide and the peptide-binding *α*_{1} and *α*_{2} domains of HLA-A2 (amino acids 1-182). In order to protonate the seven histidines of HLA-A2, the PROPKA server was employed to analyze local pK_{a} values and hydrogen-bonding networks.^{79–82} This complex was solvated in 15 987 TIP3P water molecules, with one chloride ion for overall charge neutrality. All stages of equilibration used a time step of 2.0 fs and the LINCS bond constraint algorithm.^{76,77} Following energy minimization by steepest descent, three copies of the system with random initial velocities were heated to 300 K in 150 ps and held at 300 K for 150 ps, while all atoms were restrained using harmonic constants of 1000 kJ/mol nm^{2}. Restraints were relaxed to 300 kJ/mol nm^{2} for 50 ps and for 100 kJ/mol nm^{2} for another 50 ps. After this step, the position restraints were removed from all atoms except for C_{α} atoms which comprise the *α*-helices of HLA-A2. These restraints remained in place to prevent any slow HLA-A2 motions, which is justified by the observation that the crystal structures of extended and bulged systems show very little variation in the shape of HLA-A2. After 500 ps of simulation at constant pressure, we switched to NVT simulations using the average volume of the constant pressure phase for another 500 ps of equilibration.

UFED simulations were performed with a coupling constant *κ*_{s} = 1 × 10^{6} kJ/mol. A repulsive potential was built by depositing Gaussians of height 1.0 kJ/mol and *σ* = 0.1 every 2500 time steps. The extended phase space dAFED/UFED variable was held at *T*_{s} = 10 000 K with *m*_{s} = 1 × 10^{6} kJ ps^{2}/mol. Simulations were performed on three independent trajectories for faster convergence of the PMF.^{83,84}

### APPENDIX C: CONSTRUCTION OF COLLECTIVE VARIABLES $S(r)$ AND $Z(r)$

For the three systems considered in this study, the CVs *S*(**r**) and *Z*(**r**) were constructed with two reference structures **r**_{1} and **r**_{2} defining the end-point conformational states of interest.

#### 1. Cyclohexane

A canonical chair structure of six methylene pseudoatoms was used to define structure **r**_{1}. Structure **r**_{2} was generated by reflecting the non-coplanar methylene pseudoatoms through the plane. To calculate the mean square deviation *D*(**r**^{(1)}, **r**^{(2)}), Eq. (4), required to define *S*(**r**) and *Z*(**r**), all six methylene pseudoatoms were used for alignment and displacement.

#### 2. Deca-alanine

We are grateful to Park *et al.* for providing the *α*-helix and *β*-hairpin deca-alanine reference structures. We note that in their work,^{44} these structures were the result of gas-phase optimizations. Structure **r**_{1} was defined as the heavy atoms of the capped *β*-hairpin and structure **r**_{2} was defined as the same atoms from the capped *α*-helix. These structures are shown in Fig. 5. All atoms in these structures were chosen for both displacement and alignment for calculation of mean square deviation.

#### 3. Melan-A

Structures **r**_{1} and **r**_{2} were defined using the heavy atoms of the AAG peptide, as well as the side chains of HLA-A2 that are close to AAG in both the 2GUO and the 2GT9 crystal structures. Positions of atoms in the **r**_{1} structure were extracted from PDB #2GT9,^{50} where the leading glutamic acid in the EAA decamer was removed to construct a hypothetical bulged AAG nonamer. Positions of atoms in the **r**_{2} structure were extracted from PDB #2GUO,^{50} the extended conformation. The heavy atoms of the peptide and of several MHC side chains within 3.0 Å of the peptide were chosen for the calculation of mean square deviation. These atoms, together with the MHC C_{α} atoms mentioned in Appendix B 3, were chosen for alignment.