Replica-exchange molecular dynamics (REMD) and their variants have been widely used in simulations of the biomolecular structure and dynamics. Replica exchange with solute tempering (REST) is one of the methods where temperature of a pre-defined solute molecule is exchanged between replicas, while solvent temperatures in all the replicas are kept constant. REST greatly reduces the number of replicas compared to the temperature REMD, while replicas at low temperatures are often trapped under their conditions, interfering with the conformational sampling. Here, we introduce a new scheme of REST, referred to as generalized REST (gREST), where the solute region is defined as a part of a molecule or a part of the potential energy terms, such as the dihedral-angle energy term or Lennard-Jones energy term. We applied this new method to folding simulations of a β-hairpin (16 residues) and a Trp-cage (20 residues) in explicit water. The protein dihedral-angle energy term is chosen as the solute region in the simulations. gREST reduces the number of replicas necessary for good random walks in the solute-temperature space and covers a wider conformational space compared to the conventional REST2. Considering the general applicability, gREST should become a promising tool for the simulations of protein folding, conformational dynamics, and an in silico drug design.

Atomic structures of proteins and other biomolecules have been determined by using X-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy, and cryo-electron microscopy. For some biomolecules, multiple atomic structures that represent their intermediate states1 in the reaction cycles are determined, suggesting the existence of intrinsic fluctuations and large-scale conformational changes. Considering the flexible nature of proteins, structural biology has been collaborated with computer simulations like all-atom molecular dynamics (MD) simulations. However, a large time scale gap between actual biomolecular motions and MD simulations is an unresolved issue, when standard workstations or general-purpose supercomputers are used. Special-purpose supercomputers, Anton/Anton2,2,3 have extended the simulation lengths up to 100 μs or longer, although the system sizes available with Anton/Anton2 are limited to about 100/700 thousands of atoms, respectively.

Extensive conformational searches are essential for many biological applications such as protein folding, protein-ligand binding, and protein-protein associations. In protein-folding simulations, the transition-state structures and the denatured structures are also required to explore the free-energy landscapes, which are often depicted as folding funnels.4–6 In the simulations of protein-ligand binding and protein-protein association, predictions of accurate binding poses are necessary. Many other conformational states, in addition to the binding poses, should be sampled both for analyzing the molecular mechanisms and for evaluating the binding affinities quantitatively. For this purpose, various free-energy calculation methods such as the free-energy perturbation (FEP)7 or umbrella sampling8 are often combined with enhanced conformational sampling schemes discussed below.

Over the twenty years, many methods were proposed to overcome the limitation of conformational sampling with MD simulation. Representative methods are replica-exchange methods,9–11 simulated tempering,12 integrated tempering sampling (ITS),13,14 accelerated MD,15,16 self-guided MD,17 self-guided Langevin dynamics,18,19 parallel cascade selection MD (PaCS MD),20 and metadynamics.21 Temperature replica-exchange MD (T-REMD) has been one of the most often used methods for many biological problems, such as protein folding and unfolding, aggregation, and conformational changes.22–25 T-REMD is theoretically straightforward and robust. However, they demand a large number of replicas to enhance conformational sampling of large biomolecular systems. Therefore, implicit models of solvent and membranes have been often incorporated with T-REMD.26–29 

The replica exchange with solute tempering (REST)30 method and its variants31–34 were developed for reducing the number of replicas required in T-REMD. In REST, a target system is divided into the solute and solvent regions. Only the temperature of the solute region is virtually increased by using the modified Hamiltonian, while the temperature of the solvent region is kept intact. This modification significantly reduces the number of replicas in REST compared to T-REMD. REST2, the most frequently used variant of REST,32,33,35 is now available in several MD packages, such as NAMD,36 Gromacs,32,33 and Desmond.37 There are several reports on the successful applications of REST2 simulations for conformational sampling of biomolecular systems: the relaxation of the lipid-bilayer system,38 the binding of a small peptide to the membrane,39 and so on. Furthermore, the REST scheme has been combined with replica-exchange umbrella sampling (REUS)10 or free-energy perturbation (FEP). The combined approaches are named REST/REUS or REST/FEP, which are particularly useful for protein-ligand binding.40–43 

REST and its variants have a known problem originated from a large temperature gap between “hot” solute and “cold” solvent regions. As reported by Huang et al., trajectories in the original REST are often trapped at a certain stable solute structure in the simulations.44 This problem remains unsolved in REST2.35 Introducing a large number of solvent water molecules into the solute region can suppress this issue, while computational costs increase considerably.44 

In this study, we introduce a flexible framework for the choice of the solute region in the REST2 scheme, which is referred to as the generalized REST (gREST). A part of a solute molecule as well as a part of the potential energy terms is selected as the solute region. These extensions are useful when local motions of the selected region are of interest in the simulation. The rational choice of the solute region in the framework effectively limits the energy space covered in REST and reduces the number of required replicas. We examine the conformational sampling efficiency of gREST in folding simulations of a β-hairpin from the C-terminal region of the B1 domain of protein G45 and Trp-cage (TC5b)46 in explicit solvent water. Interestingly, gREST can resolve the inherent issue in the original REST and sample wider conformational spaces of these proteins. We also provide the formulation of a multistate Bennett acceptance ratio (MBAR)47 to utilize trajectories of all the replicas in the REST simulations. By using MBAR, we compared the free-energy landscapes of the β-hairpin and Trp-cage for quantitative comparisons of sampling efficiencies. gREST is able to sample not only the native structure but also other compact non-native structures in solution.

The number of replicas required in T-REMD is proportional to N1/2, where N is the number of atoms in the target system. This indicates that a huge computational resource is required for T-REMD if the system size becomes large. REST was developed by Berne et al.30 to reduce the number of required replicas in the simulation. The fundamental idea of REST is the exclusion of solvent-solvent interaction energy from the replica-exchange probability by modifying the Hamiltonian of the system. After the development of the original REST, several variants of REST with different modified Hamiltonian have been proposed.31–33 The most commonly used REST (REST2) defines the potential energy of replica a having a solute temperature index of m as32,33,35

(1)

where Euu, Euv, and Evv are the solute-solute, solute-solvent, and solvent-solvent interaction energies, respectively. β0 is the inverse temperature of the simulation system (or the inverse of the solvent temperature), βm is the inverse of the solute temperature Tm for solute temperature index m, and Xa is the atomic coordinate of the target molecular system in replica a. Note that the simulation temperature, T0, in REST2 is common for all the replicas, and βm or Tm is a mere scaling coefficient for the solute-solute and solute-solvent interaction energies. Using this definition, the exchange probability between replicas a and b, whose solute temperature indexes are m and n, respectively, is expressed as

(2)

Since the solvent-solvent interaction energy, Evv, disappears in Eq. (2), the required number of replicas is no longer proportional to N1/2. Hereafter, we refer to the scheme (REST2) as the conventional REST.

We introduce a new REST framework in which the solute region is chosen more flexibly. In this method, which we call “generalized REST (gREST),” the solute region is defined not only as a whole solute molecule but also as a part of a solute molecule. We can also divide the potential energy function into the “solute” interactions and the “solvent” interactions. Figure 1 shows an example where only the dihedral-angle energy terms of a target molecule are selected as the “solute” region. The other potential energy terms and the rest of molecules are treated as a “solvent” in gREST. The potential energy of replica a having a solute temperature index of m in gREST is defined as

(3)

where the solute-solvent interaction energy, EuvXa, is expressed as the summation of individual solute-solvent interaction energies: EuvXa=iEuv,iXa. li is the maximum number of particles involved in the ith solute-solvent interaction [2 for bonds/Lennard-Jones (LJ)/Coulomb, 3 for angles, 4 for dihedrals or improper torsions, and 8 for CMAP in CHARMM], and ki is the number of particles that are involved in the “solute” region of the ith solute-solvent interaction. ki is always 1 for LJ and Coulomb interactions. When a part of a solute molecule is selected as the solute region, ki for the rest of energy terms can change between 1 and li − 1. Several examples of the selection of the solute region are found in the supplementary material (method SA). In gREST, Eq. (2) is modified to

(4)

Equation (4) implies several key features of gREST. First, when only the dihedral energy terms of a target molecule are selected as a “solute” (like the example in Fig. 1), there is no solute-solvent interaction. Second, the conventional REST is considered as a special case of gREST because all the ki/lis are 1/2.

FIG. 1.

REST and gREST schemes. (Left) The conventional REST (REST2) scheme with particle-based partitioning, enclosed particles (red) are solutes and the others are solvents (blue). Interactions striding over the regions are solute-solvent interactions (green). (Right) One of the gREST implementations, where only the dihedral angle energy term in the system is chosen as the “solute”; there are no solute-solvent interaction terms. gREST uses energy term-based partitioning as well as particle-based partitioning.

FIG. 1.

REST and gREST schemes. (Left) The conventional REST (REST2) scheme with particle-based partitioning, enclosed particles (red) are solutes and the others are solvents (blue). Interactions striding over the regions are solute-solvent interactions (green). (Right) One of the gREST implementations, where only the dihedral angle energy term in the system is chosen as the “solute”; there are no solute-solvent interaction terms. gREST uses energy term-based partitioning as well as particle-based partitioning.

Close modal

If we decompose the potential energy into the atomic interaction energy, Ei, Eq. (3) can be rewritten as

(5)

It is because the solute-solute and solvent-solvent interactions satisfy the following relations: ki = li and ki = 0, respectively. gREST is identical to T-REMD, if all the particles are selected as the solute region, like other REST schemes. Considering the simple modified potential energy function [Eq. (5)] and a flexible choice of the solute region, we named the method “generalized” REST (gREST) scheme. gREST was implemented in the GENESIS software package.48,49

A multistate Bennett acceptance ratio (MBAR)47 was employed to calculate free-energy differences from the REST, gREST, and two-dimensional (2D)-gREST/REUS simulations. The relative free energies of replicas are calculated by solving the following equation iteratively:

(6)

where irep, jrep, and krep are the solute temperature indexes [the same as m or n in Eqs. (1)–(5)]. f^irep is the free energy for irepth solute temperature, and EirepREST is the modified potential energy used in REST or gREST [Eq. (1) or (3)]. Nrep is the number of replicas, Nstep is the number of snapshots in a trajectory (assumed to be the same in all the replicas), and Xnsjrep is the nsth coordinate of the jrepth solute temperature index (assuming that the coordinates are reordered according to the solute temperature index). EirepRESTXnsjrep is evaluated using the trajectories after the simulation. The weight factors for the unperturbed state, W0, are calculated using the free energy values in Eq. (6) as

(7)

where c is the normalization constant and E0 is the potential energy in the unperturbed state. Histogram counts of a physical quantity, A, at a certain value Aa in the unperturbed state, H̃0Aa, are obtained by the following equation:

(8)

Q equals one only if the snapshot, X, has a value of A between AaΔA/2 and Aa+ΔA/2. ΔA is a bin width. The free-energy map at 300 K was directly calculated from this histogram using REST and gREST simulation trajectories.

We applied gREST to folding simulations of a β-hairpin from the B1 domain of protein G (residue indexes 41-56)45 and Trp-cage (TC5b)46 in explicit water. The fragment of protein G has a sequence of GEWTYDDATKTFTVTE, and it adopts a β-hairpin conformation in water. TC5b is an artificially designed 20-residue protein and has a stable folded structure. Both proteins have been often used as benchmark systems in computational studies.50–56 Huang et al. reported that replica exchanges were significantly suppressed in the folded state of Trp-cage in the original REST simulation.44 

An NMR structure of a mutated β-hairpin [protein data bank (PDB) id: 1LE3]57 was taken from the protein data bank (PDB), and the mutated three tryptophan residues (Trp5, Trp12, and Trp14 residues in the sequence) were replaced with the wild-type residues (Tyr, Phe, and Val, respectively). The N- and C-termini were capped with the acetyl and N-methyl groups, respectively. The capped β-hairpin was then solvated by water molecules, and three sodium cations were added to neutralize the system. The total system consists of 10 222 atoms. CHARMM3634,58,59 and TIP3P models were employed for the β-hairpin and water molecules, respectively. For the Trp-cage system, an NMR structure of TC5b (PDB id: 1L2Y)46 was chosen. The N- and C-termini were also capped with the acetyl and N-methyl groups, respectively. English and Garcia showed that the terminal caps significantly increase the stability of the folded state.53 The capped TC5b was then solvated by water molecules, and a chloride anion was added to neutralize the system. The total system consists of 11 138 atoms. AMBER ff14SB60 and TIP3P models were employed for proteins and water molecules, respectively.

The same equilibration procedures were applied to the β-hairpin and Trp-cage. First, energy minimizations for 5000 steps were performed. After 1-ns NVT equilibrations at 300 K, 10-ns NPT equilibrations at 300 K and 1 atm were performed. All the subsequent MD calculations were performed with NVT conditions at 300 K. Fully extended structures were obtained from a 10 ns simulation at 600 K. To cool down the whole systems, MD simulations at 300 K were performed for 1 ns. Water molecules and bonds involving a hydrogen atom were constrained with SETTLE61 and SHAKE62 algorithms, respectively. The simulation time step was set to 2 fs. A cutoff length of 9.0 Å with van der Waals force switching from 8.5 to 9.0 Å was employed in simulations of the β-hairpin system. A cutoff length of 8.0 Å without any smoothing functions was employed for the Trp-cage system. All the MD calculations in this study were performed with a development version of GENESIS SPDYN.48,49

For the conventional REST simulation, we employed 10 replicas, where Tms were 300, 322, 345, 368, 394, 423, 455, 491, 529, and 572 K, according to the previous study on the Trp-cage by Wang et al.35 On the other hand, we used 5 replicas with temperatures 300, 340, 390, 450, and 520 K in gREST. The same sets of effective temperatures were used for the β-hairpin and Trp-cage simulations. The replica-exchange probabilities were between 15% and 40% in these simulations. gREST simulations of the β-hairpin adopted all the dihedral angle and CMAP energy terms as the “solute” region. On the other hand, gREST simulations of Trp-cage (using the AMBER force field) employed all the dihedral angle energy terms as the solute region. They are referred to as gREST(dihed+CMAP) and gREST(dihed), respectively, hereafter. Before the production, 30 ns MD simulation for each replica was conducted to equilibrate the systems without any replica exchanges. We then performed REST and gREST(dihed+CMAP) simulations of the β-hairpin system for 1000 ns per replica. For the Trp-cage system, REST and gREST(dihed) simulations were performed for 2500 ns and 3000 ns per each replica, respectively. Exchange of replicas was tried every 2 ps.

We also performed 2D-gREST/REUS simulations of Trp-cage, where all the dihedral terms were selected as the “solute” region, and the radius of gyration (Rg) calculated for all the heavy atoms in Trp-cage was selected as a reaction coordinate in REUS. The initial structure was the same as the REST and gREST simulations above. We employed 20 replicas in total, where 5 replicas for gREST with the same set of solute temperatures and 4 replicas for Rg harmonic restraint windows in REUS were used. The force constant for the restraint was 1.5 (kcal/mol)/Å2, and the centers of Rg harmonic restraints for the four windows were 7.0, 8.5, 10.0, and 11.5 Å. After 30-ns equilibration for each replica, 2D-gREST(dihed)/REUS(Rg) was carried out for 1000 ns per replica. For comparison, conventional MD (5000 ns) and T-REMD (64 replicas, 1000 ns per each replica) simulations were also performed for the Trp-cage system. The detailed parameters and simulation setup for the T-REMD were given in the supplementary material (method SB).

Principal component analysis (PCA)63,64 sometimes requires a higher dimensional space to distinguish different structures, and independent component analysis (ICA)65,66 is suitable to analyze slow motions of proteins using a time correlation function. Instead of those methods, we utilize non-metric multidimensional scaling (NMDS)67 for investigating structure ensembles obtained from each simulation. First, distances between all the pair of structures are computed and are sorted in ascending order. Actual distance values are then discarded, and thus this method is called non-metric. Second, data points representing structure ensembles are first placed randomly onto an arbitrary low dimension space and then moved around to minimize the mismatch in the ascending distance order. The mismatch is called “stress,” which is a measure to show the accuracy of mapping. The advantage of NMDS in the analysis of a structural ensemble is that different structures are distinguished even in a very low dimensional space. The structure ensemble analysis based on NMDS is easily done with the MASS library68 of the R programming language.69 

We extracted 5000 structures from the lowest Tm trajectory of each of REST and gREST. The 10 000 structures in total were mapped onto a two-dimensional plane (xy) using NMDS. Protein-heavy-atom root mean square deviation (RMSD) between structures was used as the distance, which is also employed in the clustering algorithm by Daura et al.70 NMDS calculations were performed four times with different random seed values. Among those results, a result with the least stress value, 0.10, was adopted.

We performed folding simulations of the β-hairpin from the B1 domain of protein G starting from its extended structure both with the conventional REST (10 replicas) and gREST(dihed+CMAP) (5 replicas). Figure 2 shows the time series of protein-heavy-atom root mean square deviations (RMSDs) from the native state. In REST, one replica (ID 9) successfully found the native β-hairpin structure within 300 ns, while the structure was unfolded after 800 ns. The rest of the replicas did not sample the native structure regardless of the large conformational fluctuations. In gREST, replica ID 1 sampled the native structure within 650 ns, and the folded structure was kept until the end of the simulation. In the supplementary material, we also show the time series of RMSD for all the replicas (Fig. S1 of the supplementary material).

FIG. 2.

Time series of protein all heavy atom RMSD from the native state along β-hairpin trajectories. Trajectories of successfully folded replicas [ID 1 of gREST(dihed+CMAP), ID 9 of REST] and one of the other replicas which was not folded within 1000 ns are shown. The initial (left) and native (right) structures are also shown in the top of this figure. Plots for the other replicas are available in Fig. S1 of the supplementary material.

FIG. 2.

Time series of protein all heavy atom RMSD from the native state along β-hairpin trajectories. Trajectories of successfully folded replicas [ID 1 of gREST(dihed+CMAP), ID 9 of REST] and one of the other replicas which was not folded within 1000 ns are shown. The initial (left) and native (right) structures are also shown in the top of this figure. Plots for the other replicas are available in Fig. S1 of the supplementary material.

Close modal

Folding simulations of Trp-cage from its extended structures were also performed with the conventional REST and gREST(dihed). Figure 3 shows the time series of the RMSDs from the native state for the selected replicas within the first 1500 ns. Full length data for all the replicas in REST (10 replicas, 2500 ns) and gREST(dihed) (5 replicas, 3000 ns) are available in the supplementary material (Fig. S2). In REST, three replicas including replica ID 9 in Fig. 3 found the native structure, while in gREST, the native structures were obtained in four replicas except for replica ID 1 (see Fig. S2). It should be noted that the structures with small RMSD in the ID 4 trajectory of gREST (200-1200 ns) were not identical to the native state. This state is one of the native-like states, as shown in Fig. 3 (top, middle), which has a slightly different orientation of the N-terminal helix and is more flexible than the native structure. In the replica ID 4, the native structure appeared after 1300 ns and was kept stable until the end of the simulation.

FIG. 3.

Time series of protein heavy atom RMSD from the native state along Trp-cage trajectories. Trajectories of successfully folded replicas [IDs 4 and 5 of gREST(dihed), ID 9 of REST] and one of the other replicas which was not folded within 1500 ns are shown. The initial (left) and native (right) structures are also shown in the top of this figure. The center snapshot is a misfolded state appeared in an ID 4 trajectory of dihedral gREST. Only the first 1500 ns part is shown in this figure. The whole data are available in Fig. S4 of the supplementary material.

FIG. 3.

Time series of protein heavy atom RMSD from the native state along Trp-cage trajectories. Trajectories of successfully folded replicas [IDs 4 and 5 of gREST(dihed), ID 9 of REST] and one of the other replicas which was not folded within 1500 ns are shown. The initial (left) and native (right) structures are also shown in the top of this figure. The center snapshot is a misfolded state appeared in an ID 4 trajectory of dihedral gREST. Only the first 1500 ns part is shown in this figure. The whole data are available in Fig. S4 of the supplementary material.

Close modal

We also performed a 5000 ns conventional MD simulation of Trp-cage at 300 K from the extended unfolded state. No folded structures were obtained during the conventional MD simulation [Fig. S3(a) of the supplementary material]. This suggests that REST simulations can successfully accelerate the folding of small proteins in water. The conventional REST found out the native state faster than gREST, probably because twice as many replicas as gREST were used in the conventional REST. The scaling coefficient for the solute-solvent interaction energy, which was absent in gREST(dihed), would also contribute to the faster folding. In gREST(dihed) and gREST(dihed+CMAP) with just 5 replicas, the native structures of the β-hairpin and Trp-cage were obtained in multiple replicas within a reasonable time, suggesting the usefulness of gREST in protein-folding simulations. We also performed a 1500 ns conventional REST simulation of Trp-cage with 5 replicas to compare the efficiency against gREST(dihed). However, folded structures were not obtained in this simulation [Fig. S3(b) of the supplementary material]. This result also supports the advantage of accelerating only the dihedral motions in the folding of small water-soluble proteins.

The time series of the solute temperature indexes of the selected replicas of Trp-cage simulations are shown in Fig. 4. The selected replicas are the same as those in Fig. 3: replica IDs 5 and 9 for REST and replica IDs 3, 4, and 5 for gREST. Random walks in the solute temperature space are generally required to achieve efficient samplings in REST simulations. gREST(dihed) trajectories show desirable random walks in the solute temperature space, which are almost independent of the protein structures. In REST, the situation is apparently different from gREST. In the replica ID 9 of REST, the solute temperature index was trapped in the low Tm domain once the native state was found. This behavior is also observed in the β-hairpin simulation (Fig. S4 of the supplementary material). These results are basically the same as the previously reported result for the original REST.44 It would eventually result in a poor performance in conformational search, although REST showed good performance in finding the native state. In the β-hairpin simulation, the random walk in the solute temperature space for replica ID 9 was recovered around 800 ns, presumably due to a relatively low intrinsic stability of the folded state (population of the β-hairpin conformation is up to 40% in water).45 

FIG. 4.

Time series of a solute temperature index (m) of the Trp-cage simulation with gREST(dihed) and conventional REST, where only the first 1500 ns region is shown. The solute temperature index 1 corresponds to the lowest solute temperature, Tm, of 300 K in both of the two cases. The replica IDs are the same as those in Fig. 3. gREST(dihed) and REST employed 5 and 10 replicas, respectively.

FIG. 4.

Time series of a solute temperature index (m) of the Trp-cage simulation with gREST(dihed) and conventional REST, where only the first 1500 ns region is shown. The solute temperature index 1 corresponds to the lowest solute temperature, Tm, of 300 K in both of the two cases. The replica IDs are the same as those in Fig. 3. gREST(dihed) and REST employed 5 and 10 replicas, respectively.

Close modal

To understand the origin of this problem, we calculated the energy components, Euu, Euv, and Evv, along the trajectories (Fig. 5). There is no solute-solvent interaction, Euv, in gREST(dihed). There were steep drops in Euu of the REST trajectories, which coincided with the folding events. The first drop in Euu occurred when the first helix was formed, while the second drop at 600 ns was synchronized with the formation of a stable salt-bridge between Asp9 and Arg16. In the case of the β-hairpin, the same trend is observed in Fig. S5 of the supplementary material due to the formation of backbone hydrogen bonds. The drastic changes in Euu likely decouple the replicas with stable protein conformations from the others, although Euv shows the opposite changes. By contrast, Euu in gREST(dihed) or gREST(dihed+CMAP) showed moderate changes, fluctuating around the averaged values regardless of the folding events in Trp-cage or the β-hairpin. These facts, high replica exchange rates and moderate changes of Euu, in the folding trajectories of gREST suggest that a reasonable choice of the solute energy terms could solve the “hot protein and cold solvent problem.” Previously, it was proposed to include several solvent molecules into the solute region of REST to solve the problem.44 In contrast to the previous approach, gREST is computationally more efficient since it does not need to increase the number of replicas.

FIG. 5.

Time series of unperturbed energies [ignoring scaling coefficients of REST and gREST(dihed)] of Trp-cage. The same trajectories as in Figs. 3 and 4 are also used here. Euu, Euv, and Evv are solute-solute, solute-solvent, and solvent-solvent interaction energies, respectively. Etotal is the sum of Euu, Euv, and Evv. Note that gREST(dihed) in the current setup does not have solute-solvent interactions. Solute-solute energies in REST and gREST(dihed) simulations are significantly different since the latter contains some additional terms (such as protein-water Coulombic interactions).

FIG. 5.

Time series of unperturbed energies [ignoring scaling coefficients of REST and gREST(dihed)] of Trp-cage. The same trajectories as in Figs. 3 and 4 are also used here. Euu, Euv, and Evv are solute-solute, solute-solvent, and solvent-solvent interaction energies, respectively. Etotal is the sum of Euu, Euv, and Evv. Note that gREST(dihed) in the current setup does not have solute-solvent interactions. Solute-solute energies in REST and gREST(dihed) simulations are significantly different since the latter contains some additional terms (such as protein-water Coulombic interactions).

Close modal

Instead of sufficient random walks in the solute temperature space in gREST(dihed) and gREST(dihed+CMAP), once the proteins were folded in several replicas, they were rarely unfolded during the current computational time. We consider that this is not due to the “hot protein and cold solvent problem” but because the highest solute temperature may not be able to enhance global conformational changes perfectly. The choice of the highest solute temperature is a key parameter of gREST simulations as well as the original REST. We have to carefully determine the parameter depending on the purpose of each application.

To assess the conformational sampling efficiency of REST and gREST, free-energy maps of Trp-cage at 300 K are compared (Fig. 6). The maps were computed from the snapshots at 300 K only and from all of the trajectories using the multistate Bennett acceptance ratio (MBAR). The trajectories after the first folding events were used for the analysis: 1000-2500 ns and 1500-3000 ns for REST and gREST(dihed), respectively. To evaluate the quality of the free-energy maps, we also performed 2D-gREST(dihed)/REUS(Rg) and T-REMD simulations as references. For 2D-gREST(dihed)/REUS(Rg) and T-REMD, trajectories from 700 to 1000 ns were used to calculate the free-energy maps. The free-energy maps are spanned with two structural parameters, RMSD from the native structure and Rg of all the protein heavy atoms. Time-series data of RMSD, Rg, and replica parameter indexes are available in the supplementary material (Figs. S6 and S7).

FIG. 6.

Free-energy landscape of Trp-cage from REST [(a), (d), (g), and (j)], gREST(dihed) [(b), (e), (h), and (k)], and 2D-gREST(dihed)/REUS(Rg) simulations [(c), (f), (i), and (l)], where the maps except for (a) and (b) were calculated with MBAR. [(a)–(c)] RMSD/Rg two-dimensional free-energy maps from the lowest Tm (300 K) replica(s), [(d)–(f)] RMSD/Rg two-dimensional free-energy maps using all the replicas, [(g)–(i)] one-dimensional free-energy curves along RMSD, [(j)–(l)] one-dimensional free-energy curves along Rg. The (red) circled state indicates one of the major minima in gREST(dihed) simulations.

FIG. 6.

Free-energy landscape of Trp-cage from REST [(a), (d), (g), and (j)], gREST(dihed) [(b), (e), (h), and (k)], and 2D-gREST(dihed)/REUS(Rg) simulations [(c), (f), (i), and (l)], where the maps except for (a) and (b) were calculated with MBAR. [(a)–(c)] RMSD/Rg two-dimensional free-energy maps from the lowest Tm (300 K) replica(s), [(d)–(f)] RMSD/Rg two-dimensional free-energy maps using all the replicas, [(g)–(i)] one-dimensional free-energy curves along RMSD, [(j)–(l)] one-dimensional free-energy curves along Rg. The (red) circled state indicates one of the major minima in gREST(dihed) simulations.

Close modal

The 2D free-energy maps from the snapshots at 300 K agree with those from MBAR except for REST [Fig. 6(a)]. In the simulation of REST, sampling of the unfolded structures at 300 K seems to be insufficient. MBAR could cover the conformational spaces at higher temperatures, improving the quality of free-energy maps. Thus, the overall trends in the 2D free-energy maps between REST, gREST(dihed), and gREST(dihed)/REUS(Rg) are similar. The native structure of Trp-cage is shown in the map in the region of (RMSD, Rg) ≈ (2, 7) for the three simulations (Fig. 6). If we look at the map more carefully, a notable difference was found in the region of (RMSD, Rg) ≈ (5, 7.5): gREST(dihed) and 2D-gREST(dihed)/REUS(Rg) have a stable local-energy minimum, whereas REST does not. This minimum represents the existence of non-native compact structures of Trp-cage in solution. This difference can be seen more clearly in the 1D free-energy map for RMSD [Figs. 6(g)–6(i)]. gREST(dihed) and 2D-gREST(dihed)/REUS(Rg) sampled almost the same conformational spaces, which involve the native, compact non-native, and extended unfolded structures. Uncertainties of these free-energy maps and curves are discussed in the supplementary material (see discussion SC and Figs. S8–10).

We also calculated the free-energy map by using T-REMD. Unfortunately, convergence of such an extensive free-energy map could not be achieved only by 1000 ns of simulations. The limited number of replica-exchange trials in 1000 ns was not enough to traverse a wide range of the temperature space in T-REMD. The computational procedure and the free-energy map of this T-REMD simulation are available in the supplementary material (method SB and Fig. S11, respectively).

We thus further analyzed the structural ensemble by using the nonmetric multi-dimensional scaling (NMDS) method. The ensemble mapped on a 2D-plane (hereafter, we call it xy-plane) is shown in Fig. 7. Circled (red) structures in this figure correspond to the (red) circled state in Fig. 6. The gREST ensemble covered both native and non-native structures as expected from the results shown in Fig. 6. The non-native structures include the native-like, non-native compact, and unfolded structures. On the other hand, the REST ensemble was considerably occupied by the native and native-like structures. There were not many samples in the unfolded states (which correspond to x > 4 in the plot). Instead, there was a small cluster around (x, y) ≈ (2, 6), which corresponds to a small minimum around (RMSD, Rg) ≈ (7, 7) in Figs. 6(a) and 6(d). Structures in this cluster have a salt bridge between Asp9 and Arg16, as shown in the upper snapshot structure in Fig. 7. In gREST, structures marked with (red) a circle do not have any salt bridges within the protein. In REST, the contribution from Euu to the replica exchange probability is the largest according to Eqs. (2) and (4), suggesting that replicas with the lowest solute temperature (300 K) prefer protein structures with lower values of Euu. Since the Coulombic interaction is the dominant interaction in Euu in the conventional REST, protein structures stabilized with the salt bridge are likely found at the replicas at 300 K. By contrast, the current gREST(dihed) simulation did not include the Coulombic interaction in Euu and was free from this issue.

FIG. 7.

Structure map from the lowest Tm replica of REST and gREST(dihed) simulations obtained by NMDS (non-metric multidimensional scaling). Protein-heavy-atom RMSD between two structures is used as the distance. The final NMDS stress value was 0.10. (a) The result of NMDS; all 10 000 structures were mapped onto a two-dimensional (xy) plane. Filled circles (red) and triangles (blue) represent REST and gREST(dihed) snapshots, respectively. (b) The map for REST snapshots extracted from (a). (c) The map for gREST(dihed) snapshots extracted from (a). In (b) and (c), filled circles (red), open triangles (orange), filled triangles (yellow), and open circles (brown) represent snapshots with RMSD from the folded structure smaller than 3.0 Å, 3.0-5.0 Å, and 5.0-7.0 Å and larger than 7.0 Å, respectively. Representative structures are also shown in the figure, where the parenthesized numbers indicate the position in the mapped space. Two (red) circled structures belong to the red-circled state in Fig. 6.

FIG. 7.

Structure map from the lowest Tm replica of REST and gREST(dihed) simulations obtained by NMDS (non-metric multidimensional scaling). Protein-heavy-atom RMSD between two structures is used as the distance. The final NMDS stress value was 0.10. (a) The result of NMDS; all 10 000 structures were mapped onto a two-dimensional (xy) plane. Filled circles (red) and triangles (blue) represent REST and gREST(dihed) snapshots, respectively. (b) The map for REST snapshots extracted from (a). (c) The map for gREST(dihed) snapshots extracted from (a). In (b) and (c), filled circles (red), open triangles (orange), filled triangles (yellow), and open circles (brown) represent snapshots with RMSD from the folded structure smaller than 3.0 Å, 3.0-5.0 Å, and 5.0-7.0 Å and larger than 7.0 Å, respectively. Representative structures are also shown in the figure, where the parenthesized numbers indicate the position in the mapped space. Two (red) circled structures belong to the red-circled state in Fig. 6.

Close modal

Trp-cage has been a major target system of the classical MD simulation studies.50,51,53,55,56,71–81 Many implicit solvent simulations were performed to simulate the folding process of this small protein and to develop the force-field for the implicit solvent model. T-REMD was employed in some of those studies to achieve vast conformational search.80,81 In the explicit solvent system, T-REMD had been used to improve sampling efficiency. The utilization, however, was rather limited due to the large computational cost. Moreover, the extent of the conformational search was also limited since simulations in individual replicas were not long enough due to the large computational cost.75,79 Recently, Garcia et al. performed long folding simulations of Trp-cage from an unfolded state using T-REMD.53,76 One of their studies53 simulated the charged- or capped-end Trp-cage in water using T-REMD with 40 replicas for 4000 ns or 1000 ns per replica, respectively. They found that the terminal capping significantly facilitates the folding of the TC5b Trp-cage. We also employed this terminal capping in REST/gREST simulations but the folding time scale estimated from our results, more than 800 ns, was slower than that from Garcia’s study, 400 ns. This can be attributed to the differences in the force-field, number of replicas, and protein-solvent interactions. In the same study, they observed some stable structures in the non-native region. A REMD study by Zhou et al. in an explicit solvent system with a different force field also reported the existence of minimum in the unfolded state.55 Our free-energy map by gREST(dihed) and 2D-gREST(dihed)/REUS(Rg) simulations shown in Fig. 6 is consistent with these previous studies.

We have developed a new conformational search method gREST (generalized replica exchange with solute tempering) which allows us to select important energy terms related to structural changes as the solute region. gREST could significantly reduce the number of replicas. We tested gREST, where dihedral (and CMAP) terms were considered as the “solute” region, on the folding simulation of the β-hairpin of protein-G and Trp-cage systems. gREST successfully found the native structures in comparable rate as the conventional REST (REST2) simulation with much smaller computational costs. Moreover, flexible solute selections introduced in gREST allows us to avoid the solute temperature space trapping problem known for the conventional REST. Conformational search efficiency is also improved both in terms of the convergence rates and the accuracy of free-energy maps of the Trp-cage folding landscapes. The combined REST/REUS40 or REST/FEP42 approaches would improve their efficiency further by using gREST. The efficiency of gREST heavily depends on the choice of the “solute” region and some other parameters, like the highest solute temperature in replicas. For individual applications, parameters should be chosen carefully to maximize the conformational sampling efficiency by gREST.

See supplementary material for gREST examples, plots about β-hairpin systems, a full length record about Trp-cage simulations, the T-REMD simulation procedure, and the free-energy map by T-REMD.

Computational resources were provided by HOKUSAI GreatWave in RIKEN Advanced Center for Computing and Communication and K computer in RIKEN Advanced Institute for Computational Science through the HPCI System Research project (Project Nos. hp160022 and ra000009). This research has been funded by strategic programs for innovation research, “Computational life science and application in drug discovery and medical development,” “Novel measurement techniques for visualizing live protein molecules at work” (Grant No. 26119006), JST CREST on “Structural Life Science and Advanced Core Technologies for Innovative Life Science Research” (Grant No. JPMJCR13M3), RIKEN pioneering projects on “Dynamic Structural biology,” “Integrated Lipidology,” and “iTHES” (to Y.S.).

1.
U. K.
Genick
,
G. E. O.
Borgstahl
,
K.
Ng
,
Z.
Ren
,
C.
Pradervand
,
P. M.
Burke
,
V.
Šrajer
,
T.-Y.
Teng
,
W.
Schildkamp
,
D. E.
McRee
,
K.
Moffat
, and
E. D.
Getzoff
,
Science
275
(
5305
),
1471
1475
(
1997
).
2.
R. O.
Dror
,
C.
Young
, and
D. E.
Shaw
, in
Encyclopedia of Parallel Computing
, edited by
D.
Padua
(
Springer US
,
Boston, MA
,
2011
), pp.
60
71
.
3.
D. E.
Shaw
,
J. P.
Grossman
,
J. A.
Bank
,
B.
Batson
,
J. A.
Butts
,
J. C.
Chao
,
M. M.
Deneroff
,
R. O.
Dror
,
A.
Even
,
C. H.
Fenton
,
A.
Forte
,
J.
Gagliardo
,
G.
Gill
,
B.
Greskamp
,
C. R.
Ho
,
D. J.
Ierardi
,
L.
Iserovich
,
J. S.
Kuskin
,
R. H.
Larson
,
T.
Layman
,
L. S.
Lee
,
A. K.
Lerer
,
C.
Li
,
D.
Killebrew
,
K. M.
Mackenzie
,
S. Y. H.
Mok
,
M. A.
Moraes
,
R.
Mueller
,
L. J.
Nociolo
,
J. L.
Peticolas
,
T.
Quan
,
D.
Ramot
,
J. K.
Salmon
,
D. P.
Scarpazza
,
U. B.
Schafer
,
N.
Siddique
,
C. W.
Snyder
,
J.
Spengler
,
P. T. P.
Tang
,
M.
Theobald
,
H.
Toma
,
B.
Towles
,
B.
Vitale
,
S. C.
Wang
, and
C.
Young
, presented at the
SC14: International Conference for High Performance Computing, Networking, Storage and Analysis
,
2014
.
4.
P. E.
Leopold
,
M.
Montal
, and
J. N.
Onuchic
,
Proc. Natl. Acad. Sci. U. S. A.
89
(
18
),
8721
8725
(
1992
).
5.
J. D.
Bryngelson
,
J. N.
Onuchic
,
N. D.
Socci
, and
P. G.
Wolynes
,
Proteins: Struct., Funct., Bioinf.
21
(
3
),
167
195
(
1995
).
6.
J. N.
Onuchic
,
P. G.
Wolynes
,
Z.
Luthey-Schulten
, and
N. D.
Socci
,
Proc. Natl. Acad. Sci. U. S. A.
92
(
8
),
3626
3630
(
1995
).
7.
R. W.
Zwanzig
,
J. Chem. Phys.
22
(
8
),
1420
1426
(
1954
).
8.
G. M.
Torrie
and
J. P.
Valleau
,
J. Comput. Phys.
23
(
2
),
187
199
(
1977
).
9.
Y.
Sugita
and
Y.
Okamoto
,
Chem. Phys. Lett.
314
(
1-2
),
141
151
(
1999
).
10.
Y.
Sugita
,
A.
Kitao
, and
Y.
Okamoto
,
J. Chem. Phys.
113
(
15
),
6042
6051
(
2000
).
11.
K.
Hukushima
and
K.
Nemoto
,
J. Phys. Soc. Jpn.
65
(
6
),
1604
1608
(
1996
).
12.
E.
Marinari
and
G.
Parisi
,
Europhys. Lett.
19
(
6
),
451
(
1992
).
13.
Y. Q.
Gao
,
J. Chem. Phys.
128
(
6
),
064105
(
2008
).
14.
L.
Yang
and
Y.
Qin Gao
,
J. Chem. Phys.
131
(
21
),
214109
(
2009
).
15.
D.
Hamelberg
,
J.
Mongan
, and
J. A.
McCammon
,
J. Chem. Phys.
120
(
24
),
11919
11929
(
2004
).
16.
Y.
Miao
,
V. A.
Feher
, and
J. A.
McCammon
,
J. Chem. Theory Comput.
11
(
8
),
3584
3595
(
2015
).
17.
X.
Wu
and
S.
Wang
,
J. Phys. Chem. B
102
(
37
),
7238
7250
(
1998
).
18.
X.
Wu
and
B. R.
Brooks
,
Chem. Phys. Lett.
381
(
3-4
),
512
518
(
2003
).
19.
X.
Wu
,
A.
Damjanovic
, and
B. R.
Brooks
,
Adv. Chem. Phys.
150
,
255
326
(
2012
).
20.
R.
Harada
and
A.
Kitao
,
J. Chem. Phys.
139
(
3
),
035103
(
2013
).
21.
A.
Laio
and
M.
Parrinello
,
Proc. Natl. Acad. Sci. U. S. A.
99
(
20
),
12562
12566
(
2002
).
22.
Y.
Sugita
and
Y.
Okamoto
,
Biophys. J.
88
(
5
),
3180
3190
(
2005
).
23.
C.
Toyoshima
,
M.
Asahi
,
Y.
Sugita
,
R.
Khanna
,
T.
Tsuda
, and
D. H.
MacLennan
,
Proc. Natl. Acad. Sci. U. S. A.
100
(
2
),
467
472
(
2003
).
24.
Y.
Komuro
,
N.
Miyashita
,
T.
Mori
,
E.
Muneyuki
,
T.
Saitoh
,
D.
Kohda
, and
Y.
Sugita
,
J. Phys. Chem. B
117
(
10
),
2864
2871
(
2013
).
25.
T.
Mori
,
N.
Miyashita
,
W.
Im
,
M.
Feig
, and
Y.
Sugita
,
Biochim. Biophys. Acta, Biomembr.
1858
(
7 Pt B
),
1635
1651
(
2016
).
26.
W.
Im
,
M.
Feig
, and
C. L.
Brooks
,
Biophys. J.
85
(
5
),
2900
2918
(
2003
).
27.
M. S.
Lee
,
F. R.
Salsbury
, and
C. L.
Brooks
,
J. Chem. Phys.
116
(
24
),
10606
10614
(
2002
).
28.
N.
Miyashita
,
J. E.
Straub
,
D.
Thirumalai
, and
Y.
Sugita
,
J. Am. Chem. Soc.
131
(
10
),
3438
3439
(
2009
).
29.
D.
Ganguly
and
J.
Chen
,
J. Am. Chem. Soc.
131
(
14
),
5214
5223
(
2009
).
30.
P.
Liu
,
B.
Kim
,
R. A.
Friesner
, and
B. J.
Berne
,
Proc. Natl. Acad. Sci. U. S. A.
102
(
39
),
13749
13754
(
2005
).
31.
C.
Camilloni
,
D.
Provasi
,
G.
Tiana
, and
R. A.
Broglia
,
Proteins: Struct., Funct., Bioinf.
71
(
4
),
1647
1654
(
2008
).
32.
T.
Terakawa
,
T.
Kameda
, and
S.
Takada
,
J. Comput. Chem.
32
(
7
),
1228
1234
(
2011
).
33.
S. L. C.
Moors
,
S.
Michielssens
, and
A.
Ceulemans
,
J. Chem. Theory Comput.
7
(
1
),
231
237
(
2011
).
34.
R. B.
Best
,
X.
Zhu
,
J.
Shim
,
P. E. M.
Lopes
,
J.
Mittal
,
M.
Feig
, and
A. D.
MacKerell
,
J. Chem. Theory Comput.
8
(
9
),
3257
3273
(
2012
).
35.
L.
Wang
,
R. A.
Friesner
, and
B. J.
Berne
,
J. Phys. Chem. B
115
(
30
),
9431
9438
(
2011
).
36.
S.
Jo
and
W.
Jiang
,
Comput. Phys. Commun.
197
,
304
311
(
2015
).
37.
K. J.
Bowers
,
E.
Chow
,
H.
Xu
,
R. O.
Dror
,
M. P.
Eastwood
,
B. A.
Gregersen
,
J. L.
Klepeis
,
I.
Kolossvary
,
M. A.
Moraes
,
F. D.
Sacerdoti
,
J. K.
Salmon
,
Y.
Shan
, and
D. E.
Shaw
, in
Proceedings of the 2006 ACM/IEEE Conference on Supercomputing
(
ACM
,
Tampa, Florida
,
2006
), p.
84
.
38.
K.
Huang
and
A. E.
Garcia
,
J. Chem. Theory Comput.
10
(
10
),
4264
4272
(
2014
).
39.
A. K.
Smith
,
C.
Lockhart
, and
D. K.
Klimov
,
J. Chem. Theory Comput.
12
(
10
),
5201
5214
(
2016
).
40.
H.
Kokubo
,
T.
Tanaka
, and
Y.
Okamoto
,
J. Comput. Chem.
34
(
30
),
2601
2614
(
2013
).
41.
L.
Wang
,
Y.
Deng
,
J. L.
Knight
,
Y.
Wu
,
B.
Kim
,
W.
Sherman
,
J. C.
Shelley
,
T.
Lin
, and
R.
Abel
,
J. Chem. Theory Comput.
9
(
2
),
1282
1293
(
2013
).
42.
L.
Wang
,
B. J.
Berne
, and
R. A.
Friesner
,
Proc. Natl. Acad. Sci. U. S. A.
109
(
6
),
1937
1942
(
2012
).
43.
L.
Wang
,
Y.
Wu
,
Y.
Deng
,
B.
Kim
,
L.
Pierce
,
G.
Krilov
,
D.
Lupyan
,
S.
Robinson
,
M. K.
Dahlgren
,
J.
Greenwood
,
D. L.
Romero
,
C.
Masse
,
J. L.
Knight
,
T.
Steinbrecher
,
T.
Beuming
,
W.
Damm
,
E.
Harder
,
W.
Sherman
,
M.
Brewer
,
R.
Wester
,
M.
Murcko
,
L.
Frye
,
R.
Farid
,
T.
Lin
,
D. L.
Mobley
,
W. L.
Jorgensen
,
B. J.
Berne
,
R. A.
Friesner
, and
R.
Abel
,
J. Am. Chem. Soc.
137
(
7
),
2695
2703
(
2015
).
44.
X.
Huang
,
M.
Hagen
,
B.
Kim
,
R. A.
Friesner
,
R.
Zhou
, and
B. J.
Berne
,
J. Phys. Chem. B
111
(
19
),
5405
5410
(
2007
).
45.
F. J.
Blanco
,
G.
Rivas
, and
L.
Serrano
,
Nat. Struct. Mol. Biol.
1
(
9
),
584
590
(
1994
).
46.
J. W.
Neidigh
,
R. M.
Fesinmeyer
, and
N. H.
Andersen
,
Nat. Struct. Biol.
9
(
6
),
425
430
(
2002
).
47.
M. R.
Shirts
and
J. D.
Chodera
,
J. Chem. Phys.
129
(
12
),
124105
(
2008
).
48.
J.
Jung
,
T.
Mori
,
C.
Kobayashi
,
Y.
Matsunaga
,
T.
Yoda
,
M.
Feig
, and
Y.
Sugita
,
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
5
(
4
),
310
323
(
2015
).
49.
C.
Kobayashi
,
J.
Jung
,
Y.
Matsunaga
,
T.
Mori
,
T.
Ando
,
K.
Tamura
,
M.
Kamiya
, and
Y.
Sugita
,
J. Comput. Chem.
38
(
25
),
2193
2206
(
2017
).
50.
R. M.
Abaskharon
,
R. M.
Culik
,
G. A.
Woolley
, and
F.
Gai
,
J. Phys. Chem. Lett.
6
(
3
),
521
526
(
2015
).
51.
A.
Bille
,
B.
Linse
,
S.
Mohanty
, and
A.
Irback
,
J. Chem. Phys.
143
(
17
),
175102
(
2015
).
52.
A.
Byrne
,
D. V.
Williams
,
B.
Barua
,
S. J.
Hagen
,
B. L.
Kier
, and
N. H.
Andersen
,
Biochemistry
53
(
38
),
6011
6021
(
2014
).
53.
C. A.
English
and
A. E.
Garcia
,
J. Phys. Chem. B
119
(
25
),
7874
7881
(
2015
).
54.
R.
Zhou
,
Proc. Natl. Acad. Sci. U. S. A.
100
(
23
),
13280
13285
(
2003
).
55.
C. Y.
Zhou
,
F.
Jiang
, and
Y. D.
Wu
,
J. Chem. Theory Comput.
11
(
11
),
5473
5480
(
2015
).
56.
A.
Linhananta
,
J.
Boer
, and
I.
MacKay
,
J. Chem. Phys.
122
(
11
),
114901
(
2005
).
57.
A. G.
Cochran
,
N. J.
Skelton
, and
M. A.
Starovasnik
,
Proc. Natl. Acad. Sci. U. S. A.
98
(
10
),
5578
5583
(
2001
).
58.
A. D.
MacKerell
,
M.
Feig
, and
C. L.
Brooks
,
J. Am. Chem. Soc.
126
(
3
),
698
699
(
2004
).
59.
A. D.
MacKerell
,
D.
Bashford
,
M.
Bellott
,
R. L.
Dunbrack
,
J. D.
Evanseck
,
M. J.
Field
,
S.
Fischer
,
J.
Gao
,
H.
Guo
,
S.
Ha
,
D.
Joseph-McCarthy
,
L.
Kuchnir
,
K.
Kuczera
,
F. T. K.
Lau
,
C.
Mattos
,
S.
Michnick
,
T.
Ngo
,
D. T.
Nguyen
,
B.
Prodhom
,
W. E.
Reiher
,
B.
Roux
,
M.
Schlenkrich
,
J. C.
Smith
,
R.
Stote
,
J.
Straub
,
M.
Watanabe
,
J.
Wiórkiewicz-Kuczera
,
D.
Yin
, and
M.
Karplus
,
J. Phys. Chem. B
102
,
3586
3616
(
1998
).
60.
J. A.
Maier
,
C.
Martinez
,
K.
Kasavajhala
,
L.
Wickstrom
,
K. E.
Hauser
, and
C.
Simmerling
,
J. Chem. Theory Comput.
11
(
8
),
3696
3713
(
2015
).
61.
S.
Miyamoto
and
P. A.
Kollman
,
J. Comput. Chem.
13
(
8
),
952
962
(
1992
).
62.
J.-P.
Ryckaert
,
G.
Ciccotti
, and
H. J. C.
Berendsen
,
J. Comput. Phys.
23
(
3
),
327
341
(
1977
).
63.
S.
Hayward
,
A.
Kitao
, and
N.
,
Proteins: Struct., Funct., Bioinf.
23
(
2
),
177
186
(
1995
).
64.
A.
Kitao
and
N.
Go
,
Curr. Opin. Struct. Biol.
9
(
2
),
164
169
(
1999
).
65.
L.
Molgedey
and
H. G.
Schuster
,
Phys. Rev. Lett.
72
(
23
),
3634
3637
(
1994
).
66.
Y.
Naritomi
and
S.
Fuchigami
,
J. Chem. Phys.
134
(
6
),
065101
(
2011
).
67.
J. B.
Kruskal
,
Psychometrika
29
(
2
),
115
129
(
1964
).
68.
W. N.
Venables
and
B. D.
Ripley
,
Modern Applied Statistics with S
, 4th ed. (
Springer
,
New York
,
2002
).
69.
R Core Team
, R: A Language and Environment for Statistical Computing, https://www.R-project.org/,
2017
.
70.
X.
Daura
,
K.
Gademann
,
B.
Jaun
,
D.
Seebach
,
W. F.
van Gunsteren
, and
A. E.
Mark
,
Angew. Chem., Int. Ed.
38
(
1-2
),
236
240
(
1999
).
71.
C.
Simmerling
,
B.
Strockbine
, and
A. E.
Roitberg
,
J. Am. Chem. Soc.
124
(
38
),
11258
11259
(
2002
).
72.
R.
Geney
,
M.
Layten
,
R.
Gomperts
,
V.
Hornak
, and
C.
Simmerling
,
J. Chem. Theory Comput.
2
(
1
),
115
127
(
2006
).
73.
H.
Nguyen
,
D. R.
Roe
, and
C.
Simmerling
,
J. Chem. Theory Comput.
9
(
4
),
2020
2034
(
2013
).
74.
H.
Nguyen
,
J.
Maier
,
H.
Huang
,
V.
Perrone
, and
C.
Simmerling
,
J. Am. Chem. Soc.
136
(
40
),
13959
13962
(
2014
).
75.
D.
Paschek
,
H.
Nymeyer
, and
A. E.
Garcia
,
J. Struct. Biol.
157
(
3
),
524
533
(
2007
).
76.
D.
Paschek
,
R.
Day
, and
A. E.
Garcia
,
Phys. Chem. Chem. Phys.
13
(
44
),
19840
19847
(
2011
).
77.
S.
Kitazawa
,
M. J.
Fossat
,
S. A.
McCallum
,
A. E.
Garcia
, and
C. A.
Royer
,
J. Phys. Chem. B
121
(
6
),
1258
1267
(
2017
).
78.
S.
Chowdhury
,
M. C.
Lee
,
G.
Xiong
, and
Y.
Duan
,
J. Mol. Biol.
327
(
3
),
711
717
(
2003
).
79.
J.
Juraszek
and
P. G.
Bolhuis
,
Proc. Natl. Acad. Sci. U. S. A.
103
(
43
),
15859
15864
(
2006
).
80.
J. W.
Pitera
and
W.
Swope
,
Proc. Natl. Acad. Sci. U. S. A.
100
(
13
),
7587
7592
(
2003
).
81.
W.
Zheng
,
E.
Gallicchio
,
N.
Deng
,
M.
Andrec
, and
R. M.
Levy
,
J. Phys. Chem. B
115
(
6
),
1512
1523
(
2011
).

Supplementary Material