By analogy with single-molecule pulling experiments, we present a computational framework to obtain free energy differences between complex solvation states. To illustrate our approach, we focus on the calculation of solvation free energies (SFEs). However, the method can be readily extended to cases involving more complex solutes and solvation conditions as well as to the calculation of binding free energies. The main idea is to drag the solute across the simulation box where atomistic and ideal gas representations of the solvent coexist at constant temperature and chemical potential. At finite pulling speeds, the resulting work allows one to extract SFEs via nonequilibrium relations, whereas at *infinitely* slow pulling speeds, this process becomes equivalent to the thermodynamic integration method. Results for small molecules well agree with literature data and pave the way to systematic studies of arbitrarily large and complex molecules.

Solvation free energies (SFEs), the difference in free energy resulting from considering the solute molecule in a solvent and in the gas phase at a given temperature and pressure, are fundamental to understand various processes in biology, chemistry, and pharmaceutical sciences in view of computer-aided drug design. SFEs are often calculated using free energy methods that join solvated and gas phase states via a careful modulation of physical interactions.^{1–13} This procedure is rather involved and strongly system dependent.

Here, we compute SFEs in terms of the work necessary to pull the molecule from a region within the simulation box containing the solvent (solvated state) to another region containing an ideal gas representation of the solvent (gas phase). A uniform solvent number density is enforced throughout the simulation box. Hence, molecules can freely diffuse while changing their resolution, from interacting to ideal gas, *on the fly* which consequently guarantees a constant solvent chemical potential.

In particular, let us consider a system of *N* atoms, where a single solute molecule *σ* is solvated in *N*_{s} solvent molecules of type *α*. To concurrently simulate the fully atomistic and ideal gas representation of the solvent, we apply the adaptive resolution framework,^{14–16} particularly the Hamiltonian adaptive resolution H-AdResS method.^{17–20} There, a Hamiltonian of the system includes a switching field *λ*(**r**) that defines regions within the simulation box where intermolecular interactions are either fully atomistic [*λ*(**r**) = 1] or nonexistent [*λ*(**r**) = 0]. A smooth interpolation between the values 0 and 1, and thus a change in molecular resolution, takes place in between these two regions. The Hamiltonian of the system has the form^{21}

with (*r*, *p*) being positions and momenta and $K=\u2211i=1Npi2/2mi$ the total kinetic energy of the system. In this particular case, we write the potential energy as

where the labels *intra* and *inter* refer to intra- and intermolecular interactions, respectively. The vector **R** is the position of the center of mass of a given molecule, and the switching field *λ*(**R**), which determines the molecule’s identity, takes the value 1 if the molecule is in the atomistic region (AT), 0 is in the ideal gas region (IG), and a smooth interpolation between these values is defined in the hybrid (HY) region (see Fig. 1). It is apparent that the potential given by Eq. (2) can be generalized to cases where multiple solute molecules are present or the solvent corresponds to a molecular mixture. Moreover, the choice of reference state, in this case the gas phase, is rather arbitrary and can be calibrated depending on the problem at hand.

The Δ*H* terms are free energy compensations (FECs) introduced in the Hamiltonian to eliminate nonphysical contributions to the forces due to the gradients of *λ*(**r**) and to guarantee a flat density profile across the simulation box. For all cases considered, there are only two FEC contributions, one for all the solvent molecules, Δ*H*_{α}, and the other for the solute molecule, Δ*H*_{σ}. The FEC of the solvent has been identified with the excess (or residual in the case of mixtures) chemical potential^{19} and can be easily computed using an iterative *on-the-fly* procedure^{22} (see Sec. I of the supplementary material). In this formalism, we write the following time-dependent Hamiltonian:

with

a time-dependent harmonic restraint, with constant *κ*, applied on the *x*-coordinate of the atom *i* that belongs to the solute molecule *σ*, *x*_{i∈σ}(*t*), whose initial position is denoted by $xi\u2208\sigma 0$. This potential drives the solute molecule across the simulation box to sample different solvation conditions between two extreme cases: a fully solvated solute and its gas phase (see Fig. 1). From the Hamiltonian (3), we can compute the *thermodynamic* work that corresponds to the change in energy of the system,^{23}

with $v$ being the pulling rate. Finally, the relation between the work defined in Eq. (5) and the free energy difference between two equilibrium solvation states can be established by the Jarzynski equality (JE),^{24–26}

with *β* = 1/*k*_{B}*T*. Shortly after its formulation, the JE found immediate application in various single-molecule pulling experiments.^{27–30} One interesting limiting case of the JE corresponds to the infinitely slow pulling limit where the system equilibrates for every value of the switching field *λ*(**r**). In this case, from the JE (6) and the definition of work (5), we obtain an expression for Δ*F* completely analogous to the thermodynamic integration method,^{25,31}

where the brackets ⟨⋯⟩_{λ} indicate the ensemble average for a given value of the field *λ*(**r**) and the *δ* indicates a functional derivative. The terms *R*_{a} = *r*_{AT} + *d*_{HY} and *R*_{b} = *r*_{AT} with *r*_{AT} and *d*_{HY} being the linear dimensions of the AT and HY regions, respectively. From the Hamiltonian (3), one obtains

This quantity vanishes upon integration.^{21} For the solvent molecules *α*, this result verifies that the excess chemical potential is zero. For the solute *σ*, we have two possibilities. Δ*H*_{σ}(*λ*(**R**_{σ})) ≠ 0 implies Δ*F* = 0 and Δ*H*_{σ}(*λ*(**R**_{σ})) ≡ 0 implies that, in the infinitely slow pulling limit, the SFE can be obtained as

To investigate this, we consider a single component system. The solute is defined as one of the system’s molecules such that Δ*H*_{α} = Δ*H*_{σ}. We calculate Δ*F* by using Eq. (7) when Δ*H*_{σ} ≠ 0 and Eq. (8) when Δ*H*_{σ} = 0. With this aim, we perform umbrella sampling^{32,33} simulations of pure SPC/E^{34} water (see Secs. I and II of the supplementary material). In practice, we select a water molecule inside the AT region (schematically indicated in Fig. 1) and restrain the *x*-coordinate of its oxygen atom using the harmonic potential in Eq. (4) with zero pulling rate and spring constant *κ* = 209.2 kJ/mol/Å^{2}. First, we apply the FEC to all the molecules in the system including the selected water molecule [Δ*H*_{σ} ≠ 0 in Eq. (2)]. This solute molecule is moved sequentially by $\Delta xoxygen0=0.2\u2009$ and sampled for 20 ps to construct the biased probability distribution of *x*_{oxygen}. We perform 20 uncorrelated simulation runs to calculate the solvation free energy profile and shift it with respect to the IG region. Averages and standard deviations (error bars) are reported in Fig. 2 (blue symbols) which show, as expected and within error bars, a flat free energy profile across the simulation box (Δ*F* = 0).

Second, we repeat the same umbrella calculation as described before but without applying the FEC to the *solute* water molecule [Δ*H*_{σ} = 0 in Eq. (2)]. Results are also reported in Fig. 2 (red symbols), where a flat energy profile is apparent in both the AT and IG regions and the gap between the two amounts for the difference in free energy of −29.45 ± 2.63 kJ/mol.^{35} This result agrees within error bars (which are comparable to the ones obtained with state-of-the-art free energy methods) with the excess chemical potential as reported in the literature (−29.52 ± 0.03 kJ/mol^{36} and −29.18 ± 0.16 kJ/mol^{19}), and it validates our approach since the solvation free energy of a water molecule in water is the excess chemical potential.

For the second part of this study, we focus on the calculation of the SFE of one urea molecule in water (in the following, we assume Δ*H*_{σ} = Δ*H*_{urea} = 0). In this case, we apply JE for different pulling rates, including the *infinitely* slow pulling case given by Eq. (8). We simulate urea and water molecules using the OPLS^{37} and SPC/E^{34} force fields, respectively (see the computational details in Sec. I of the supplementary material). In this case, the carbon atom of the urea molecule is restrained and pulled across the simulation box using the harmonic potential in Eq. (4) with zero pulling rate and spring constant *κ* = 209.2 kJ/mol/Å^{2}. The resulting free energy profile is shown in Fig. 3 (blue symbols), and we use it as the benchmark for the next part of the investigation.

We now proceed to perform the steered molecular dynamics of the solute molecule to bring it from the AT to the IG region, and we use the JE to compute the SFE. The initial conditions for all pulling experiments, slow and fast, are extracted from different equilibrated configurations at constant temperature. Thus, the initial positions of the restrained molecule are distributed with Boltzmann weight $exp(\u2212\kappa (x\u2212x0)2/2kBT)$ whose width (standard deviation of the distribution) is given by the inverse of the restraining stiffness, *κ*^{−1}. We apply the time-dependent restraining potential in Eq. (4) with *κ* = 209.2 kJ/mol/Å^{2} which is high enough to neglect, within error bars, the contribution of such a potential in the calculation of the free energy. Thus, we effectively calculate the difference in free energy for the unconstrained system.^{38} In practice, to make use of the JE (6), it is possible, on the one hand, to express ⟨e^{−βW}⟩ in terms of a cumulant expansion,^{39}

with ⟨*W*^{2}⟩ − ⟨*W*⟩^{2} being the mean-squared variation of *W*. On the other hand, it is also possible to directly use the exponential average, i.e.,

In both cases, we use Eq. (5) to compute the applied work *W* on the system. The initial position of the solute is set at $xi\u2208\sigma 0=20\u2009\xc5$, and the simulation total time *τ* is set such that the final equilibrium position $xi\u2208\sigma 0+v\tau $ is 60 Å. We pull the restrained atom in the solute molecule at two constant speeds: $v$ = 10 Å/ns and $v$ = 50 Å/ns (the latter being comparable to the thermal velocity of urea at room temperature, see Sec. IV of the supplementary material). We generate 60 (200 in the case of $v$ = 50 Å/ns) uncorrelated trajectories, and we group them into six (20) blocks of ten trajectories. The statistical analysis (first and second orders of the cumulants as well as the exponential average) is carried out for each block. In Fig. 3(a), the result of computing ⟨*W*⟩ for a pulling rate of $v$ = 10 Å/ns is, as expected, larger than Δ*F* [4 kJ/mol (black symbols)]. Nevertheless, we use the JE by either adding the contribution of higher order cumulants as in Eq. (9) (red symbols) or evaluating the exponential average as in Eq. (10) (green symbols). Both estimates increase the accuracy with respect to the benchmark free energy curve. Taking this into account, we report the solvation free energy of one urea molecule as obtained from the exponential average as −52.9 ± 3.6 kJ/mol in good agreement with the result reported in Ref. 40 of the excess chemical potential of aqueous urea (−55.13 kJ/mol) at a mole fraction of 7.657 × 10^{−4}. We show that $v$ = 10 Å/ns [Fig. 3(a)] is an optimum pulling rate (see Sec. V of the supplementary material) and use this value to comment on the efficiency of the method. Since error bars are systematically smaller for the JE than for umbrella sampling results, we report in Sec. VI of the supplementary material, free energy profiles with reduced size of the statistical sample. These results indicate that to obtain comparable uncertainty, it is required to run approximately the same number of molecular dynamics steps in both cases. When calculating free energies, optimized equilibrium methods usually outperform nonequilibrium methods.^{31,41,42} However, our particular result might be due to the simplicity of the system under consideration and/or the choice of equilibrium calculation which, in this context, serves only an illustrative purpose and could be further optimized.

The amount of nonreversible work increases upon increasing the pulling rate to 50 Å/ns, as presented in Fig. 3(b). The first order cumulant ⟨*W*⟩ (black symbols) shows a difference of approximately 30 kJ/mol with respect to the benchmark free energy (blue symbols). In this case, the second order cumulant (red symbols) is more accurate with respect to the umbrella sampling calculation than the logarithmic estimator in Eq. (10) (green symbols). However, the uncertainty in the second order cumulant calculation is larger because of the limited sampling.^{38,39} The error in the exponential average is lower, i.e., with smaller fluctuations, but still with a considerable mean deviation of approximately 20 kJ/mol from the reference value, that evidences the well-documented bias of this particular estimator.^{38,39}

Previously, we have only considered the process to bring the molecule from the AT to the IG region, but it is also possible to compute Δ*F* by performing the reverse process. In this case, Crooks fluctuation theorem (CFT) combines the information obtained from examining *forward* (F) and *reverse* (R) pulling events,^{43}

where *W* is the work performed on the system resulting from F and R pulling and *β* = 1/*k*_{B}*T*. Similarly to the JE, the CFT has also been investigated in various experimental studies.^{44–46} In this case, the difference in free energy Δ*F* is obtained by evaluating the ratio of the F and R work distributions *P*_{F}(+*βW*)/*P*_{R}(−*βW*). We identify F with pulling the solute from the AT to the IG region and R with the inverse process. To use the CFT, we perform a number of R processes equal to the number of F processes at a pulling rate of $v$ = 50 Å/ns. Free energy profiles for R events are shown in Fig. 4 where a consistent behavior with respect to the F processes is apparent and rules out the presence of hysteresis in the system. For this case as well, the best estimate with respect to the umbrella sampling profile is given by the second order cumulant expansion (red symbols). Finally, we compute the work distributions *P*(±*βW*)_{F/R} for both R and F processes (Fig. 5). We identify the intersection of the distributions, i.e., the point at which *W* = Δ*F* [Fig. 5 (inset)], and report Δ*F* = 49.67 kJ/mol. The results obtained by using the JE and the CFT agree reasonably well. However, more statistics and data from different pulling rates need to be collected in order to convincingly calculate Δ*F* from the CFT.

We have introduced a framework to compute SFEs via the work resulting from dragging the solute between solvation states where nonequilibrium work relations allow one to compute free energy differences. The method can be extended to cases including more complex solutes, multiple solvation conditions, calculation of absolute binding affinities, and relative binding free energies. Moreover, large solute molecules can be studied because (i) pulling from the AT region depletes the solvent; however, this can be compensated by defining a sufficiently large IG region at minimal computational expenses and (ii) pulling from the IG region gives results in line with the JE. Finally, when equilibrating intermediate states or finding an optimal path between solvation conditions become unfeasible, this pulling protocol provides a sound and straightforward alternative to current SFE methods.

The supplementary material includes the computational details common to all simulations as well as the specifics of the umbrella sampling calculations. In addition, there is a short discussion on the correction needed for an appropriate comparison between results obtained with DSF and conventional Ewald summation for electrostatics. To estimate the pulling speed for the nonequilibrium calculations, it is convenient to compute the mean-squared displacement of a urea molecule in aqueous solution. Moreover, to find the optimal pulling rate among the values considered in this study, we also present the standard deviation of the corresponding work distributions. Finally, we add a section where we compare the efficiency of umbrella sampling and nonequilibrium calculations.

The authors thank Nancy C. Forero-Martinez and Arghya Dutta for critical reading and insightful comments. The authors gratefully acknowledge funding by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)—Project No. 233630050—TRR 146. R.C.-H. gratefully acknowledges the Alexander von Humboldt Foundation for financial support. This work was supported by the European Research Council under the European Union’s Seventh Framework Programme (No. FP7/2007-2013)/ERC Grant Agreement No. 340906-MOLPROCOMP.

## REFERENCES

The choice of the DSF model for electrostatic interactions implies the use of a correction discussed in Sec. III of the supplementary material.