We present a family of alchemical perturbation potentials that enable the calculation of hydration free energies of small- to medium-sized molecules in a single concerted alchemical coupling step instead of the commonly used sequence of two distinct coupling steps for Lennard-Jones and electrostatic interactions. The perturbation potentials we employ are non-linear functions of the solute–solvent interaction energy designed to focus sampling near entropic bottlenecks along the alchemical pathway. We present a general framework to optimize the parameters of alchemical perturbation potentials of this kind. The optimization procedure is based on the *λ*-function formalism and the maximum-likelihood parameter estimation procedure we developed earlier to avoid the occurrence of multi-modal distributions of the coupling energy along the alchemical path. A novel soft-core function applied to the overall solute–solvent interaction energy rather than individual interatomic pair potentials critical for this result is also presented. Because it does not require modifications of core force and energy routines, the soft-core formulation can be easily deployed in molecular dynamics simulation codes. We illustrate the method by applying it to the estimation of the hydration free energy in water droplets of compounds of varying size and complexity. In each case, we show that convergence of the hydration free energy is achieved rapidly. This work paves the way for the ongoing development of more streamlined algorithms to estimate free energies of molecular binding with explicit solvation.

## I. INTRODUCTION

The hydration free energy of a compound is defined as the reversible work for transferring one molecule of the compound from the gas phase to the water phase.^{1} The hydration free energy is an important characteristic of a substance. For instance, it is one of the determinant factors of water solubility in drug formulations^{2} and of the binding affinity of an inhibitor toward a protein receptor.^{3} Hydration free energies are most commonly derived from Henry’s constant measurements.^{4}

Free energies of hydration can also be estimated computationally.^{5} Most commonly, this is accomplished by molecular simulations of alchemical transformations in which solute–solvent interactions are progressively turned on. The nature of the alchemical transformation is critical to obtaining reliable results. A single direct alchemical process in which the coupling between the solute and the solvent is increased in a simple linear fashion has been found to be problematic for anything other than the smallest solutes (monatomic atoms and ions, water, methane, and similar).^{6} One issue is the singularity of the derivative of the alchemical potential with respect to the parameter *λ* near the decoupled state.^{7} Simple approaches of this kind can also be found to converge slowly due to bottlenecks along the alchemical path that are caused by poorly sampled conformational equilibria.^{8}

These issues have been the subject of intense study. This collective effort has resulted in a set of recommended best practices for alchemical calculations commonly employed by the computational chemistry community that are presently implemented in popular molecular simulation software packages.^{5,9,10} For example, it is generally recommended to split the coupling of the solute and the solvent into two phases, the first in which the volume-exclusion core repulsion and dispersion interactions are turned on, followed by a second phase in which electrostatic interactions are turned on. Separation-shifted soft-core pair potentials^{11} are commonly introduced to avoid end-point singularities, especially when volume-exclusion core repulsion terms are introduced. In addition, it is often necessary for large solutes to introduce one small group of atoms at a time or to resort to bond-growing processes in which solute atoms are pushed out from a central point rather than directly created in the solvent.^{12}

While successful, these strategies add layers of complexity to hydration free energy calculations.^{13} Splitting the coupling process into two calculation phases requires specifying two alchemical schedules that are often very different from each other. This practice also relies on simulating a nonphysical intermediate state consisting of the uncharged solute in water, which may have very different conformational propensities than the actual solute. The unphysical uncharged solute may, for example, undergo hydrophobic collapse and revert slowly to the physical extended hydrated state.

Software implementations of soft-core potential functions are cumbersome and difficult to maintain, and they require the optimization of specific parameters for each interaction potential type.^{13} Conventional separation-shifted scaled soft-core pair potentials^{11} also lead to non-linear and non-algebraic forms of the alchemical potential that require a cumbersome energy re-scoring process for the estimation of free energy changes with thermodynamic reweighting.^{14,15}

Thus, it would be beneficial to develop more streamlined alchemical protocols that (i) compute the hydration free energy in one continuous transformation process, (ii) do not require modifications of the standard form of pair potential functions, and (iii) can be parameterized using a linear progression of the charging parameter *λ*, while (iv) preserving or improving the rate of equilibration and convergence of free energy estimates relative to mainstream protocols. Here, we present a method with potentially all of these characteristics, while being applicable to a range of alchemical transformations, including those used in structure-based drug design,^{16} which is our primary focus.^{17,18}

The approach presented here is based on the work we carried out recently to optimize binding free energy calculations in the context of implicit solvation.^{19,20} The previous work yielded a family of optimized alchemical potential energy functions that modify all interactions in one concerted step without requiring customized soft-core pair potentials. The effort also resulted in the development of a general framework, based on generalized non-Boltzmann sampling theory^{21} and maximum likelihood estimation,^{22} for the optimization of the parameters of the alchemical potentials to accelerate conformational mixing and convergence.

In the present work, we demonstrate that the same approach applies to the estimation of the hydration free energies of small- to medium-sized molecules using an explicit representation of the solvent. As in our previous work,^{20} the approach is based on the observation that the slow convergence of free energies is due to rare conformational transitions caused by entropic bottlenecks akin to first-order phase transitions^{23} and that these can be circumvented by an appropriate choice of the alchemical perturbation energy function.

This work is organized as follows: We first review the theoretical and computational protocol developed in the previous work. Then, we apply it to a model system of hydration in which four diverse solutes of a wide range of sizes and hydrophobic characters (ethanol, alanine dipeptide, 1-naphthol, and 3,4-diphenyltoluene) are transferred from the gas phase to a water droplet (Figs. 1 and 2). We show that the optimized protocols yield results that are consistent with more conventional approaches when they both reach convergence. However, for larger solutes, only the optimized protocol can achieve rapid and reliable convergence. These promising outcomes pave the way for a novel generation of more streamlined methodologies for alchemical transformations in condensed phases.

## II. THEORY AND METHODS

### A. Alchemical transformations for the estimation of hydration free energies

Alchemical transformations are based on an alchemical potential energy function that interpolates from the potential energy function *U*_{0} of the starting state to that of the final state *U*_{1} as the progress parameter *λ* goes, conventionally, from 0 to 1. The most straightforward approach is a linear interpolating function of the following form:

where *x* represents the degrees of freedom of the system, and we have introduced the perturbation function,

which in the case of hydration corresponds to the solute–solvent interaction energy.

While straightforward, the linear alchemical energy function (1) leads to instabilities and poor convergence, especially when volume-exclusion terms are introduced.^{13} To address this issue, we consider the family of alchemical potential energy functions of the following form:

where *u*_{sc}(*x*), defined below by Eq. (7), is a soft core-modified solute–solvent interaction energy and *W*_{λ}(*u*_{sc}) is an alchemical perturbation function defined such that *W*_{0}(*u*_{sc}) = 0 and *W*_{1}(*u*_{sc}) = *u*_{sc} at *λ* = 0 and *λ* = 1, respectively. We will consider, in this work, the linear function

and the generalized softplus function

where the parameters *λ*_{2}, *λ*_{1}, *α*, *u*_{0}, and *w*_{0} are functions of *λ*. The derivative of the softplus function is the generalized logistic function (Fermi’s function),

The softplus perturbation function’s parameters are optimized using the procedure described in Ref. 20 and briefly described later in this section.

Note that the perturbation functions *W*_{λ}(*u*) that we are considering are algebraic functions of only *λ* and the solute–solvent interaction energy *u* itself does not depend on *λ*. The function of a potential function form of these alchemical potentials is reminiscent of density functional descriptions of solutions^{24,25} and makes them unlike alchemical energy functions with soft-core pair potentials whose solute–solvent interaction energy function also depends on *λ* implicitly (and non-algebraically) through the *λ*-dependent scaled distance function.^{26} The lack of *λ*-dependence in the solute–solvent interaction energy is a critical assumption of the alchemical theory employed here. This feature also simplifies the process of the calculation of the free energy change by thermodynamic reweighting,^{15,27} since it does not require calling the molecular mechanics engine to recalculate the perturbation energy of sampled configurations at each *λ*-state.

In this work, we employ the following definition of the soft-core solute–solvent interaction energy:^{20}

and

where *u*_{0} is an energy cutoff below which the potential is unchanged (*u*_{0} = 0 in this work), *u*_{max} > 0 is the maximum allowed value of the soft-core solute–solvent interaction energy, *z* = 1 + 2*y*/*a* + 2(*y*/*a*)^{2}, and *a* is an adjustable dimensionless exponent (here, *u*_{max} = 50 kcal/mol and *a* = 1/16). With these definitions, *u*_{sc}(*u*) is a *C*(2)-smooth one-to-one map from the original solute–solvent interaction energy to the soft-core interaction energy. For favorable solute–solvent interaction energies (*u* < *u*_{0}), the solute–solvent interaction energy of the soft-core *u*_{sc} is that of the original *u*. However, for unfavorable solute–solvent interaction energies (*u* > *u*_{0}), such as when two atoms clash as they approach each other, the soft-core interaction energy *u*_{sc} grows less rapidly than *u*, and it eventually reaches a maximum plateau, unlike the solute–solvent interaction energy that grows indefinitely.

We stress that in this soft-core methodology, there are no soft-core modifications of pair potentials. The soft-core function is applied to the overall solute–solvent interaction energy *u* evaluated with the standard form of the Coulomb and Lennard-Jones interatomic potentials without soft-core modifications. Moreover, the soft-core transformation has a negligible effect on the end points of the alchemical path and on the accuracy of the free energy estimate. In the dehydrated state, the solute–solvent interaction energy is zero, regardless of its definition (raw or soft-core). In the hydrated state, the solute–solvent interaction energy is always negative (see Fig. 3 at *λ* = 1, for example) and is not affected by the soft-core transformation. To avoid potential instabilities, we ensure that large and negative interaction energies between unlike charges at close distances, which may occur at nearly decoupled states, are offset by corresponding Lennard-Jones repulsive terms (see Sec. II C). No occurrences of spurious favorable binding energies have been found for nearly decoupled states (see Fig. 3 at *λ* = 0, for example).

To obtain the hydration free energy, a set of samples of the soft-core solute–solvent interaction energies, *u*_{sc}(*i*), are collected during molecular dynamics simulations performed at a sequence of *λ* values between 0 and 1. The free energy profile as a function of *λ*, Δ*G*(*λ*), is obtained by multi-state reweighting^{15} using the unbinned weighted histogram analysis method (UWHAM) method.^{27} The hydration free energy $\Delta Gh*$ in the Ben-Naim standard state^{28} is, by definition, the value of free energy profile at *λ* = 1. In this notation, $\Delta Gh*=\Delta G(1)$.

### B. Analytical theory of alchemical transformations

The hydration process is analyzed and optimized using the theory we recently developed for alchemical potential energy functions of the form in Eq. (3).^{22} Briefly, following the potential distribution theorem,^{29} here, we consider *p*_{0}(*u*_{sc}), the probability density of the soft-core solute–solvent interaction energy *u*_{sc} in the ensemble in which solute and solvent are uncoupled (*λ* = 0). All other quantities of the alchemical transformation can be obtained from *p*_{0}(*u*_{sc}).^{30,31} In particular, given *p*_{0}(*u*_{sc}), the probability density for the binding energy *u*_{sc} for the state with perturbation potential *W*_{λ}(*u*_{sc}) is

where *β* = 1/*k*_{B}*T*,

is the excess component of the equilibrium constant for binding and

is the corresponding free energy profile. Note that for a linear perturbation potential, *W*_{λ}(*u*_{sc}) = *λu*_{sc}, Eqs. (10) and (11) state that the free energy profile is related to the double-sided Laplace transform of *p*_{λ}(*u*_{sc}).

An analytical description of *p*_{0}(*u*_{sc}), and thus of all the quantities derived from it, is available. Briefly, the theory is based on the assumption that the statistics of the solute–solvent interaction energy *u* in the decoupled state is the convolution of two processes: one that is described by the sum of many “soft” background solute–solvent interactions, which follows central limit statistics, and another process that describes “hard” atomic collisions, which follows max statistics (see Ref. 22 for the full derivation) The probability density *p*_{0}(*u*) is further expressed as the superposition of probability densities of a small number of modes,

where *c*_{i} are adjustable statistical weights that sum to 1 and *p*_{0i}(*u*) is the probability density that corresponds to mode *i*. The probability density for mode *i*, *p*_{0i}(*u*), is described analytically as

where $g(u;\u016b,\sigma )$ is the normalized Gaussian density function of mean $\u016b$, the standard deviation is *σ*, and

where *H*(*u*) is the Heaviside function, $x=u/\epsilon +\u0169/\epsilon +1$, and $xC=\u0169/\epsilon +1$ (see Ref. 22 and Appendix A of Ref. 20 for the complete derivation) The model for each mode *i* depends on a number of adjustable parameters that correspond to the following physical quantities:^{22}

*c*_{i}: statistical weight of mode*i*;*p*_{bi}: probability that no atomic clashes occur while in mode*i*;$\u016bbi$: the average background interaction energy of mode

*i*;*σ*_{bi}: the standard deviation of background interaction energy of mode*i*;*n*_{li}: the effective number of statistical uncorrelated atoms of the solute in mode*i*;*ε*_{i}: the effective*ε*parameter of a hypothetical Lennard-Jones interaction energy potential that describes the solute–solvent interaction energy in mode*i*;$\u0169i$: the solute–solvent interaction energy value above which the collisional energy is not zero in mode

*i*.

Together with the weights *c*_{i}, the parameters above are varied using a maximum likelihood criterion to fit the distributions of the soft-core solute–solvent interaction energy obtained from numerical simulations.^{22} The distribution of the soft-core solute–solvent interaction energy *p*_{0}(*u*_{sc}) is obtained from *p*_{0}(*u*) using the standard formula for the change of random variable.^{20}

In this work, we use the analytical theory above and the results of trial alchemical simulations to optimize the parameters of the softplus alchemical perturbation potential in Eq. (5).^{20} The procedure consists of running trial alchemical calculations using the linear alchemical potential *W*_{λ}(*u*_{sc}) = *λu*_{sc} to obtain the set of samples of the soft-core binding energies as a function of *λ*. The samples obtained from the trial simulation are then used to derive optimized parameters for the analytical model for *p*_{0}(*u*_{sc}) [Eqs. (12)–(14)] using a maximum likelihood approach.^{32} We developed an application based on Tensorflow^{33} to conduct the maximum likelihood optimization (https://github.com/egallicc/femodel-tf-optimizer).

The analytical representation of *p*_{0}(*u*_{sc}) is then analytically differentiated with respect to *u*_{sc} to obtain the *λ*-function defined as^{20}

The minima and maxima of *p*_{λ}(*u*_{sc}) occur when the *λ*-function and *∂W*_{λ}(*u*_{sc})/*∂u*_{sc} intersect,^{20}

Hence, the *λ*-function can be used as a guide to design alchemical potentials that avoid the occurrence of multi-modal distributions that are difficult to converge. The linear alchemical potential *W*_{λ}(*u*_{sc}) = *λu*_{sc} leads to *∂W*_{λ}(*u*_{sc})/*∂u*_{sc} = *λ*, which is a constant, and in many cases, it intersects the *λ*-function at multiple points (see Fig. 4). To avoid multi-modal distributions, here, we use the softplus alchemical potential and we vary the parameters *λ*_{2}, *λ*_{1}, *α*, *u*_{0}, and *w*_{0} as a function of *λ* such that the derivative of the softplus function in Eq. (6) intersects *λ*_{0}(*u*_{sc}) at a single point at each *λ* or, when this is not easily achievable, such that it at least intersects it at nearby points. As discussed in Ref. 20, this procedure removes or reduces the severity of entropic sampling bottlenecks during the alchemical coupling process and enhances conformational sampling efficiency in order to achieve convergence of the binding free energy estimates.

### C. Computational details

In this work, we employ a hydration model in which solutes are transferred from vacuum to near the center of mass of a water droplet measuring at around 27 Å in diameter and that is composed of 357 TIP3P water molecules (Fig. 1). The droplet is confined in a spherical region defined by a flat-bottom harmonic restraining potential centered at the origin and acting on each TIP3P water oxygen atom. The flat-bottom potential tolerance was set to 24 Å, which results in a confinement region of approximately twice the droplet volume and a force constant of 5 kcal/(mol Å^{2}) beyond this tolerance. This simplified hydration model is selected here to illustrate and assess the proposed methodology using our existing Single Decoupling Method (SDM) plugin in OpenMM,^{20,34} which was originally designed for binding free energy calculations with implicit solvation. The droplet model adopted here also avoids potential issues not relevant to this work that concern the correctness of implementations of alchemical processes with periodic boundary conditions and long-range electrostatics.^{35,36}

The solutes (Fig. 2) were prepared with Maestro (Schrödinger, LLC), and the GAFF/AM1-BCC force field parameters were assigned using the Antechamber program.^{37} Polar hydrogen atoms with zero Lennard-Jones parameters were assigned *σ*_{LJ} = 0.1 Å and *ϵ*_{LJ} = 10^{−4} kcal/mol to avoid infinitely attractive interactions in nearly uncoupled states between unlike charges at small distances. For the systems considered here, the change in potential energy due to the introduction of these small Lennard-Jones parameters is below single floating point precision. Single decoupling alchemical calculations were prepared using the SDM workflow (https://github.com/egallicc/openmm_sdm_workflow.git) using 16 *λ* states. MD calculations employed the OpenMM^{34} MD engine and the SDM integrator plugins (https://github.com/rajatkrpal/openmm_sdm_plugin.git) using the OpenCL platform. The ASyncRE software,^{38} customized for OpenMM and SDM (https://github.com/egallicc/async_re-openmm.git), was used for the Hamiltonian Replica Exchange in *λ* space with a uniform *λ* schedule between 0 and 1. The *λ*-dependent parameters that were used with the softplus potential are listed in Tables II–V. Molecular dynamics runs were conducted for a minimum of 5 ns per replica with a 1 fs time step at 300 K such that *λ* values were exchanged approximately every 5 ps. A Langevin thermostat at 300 K with a relaxation time constant of 20 ps was used. Binding energy samples and trajectory frames were recorded every 5 ps. The calculations were performed on the XSEDE Comet graphics processing unit (GPU) HPC cluster at the San Diego Supercomputing Center. Simulation input files and running instructions are available at https://github.com/sheenam1509/water-droplet-hydration.

The calculation of the alchemical potential and its derivatives [Eq. (3)] requires two evaluations of the energy and forces at each time step, once for the hydrated system and again for the dehydrated state after translating the solute from the droplet into the vacuum to compute the binding energy. Thus, the method is a factor of two slower than conventional molecular dynamics. The hybridization and processing of the alchemical forces are O(N) procedures implemented directly on the GPU and have a negligible impact on the performance. For the droplet systems, we obtain an approximate throughput of 150 ns/day with a 1 fs time step on an NVIDIA P100 GPU of the Comet cluster.

To accelerate conformational sampling, in this work, we employ the Hamiltonian Replica Exchange algorithm^{39–42} in alchemical space^{30,43,44} using the asynchronous implementation of Replica Exchange (ASyncRE)^{38} with the Gibbs Independence Sampling algorithm for state reassignments.^{20}

Binding free energy estimates and their corresponding statistical error estimates have been obtained by multi-state reweighting using the UWHAM method.^{27}

## III. RESULTS

### A. Probability distributions of the solute–solvent interaction energy with the linear alchemical potential

As discussed in Sec. II, the free energy profile is determined by the probability distributions of the perturbation energy along the alchemical path. The probability distributions *p*_{λ}(*u*_{sc}) for the selected values of *λ* that were obtained from the concerted alchemical hydration with the linear alchemical potential are shown in Fig. 3 (dots). At small *λ* values, the distributions are clustered around large and unfavorable values of the solute–solvent interaction energy, reflecting the occurrence of frequent atomic clashes when the solute and solvent are weakly coupled. Conversely, at values of *λ* near 1, the solute and solvent are more strongly coupled, and the solute–solvent interaction energies tend to be favorable. As *λ* increases; however, the distributions do not uniformly move toward lower values of *u*_{sc}. Instead, they become bimodal at intermediate values of *λ* with a trough near *u*_{sc} = 0 kcal/mol separating the weakly decoupled and strongly coupled states. The conversion from weakly coupled to strongly coupled behavior occurs by the transfer of population between these two limiting states, rather than by the formation of states with intermediate values of the interaction energy. This behavior is the opposite of what is expected for linear response behavior.^{45} Rather, it is the hallmark of the occurrence of a strong phase transition.^{20,23}

The transition from the weakly coupled to the strongly coupled states occurs gradually in the case of the concerted alchemical linear process for ethanol, the smallest solute considered. For this solute, the weakly coupled and strongly coupled modes partially overlap at *λ* = 0.4 (Fig. 3, ethanol, green dots). However, the transition occurs sharply for all of the other solutes. In the case of 1-naphthol, for example, the system transitions from being weakly coupled to strongly coupled in the small span of *λ* from 0.333 to 0.4 (Fig. 3, 1-naphthol, yellow and green dots). Similar biphasic behavior occurs for alanine dipeptide and especially for 3,4-diphenyltoluene. Evidently, there is a critical *λ* value for each solute when the weakly coupled and strongly coupled modes are equally probable. However, because interconversions from one state to the other are extremely rare, it would be very difficult to pinpoint the equilibrium value accurately (see Fig. 6). Indeed, the equilibration analysis described later (Table VI and Fig. 7) shows that with the exception of ethanol and perhaps alanine dipeptide, the sequence of distributions in Fig. 3 obtained with the linear alchemical potential is unlikely to be converged.

The small number of transitions between uncoupled (dehydrated) and coupled (hydrated) states is due to the small probability of visiting the so-called no man’s land of interaction energies of low probability between the two states when applying the concerted alchemical transformation. For 3,4-diphenyltoluene, for example, the probability for observing a configuration with zero solute–solvent interaction energy is immeasurably small (Fig. 3, 3,4-diphenyltoluene). For the less extreme examples of alanine dipeptide and 1-naphthol, it is possible to observe distributions with mixtures of weakly coupled and strongly coupled states with very small, yet observable density in the no man’s land region. Even for these solutes, however, hydration and dehydration events are rarely observed in the simulations with the linear alchemical potential (Table VI and Fig. 6).

Due to the lack of a sufficient number of transitions, it is not feasible to accurately estimate the relative equilibrium populations of hydrated and dehydrated states of the bulky solutes at any of the *λ*-states. In turn, because the hydration free energy depends on the relative populations of coupled and uncoupled states as a function of *λ*,^{22} it is expected (and largely confirmed, see Fig. 7) that with the linear alchemical potential, the hydration free energy estimates for the solutes, with the exception of ethanol, are likely to be substantially biased by finite sampling.

Collectively, these data show that conventional linear alchemical interpolation schemes are not generally suitable for the implementation of reliable concerted alchemical protocols in which Lennard-Jones and electrostatic interactions are removed simultaneously.

### B. Probability distributions of the solute–solvent interaction energy with the softplus alchemical potential

The data shown in Fig. 3 confirm that the observed probability distributions of the soft-core solute–solvent interaction energies, which were obtained from the concerted alchemical calculations (dots in Fig. 3), are accurately reproduced by the analytical model for *p*_{0}(*u*_{sc})^{20,22} that was parameterized to each dataset (continuous lines in Fig. 3). The parameters of the model that were optimized by the maximum likelihood method are listed in Table I. The model reproduces the distributions’ positions and their variations as a function of *λ*, including the transition points from dehydrated to hydrated states.

. | Weight . | p_{b}
. | $\u016bb$ (kcal/mol) . | $\sigma ba$ . | ε (kcal/mol)
. | $\u0169a$ . | n_{l}
. |
---|---|---|---|---|---|---|---|

Ethanol | |||||||

Mode 1 | 3.41 × 10^{−3} | 1.13 × 10^{−6} | −7.30 | 2.68 | 5.83 | −0.07 | 4.08 |

Mode 2 | 2.09 × 10^{−1} | 9.40 × 10^{−7} | −3.70 | 2.70 | 28.6 | 1.90 | 4.41 |

Mode 3 | 6.11 × 10^{−1} | 2.27 × 10^{−9} | 5.83 | 1.02 | 22.7 | 22.7 | 9.84 |

Alanine dipeptide | |||||||

Mode 1 | 1.16 × 10^{−10} | 5.33 × 10^{−8} | −16.32 | 4.32 | 4.38 | 1.28 | 5.20 |

Mode 2 | 2.03 × 10^{−7} | 1.62 × 10^{−8} | −14.31 | 3.88 | 3.5 | 10.8 | 5.93 |

Mode 3 | 1.00 | 0 | 103.89 | 4.72 | 33.6 | 83.0 | 19.07 |

1-Naphthol | |||||||

Mode 1 | 2.92 × 10^{−8} | 7.90 × 10^{−8} | −14.86 | 3.36 | 4.24 | 1.85 | 4.85 |

Mode 2 | 9.92 × 10^{−7} | 7.68 × 10^{−8} | −12.13 | 3.32 | 6.84 | 11.8 | 5.06 |

Mode 3 | 1.00 | 0 | 198 | 3 | 9.50 | 250 | 19.1 |

3,4-Diphenyltoluene | |||||||

Mode 1 | 1.06 × 10^{−15} | 4.54 × 10^{−5} | −14.7 | 3.80 | 3.66 | 13.8 | 3.14 |

Mode 2 | 1.00 | 0 | 172 | 4.00 | 5.82 | 367 | 49.2 |

. | Weight . | p_{b}
. | $\u016bb$ (kcal/mol) . | $\sigma ba$ . | ε (kcal/mol)
. | $\u0169a$ . | n_{l}
. |
---|---|---|---|---|---|---|---|

Ethanol | |||||||

Mode 1 | 3.41 × 10^{−3} | 1.13 × 10^{−6} | −7.30 | 2.68 | 5.83 | −0.07 | 4.08 |

Mode 2 | 2.09 × 10^{−1} | 9.40 × 10^{−7} | −3.70 | 2.70 | 28.6 | 1.90 | 4.41 |

Mode 3 | 6.11 × 10^{−1} | 2.27 × 10^{−9} | 5.83 | 1.02 | 22.7 | 22.7 | 9.84 |

Alanine dipeptide | |||||||

Mode 1 | 1.16 × 10^{−10} | 5.33 × 10^{−8} | −16.32 | 4.32 | 4.38 | 1.28 | 5.20 |

Mode 2 | 2.03 × 10^{−7} | 1.62 × 10^{−8} | −14.31 | 3.88 | 3.5 | 10.8 | 5.93 |

Mode 3 | 1.00 | 0 | 103.89 | 4.72 | 33.6 | 83.0 | 19.07 |

1-Naphthol | |||||||

Mode 1 | 2.92 × 10^{−8} | 7.90 × 10^{−8} | −14.86 | 3.36 | 4.24 | 1.85 | 4.85 |

Mode 2 | 9.92 × 10^{−7} | 7.68 × 10^{−8} | −12.13 | 3.32 | 6.84 | 11.8 | 5.06 |

Mode 3 | 1.00 | 0 | 198 | 3 | 9.50 | 250 | 19.1 |

3,4-Diphenyltoluene | |||||||

Mode 1 | 1.06 × 10^{−15} | 4.54 × 10^{−5} | −14.7 | 3.80 | 3.66 | 13.8 | 3.14 |

Mode 2 | 1.00 | 0 | 172 | 4.00 | 5.82 | 367 | 49.2 |

The parameterized analytical functions for *p*_{0}(*u*_{sc}) are differentiated analytically with respect to *u*_{sc} to obtain the corresponding *λ*-functions *λ*_{0}(*u*_{sc}) [Eq. (15)]^{20} shown in Fig. 4 (black curves). These functions are used to predict, graphically, the location of the maxima and minima of the probability distributions of the soft-core solute–solvent interaction energy as *λ* is varied. The procedure consists of finding the intersections between the *λ*-function and, in the case of a linear perturbation potential, the horizontal line drawn at the level of the desired value of *λ* [see Eq. (16)].

Indeed, the predictions from the *λ*-functions in Fig. 4 are quantitatively consistent with the observations in Fig. 3. In each case, the distributions near *λ* = 0 are expected to have one mode at large values of *u*_{sc}. As *λ* increases (that is, as the horizontal line shifts up), a back-bending region^{23} of the *λ*-function is encountered in which three intersections typically occur. The first and the last correspond to the maxima of the distributions, while the middle corresponds to a minimum. More complex patterns can arise when there are multiple back-bending regions. In the case of ethanol, the distributions are predicted to be unimodal at favorable values of the solute–solvent interaction energy above approximately *λ* = 0.4 (Fig. 4, ethanol). For all the other solutes, the back-bending of the *λ*-function is predicted to be so extreme in order to prevent, in principle, unimodal distributions with the linear potential function even for values of *λ* close to 1. While the *λ*-functions can be used to identify the locations of maxima and minima of the distributions, they alone do not predict the relative populations of competing modes. Thus, for example, 1-naphthol at *λ* = 1 is overwhelmingly more likely to form favorable interactions with the solvent even though a second, immeasurably small, mode is predicted to exist at unfavorable interaction energies.

The softplus perturbation potential can be tuned to reduce the perturbation energy gap across the hydration/dehydration transition and avoid hard-to-converge bimodal distributions.^{20} As shown in Fig. 4 for ethanol, the parameters of the softplus potential can be tuned to avoid multiple intersections with the *λ*-function, thereby avoiding the occurrence of bimodal distributions (see Fig. 5). When this is not feasible, as for alanine dipeptide, 1-naphthol, and 3,4-diphenyltoluene in Fig. 4, the softplus potential can still be tuned to reduce the energy gap between competing modes and to favor one mode over another.

The alchemical simulations with the softplus potentials, which consist of the schedules (Tables II–V) that were designed based on the predicted *λ*-functions (Fig. 4), result in the elimination, or at least the reduction, of solute–solvent interaction energy gaps along the alchemical pathway (Fig. 5). Unlike the concerted simulations with the linear alchemical potential (Fig. 3), simulations with the softplus alchemical potential yield a gradual shift of the probability distributions of the solute–solvent interaction energy as the solute is coupled to the solvent (Fig. 5). The distributions with the softplus potential are also generally unimodal, which reflect either lack of multiple stable conformational macrostates or the gradual shift from one macrostate to another as *λ* is varied. The softplus potential effect is not as significant for ethanol, which does not exhibit a sharp hydration transition with the linear alchemical potential (Fig. 3). However, in the case of 1-naphthol, the softplus potential has a very substantial effect. In this case, an interaction energy gap of more than 30 kcal/mol (Fig. 3) is virtually eliminated (Fig. 5), thereby facilitating transitions between the weakly coupled and strongly coupled states on opposite sides of the gap (see Fig. 6). The hydration calculations for alanine dipeptide and 3,4-diphenyltoluene also significantly benefit from using the softplus alchemical potential, even though the no man’s land with respect to the solute–solvent interaction energy is not eliminated (see Fig. 5) in these cases.

λ
. | λ_{1}
. | λ_{2}
. | α (kcal/mol)^{−1}
. | u_{0} (kcal/mol)
. | w_{0} (kcal/mol)
. |
---|---|---|---|---|---|

0.000 | 0.000 | 0.000 | 0.400 | 10.000 | 0.000 |

0.067 | 0.000 | 0.044 | 0.400 | 10.000 | −0.521 |

0.133 | 0.000 | 0.089 | 0.400 | 10.000 | −1.043 |

0.200 | 0.000 | 0.133 | 0.400 | 10.000 | −1.564 |

0.267 | 0.000 | 0.178 | 0.400 | 10.000 | −2.086 |

0.333 | 0.000 | 0.400 | 0.400 | 8.889 | −4.249 |

0.400 | 0.000 | 0.400 | 0.400 | 6.667 | −3.360 |

0.467 | 0.000 | 0.400 | 0.400 | 4.444 | −2.471 |

0.533 | 0.000 | 0.400 | 0.400 | 2.222 | −1.582 |

0.600 | 0.000 | 0.400 | 0.400 | 0.000 | −0.693 |

0.667 | 0.167 | 0.400 | 0.400 | 0.000 | −0.404 |

0.733 | 0.333 | 0.400 | 0.400 | 0.000 | −0.116 |

0.800 | 0.500 | 0.500 | 0.400 | 0.000 | 0.000 |

0.867 | 0.667 | 0.667 | 0.400 | 0.000 | 0.000 |

0.933 | 0.833 | 0.833 | 0.400 | 0.000 | 0.000 |

1.000 | 1.000 | 1.000 | 0.400 | 0.000 | 0.000 |

λ
. | λ_{1}
. | λ_{2}
. | α (kcal/mol)^{−1}
. | u_{0} (kcal/mol)
. | w_{0} (kcal/mol)
. |
---|---|---|---|---|---|

0.000 | 0.000 | 0.000 | 0.400 | 10.000 | 0.000 |

0.067 | 0.000 | 0.044 | 0.400 | 10.000 | −0.521 |

0.133 | 0.000 | 0.089 | 0.400 | 10.000 | −1.043 |

0.200 | 0.000 | 0.133 | 0.400 | 10.000 | −1.564 |

0.267 | 0.000 | 0.178 | 0.400 | 10.000 | −2.086 |

0.333 | 0.000 | 0.400 | 0.400 | 8.889 | −4.249 |

0.400 | 0.000 | 0.400 | 0.400 | 6.667 | −3.360 |

0.467 | 0.000 | 0.400 | 0.400 | 4.444 | −2.471 |

0.533 | 0.000 | 0.400 | 0.400 | 2.222 | −1.582 |

0.600 | 0.000 | 0.400 | 0.400 | 0.000 | −0.693 |

0.667 | 0.167 | 0.400 | 0.400 | 0.000 | −0.404 |

0.733 | 0.333 | 0.400 | 0.400 | 0.000 | −0.116 |

0.800 | 0.500 | 0.500 | 0.400 | 0.000 | 0.000 |

0.867 | 0.667 | 0.667 | 0.400 | 0.000 | 0.000 |

0.933 | 0.833 | 0.833 | 0.400 | 0.000 | 0.000 |

1.000 | 1.000 | 1.000 | 0.400 | 0.000 | 0.000 |

λ
. | λ_{1}
. | λ_{2}
. | α (kcal/mol)^{−1}
. | u_{0} (kcal/mol)
. | w_{0} (kcal/mol)
. |
---|---|---|---|---|---|

0.000 | 0.000 | 0.000 | 0.400 | 3.000 | 0.000 |

0.067 | 0.000 | 0.138 | 0.400 | 3.000 | −0.778 |

0.133 | 0.000 | 0.258 | 0.400 | 3.000 | −1.557 |

0.200 | 0.000 | 0.395 | 0.400 | 3.000 | −2.335 |

0.267 | 0.000 | 0.544 | 0.400 | 2.200 | −3.113 |

0.333 | 0.000 | 0.651 | 0.400 | 2.200 | −3.892 |

0.400 | 0.000 | 0.759 | 0.400 | 2.200 | −5.837 |

0.467 | 0.000 | 0.867 | 0.400 | 1.100 | −2.947 |

0.533 | 0.067 | 0.867 | 0.400 | 0.000 | −1.387 |

0.600 | 0.200 | 0.867 | 0.400 | 0.000 | −1.156 |

0.667 | 0.333 | 0.867 | 0.400 | 0.000 | −0.925 |

0.733 | 0.467 | 0.867 | 0.400 | 0.000 | −0.694 |

0.800 | 0.600 | 0.867 | 0.400 | 0.000 | −0.463 |

0.867 | 0.723 | 0.867 | 0.400 | 0.000 | −0.232 |

0.933 | 0.867 | 0.867 | 0.400 | 0.000 | 0.000 |

1.000 | 1.000 | 1.000 | 0.400 | 0.000 | 0.000 |

λ
. | λ_{1}
. | λ_{2}
. | α (kcal/mol)^{−1}
. | u_{0} (kcal/mol)
. | w_{0} (kcal/mol)
. |
---|---|---|---|---|---|

0.000 | 0.000 | 0.000 | 0.400 | 3.000 | 0.000 |

0.067 | 0.000 | 0.138 | 0.400 | 3.000 | −0.778 |

0.133 | 0.000 | 0.258 | 0.400 | 3.000 | −1.557 |

0.200 | 0.000 | 0.395 | 0.400 | 3.000 | −2.335 |

0.267 | 0.000 | 0.544 | 0.400 | 2.200 | −3.113 |

0.333 | 0.000 | 0.651 | 0.400 | 2.200 | −3.892 |

0.400 | 0.000 | 0.759 | 0.400 | 2.200 | −5.837 |

0.467 | 0.000 | 0.867 | 0.400 | 1.100 | −2.947 |

0.533 | 0.067 | 0.867 | 0.400 | 0.000 | −1.387 |

0.600 | 0.200 | 0.867 | 0.400 | 0.000 | −1.156 |

0.667 | 0.333 | 0.867 | 0.400 | 0.000 | −0.925 |

0.733 | 0.467 | 0.867 | 0.400 | 0.000 | −0.694 |

0.800 | 0.600 | 0.867 | 0.400 | 0.000 | −0.463 |

0.867 | 0.723 | 0.867 | 0.400 | 0.000 | −0.232 |

0.933 | 0.867 | 0.867 | 0.400 | 0.000 | 0.000 |

1.000 | 1.000 | 1.000 | 0.400 | 0.000 | 0.000 |

λ
. | λ_{1}
. | λ_{2}
. | α (kcal/mol)^{−1}
. | u_{0} (kcal/mol)
. | w_{0} (kcal/mol)
. |
---|---|---|---|---|---|

0.000 | 0.000 | 0.000 | 0.400 | 5.000 | 0.000 |

0.067 | 0.000 | 0.116 | 0.400 | 5.000 | −0.778 |

0.133 | 0.000 | 0.231 | 0.400 | 5.000 | −1.557 |

0.200 | 0.000 | 0.347 | 0.400 | 5.000 | −2.335 |

0.267 | 0.000 | 0.462 | 0.400 | 5.000 | −3.113 |

0.333 | 0.000 | 0.578 | 0.400 | 5.000 | −3.892 |

0.400 | 0.000 | 0.867 | 0.400 | 5.000 | −5.837 |

0.467 | 0.000 | 0.867 | 0.400 | 1.667 | −2.947 |

0.533 | 0.067 | 0.867 | 0.400 | 0.000 | −1.387 |

0.600 | 0.200 | 0.867 | 0.400 | 0.000 | −1.156 |

0.667 | 0.333 | 0.867 | 0.400 | 0.000 | −0.925 |

0.733 | 0.467 | 0.867 | 0.400 | 0.000 | −0.694 |

0.800 | 0.600 | 0.867 | 0.400 | 0.000 | −0.463 |

0.867 | 0.733 | 0.867 | 0.400 | 0.000 | −0.232 |

0.933 | 0.867 | 0.867 | 0.400 | 0.000 | 0.000 |

1.000 | 1.000 | 1.000 | 0.400 | 0.000 | 0.000 |

λ
. | λ_{1}
. | λ_{2}
. | α (kcal/mol)^{−1}
. | u_{0} (kcal/mol)
. | w_{0} (kcal/mol)
. |
---|---|---|---|---|---|

0.000 | 0.000 | 0.000 | 0.400 | 5.000 | 0.000 |

0.067 | 0.000 | 0.116 | 0.400 | 5.000 | −0.778 |

0.133 | 0.000 | 0.231 | 0.400 | 5.000 | −1.557 |

0.200 | 0.000 | 0.347 | 0.400 | 5.000 | −2.335 |

0.267 | 0.000 | 0.462 | 0.400 | 5.000 | −3.113 |

0.333 | 0.000 | 0.578 | 0.400 | 5.000 | −3.892 |

0.400 | 0.000 | 0.867 | 0.400 | 5.000 | −5.837 |

0.467 | 0.000 | 0.867 | 0.400 | 1.667 | −2.947 |

0.533 | 0.067 | 0.867 | 0.400 | 0.000 | −1.387 |

0.600 | 0.200 | 0.867 | 0.400 | 0.000 | −1.156 |

0.667 | 0.333 | 0.867 | 0.400 | 0.000 | −0.925 |

0.733 | 0.467 | 0.867 | 0.400 | 0.000 | −0.694 |

0.800 | 0.600 | 0.867 | 0.400 | 0.000 | −0.463 |

0.867 | 0.733 | 0.867 | 0.400 | 0.000 | −0.232 |

0.933 | 0.867 | 0.867 | 0.400 | 0.000 | 0.000 |

1.000 | 1.000 | 1.000 | 0.400 | 0.000 | 0.000 |

λ
. | λ_{1}
. | λ_{2}
. | α (kcal/mol)^{−1}
. | u_{0} (kcal/mol)
. | w_{0} (kcal/mol)
. |
---|---|---|---|---|---|

0.000 | 0.000 | 0.000 | 0.400 | 4.800 | 0.000 |

0.067 | 0.000 | 0.131 | 0.400 | 4.800 | −0.212 |

0.133 | 0.000 | 0.262 | 0.400 | 4.800 | −1.272 |

0.200 | 0.000 | 0.392 | 0.400 | 4.800 | −1.908 |

0.267 | 0.000 | 0.524 | 0.400 | 4.800 | −2.120 |

0.333 | 0.000 | 0.655 | 0.400 | 4.800 | −2.544 |

0.400 | 0.000 | 0.786 | 0.400 | 4.400 | −3.392 |

0.467 | 0.000 | 0.917 | 0.400 | 4.400 | −3.180 |

0.533 | 0.050 | 0.917 | 0.400 | 4.400 | −2.968 |

0.600 | 0.156 | 0.917 | 0.400 | 4.400 | −2.756 |

0.667 | 0.308 | 0.917 | 0.400 | 4.000 | −2.332 |

0.733 | 0.460 | 0.917 | 0.400 | 4.000 | −1.908 |

0.800 | 0.612 | 0.917 | 0.400 | 0.000 | −1.484 |

0.867 | 0.764 | 0.917 | 0.400 | 0.000 | −1.060 |

0.933 | 0.917 | 0.917 | 0.400 | 0.000 | 0.000 |

1.000 | 1.000 | 1.000 | 0.400 | 0.000 | 0.000 |

λ
. | λ_{1}
. | λ_{2}
. | α (kcal/mol)^{−1}
. | u_{0} (kcal/mol)
. | w_{0} (kcal/mol)
. |
---|---|---|---|---|---|

0.000 | 0.000 | 0.000 | 0.400 | 4.800 | 0.000 |

0.067 | 0.000 | 0.131 | 0.400 | 4.800 | −0.212 |

0.133 | 0.000 | 0.262 | 0.400 | 4.800 | −1.272 |

0.200 | 0.000 | 0.392 | 0.400 | 4.800 | −1.908 |

0.267 | 0.000 | 0.524 | 0.400 | 4.800 | −2.120 |

0.333 | 0.000 | 0.655 | 0.400 | 4.800 | −2.544 |

0.400 | 0.000 | 0.786 | 0.400 | 4.400 | −3.392 |

0.467 | 0.000 | 0.917 | 0.400 | 4.400 | −3.180 |

0.533 | 0.050 | 0.917 | 0.400 | 4.400 | −2.968 |

0.600 | 0.156 | 0.917 | 0.400 | 4.400 | −2.756 |

0.667 | 0.308 | 0.917 | 0.400 | 4.000 | −2.332 |

0.733 | 0.460 | 0.917 | 0.400 | 4.000 | −1.908 |

0.800 | 0.612 | 0.917 | 0.400 | 0.000 | −1.484 |

0.867 | 0.764 | 0.917 | 0.400 | 0.000 | −1.060 |

0.933 | 0.917 | 0.917 | 0.400 | 0.000 | 0.000 |

1.000 | 1.000 | 1.000 | 0.400 | 0.000 | 0.000 |

### C. Analysis of replica exchange efficiency

Hamiltonian replica exchange efficiency has been monitored here in terms of the extent of diffusion of replicas in the solute–solvent interaction energy space. In particular, we monitored the rate of transitions between coupled states with favorable solute–solvent interaction energies and uncoupled states with large and unfavorable interaction energy. The time trajectories of the soft-core solute–solvent interaction energies that were sampled by the replicas are shown in Fig. 6 for each alchemical hydration simulation. The number of transitions from uncoupled to coupled states (hydration) and vice versa (dehydration) is presented in Table VI.

Protocol . | $\Delta Gh\u25cb$ (kcal/mol) . | t_{eq} (ns)
. | n_{hydr}
. | n_{dehydr}
. |
---|---|---|---|---|

Ethanol^{a} | ||||

Linear | −2.23 ± 0.09 | 0.00 | 96 | 96 |

Softplus | −2.12 ± 0.11 | 0.63 | 76 | 77 |

Alanine dipeptide^{b} | ||||

Linear | −8.06 ± 0.14 | 0.13 | 0 | 2 |

Softplus | −8.25 ± 0.21 | 0.00 | 45 | 43 |

1-Naphthol^{c} | ||||

Linear | −4.53 ± 0.18 | 3.50 | 6 | 7 |

Softplus | −3.66 ± 0.14 | 0.00 | 34 | 32 |

3,4-Diphenyltoluene^{d} | ||||

Linear | −0.33 ± 0.18 | 1.38 | 0 | 2 |

Softplus | 3.51 ± 0.26 | 3.38 | 15 | 14 |

Protocol . | $\Delta Gh\u25cb$ (kcal/mol) . | t_{eq} (ns)
. | n_{hydr}
. | n_{dehydr}
. |
---|---|---|---|---|

Ethanol^{a} | ||||

Linear | −2.23 ± 0.09 | 0.00 | 96 | 96 |

Softplus | −2.12 ± 0.11 | 0.63 | 76 | 77 |

Alanine dipeptide^{b} | ||||

Linear | −8.06 ± 0.14 | 0.13 | 0 | 2 |

Softplus | −8.25 ± 0.21 | 0.00 | 45 | 43 |

1-Naphthol^{c} | ||||

Linear | −4.53 ± 0.18 | 3.50 | 6 | 7 |

Softplus | −3.66 ± 0.14 | 0.00 | 34 | 32 |

3,4-Diphenyltoluene^{d} | ||||

Linear | −0.33 ± 0.18 | 1.38 | 0 | 2 |

Softplus | 3.51 ± 0.26 | 3.38 | 15 | 14 |

^{a}

*u*_{lower} = −10, *u*_{upper} = 25 kcal/mol.

^{b}

*u*_{lower} = −35, *u*_{upper} = 25 kcal/mol.

^{c}

*u*_{lower} = −20, *u*_{upper} = 25 kcal/mol.

^{d}

*u*_{lower} = −30, *u*_{upper} = 30 kcal/mol.

The data indicate that ethanol undergoes frequent hydration/dehydration transitions with either the linear and softplus perturbation potentials. However, the bulkier solutes (alanine dipeptide, 1-naphthol, and 3,4-diphenyltoluene) undergo very few hydration/dehydration transitions with the linear alchemical potential. This behavior is due to the wide interaction energy gap between the distributions of the solute–solvent interaction energies for the coupled and uncoupled states (Fig. 3). The bulky nature of these solutes makes it very improbable for a solute molecule to find a suitable configuration in the solvent in order to make favorable interactions with the solvent when nearly decoupled from it. Conversely, these solutes are unlikely to overcome the energetic penalty necessary to become decoupled from the solvent upon making some favorable interactions. The coupled and decoupled states of the bulky solutes are essentially separated by a first-order phase transition that is entropically frustrated in the hydration direction and energetically frustrated in the dehydration direction.^{23}

On the other hand, the optimized softplus perturbation potential is very effective at circumventing the phase transition. As shown in Fig. 6 and Table VI, many more hydration/dehydration transitions are observed with the softplus potential than with the linear potential. A higher number of transitions make it possible to more rapidly equilibrate and to converge the hydration free energy estimates for the bulkier solutes without additional computational expense (see Table VI and Fig. 7).

### D. Equilibration of the hydration free energy estimates

The computed hydration free energy estimates are plotted in Fig. 7 as a function of the amount of data discarded from the start of the simulation (the equilibration time). These plots, referred to as reverse cumulative equilibration profiles,^{19,46,47} are used to determine the time after which the time series of data generated by the simulation becomes stationary and not biased by the starting conformation of the system. It is not obvious to pinpoint such a time because while the accuracy of the binding free energy presumably improves as more non-equilibrated samples are discarded, the precision of the estimate (indicated by the error bars in Fig. 7) decreases as more initial samples are discarded. Here, we take the approach of choosing the hydration free energy estimate as the value that corresponds to the shortest equilibration time, which gives a value statistically indistinguishable from those at all subsequent equilibration times.^{19}

Based on this criterion, we conclude that the hydration free energy estimate for ethanol equilibrates almost immediately after the start of the simulation with either the linear or the softplus alchemical potentials. The hydration free energy estimates for ethanol with the linear and softplus potentials are in statistical agreement (Table VI). Given the very different nature of the alchemical paths in the two simulations, we conclude that equilibration and convergence of the hydration free energies have been achieved in this case. This positive outcome is consistent with the high rate of hydration and dehydration transitions observed (Fig. 6 and Table VI) for ethanol with either the linear or softplus potential. The result for ethanol confirms the validity of the concerted alchemical protocol and the correctness of the simulation algorithms in producing a canonical ensemble of conformations with either potential.

A similar analysis for the bulkier solutes (alanine dipeptide, 1-naphthol, and 3,4-diphenyltoluene) shows that the equilibration of the concerted free energy protocol is achieved rapidly when using the optimized softplus potential (Fig. 7, second column). Based on the relatively large numbers of hydration/dehydration transitions observed for these solutes, the hydration free energy estimates are considered to be well converged. 3,4-diphenyltoluene, the largest and most hydrophobic solute, is confirmed as a stringent test case for one-step concerted alchemical protocols.^{13} Despite the relatively few hydration/dehydration transitions observed even after optimization of the softplus potential, the approach employed here appears to yield a converged and reliable hydration free energy estimate for this compound. After taking into account long-range van der Waals interactions, which are expected to contribute as much as 3 kcal/mol of additional solute–solvent interaction energy, the hydration free energy obtained here for 3,4-diphenyltoluene in a droplet is consistent with earlier calculations with periodic boundary conditions.^{13} To our knowledge, the measured hydration free energy for this compound has never been reported.

In contrast to the softplus potential, the concerted hydration alchemical protocol with the linear potential fails to reach reliable convergence for the bulkier solutes. This is indicated by the very few hydration/dehydration transitions (Table VI) and the cumulative equilibration profiles (Fig. 7, first column—alanine dipeptide, 1-naphthol, and 3,4-diphenyltoluene). For alanine dipeptide, the rapid equilibration of the hydration free energy estimate with the linear potential is likely circumstantial as it hinges on only two dehydration transitions early in the replica exchange trajectories (Fig. 6, alanine dipeptide—left panel). With the linear perturbation potential, the hydration free energy of 1-naphthol appears to equilibrate after 3 ns at a value of −4.53 ± 0.18 kcal/mol, which is statistically inconsistent with the less biased estimate of −3.66 ± 0.14 obtained with the softplus potential (Table VI and Fig. 7, 1-naphthol). The concerted alchemical hydration of 3,4-diphenyltoluene is a good example of the potential convergence pitfalls of free energy calculations. With the linear potential (Fig. 7, 3,4-diphenyltoluene), the hydration free energy appears to equilibrate after ∼1.3 ns at a very stable plateau value, which may be mistaken as a converged estimate. In actuality, the value of the plateau is not converged as confirmed by the lack of hydration/dehydration transition events after 1.3 ns of simulation per replica (Fig. 6, 3,4-diphenyltoluene—left panel).

The data collected for these solutes strongly support the finding that commonly used linear alchemical perturbation potentials are unsuitable for obtaining converged free energy estimates with a concerted alchemical process. In contrast, the optimized softplus potentials can enable rapid equilibration and convergence by promoting hydration/dehydration transitions.

## IV. DISCUSSION

Alchemical hydration free energy calculations are widely employed to predict the water solubilities of substances,^{4,48} to test force field and free energy protocols,^{49–51} and to estimate absolute and relative binding free energies of molecular complexes.^{3,52} While alternatives have been proposed,^{13,53} the guidelines that are generally accepted to avoid slow convergence and numerical instabilities involve splitting the alchemical hydration process into two steps. In the first step, volume-exclusion and dispersion solute–solvent interactions are turned on, followed by a second step in which electrostatic interactions are established.^{9} In addition, especially during the volume-exclusion phase, it is customary to employ soft-core interatomic pair potentials.^{12,26,54}

These free energy practices, which are widely implemented in popular molecular simulation packages,^{34,55–58} are generally successful and robust.^{59} Nevertheless, it would be beneficial to explore concerted alternatives with improved equilibration and convergence characteristics. The software implementations of free energy methods can also be cumbersome and challenging to integrate and maintain alongside other molecular simulation algorithms. König *et al.*^{54} recently discussed the limitations of conventional non-linear and non-algebraic forms of soft-core pair potentials and proposed a family of alchemical hybrid perturbation potentials that addresses some of the challenges. A recent study by Lee *et al.*^{13} discussed the downsides of multi-step free energy methods and called for a streamlined concerted approach that would be more easily integrated with an extended ensemble and self-adjusting conformational sampling algorithms, as well as non-equilibrium approaches.^{60–62} Lee *et al.*^{13} proposed a family of soft-core pair potentials and non-linear alchemical hybrid perturbations that facilitate the calculation of hydration free energies in a concerted fashion.

As shown in this work, concerted alchemical protocols present unique computational challenges that have likely discouraged progress in earlier attempts. We have shown, for example, that conventional linear alchemical interpolating potentials can introduce severe entropic bottlenecks that preclude the rapid equilibration between the hydrated and dehydrated states of the solute, which result in strongly biased free energy estimates.

In a recent study,^{20} we identified pseudo-first-order phase transitions along the alchemical path as the critical and fundamental obstacles to the application of concerted alchemical processes for molecular binding. In the same study, we addressed the issue by applying Hamiltonian-shaping techniques inspired by non-Boltzmann sampling methodologies developed by Straub and co-workers^{23,63} for the study of temperature-driven first-order phase transitions. Using this approach, we were able to identify alchemical paths that avoid or soften biphasic behavior in the context of binding with an implicit description of the solvent.^{20} In the present study, we demonstrate that these similar techniques can also be applied to the estimation of hydration free energies with explicit solvation. Critical to this approach is a new family of alchemical perturbation functions presented here that is potentially of broader applicability. We also illustrate a theoretical framework and a graphical procedure to optimize them in order to systematically improve conformational sampling and the rate of equilibration and convergence of free energy estimates.

In this work, we considered the hydration of solutes into a water droplet, a simplified system that is a first, yet representative validation of the method. The droplet model is likely not optimal for the accurate estimation of gas phase hydration free energies. It was selected mainly for expediency in order to validate the theory without extensive modifications of our Single Decoupling Method (SDM) software that was originally developed alongside an implicit description of the solvent.^{20} In SDM, the perturbation from the coupled state to the uncoupled state of the molecular complex is achieved by displacing the ligand out of the receptor binding pocket and into the surrounding implicit solvent bulk. The same approach would not work for decoupling the solute molecule from an explicit solvent system with periodic boundary conditions because a periodic boundary system does not have an outer region where the solute can be placed in. A software implementation of our method for hydration free energy estimation with periodic boundary conditions would require the implementation of solute–solvent decoupling through the tuning of force field parameters. While OpenMM supports this procedure,^{64} our immediate focus is to build upon the results of this work to develop concerted binding free energy estimation protocols with explicit solvation and periodic boundary conditions for which the displacement approach does not suffer similar limitations. This work is ongoing and will be reported in upcoming publications.

The theoretical, algorithmic, and numerical simulation strategies presented here and the promising results illustrated here indicate a possible roadmap to streamline the implementation of alchemical protocols in molecular simulation packages. The first suggestion is to replace soft-core pair potentials with, as done here, a soft-core function applied to the solute–solvent interaction energy [e.g., using Eqs. (7) and (8)]. This approach would greatly simplify the implementation of alchemical protocols in molecular dynamics packages by making it unnecessary to modify the core energy and force subroutines. While the method used here requires the re-processing of the forces, this is cleanly and efficiently accomplished in our implementation by a *O*(*N*) loop coded in a separate small subroutine of the molecular dynamics integrator. The family of alchemical potentials introduced here also simplifies the thermodynamic reweighting procedure with MBAR or UWHAM^{15,27} for free energy estimation. Alchemical implementations using the molecular dynamics engine with soft-core pair potentials require the re-evaluation of each sample’s perturbation energy for each *λ* state on the alchemical path.^{14} This cumbersome process is completely bypassed in our implementation. The alchemical potential introduced here is a one-to-one function of the solute–solvent interaction energy, which can be saved together with each trajectory frame and manipulated algebraically with minimal effort, using Eq. (3), to yield the perturbation energies at any of the alchemical states.

The second suggestion emerging from this study is the use of a non-linear alchemical perturbation that, like the softplus potential we propose [Eq. (5)], judiciously warps the alchemical path in critical regions to avoid phase transitions while simultaneously retaining a linear character away from the problematic regions. We found that adequately designed softplus alchemical schedules can significantly enhance the rate of equilibration and convergence of the free energy estimates. In this work, we employed the *λ*-function formalism^{20} [Eqs. (15) and (16)] to optimize the alchemical schedules. We adjusted the softplus functions graphically to avoid multiple intersections between the derivative of the softplus function and the *λ*-functions (Fig. 4). The predictions based on the *λ*-functions were found to be rather robust. They immediately resulted in alchemical simulations with mostly unimodal and smooth-varying distributions of the perturbation energy without further trail and error. We envision that this method can be automated to implement adaptive optimization algorithms that adjust the alchemical schedule during the simulations. We also envision that the *λ*-function formalism can be useful for developing and assessing the next generation of alchemical algebraic perturbation functions^{13,54} to yield more robust free energy estimation tools.

## V. CONCLUSIONS

We employ a statistical analytical theory of molecular association^{22} and a single-decoupling alchemical method for binding free energy estimation with implicit solvation^{20,23} to develop a concerted alchemical protocol for the calculation of hydration free energies of small- to medium-sized molecules with explicit solvation. The concerted alchemical hydration protocol introduced here involves a single alchemical transformation rather than the more commonly employed pair of distinct decoupling transformations for electrostatic and steric/dispersion interactions. The approach and its benefits are illustrated for model systems involving the hydration of four diverse solutes in a water droplet. This proof of principle study paves the way for a new generation of streamlined concerted free energy estimation algorithms in condensed phases.

## AUTHORS’ CONTRIBUTIONS

S.K. and S.A. authors contributed equally to this work.

## ACKNOWLEDGMENTS

We acknowledge support from the National Science Foundation (NSF CAREER Grant No. 1750511 to E.G.). Molecular simulations were conducted on the Comet GPU supercomputer cluster at the San Diego Supercomputing Center supported by NSF XSEDE Award No. TG-MCB150001.

## DATA AVAILABILITY

The software and the input files used to generate the data are openly available in Github at https://github.com/egallicc/openmm_sdm_workflow, https://github.com/rajatkrpal/openmm_sdm_plugin, and https://github.com/egallicc/async_re-openmm, Refs. 65–67. The molecular dynamics trajectories that support the findings of this study are also available from the corresponding author upon reasonable request.