We present our blind prediction of the toluene–water partition coefficients in the context of the SAMPL9 challenge. For the calculation of the solvation free energies in water, toluene, and 1-octanol, we used an efficient MD-based nonequilibrium alchemical technique relying on the GAFF2 non-polarizable force field. The method is based on the fast-growth of an initially decoupled solute. Canonical sampling of the associated end-state is efficiently obtained by performing a Hamiltonian replica exchange simulation of the gas-phase solute molecule alone, combined with equilibrium configurations of the solvent. Before submitting the prediction, a pre-assessment of the method and of the force field was made by comparing with the known experimental counterpart the calculated octanol–water partition coefficients using different set of atomic charges. The analysis allowed to optimize our blind prediction for the toluene–water partition coefficients, providing at the same time valid clues for improving the performance and reliability of the non-polarizable force field in free energy calculations of drug-receptor systems.

## I. INTRODUCTION

The Statistical Assessment of the Modeling of Proteins and Ligands (SAMPL) is a well-established international initiative for advancing the computational techniques in drug design.^{1–5} New experimental data, such as host–guest dissociation free energies, hydration free energies, acid–base dissociation constants, or partition coefficients, are undisclosed to participants until the prediction submission deadline so that the true predictive power of methods can be assessed.

In modern methodologies based on atomistic molecular dynamics (MD) simulations,^{6–11} SAMPL challenges are aimed to a two-fold scope, i.e., that of testing the accuracy of the physical model or “force field,”^{12–20} often leading to systematic errors, and that of assessing the reliability of a computational procedure or protocol, likely producing random errors. In the latest SAMPL9 challenge,^{21} participants were required to predict the toluene/water partition coefficients, LogP_{tw}, for a series a compounds with disparate flexibility or molecular weight and coarsely spanning a significant portion of the “drug-like” chemical space, including moieties such as carboxyl, carbonyl, sulfonic, oxydryl, amino, amide, halogen, phenyl, hetero-cyclic, and alkyl (see Fig. 1). Given that the drug–receptor binding affinity is implicitly driven by the difference of two solvation free energies, namely, that of the drug when bound to the receptor and that of the drug in bulk solvent,^{22–24} the capability of accurately predicting partition coefficients, strictly related to solute transfer free energy between the two solvents, is an essential requirement for applying any methodology to the challenging case of drug design.

Partition coefficients are generally computed using very efficient knowledge-based (KB) empirical techniques, such as XlogP3,^{25–27} or using *ab initio* (QM) approaches mixed with trained parametric models for the polarization continuum, as in the Consortium for Small Scale Modeliing (COSMO)^{28} or the solvation model based on density (SMD)^{29} implicit solvation models. In SAMPL challenges, these approaches generally outperform^{30,31} much more costly MD methodologies, based on the *explicit* representation of the solvent to account for micro-solvation phenomena and largely relying on the so-called alchemical protocol.^{6,7,32} However, when the KB methodology or QM-based techniques are applied to the calculation of host–guest or drug–receptor binding affinities, their performances are significantly degraded.^{3,5} On the contrary, the reliability of the MD-based approaches is much less affected when passing from LogP predictions to drug–receptor or host–guest binding affinities calculations,^{33–36} systematically resulting, in this last case, as the best performing methods.^{3,5} Apparently, as demonstrated in SAMPL challenges, QM or KB methods are presently unable to effectively cope with the heterogeneity of the solute environment in the calculation of the solvation free energy of a host-bound solute. Hence, testing an MD-based approach on a LogP blind prediction involving bulky and flexible compounds with diverse chemical groups is important since a successful protocol in this context has a good chance to produce reliable predictions when transferred to the more challenging case of drug–receptor binding affinities.

Most of the MD submissions in SAMPL challenges are based on the well-established alchemical approach, whereby the free energy between the two end-states (solvated solute and gas-phase solute) is recovered by setting up a so-called *λ* stratification of *n* intermediate non-physical states where the interaction between the solute molecule and the explicit solvent molecules is progressively decoupled via the alchemical parameter *λ*.^{6,37} The *n* intermediate states at the fixed coupling parameter *λ* are chosen so that the potential energy distributions of contiguous *λ* states have a significant overlap,^{6} allowing the calculation of the solvation free energy as a sum of *n* − 1 individual free energy contributions along the *λ*-stratification by way of free energy perturbation^{38} (FEP). FEP is an *equilibrium* technique requiring canonical sampling on the end-states and on each of the intermediate *λ* states. For flexible compounds with many rotatable bonds and high torsional barriers, canonical sampling can be problematic especially when dealing with solute molecules that are characterized by a complex conformational landscape, requiring long simulations whose convergence depends in an unknown way on the level of the *λ* coupling parameter.^{39–41} In such cases, it may be necessary to combine the *λ*-stratification approach with costly enhanced sampling techniques, such as Hamiltonian Replica Exchange^{41,42} (HREM).

In Refs. 33 and 34, we have recently analyzed and discussed the reliability of unidirectional or bidirectional nonequilibrium techniques (NE) in the calculation of solvation free energies. In the alchemical NE approach, the end-states are connected, rather by a stratification of *equilibrium* states, by a swarm of fast NE independent trajectories (initiated from canonically sampled end states) where the solute–solvent coupling parameter is continuously varied. The solvation free energy is recovered from the NE work data, exploiting the Crooks theorem^{43} via the Bennett acceptance ratio^{44,45} (BAR) if the NE transitions are performed in both senses or using the Jarzynski theorem^{46} or the Gaussian assumption^{47,48} if the NE transformations are conducted in one sense. Unlike in FEP, in the NE variant of the alchemical approach, canonical sampling is required *only at the end states*. This is a huge advantage with respect to FEP, especially if the NE transformations are performed only in the direction of the fast growth (NE-FG), i.e., starting from an end-state where the solute is completely decoupled from the solvent. In this case, canonical sampling can be reliably and cheaply obtained by performing an enhanced sampling of the gas-phase solute molecule alone, combined with equilibrium configurations of the solvent.

In this paper, we present our blind predictions on LogP_{tw} for the 16 compounds shown in Fig. 1 proposed in the latest SAMPL9 challenge. Given the expected confidence interval for the NE-FG method,^{33,34} the availability of experimental LogP_{ow}^{49} provided the opportunity to attempt to disentangle the systematic error due to the force field (mostly affected by that due to the fixed-charge representation^{50}) from the methodological error, hence improving our blind prediction for the SAMPL9 LogP_{tw}. Our submission had a good ranking among all submitted blind predictions (mostly based on empirical or QM methodologies), revealing as the best-performing and, at the same time, most efficient approach among the MD-based submissions.

## II. THEORETICAL ASPECTS

*canonical*sampling of the initial end-state, ending in a

*nonequilibrium*sampling final end-state. In the annihilation direction, one needs to prepare a canonical sampling of the solvated compound, a task that for flexible and bulky solutes, such as those of Fig. 1, may require the usage of HREM on the full system. In the growth sense, the initial equilibrium states can be prepared by sampling the conformations of the isolated (gas-phase) molecule using HREM and then by combining such gas-phase configurations with equilibrium configurations of the pure solvent. This approach has the advantage to apply the HREM to a single molecule system with a very limited computational cost, while a

*single*ordinary MD simulation can be used for the sampling of the solvent configurations to be combined with the HREM sampling of any given solute. The fast-growth procedure for the calculation of the LogP is illustrated in Fig. 2. For each NE-FG trajectory in a given solvent, the externally driven

*λ*parameter does on the system the work $W=\u222bo\tau \u2202H(\lambda )\u2202\lambda \lambda \u0307dt$, where

*τ*is the duration of the NE trajectories. The corresponding solvation free energy Δ

*G*

_{s}is then computed from the work histogram values using

*T*is the temperature and

*R*is the gas constant. Equation (1) is valid and

*exact*if and only if the work distribution is normal and can be straightforwardly proved using the Crooks theorem.

^{47,51}If the distribution is non-Gaussian, Eq. (2) must be used. The latter is based on the Jarzynski exponential average plus the bias correction

*B*

_{J}(

*n*,

*σ*) due the finite size of the work sample.

^{52}Whenever normality of the work distribution can be reliably assumed, Eq. (1) provides an

*unbiased*estimates of the solvation energy and should hence be preferred to the Jarzynsky-based estimate [Eq. (2)]. For work samples with

*n*≥ 100, normality can be ruled with 95% confidence if the quantity

*A*

^{2}of the Anderson Darling (AD) test is larger than 0.752.

^{53,54}A probability exceeding 50% from the

*p*-value

^{55}associated with AD that the distribution is normal is obtained when

*A*

^{2}≤ 0.34. Given a

*n*value work histogram with variance

*σ*

^{2}, the bias

*B*

_{J}in Eq. (2) can be estimated by sampling

*n*values from a normal distribution $N$ with the same variance and zero mean and by comparing the exact value for the associated free energy, $\Delta GN=\u2212\beta \sigma 2/2$, with that obtained from the exponential average on the normally distributed

*n*points with

*σ*

^{2}variance, $\Delta GJ(n,\sigma )=\u2212RT\u2061ln(1n\u2211i\u2208Ne\u2212\beta Wi)$, yielding the bias as $Bj(n,\sigma )=\Delta GN\u2212\Delta GJ(n,\sigma )$. In the assumption that the initial sampling of the fast-growth process is canonical (see Fig. 2), the parameters affecting accuracy and precision of the method are the number of NE-FG trajectories and the time duration

*τ*of the NE-FG trajectories, related to the resolution of the work distribution entering the solvation free energy. In the contour plots of Fig. 3, we show the convergence of the solvation free energy for the SM02 molecule in water (left) and the SM13 molecule in octanol [computed with Eq. (1) or Eq. (2) depending on the character of

*P*(

*W*)] as a function of

*τ*and of the number of sampled work values.

The work data were taken from Ref. 33. The green arrow on the z-scale indicate the reference value computed with BAR using 480 trajectories lasting 2.6 ns. The rectangle marked with a green grid corresponds to a plateau where the solvation free energy estimate is basically constant and coincident with the BAR reference value. We can see that NE-FG estimates of then solvation free energies for these two molecules are already close to the reference value for *n* = 100 and *τ* = 450 ps for SM02 in water and for *n* = 100 and *τ* = 600 ps SM13 in octanol.

_{ab}is computed from the difference of the solvation free energies of the solute in the two solvents

*a*,

*b*as

## III. METHODS

The starting structures for the solute was prepared with OpenBabel, by converting the SMILES code to a PDB file, using the –gen3D option.^{56} OpenBabel creates first a non-optimal structure using fragment templates followed by a steepest geometry optimization with the MMFF96 force field.^{57} Fixed atomic charges in our blind submission are readily computed on the OpenBabel-generated structure at the popular AM1-BCC level with the prospect of an easy and automatable implementation of our NE-FG protocol for LogP and host–guest predictions. We used the GAFF2^{17} force field for the compounds of Fig. 1 and for the 1-octanol and toluene solvent molecules. The force field assignment and AM1-BCC charges were obtained from the web tool “PrimaDORAC”^{19} (www1.chim.unifi.it/orac/primadorac). Given the sensitivity^{50} to atomic charges of the solvation free energies, we tested two sets of atomic charges for the solutes, namely, the AM1-BCC charges^{58} from PrimaDORAC and the Electrostatic Potential charges (ESP) computed at the HF/6-31Gd level according to the Merz–Kollman scheme with dipole restraint.^{59} For water, we used the OPC3 model.^{60}

We must warn that the choice of the initial 3D structure for *fixed* atomic charge determination may have a non-negligible impact on the finally accuracy. In this respect, iterative refitting or adaptive parameterization of the atomic charges based on the gas-phase and bulk conformational sampling may be of help in the selection of the best conformation-dependent fixed atomic charges, as shown in a recent SAMPL challenge study.^{61} However, part of the emphasis on the present sample LogP challenge was on the benefit/cost ratio. Participants were, in fact, required to provide details on the computational resources used for their predictions. Smart adaptive methodologies for optimization of 3D-dependent atomic charges, such as those described in Ref. 61, are expensive and system dependent and, hence, are not suited for low-cost automatable screening protocol in drug design.

HREM of the gas-phase of the solute in the NVT ensemble^{62} at 300 K was done by scaling only the potential energy of the isolated molecule with a minimum scaling factor of 0.1, corresponding to a “temperature” of 3000 K using eight replicas with scale factor progression^{63} given by *c*_{m} = *c*^{(m−1)/8}, *m* = 1…8. HREM simulations lasted 8 ns, collecting 96 solute conformations on the target state (T = 300 K).

Pure solvents were equilibrated in the constant temperature constant pressure ensemble (NPT) ensemble for 1 ns with *P* = 1 atm and *T* = 300 K, imposed via a Parrinello–Rahman Langrangian with isotropic stress tensor^{64} and Nosé thermostats coupled to the translational and internal degrees of freedom of the system. 96 configurations (to be combined with the decoupled 96 configuration of the solute) were collected in the following 1 ns of production run. Solvents were simulated in cubic boxes under periodic boundary conditions (PBCs) with 1728 for water, 343 for toluene, and 216 for 1-octanol, yielding a mean side-length of 37.43, 39.38, and 38.62 Å for water, toluene, and 1-octanol, respectively. The calculated densities for 1-octanol and toluene were 0.81 g/cm^{3} (experimental value 0.82 g/cm^{3}) and 0.86 g/cm^{3} (experimental value 0.87 g/cm^{3}), respectively. The density of OPC3 water was 0.992 compared to the experimental value of 0.997.

The (*n* = 96) NE-FG simulations, starting from the canonical sampling obtained by combining the HREM gas-phase configurations of the solute molecules with that of the pure solvent, were launched in the NPT ensemble. The *λ* alchemical coupling parameter was switched on in *τ* = 450 ps for toluene and water and in 600 ps for 1-octanol, according to a common protocol whereby the Lennard–Jones interactions were first turned on using a soft-core regularization^{65} in 300 ps for water and toluene and in 390 ps for 1-octanol, followed by the electrostatic interactions in the last 150 ps for water and toluene, and in the last 210 ps for 1-octanol. The choice of *n* and *τ* for the parameters in the NE-FG runs was dictated by the results on past SAMPL challenges on drug-like molecule, as shown in Fig. 3.

In all simulations, the equations of motion were integrated using a multiple time-step r-RESPA scheme^{64,66} imposing constraints only on the X–H bonds with X being any heavy atom. The long range cut-off for Lennard–Jones interactions was set to 13 Å in all cases. Long range electrostatic were treated under PBC using the smooth particle mesh Ewald method,^{67} with an *α* parameter of 0.38 Å^{−1}, a grid spacing in the direct lattice of about 1 Å, and a fourth order B-spline interpolation for the gridded charge array.

The NE-FG calculations were run on the CRESCO6 cluster^{68} using the ORAC program.^{69} For the calculation of a solvation energy, the associated parallel job engaged 576 cores (96 MPI process each using six OpenMP threads) lasting about 1.5 and 2 wall-time clock hours for water/toluene and 1-octanol, respectively.

## IV. RESULTS AND DISCUSSION

The SAMPL9 challenge calls for the prediction of the toluene/water partition coefficients of the compounds of Fig. 1 for which no experimental results were available. Given the sensitivity of solvation free energies to the selected atomic charges in non-polarizable force fields,^{50} before submitting our prediction, we made a pre-assessment of the NE-FG performance with the GAFF2 atomic types force field on the LogP_{ow} coefficients (for which experimental data are available from the PUBCHEM database^{49}) using two sets of atomic charges, namely, (i) the AM1-BCC charges, consisting in the Mulliken charges computed using the semiempirical AM1 method^{70} plus a bond-charge correction tabulated in Ref. 58, and (ii) the ESP charges computed at the HF/6-31Gd level. Both these approaches are widely adopted in free energy calculations based on fixed-charge force fields

In Fig. 4, we report the mean unsigned (MUC) and signed charge (MSC) difference between the AM1-BCC and HF/6-31Gd charges averaged on all the atoms in the SAMPL9 compounds belonging to a given atomic GAFF2 type. Atoms belonging to the same atomic type are characterized by a common chemical environment and are hence expected to bear similar charges. Hence, when MSC is positive, it implies that the HF/6-31Gd average charge for that atomic type is systematically lower than the corresponding AM1-BCC charge. On the other hand, where the difference was random within a given atomic type, we should see MSC close to zero and MUC of the order of the mean unsigned deviation between the two sets of charges. In some cases, MUC and MSC are significant $(>0.2e)$ and almost coincident, indicating that the HF/6-31G charges are systematically more *negative* with respect to the AM1-BCC counterpart. This occurs, in particular, for the nitrogen atoms in aromatic systems, in amino and amide groups, bearing, in general, a much more pronounced negative charge when ESP charges are evaluated at the HF/6-31Gd level. For the carbonyl carbon, MUC and MSC are similar in magnitude but with opposite sign MSC, implying the HF/6-31Gd charge is systematically more positive with respect to the AM1-BCC charge. In general, we may say that when using the HF/6-31Gd, we observe a higher charge separation within the molecule. We may therefore expect that the solvation free energy is affected by these differences in the two sets of charges in manner that depends on the polarity of the solvent.

In Tables S1 and S2 of the supplementary material, we report the detailed results for the NE-FG-computed solvation free energy estimates in the three solvents for the compounds of Fig. 1 using the AM1-BCC atomic charges and the HF/6-31Gd ESP charges, respectively. The reported estimates in Tables S1 and S2 are based on Eq. (1) if *A*^{2} < 0.34, implying a *p*-value >50% (i.e., the null hypothesis referring to the normal distribution is true with a probability exceeding 50%), and on Eq. (2) otherwise. The errors (95% confidence interval) have been computed by bootstrap with resampling from the 96 work data.

In Fig. 5, we show the differences in the solvation energy with the two sets of charges in the three solvents. We may say, in general, that the use of the HF/6-31Gd charges produces more negative solvation free energies in all solvents. Differences tend to be higher in water with respect to less polar solvents, such as 1-octanol of toluene. More in detail, the free energy gain when switching form AM1-BCC to HF/6-31Gd charges significantly increases in water (by up to 7 kcal/mol in compound 8) in all cases except for compound 9 and 16 where the solvation free energy computed with the HF/6-31Gd charges is similar to that evaluated using the AM1-BCC charges. This result is consistent with the fact that, in general, charge separation is higher in the HF/61Gd charges with respect to the AM1-BCC charges, and the free energy gains are expected to increase with increasing polarity of the solvent, hence preventing a compensating effect in the calculated LogP between a non-polar solvent and water when switching between the two sets of atomic charges.

The reliability of NE-FG calculated LogP_{ow} can be assessed in Fig. 6 where we report the correlation plots between experimental and NE-FG-calculated LogP_{ow} [computed from the solvation energies of Tables S1 and S2 by way of Eq. (3)] with the different charge sets. In one case (plot on the top left in Fig. 6), we used the AM1-BCC charges of the solute in both water and 1-octanol. In a second case (top central plot), we used the HF/6-31Gd ESP charge in both water and 1-octanol. Finally, in the third case (right top plot), we used the HF/6-31Gd ESP charges for water and the AM1-BCC charges for 1-octanol. The usage of two different set of charges in the two solvents is justified by the fact that charge separation on a molecule is a function of the polarity of the solvent so that the large HF/6-31Gd charges should be appropriate for water, while the smaller AM1-BCC charges should be used in the much less polar 1-octanol solvent. Correlation metrics for the plots shown in Fig. 6 is collected in Table I.

Charge set . | R
. | a
. | b
. | MAE . | MSE . | τ
. |
---|---|---|---|---|---|---|

AM1-BCC | 0.83 | 1.22 | 2.30 | 2.97 | −2.92 | 0.62 |

HF/6-31Gd | 0.88 | 1.38 | 0.88 | 1.98 | −1.95 | 0.60 |

Mix | 0.76 | 1.41 | −0.33 | 1.63 | −0.81 | 0.55 |

xLogp3 | 0.89 | 0.79 | 0.85 | 0.57 | −0.27 | 0.62 |

Charge set . | R
. | a
. | b
. | MAE . | MSE . | τ
. |
---|---|---|---|---|---|---|

AM1-BCC | 0.83 | 1.22 | 2.30 | 2.97 | −2.92 | 0.62 |

HF/6-31Gd | 0.88 | 1.38 | 0.88 | 1.98 | −1.95 | 0.60 |

Mix | 0.76 | 1.41 | −0.33 | 1.63 | −0.81 | 0.55 |

xLogp3 | 0.89 | 0.79 | 0.85 | 0.57 | −0.27 | 0.62 |

We can see that while correlation, as expressed by the Pearson and rank Kendall coefficients, is good and comparable to that of the reliable xLogP3 estimate in all cases, the mean absolute error (MAE) and MSE are found to critically depend on the selected charge set, highlighting the sensitivity of the solvation energies and LogP to electrostatic interactions. In this regard, we note that both MAE and MSE improve significantly when switching from AM1-BCC to HF/6-31Gd charges with correlation still being very strong. The slope and intercept of the best fitting line, on the other hand, are still far from their corresponding ideal value of 1 and 0, values that are more closely approached by the xLogP3 predictions. The best MAE and MSE were obtained when we used the mixed approach, i.e., HF/6-31Gd charges in water and the AM1-BCC charges in 1-octanol.

In the bottom plots of Fig. 6, we show the experimental and corresponding NE-FG-calculated LogP_{ow} for each of the 16 compounds of Fig. 1. For the AM1-BCC charge set in both solvents (bottom left plot in Fig. 6), all LogP_{ow} of the nitrogen containing compounds are overestimated except for compounds 10 and 12, which are in good agreement with experiment. Compound 10 does not contain nitrogen, while compound 12 has one amide nitrogen. Switching to the HF/6-31Gd charges in both solvents (bottom central plot in Fig. 6), overestimation is tamed and the calculated LogP_{ow} of compound 6 (besides 10 and 12) is now in excellent agreement with the experimental counterpart.

Finally, when using HF/6-31Gd charges for the solute in water and AM1-BCC charges in 1-octanol, the agreement with experiment becomes satisfactory for all compounds except for four clear outliers, namely, compounds 3, 5, 9, and 16, where still we have a significant overestimation of LogP_{ow}, comparable to that observed in the two previously discussed charge sets. On the other hand, the decisive improvement of LogP_{ow} of compounds containing one or more sp_{2}/sp_{3} nitrogen atoms in cycles (1, 4, 5, 7, 11, 13, 14, 15, and 16) with respect to LogP_{ow} computed with the AM1-BCC set leads us to believe that AM1-BCC charges on nitrogen containing cycles do not, in general, reflect the real electron density in hetero-cycles in water, possibly due to an effect of the nitrogen lone pair^{71} and/or polarization of the solvent. Such systematic overestimation of LogP_{ow} when using AM1-BCC charges in both solvents for solutes with nitrogen containing cycles (SM41-SM46) was also detected in the SAMPL7 batch of compounds in our NES-1 (GAFF2/OPC3)G submission as well as in other FEP-based alchemical submissions.^{31}

The outliers in the mixed charge set, 3, 5, 9, and 16 all share a tertiary amino group where the GAFF2 atomic type of the nitrogen is n3 (see Fig. 4). Compound 16 has five nitrogen atoms in cycles and a chlorine atom where the role of the positive charge on the extra center^{72} due the so-called “sigma-hole” (not included in our model for compound 16) may be, in part, responsible for the observed discrepancy. It is worth mentioning that substantial overestimation of MD-based LogP_{ow} with the GAFF2 parameterization for tertiary amine solutes with n3 type was also detected in Ref. 50.

The mixed approach can be considered as a sort of zero-order polarizable model, where we use a stronger charge separation (see Fig. 4) and, hence, higher electric moments in the solvent with a high dielectric constant where the mean field induction effects are supposed to be more important. Such an approach has, however, drawbacks since we still have four clear outliers (see Fig. 6) and *R* and *τ* correlation coefficients inferior to those observed using the same charge set in both solvents (see Table I). While most of the solvation free energy in non-polar solvents is recovered when switching on the Lennard-Jones potential^{36} and the electrostatic contribution is minor, in water, the recharging stage produces by far the major contribution to the free energy.^{36} If the solute atomic charges do not reflect with sufficient accuracy the electron density in liquid water, large errors in LogP may arise. We can hence infer that most of the observed discrepancies in the mixed approach are probably due to a non-optimal charge parameterization for the solute *in water*, in particular, for the four outliers when using the mixed charge set. On the overall, the analysis of LogP_{ow} data reported in Fig. 6 appears to confirm^{31,50} that the parameterization of electrostatic interactions in the popular GAFF2 force field for tertiary amino compounds or for solute bearing sp_{2} or sp_{3} nitrogen containing cycles should be carefully revised. Based on these results, we decided to submit our prediction for the unknown LogP_{tw}, exploiting the knowledge of experimental LogP_{ow} and using the solvation energies computed in the less polar solvents toluene and 1-octanol with the AM1-BCC charges set as reported in Table S1 of the supplementary material. The LogP_{tw} toluene–water partition coefficients in our submission have been computed as LogP_{tw} = LogP_{to} + LogP_{ow}(Exp).

Results for this prediction (named logP_NE-FG on the SAMPL9 GitHhub site^{21}) are collected in Fig. 7, where, on the left, we report the experimental and calculated LogP_{tw} with 95% confidence interval for the SAMPL9 compounds and, on the right, the corresponding correlation plot. Experimental and NE-FG-calculated values are well correlated, yielding a Pearson coefficient (*R*) of 0.83, a Kendall rank coefficient (*τ*) of 0.58, a mean absolute error (MAE) of 1.12 LogP units, and a Lin concordance coefficient (CCC) of 0.82.

The latter metrics combines^{73} in a single coefficient a measure of the accuracy, as assessed by the MAE, and of precision in ranking as assessed by the correlation coefficient *R*. Our blind prediction ranks fifth for the MAE and sixth for the CCC among the 20 submitted predictions in the challenge. Compound 8 (Glyburide), in our case as well as in most of the MD-based predictions,^{74} is a clear outlier, with a discrepancy between experimental and calculated value by more than four LogP units. In parentheses in the correlation plots of Fig. 7, we have reported the values of *R*, *τ*, MAE, and CCC computed by discarding the outlier glyburide, with *R* hitting 0.90 and with a MAE below 1 LogP unit.

It should be stressed that the deviations from the experimental values in our blind prediction mirror the discrepancies between the calculated and experimental transfer free energy from 1-octanol to toluene (evaluated from the knowledge of experimental LogP_{ow} and LogP_{tw}; see the caption of Fig. S1 in the supplementary material). In nine cases, these differences (see Fig. S1 of the supplementary material) are below 1 kcal/mol, and in only one case (compound 8), they exceed 2 kcal/mol. Glyburide (8) is a challenging test for unidirectional NE-FG calculations. Such a compound is the bulkiest (61 atoms) among all SAMPL9 molecules, bearing amide, halogen, sulfonic, phenyl, oxo, and amino moieties and a rotatable bond count of eight. The main source of the error for compound 8 is methodological, as it can be assessed by the unusually large confidence intervals found for the solvation energy in 1-octanol and in water (see Table S1 of the supplementary material). The statistical uncertainty of the solvation energy is strictly related to the variance of the fast-growth work distribution. The latter, in turn, is directly proportional to the dissipation of the NE unidirectional process. Hence, the NE growth in 1-octanol of the conformationally complex and bulky Glyburide is a strongly dissipative process because of the many disparate metastable conformations that the solute can exhibit in solution. In the case of 1-octanol, such large dissipation is due to the fact that the solvent is slow in rearranging the solvation shell, while the solute–solvent interaction is being switched on. In other words, 1-octanol as a solvent experiences a larger resistance to movement or diffusion of neighboring portions relative to one another especially for large solutes, hence exhibiting a much larger viscosity with respect to toluene.

In water, whose viscosity is only marginally larger than that of toluene, the large dissipation is likely due to a combined effect of the conformational disorder and the polar nature of many glyburide moieties engaging in a variable number of strong H-bonds upon recharging, hence widening the work distribution. Very likely, the ≃4, 3.5 kcal/mol wide growth work distribution of compound 8 in 1-octanol and in water, respectively, is given by a mixtures of normal distributions due to disparate gas-phase starting conformations.^{75} The number of work values (96) is too low to afford a reliable estimate of the coefficients of a mixture of Gaussian via, e.g., the expectation-maximization algorithm,^{76,77} and the bias of the Jarzynski unidirectional estimate (see the section titled “Theoretical Aspect”), in this case, is overestimated, leading to a corresponding strong overestimation of the solvation energy in 1-octanol, negatively affecting the Log_{to} estimate used in our blind prediction.

## V. COMPARATIVE ANALYSIS OF THE LOGP SAMPL9 SUBMISSIONS

Performances of all SAMPL9/LogP submissions are reported in Table S3 of the supplementary material. As expected, the best-performing methods are low-cost knowledge-based empirical approaches. Most of the QM-based approaches had an excellent correlation as measured by *R* and *τ*, but they show an accuracy (measured by the MAE) somewhat below expectations compared to previous SAMPL results,^{30,31,78,79} probably due to the not sufficiently optimized parameterization of the implicit model for the unusual toluene solvent. A full analysis including correlation plots for all submissions and histograms of deviations for all compounds can be found in Ref. 74

Here, we are interested in comparing MD-based submissions by assessing accuracy and precision in ranking (i.e., MAE and *R*), with an eye on the benefit/cost ratio, an implicit requirement in this challenge, given that participants were requested to detail the central processing unit (CPU)/graphics processing unit (GPU) cost of their protocol in the submission template. We recall that MD-based approaches are systematically found among the best-performing methods when dealing with drug–receptor or host–guest complexes. In Table II, we succinctly report the main methodological aspects of the eight MD-based submissions, along with their performances. Submissions are ranked according to the Lin concordance correlation coefficient, CCC. CCC ranges from −1 and 1 and cannot exceed the Pearson correlation coefficient *R*. CCC effectively^{73} measures the inter-rater reliability among two independent observables, combining the *R* metrics regarding ranking and the MAE outcome sensing the accuracy.

Subm. . | Method . | FF . | Charges . | Ranked . | t_{sim}/ns
. | CCC . | MAE . | R
. |
---|---|---|---|---|---|---|---|---|

VoltzLab | FEP/alchemy/EE | OpenFF-2.0 | AM1-BCC | T | ≃3 × 10^{4} | 0.82 | 1.26 | 0.90 |

NE-FG | NE/alchemy | GAFF2 | AM1-BCC | T | 90 | 0.82 | 1.12 | 0.83 |

Beckstein–Iorga | FEP/alchemy | GAFF/TIP3P | AM1-BCC | F | ≃250 | 0.75 | 1.50 | 0.79 |

Beckstein–Iorga | FEP/alchemy | OPLS/M24 | mol2ff | T | ≃250 | 0.73 | 1.72 | 0.88 |

Beckstein–Iorga | FEP/alchemy | OPLS/TIP4P | CM1A | F | ≃250 | 0.72 | 1.88 | 0.91 |

Sprick | FEP/alchemy/HREM | GAFF/TIP3P | IpolQ-Mod | T | 240 | 0.47 | 3.03 | 0.66 |

Oxford | E-S/MCC | GAFF2/TIP3P | AM1-BCC | T | 1200 | 0.42 | 1.78 | 0.44 |

MD (Patel) | FEP/alchemy | CGenFF/TIP4P | cGenFF | F | 216 | 0.21 | 2.63 | 0.25 |

Subm. . | Method . | FF . | Charges . | Ranked . | t_{sim}/ns
. | CCC . | MAE . | R
. |
---|---|---|---|---|---|---|---|---|

VoltzLab | FEP/alchemy/EE | OpenFF-2.0 | AM1-BCC | T | ≃3 × 10^{4} | 0.82 | 1.26 | 0.90 |

NE-FG | NE/alchemy | GAFF2 | AM1-BCC | T | 90 | 0.82 | 1.12 | 0.83 |

Beckstein–Iorga | FEP/alchemy | GAFF/TIP3P | AM1-BCC | F | ≃250 | 0.75 | 1.50 | 0.79 |

Beckstein–Iorga | FEP/alchemy | OPLS/M24 | mol2ff | T | ≃250 | 0.73 | 1.72 | 0.88 |

Beckstein–Iorga | FEP/alchemy | OPLS/TIP4P | CM1A | F | ≃250 | 0.72 | 1.88 | 0.91 |

Sprick | FEP/alchemy/HREM | GAFF/TIP3P | IpolQ-Mod | T | 240 | 0.47 | 3.03 | 0.66 |

Oxford | E-S/MCC | GAFF2/TIP3P | AM1-BCC | T | 1200 | 0.42 | 1.78 | 0.44 |

MD (Patel) | FEP/alchemy | CGenFF/TIP4P | cGenFF | F | 216 | 0.21 | 2.63 | 0.25 |

Among the eight submissions reported in Table II, six are done using the standard alchemical FEP method, based on *λ*-stratification. Two of these (VoletzLab and Sprick) are implemented as *λ*-hopping techniques.^{80,81} Our blind prediction is the only one produced using the nonequilibrium alchemical methodology. Finally, the Oxford submission relies on the end-state determination of the energy and entropy with the latter computed according to the so-called multiscale cell correlation.^{82} As to the force fields, all are based on the fixed-charge approach, with GAFF or GAFF2 used in four cases, the Optimized Potentials for Liquid Simulations (OPLS)^{18} in two cases and CHARMM/CGenFF^{83} and OpenFF^{84} in one case. The cost reported in Table II is measured in invested simulation time (in ns) on a per-solute basis, deduced from the information reported in the methodological sections of the submissions csv files.^{85} Such a measure is independent of the power of the MD engine and of the hardware configuration. The best-performing methods according to CCC are the NE-FG and VoeltzLab. The latter, combining standard FEP with the so-called extended ensemble^{81} *λ*-hopping approach, required, for a single LogP determination, a total simulation time in the order of the tens of microseconds, i.e., a cost ≃300 times higher than that of our NE-FG method.

In Fig. 8, we compare our prediction, based on the GAFF2 force field and AM1-BCC charge set, with the FEP predictions relying on the AM1-BCC parameterization for the solutes. As it can be seen, the three predictions sets, NE-FG, VoeltzLab, and Beckstein–Iorga, are strongly correlated one to the other with mutual *R* exceeding in all cases 0.85. This mutual correlation and the similar agreement with the experimental data occur despite the usage of three different force fields for the intramolecular and intermolecular Lennard–Jones interactions, highlighting the prominent role played by the fixed-charge modeling for the electrostatic contribution to the solvation free energies.

Figure 8 appears to confirm that the discrepancy observed for glyburide (compound 8) in our case is methodological due to the wide and non-Gaussian nature of the work distribution in the 1-octanol solvent used in our prediction (see Table S1 of the supplementary material), as the VoeltzLab and Beckstein–Iorga submissions predict a similar value for compound 8 much closer to the experiment. On the other hand, the error provided by the NE-FG method via work bootstrapping, essentially proportional to the width of the work distribution, appears to be credible also in the case of compound 8, where the upper bound of the confidence interval is not far from the experimental value. The LogP SAMPL9 challenge has once again shown that the error determination is clearly a weak point in FEP calculations based on the equilibrium simulations of the intermediate *λ* states. Mean reported FEP errors^{74,85} range from 0.05 (Beckstein–Iorga) to 0.43 (Patel) kcal/mol, with no apparent relation with the invested simulation time. The errors are computed in most cases by block averaging or bootstrapping in the intermediate *λ* states and summing in quadrature the resulting errors and in one case (Patel) by running the whole FEP calculation in triplicate.

## VI. CONCLUSION

We have presented our blind prediction for the toluene–water partition coefficients in the context of the SAMPL9 challenge using a very efficient MD-based nonequilibrium alchemical technique relying on the popular GAFF2 non-polarizable force field for the solutes. A pre-assessment was made based on the knowledge of the experimental octanol–water partition coefficients for all 16 compounds included in the challenge. The calculation of the solvation free energies in the various solvents were done using different approaches for the determination of the atomic charges on the SAMPL9 compounds. In particular, we used in our MD-based calculations of the solvation free energies the standard AM1-BCC atomic charges and the ESP atomic charges computed at the HF/6-31Gd level of theory. The comparison of calculated and experimental LogP_{ow} with various combination of the charge sets revealed that the HF/6-31Gd charges, yielding, in general, a stronger charge separation between atoms with disparate electronegativity, appear to be, in most cases, more appropriate for the water solvent. The analysis allowed to identify some important critical issues in the GAFF2 parameterization connected to the nitrogen atoms in cycles or in tertiary amine, providing a valid clue for improving the performance and reliability of non-polarizable force field in free energy calculations of drug–receptor systems. Based on the outcome of the pre-assessment, we therefore decided to use only the calculated solvation free energies of the non-polar toluene and 1-octanol solvents with the AM1-BCC charge set, exploiting the knowledge of the experimental value of the octanol–water partition coefficients to arrive at our blind submission for the SAMPL9 challenge. Our submission, based on nonequilibrium alchemy, was the best-performing and most efficient MD-based approach among the eight MD-based submissions.

## SUPPLEMENTARY MATERIAL

See the supplementary material for solvation free energies in water, 1-octanol, and toluene using AM1-BCC and ESP charges. Compact analysis of all ranked and not ranked LogP challenge submissions. LogPto correlation plot.

## ACKNOWLEDGMENTS

The computing resources and the related technical support used for this work have been provided by the CRESCO/ENEAGRID High Performance Computing infrastructure^{68} and its staff. The CRESCO/ENEAGRID High Performance Computing infrastructure is funded by ENEA, the Italian National Agency for New Technologies, Energy and Sustainable Economic Development, and by Italian and European research programs (see www.cresco.enea.it for information). The authors thank MIUR-Italy (“Progetto Dipartimenti di Eccellenza 2023–2027” allocated to Department of Chemistry “Ugo Schiff”) and the National Recovery and Resilience Plan, Mission 4 Component 2 - Investment 1.4 - NATIONAL CENTER FOR HPC, BIG DATA AND QUANTUM COMPUTING - funded by the European Union - NextGenerationEU - CUP B83C22002830001. We finally appreciate the National Institutes of Health for its support of the SAMPL project via R01GM124270 to David L. Mobley (UC Irvine).

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

**Piero Procacci**: Conceptualization (lead); Data curation (lead); Formal analysis (lead); Funding acquisition (equal); Project administration (lead); Software (equal); Writing – original draft (lead); Writing – review & editing (lead). **Guido Guarnieri**: Funding acquisition (equal); Project administration (supporting); Software (equal); Writing – original draft (supporting); Writing – review & editing (supporting).

## DATA AVAILABILITY

PDB trajectory files, raw work data, and force field parameter files are available at the general-purpose open-access repository https://zenodo.org/record/7645899. A complete analysis of the submissions of the LogP SAMPL9 challenge can be found at the GitHub site https://procacci.github.io/LogP-SAMPL9/.

The ORAC program (v6.1) is available for download under the General Public License at the website http://www1.chim.unifi.it/orac/.

Third-party software xLogP3 (v3.2.2) can be downloaded for free for academic users from the website http://www.sioc-ccbg.ac.cn.

## REFERENCES

*Free Energy Methods in Drug Discovery—Introduction*

GAFF and GAFF2 are public domain force fields and are part of the AmberTools distribution, available for download at https://amber.org internet address (accessed January 2022). According to the AMBER development team, the improved version of GAFF, GAFF2, is an ongoing poject aimed at “reproducing both the high quality interaction energies and key liquid properties such as density, heat of vaporization and hydration free energy.” GAFF2 is expected “to be an even more successful general purpose force field and that GAFF2-based scoring functions will significantly improve the successful rate of virtual screenings.”

_{a}, and log D predictions from the SAMPL7 blind challenge

*via*alchemical simulations: let’s get honest about sampling, once more

*Theory and use of the EM algorithm*