Machine learning (ML) potentials are a powerful tool in molecular modeling, enabling ab initio accuracy for comparably small computational costs. Nevertheless, all-atom simulations employing best-performing graph neural network architectures are still too expensive for applications requiring extensive sampling, such as free energy computations. Implicit solvent models could provide the necessary speed-up due to reduced degrees of freedom and faster dynamics. Here, we introduce a Solvation Free Energy Path Reweighting (ReSolv) framework to parameterize an implicit solvent ML potential for small organic molecules that accurately predicts the hydration free energy, an essential parameter in drug design and pollutant modeling. Learning on a combination of experimental hydration free energy data and ab initio data of molecules in vacuum, ReSolv bypasses the need for intractable ab initio data of molecules in an explicit bulk solvent and does not have to resort to less accurate data-generating models. On the FreeSolv dataset, ReSolv achieves a mean absolute error close to average experimental uncertainty, significantly outperforming standard explicit solvent force fields. Compared to the explicit solvent ML potential, ReSolv offers a computational speedup of four orders of magnitude and attains closer agreement with experiments. The presented framework paves the way for deep molecular models that are more accurate yet computationally more cost-effective than classical atomistic models.

Solvation free energy, and notably hydration free energy, is generally recognized as a fundamental thermodynamic quantity of interest in computational chemistry. Defined as the work done when transferring a molecule from the gas phase to the solution (water in the case of hydration free energy), it enables the computation of several key physicochemical properties of molecules, such as solubility, partition coefficients, activity coefficients, and binding free energies in solutions.1,2 These properties are of great importance to the pharmaceutical, environmental, and materials sciences,3–9 prompting the organization of blind prediction SAMPL challenges10–12 with hydration free energy as one of the main targets. In addition, Mobley and Guthrie compiled and curated a FreeSolv database of experimentally measured hydration free energies for small neutral molecules in water.13,14

A wide spectrum of methods is available to calculate the solvation free energy, ranging from traditional approaches such as continuum solvation models15,16 to recent machine learning (ML) algorithms17–26 and their combinations.27–29 The alchemical methods with Molecular Dynamics (MD) simulations14,30,31 are typically assumed to be highly accurate but computationally expensive.32,33 However, both the fidelity and the efficiency highly depend on the explicitly treated degrees of freedom and the employed potential energy model.

In implicit solvent models, the solvent molecules (e.g., water molecules) are not explicitly present in the system (as in explicit solvent models); instead, the interactions are modified to account for the solvent effects.34 The number of degrees of freedom is thereby greatly reduced, resulting in large computational gains. Classical implicit models treat the solvent as a continuum medium with specific dielectric and interfacial properties. The typical approach decouples the electrostatic (polar) and nonpolar interactions. The former can be approximated by solving the Poisson–Boltzmann equation or further simplified with the popular generalized Born model, while the latter is most often estimated via the solvent-accessible surface area.34,35 While recent advances, such as using ML to predict the generalized Born radii,36 have increased the accuracy of these models, they are still in considerable disagreement with experimental data and explicit solvent models.26,37,38 In particular, the solvation free energy root mean square error (RMSE) is ∼3.6 kcal/mol for both the Poisson–Boltzmann and generalized Born models. By specifically optimizing the nonpolar interactions using hydration free energy data, the test set RMSE can be decreased to 1.68 kcal/mol.39 

Higher accuracy can be achieved through ab initio methods, based on quantum chemical calculations, albeit at the cost of computational demand. Commonly utilized is the COSMO family,40,41 with COSMO-RS (Conductor-like Screening Model for Real Solvents) being particularly notable in property predictions. In particular, extensions thereof have achieved high fidelity predictions of solvation free energies (MAE = 0.52 kcal/mol).42 

An alternative to the computational hurdle imposed by ab initio methods, without significantly compromising predictive accuracy, is offered by the many-body, flexible potential energy surfaces that characterize ML potentials. During the past decade, ML potentials were deployed to derive atomistic43 and coarse-grained44 models based on training data provided by high-fidelity simulations,45 experiments,46 or both.47 The derivation of empirical force fields from the experimental data has been extensively studied (see, for example, Refs. 48–51), and this approach naturally extends to ML potentials. In the context of implicit water models, the capacity of deep ML potentials is large enough to compensate for the removal of solvent degrees of freedom. The predicted solute properties, such as the conformational landscape of proteins, match those obtained from the reference explicit water model.26,52–58 However, these studies used classical atomistic force fields as the data-generating model, inherently limiting the attainable accuracy of the resulting ML potential. As shown previously, and also in this work, classical atomistic models such as General Amber Force Field (GAFF) and CHARMM General Force Field (CGenFF) systematically overestimate hydration free energies.30 Modifying the Lennard-Jones parameters yields an improved agreement with experiments but can, at the same time, negatively affect other properties.30 

Meanwhile, more accurate ab initio training data are prohibitively expensive for solutes in explicit bulk water. Consequently, frequently used Density Functional Theory (DFT) training databases such as QM9,59 ANI,60 and QM7-X61 contain small organic molecules in vacuum or at best contain samples with a few randomly placed water molecules around solutes.62,63 Moreover, implicit solvent models are coarse-grained models. When trained with the common force-matching approach,64 coarse-grained ML models require much more data than their atomistic counterparts.65 The underlying reason is the surjective atomistic-to-coarse-grained mapping, resulting in noisy force labels. As an illustration, the parameterization of coarse-grained models for alanine dipeptide in implicit water required a dataset with 106 configurations,45,66 a number easily obtainable with classical force fields but vastly out of reach for ab initio calculations.

In this work, we present a Solvation Free Energy Path Reweighting (ReSolv) approach to parameterize an ML potential for small organic molecules in an implicit aqueous solvent. The above-mentioned difficulties are circumvented with a two-stage training procedure, utilizing first the DFT database of molecules in vacuum and then the experimental hydration free energy database. The second stage entails a non-trivial training since the hydration free energy is not a direct output of an ML model but instead involves molecular simulations. Constructing the free energy integration path along the ML model training process and utilizing the Zwanzig reweighting scheme enables us to perform efficient training that avoids differentiating through the molecular simulations. The ReSolv model predicts hydration free energies more accurately than the classical explicit solvent models despite being an implicit solvent (i.e., coarse-grained) model. In addition, ReSolv’s predictions are not systematically biased and are more robust for molecules with large negative hydration free energies. We also investigate error correlation between different modeling approaches and point out several potentially erroneous data points that should be reconsidered in future database curation.

The training methodology of ReSolv consists of two consecutive stages as shown in Fig. 1. In the first stage, we parameterize the ML potential for molecules in vacuum. The model takes as an input a configurational state of the molecule S and predicts the potential energy, i.e., Uvac = U(Sθvac). The forces on the atoms are computed as the negative derivative of the potential with respect to the atoms’ position vectors. In training, we aim to adjust the parameters of the model θvac such that the predictions match the corresponding energies UDFT and forces FDFT in the ab initio database, i.e., using a training approach via Eq. (5).

FIG. 1.

Solvation free energy path reweighting (ReSolv). The green color indicates ReSolv’s stage 1, where we train an ML potential Uvac for molecules in vacuum based on the ab initio dataset containing configurations S and the corresponding energies UDFT and forces FDFT. The blue color represents ReSolv’s stage 2, where we train an ML potential Usol for molecules in an implicit solvent by initializing the parameters with θvac and perturbing them toward θsol where the free energy difference between Uvac and Usol equals the experimental solvation free energy ΔAexp. The red color depicts the parameter update procedure involving trajectory reweighting, Free Energy Perturbation (FEP), and Bennett acceptance ratio (BAR) methods. See the main text for more details.

FIG. 1.

Solvation free energy path reweighting (ReSolv). The green color indicates ReSolv’s stage 1, where we train an ML potential Uvac for molecules in vacuum based on the ab initio dataset containing configurations S and the corresponding energies UDFT and forces FDFT. The blue color represents ReSolv’s stage 2, where we train an ML potential Usol for molecules in an implicit solvent by initializing the parameters with θvac and perturbing them toward θsol where the free energy difference between Uvac and Usol equals the experimental solvation free energy ΔAexp. The red color depicts the parameter update procedure involving trajectory reweighting, Free Energy Perturbation (FEP), and Bennett acceptance ratio (BAR) methods. See the main text for more details.

Close modal
In the second stage, we keep the Uvac model fixed and train the ML potential Usol = U(Sθsol), parameterizing molecular interactions in an implicit solvent. The parameters θsol are optimized such that the free energy difference ΔA between the potentials Uvac and Usol reproduces the experimental solvation free energy ΔAexp. For the sake of simplicity, we assume here that there is only one molecule in the training database. The loss function is then given by
(1)
Training on experimental data is not as straightforward as training in the first stage because ΔA is not an output of the ML model but rather evaluated from an MD simulation driven by an ML potential. We employ a variation of the Differential Trajectory Reweighting (DiffTRe) method46 that avoids exploding gradients and reduces the computational and memory requirements compared to alternative gradient computation via backpropagation through the MD simulation. Before continuing, we first give a brief summary of the DiffTRe method.
Consider the task of matching a time-independent observable O, e.g., by postulating the following loss function L=(OθOexp)2, where ⟨⟩θ denotes the ensemble average with respect to the canonical distribution using an ML potential Uθ = U(S θ). DiffTRe leverages thermodynamic perturbation theory,67 stating that ⟨Oθ can be estimated from N decorrelated states {Sj}θ̂ generated by a reference potential Uθ̂=U(S;θ̂), that is,
(2)
where β = 1/(kBT), in which kB is Boltzmann’s constant and T is the temperature. Due to limited sampling, the estimation should only be used if the states generated by the reference potential Uθ̂ are statistically close to states that would have been sampled from the potential Uθ. The distribution overlap is captured with the effective sample size68 given by
(3)
The DiffTRe training, therefore, works as follows. First, an MD simulation is performed with the reference potential generating {Sj}θ̂. Then, at each update step, if θ is determined sufficiently close to θ̂, i.e., NeffN̄eff for a fixed effective sample size threshold N̄eff, the trajectory is reused. Otherwise, Uθ is set as the new reference potential, and a new reference trajectory is generated. Most importantly, for both cases, Eq. (2) provides a differentiable relation between ⟨Oθ and the model’s parameters θ, thereby enabling the computation of Lθ required for gradient-based optimization.

For the present task of learning the solvation free energy, DiffTRe could be directly employed as presented above. Nevertheless, the free energy difference between the potentials Uvac and Usol is typically estimated by constructing a free energy integration path, e.g., with a new potential energy function U(λ) = (λ − 1)Uvac + λUsol, and estimating the free energy differences between discrete steps of λ. This approach would require as many simulations as there are λ steps. In addition, intermediate steps would involve an MD simulation using a linear combination of two ML potentials, rendering the update step computationally expensive.

Here, we instead use the DiffTRe learning process itself as a free energy integration path. We initialize the parameters with the pre-trained θvac parameters and iteratively perturb them during training toward the desired θsol parameters (see Fig. 1). At each update step, we compute the free energy difference between the new and previous potentials using a hybrid of the free energy perturbation67 and the Bennett Acceptance Ratio (BAR) methods.69 The free energy differences are accumulated along the training, which, at the end of the training, yield the total free energy difference between Uvac and Usol or ΔA.

Explicitly, let us consider the update step from θi to θi+1. At this point, due to reweighting, the reference trajectory may have been generated at an earlier step ik with the ML potential U(Sθik), where ki. Since this is the reference trajectory, θ̂=θik. As above, we denote the states generated by the reference potential with {Sj}θ̂. The corresponding free energy difference is ΔA=ΔAθ0θik, which was already computed at step ik. First, we compute Neff [Eq. (3)]. If the reweighting criterion is satisfied, the trajectory is reused. In the opposite case, we generate a new reference trajectory with the current ML potential U(Sθi). Thus, θ̂=θi. We also update ΔA=ΔA+ΔAθikθiBAR, where ΔAθikθiBAR denotes the free energy difference between steps ik and i estimated with the BAR method.69 For both cases, we then compute the free energy difference between potentials U(S;θ̂) and U(Sθi) using the differentiable free-energy perturbation relation,67 
(4)
where the summation runs over states generated by the reference potential. Finally, we update ΔA=ΔA+ΔAθ̂θi and evaluate the loss [Eq. (1)]. Crucially, Eq. (4) provides a differentiable relation between ΔA and θi, enabling the computation of Lθi. Note that if the reference trajectory was regenerated, then θ̂=θi and Eq. (4) reduces to zero, but its gradient with respect to θi is generally non-zero.

Both vacuum and solvent ML potentials are based on the GNN architecture NequIP70 as implemented in JAX MD.71 Computations were performed with double precision on NVIDIA A100 80GB GPUs. The architectural hyperparameters of NequIP are listed in Table S2 of the supplementary material.

In the first stage, the ML potential parameters are adjusted via backpropagation so that the predicted energies and forces match the target values. The corresponding loss function is
(5)
where γU and γF are weighting factors, Nbs is the batch size, Ni,a is the number of atoms of molecule i, and k iterates over the x-, y-, and z-dimensions. Ui is the energy of the ith molecule in the batch, and Fijk is the force in direction k of atom j from the ith molecule in the batch. The subscript DFT denotes the ab initio data target. We normalize the target energies and forces. The energies are shifted by the mean energy of the training data and scaled with the average root mean square force of the training data. The forces are scaled with the average root mean square force of the training data. Furthermore, we do not employ learnable scaling or shifting and set the per-atom scaling to 1 and the per-atom shift to 0. To pick the best model, we employ early stopping on the validation dataset. For the numerical optimization hyperparameters, see Table S3 of the supplementary material.

In the second stage, we employ the RDKit72 to generate the 3D structure of a molecule based on the SMILES string provided by the FreeSolv database. Next, we perform the energy minimization of structures with the MMFF94 force field.73–77 The obtained configurations are used to run a 300 ps initial equilibration simulation followed by a 200 ps production run for each molecule. All simulations are performed in the NVT ensemble and numerically integrated with the velocity Verlet scheme using a 1 fs time step. To match the conditions of the experimental data, the temperature is kept at 298.15 K with the Langevin thermostat with the damping factor set to 1/ps. We check the stability of our simulations by evaluating that the last configuration of the molecules can be represented by a single graph and that no atoms diverged from the molecule. In addition, to see whether our models yield physically reasonable trajectories, we compare the bonds, angles, and dihedrals of the Usol trajectories for three randomly selected molecules to the bonds, angles, and dihedrals as obtained after an energy minimization with the UFF using RDKit72 and openFF-2.2.1 using openff-toolkit78 and openMM;79 see Figs. S9 and S10 of the supplementary material, respectively. In training, new simulations are initialized with the last configuration of the previous simulation and consist of a 50 ps equilibration and a 200 ps production run. We sample every 5 ps during the production run, i.e., 40 samples per trajectory. The reweighting effective sample size threshold is fixed to N̄eff=0.9. The numerical optimization hyperparameters are reported in Table S3 of the supplementary material.

The reported hydration free energies are computed with the BAR method using only the end states, i.e., vacuum and water states. As previously reported, intermediate states are not necessary for implicit solvent models due to the sufficient overlap between the vacuum and solvated ensembles of the solute.33 We first run 300 ps of equilibration and sample every 5 ps in the subsequent 200 ps production run. We perform convergence tests (Fig. S8 of the supplementary material) for 12 randomly selected molecules in the test set to ensure sufficient sampling.

In this work, we employ ReSolv to learn the hydration free energy of small organic molecules, which, by construction, also yields an ML potential for molecules in implicit water. The architecture of the ML potential is based on NequIP,70 a data efficient E(3)-equivariant graph neural network. For stage 1 of training, we use the QM7-X dataset,61 providing the target DFT energies and forces. The dataset consists of 4.2 × 106 small organic molecule samples, including equilibrium states, structural isomers, structural stereoisomers, and off-equilibrium configurations. The considered molecules are composed of the heavy atom set {C, N, O, S, Cl}. They have up to seven heavy atoms or four to twenty-three atoms in total, including hydrogen. We randomly split the dataset into train (89.8%), validation (10.0%), and test (0.2%) sets. The ReSolv vacuum model (Uvac) achieves a test set mean absolute error (MAE) within literature benchmarks80,81 (Table I). The errors are close to the QM7-X dataset error, i.e., the DFT calculations were computed at the PBE0+MBD level, with a precision of 10−3 eV and 10−4 eV/Å for the energies and forces, respectively.

TABLE I.

Top: Mean absolute error (MAE) of energy and force predictions computed on 10 100 random test samples in the QM7-X dataset. Results marked with an asterisk (*) were computed by Duval et al.80 on a different test set. Bottom: The MAE and root mean square error (RMSE) of hydration free energy predictions of different models for the same test set molecules in the FreeSolv database. The best-performing models are highlighted in bold.

QM7-X test dataset MAE
ModelEnergy (eV/atom)Force (eV/Å)
ReSolv vacuum 0.005 0.039 
SchNet* 0.042 0.056 
SpookyNet* 0.011 0.015 
FAENET* 0.011 0.018 
QM7-X test dataset MAE
ModelEnergy (eV/atom)Force (eV/Å)
ReSolv vacuum 0.005 0.039 
SchNet* 0.042 0.056 
SpookyNet* 0.011 0.015 
FAENET* 0.011 0.018 
ModelFreeSolv test dataset
MAE (kcal/mol)RMSE (kcal/mol)
ReSolv 0.63 0.96 
Amber 1.02 1.39 
CHARMM 1.05 1.79 
ModelFreeSolv test dataset
MAE (kcal/mol)RMSE (kcal/mol)
ReSolv 0.63 0.96 
Amber 1.02 1.39 
CHARMM 1.05 1.79 

For stage 2 of training, we utilize the FreeSolv database.14 It contains hydration free energies of 643 small molecules with neutral charges, representing compounds relevant to drug-like molecules. Keeping consistent with the heavy atom types of the QM7-X dataset, i.e., {C, N, O, S, Cl}, we extract the corresponding subset with 559 molecules. We also excluded 17 molecules for which the ReSolv vacuum model yielded unstable simulations and an additional five molecules that were unstable during training. We split this subgroup into train and test sets of roughly 70% and 30% proportions, which amounts to 375 and 162 molecules, respectively. When dividing the data, we kept the occurrences of various heavy atom combinations consistent within the two sets, i.e., the training set contains roughly 70% of the molecules with heavy atom combinations N, N–C, N–C–O, etc. This choice ensures that the various heavy atom combinations are represented during training and testing. In addition, we ensured that the test set consists exclusively of molecules with chemical functional groups that appear in the training set, i.e., any molecule that is a lone representative of a functional group formed part of the training set. The functional group classification was taken from the FreeSolv database.13 The polyfunctional molecules were classified according to the first listed functional group as in Ref. 31. Apart from satisfying these two conditions, the molecules are split randomly.

The ReSolv’s test set errors are reported in Table I and compared with the explicit solvent classical atomistic models Amber (GAFF) and CHARMM (CGenFF). We compute the errors for the same test set molecules to enable a direct comparison. Data for Amber are provided in the FreeSolv database,13 while for CHARMM, the values are taken from Ref. 31. In the CHARMM error computation, two molecules (Mobley ID 6359135 and 2146331) are excluded because their data are not provided in Ref. 31 since the force field parameters could not be generated. The ReSolv’s MAE is close to the average uncertainty of the FreeSolv database (0.6 kcal/mol). Moreover, ReSolv outperforms the explicit solvent classical atomistic models by a large margin. The Amber and CHARMM models display a similar MAE/RMSE for the molecules in the training set (Fig. S1 of the supplementary material), confirming that the test set is representative of the entire considered database. In addition, we trained the ReSolv model using a random splitting of the data and achieved a similar performance. In particular, the obtained MAE and RMSE are 0.65 and 0.91 kcal/mol, respectively. The corresponding errors for the classical force fields remain substantially higher (Fig. S2 of the supplementary material). The comparison with the Amber and CHARMM models is somewhat unfair, given that the two models were not explicitly parameterized to reproduce the hydration free energy data. Nevertheless, reparameterizations and corrections in this direction were previously attempted, and the resulting models still scored lower than ReSolv. For example, Boulanger et al. rescaled the Amber (GAFF) molecule–water van der Waals dispersion interaction to better reproduce the hydration free energy and reported an MAE of 0.79 kcal/mol for the final optimized model.30 In another example, Scheen et al. trained an ML model to correct the Amber (GAFF) force field predictions.28 The hybrid Amber/ML approach achieved an MAE of 0.76 kcal/mol on the SAMPL4 test data containing 47 samples of the FreeSolv database.

Next, we perform an in-depth error analysis. The parity plot [Fig. 2(a)] reveals systematic errors for Amber and CHARMM predictions, with most points lying above the diagonal line. To further demonstrate this point, we plot the error distributions relative to the experimental values [Fig. 2(b)]. The distributions are skewed to the left for the classical force fields, indicating that the majority of predictions are overestimated. The Amber model, for example, overestimates the hydration free energies on average by more than 1 kcal/mol as already reported by Boulanger et al.30 Conversely, no systematic errors are found for the ReSolv model, evidenced by the symmetric distribution about zero. These findings are reflected in the percentage of predictions within the experimental uncertainty [Fig. 2(c)]. The ReSolv model scores the highest, followed by CHARMM and Amber in last place, albeit Amber exhibiting a lower MAE and RMSE than the CHARMM model.

FIG. 2.

Prediction performance. The implicit solvent ReSolv model (blue) is referenced against the explicit solvent classical atomistic models Amber (GAFF; red) and CHARMM (CGenFF; green). The results are shown for the test dataset. (a) Parity plot with the error bars denoting the experimental uncertainty and the gray dotted-dashed line indicating the perfect prediction. (b) Error probability distribution relative to the experimental values. The distributions are fitted with the Gaussian kernel density estimator. (c) The number of predictions with errors lower than the experimental uncertainty. The gray background histogram depicts the distribution of experimental uncertainty. The total percentage of predictions within the experimental uncertainty is 57%, 31%, and 46% for the ReSolv, Amber, and CHARMM models, respectively. (d) Mean absolute error (MAE) increases with decreasing experimental hydration free energy. The red solid line denotes the median, the red dashed line denotes the mean, the box ranges are from the first to the third quartile, and the whiskers correspond to the 1.5× interquartile range.

FIG. 2.

Prediction performance. The implicit solvent ReSolv model (blue) is referenced against the explicit solvent classical atomistic models Amber (GAFF; red) and CHARMM (CGenFF; green). The results are shown for the test dataset. (a) Parity plot with the error bars denoting the experimental uncertainty and the gray dotted-dashed line indicating the perfect prediction. (b) Error probability distribution relative to the experimental values. The distributions are fitted with the Gaussian kernel density estimator. (c) The number of predictions with errors lower than the experimental uncertainty. The gray background histogram depicts the distribution of experimental uncertainty. The total percentage of predictions within the experimental uncertainty is 57%, 31%, and 46% for the ReSolv, Amber, and CHARMM models, respectively. (d) Mean absolute error (MAE) increases with decreasing experimental hydration free energy. The red solid line denotes the median, the red dashed line denotes the mean, the box ranges are from the first to the third quartile, and the whiskers correspond to the 1.5× interquartile range.

Close modal

The parity plot [Fig. 2(a)] also highlights the FreeSolv’s uneven distribution with respect to the hydration free energy value. There are only a few molecules with a large and negative hydration free energy. The largest two (Mobley ID 9534740 and D-mannitol) are particularly problematic for the Amber and CHARMM models. These molecules, discussed further in Sec. III C, are part of a general trend. The MAE increases with decreasing hydration free energy [Fig. 2(d)] for all models. However, the ReSolv model is the most robust model in this respect. See Figs. S3 and S1 of the supplementary material for point-wise results and training dataset results, respectively. For Amber and CHARMM models, a possible explanation could be an inaccurate partial charge assignment. As shown by Jämbeck et al.,82 the hydration free energy is very sensitive to the choice of charge computation method and can differ by several kcal/mol depending on the method used. We found that the absolute sum of the Gasteiger partial charge per atom is highly correlated with the hydration free energy as is the polar surface area of the molecules (Fig. S4 of the supplementary material). Indeed, keeping in mind the net-neutrality of the molecules, the absolute Gasteiger partial charge per atom is sensitive to local charge gradients contributed by polar fragments within the molecule, whence its correlation with the polar surface area. The computed Pearson correlation coefficient is r = −0.71, which is comparable to the correlation with the related polar surface area (r = −0.74) but significantly larger than correlation with other molecular properties such as dipole moment (r = −0.42) or volume of a solute (r = −0.18). In particular, we find that the absolute sum of the charges increases with decreasing hydration free energy. An accurate partial charge assignment is, therefore, particularly important for molecules with large negative hydration free energies, potentially explaining why the predictions for classical force fields tend to worsen with decreasing hydration free energy. With this insight, we depict in Fig. S5 of the supplementary material the difference between the Usol and Uvac atomic contributions to the potential energy for a fixed configuration and find a correspondence with the Gasteiger partial charges from the heavy atoms. The energy differences highlight that the changes are predominant around polar atoms, agreeing with what we would expect based on the above correlations, supporting that the learning of Usol is physically informed.

In Fig. 3, the test set MAE is broken down into various molecular properties. First, we examine the error with respect to chemical functional groups. The chemical space covered by the FreeSolv database is quite extensive given the relatively small size of the database.13,14,31 ReSolv demonstrates uniform error across the chemical functionalities, which is, to some extent, expected given the training set construction. However, low MAE is also found for the functional groups rarely appearing in the training set, i.e., for the “other” group where we merge functional groups with sparse occurrence in the train and test sets. In contrast, the Amber and CHARMM models exhibit larger fluctuations in MAE for some functional groups (e.g., primary alcohol for Amber or primary amine for CHARMM) displaying notably higher errors. Note that the same conclusion can be drawn for the molecules in the training set (Fig. S6 of the supplementary material). Next, we consider the models’ robustness with respect to the size of the molecule. From 14 heavy atoms onward, an increased MAE can be seen for all models, most notably for the Amber model. Finally, in terms of heavy atom types, all three predictive models display larger errors for molecules with four heavy atom types. However, we attribute this result mainly to the small sample size containing problematic molecules, i.e., the MAE is computed on only four molecules with the nitralin molecule (discussed in Sec. III C) contributing most to the error. The absence of a similar trend for the molecules in the training set (Fig. S6 of the supplementary material) further supports our claim.

FIG. 3.

Error analysis. The box plots of the mean absolute error (MAE) with respect to (a) chemical functional groups, (b) number of heavy atoms, and (c) number of heavy atom types are shown for the test set molecules. The red solid line denotes the median, the red dashed line denotes the mean, the box ranges are from the first to the third quartile, and the whiskers correspond to the 1.5× interquartile range. The gray bar plots depict the number of corresponding samples in the training set. We compare three different models: ReSolv (blue), Amber (GAFF; red), and CHARMM (CGenFF; green). In panel (a), the “other” group corresponds to the remaining functional groups with less than three samples in the test set.

FIG. 3.

Error analysis. The box plots of the mean absolute error (MAE) with respect to (a) chemical functional groups, (b) number of heavy atoms, and (c) number of heavy atom types are shown for the test set molecules. The red solid line denotes the median, the red dashed line denotes the mean, the box ranges are from the first to the third quartile, and the whiskers correspond to the 1.5× interquartile range. The gray bar plots depict the number of corresponding samples in the training set. We compare three different models: ReSolv (blue), Amber (GAFF; red), and CHARMM (CGenFF; green). In panel (a), the “other” group corresponds to the remaining functional groups with less than three samples in the test set.

Close modal

The results thus far demonstrate ReSolv’s ability to accurately predict the hydration free energies of molecules with chemical functional groups seen during training. Given the typically limited availability of training samples, it is especially desirable for ML potentials to extrapolate effectively into unseen chemical space. To assess ReSolv’s chemical generalizability, we employ the same training procedure and dataset prescribed in Sec. III A but with all samples of a given functional group (including multifunctional molecules) removed from the training set and compare the performance to the cases when seen.

We conducted two such trainings, excluding samples with either primary amine or primary alcohol functionalities. These functional groups were selected due to their challenging nature for CHARMM and Amber, respectively [see Fig. 3(a)]. With the test sets identical, we found that the errors associated with the unseen functional groups were highly consistent with those observed when the same groups were seen in training; see Fig. 4. This consistency demonstrates ReSolv’s robustness and capacity to predict well even for unencountered regions in chemical space.

FIG. 4.

Extrapolation to unseen functional groups. Box plots of the mean absolute error (MAE) with respect to chemical functional groups, comparing four different models: ReSolv with primary amines unseen during training (dark blue); ReSolv with primary alcohols unseen during training (light blue); Amber (GAFF; red); and CHARMM (CGenFF; green). The “other” group corresponds to the remaining functional groups with less than three samples in the test set. The red solid line denotes the median, the red dashed line denotes the mean, the box ranges are from the first to the third quartile, and the whiskers correspond to the 1.5× interquartile range.

FIG. 4.

Extrapolation to unseen functional groups. Box plots of the mean absolute error (MAE) with respect to chemical functional groups, comparing four different models: ReSolv with primary amines unseen during training (dark blue); ReSolv with primary alcohols unseen during training (light blue); Amber (GAFF; red); and CHARMM (CGenFF; green). The “other” group corresponds to the remaining functional groups with less than three samples in the test set. The red solid line denotes the median, the red dashed line denotes the mean, the box ranges are from the first to the third quartile, and the whiskers correspond to the 1.5× interquartile range.

Close modal

We noticed several outliers in the parity plot [Fig. 2(a)] for which the predictions are largely off for all three investigated models. Error correlation between different modeling approaches can be used to identify possible erroneous data points. Previous outlier analysis stimulated corrections of some experimental data points that are part of the FreeSolv database.83 For example, the initially provided experimental value for D-mannitol was −27.79 kcal/mol, which was later corrected to −23.62 kcal/mol. Nevertheless, doubts about the validity of the experimental value remained as the COSMO-RS prediction deviated by 6 kcal/mol.83 As a result, some recent studies excluded D-mannitol from the test set.17,18 Interestingly, ReSolv’s prediction of D-mannitol’s hydration free energy is −27.42 kcal/mol, in excellent agreement with the original experimental value.

We compute the Pearson correlation coefficient of signed and absolute errors to investigate the error correlation between ReSolv, Amber, and CHARMM predictions (Fig. S7 of the supplementary material). Overall, we find positive correlations between models, with the largest correlation of 0.6 between the Amber and CHARMM models. In Fig. 5, we show point-wise error correlations and mark the molecules with large and correlated errors (exact values are given in Table S1 of the supplementary material). The nitralin and pirimor molecules particularly stand out, with all models’ predictions deviating from experimental values by more than 2.5 kcal/mol. These two molecules also have a large experimental uncertainty (1.93 kcal/mol), confirming that error correlation analysis is a useful approach for inaccurate data recognition.

FIG. 5.

Correlation of errors in kcal/mol between the ReSolv, Amber (GAFF), and CHARMM (CGenFF) models for the test dataset. The gray dotted-dashed line indicates the perfect correlation. Eight molecules with absolute errors greater than 1.7 kcal/mol simultaneously for x- and y-axes are marked with distinct symbols as shown in the legend. In Table 1 of the supplementary material, we list the mean errors across all models and experimental uncertainties for these eight molecules.

FIG. 5.

Correlation of errors in kcal/mol between the ReSolv, Amber (GAFF), and CHARMM (CGenFF) models for the test dataset. The gray dotted-dashed line indicates the perfect correlation. Eight molecules with absolute errors greater than 1.7 kcal/mol simultaneously for x- and y-axes are marked with distinct symbols as shown in the legend. In Table 1 of the supplementary material, we list the mean errors across all models and experimental uncertainties for these eight molecules.

Close modal

Finally, we compare ReSolv’s performance to explicit solvent ML potentials. Using the same ML potential architecture, implementation, and common simulation setup in the literature, the measured computational cost differs by four orders of magnitude (see the supplementary material). The ReSolv’s speedup is due to two factors: (i) faster execution of MD steps due to the reduced number of particles and (ii) reduced number of required MD steps due to enhanced sampling.

Surprisingly, ReSolv’s substantial computational gains do not come at the cost of accuracy. To the contrary, a comparison with a recent study84 on hydration free energy predictions for six selected molecules in the FreeSolv database suggests that the ReSolv’s predictions are in better agreement with the experimental results (Fig. 6). More precisely, the ReSolv model achieves better accuracy for five out of six molecules, including ethane, which is part of our test dataset, while the other five molecules were included in the training dataset.

FIG. 6.

Comparison with an explicit solvent ML potential. The hydration free energy prediction for six molecules in the FreeSolv database recently tested by Moore et al.84 with an explicit solvent ML potential. Apart from caffeine, ReSolv’s predictions are closer to experimental results. This includes ethane, which is part of our test dataset.

FIG. 6.

Comparison with an explicit solvent ML potential. The hydration free energy prediction for six molecules in the FreeSolv database recently tested by Moore et al.84 with an explicit solvent ML potential. Apart from caffeine, ReSolv’s predictions are closer to experimental results. This includes ethane, which is part of our test dataset.

Close modal

This work presents the Solvation Free Energy Path Reweighting (ReSolv) to efficiently learn an ML-based potential energy surface in an implicit solvent by utilizing the experimental solvation free energy data and reweighting in training. Since the framework was showcased with QM7-X and FreeSolv datasets, the trained ReSolv model enables an accurate hydration free energy prediction for small organic compounds, which can be directly utilized in, for example, de novo drug design. However, the spanned chemical space of molecules could be enlarged with other datasets. In addition, the same methodology can be used to obtain an ML potential for implicit solvents other than water. For instance, extensively available octanol–water partition data could be used to derive a model in an implicit octanol environment. Predicting lipophilicity, correlated with oil–water partition coefficient, is also a critical parameter in the pharmaceutical industry as drug candidates must be sufficiently lipophilic to penetrate the lipid core of membranes but not too lipophilic that they remain there.85 Transfer learning strategy would likely accelerate the learning of other solvents. It is reasonable to assume that similar solvents modify the interactions similarly. For example, the knowledge gained by learning an implicit water model could benefit the model for other polar solvents, such as ethanol. This setting could be particularly useful for solvents for which the solvation free energy data are scarce.

For the FreeSolv database, ReSolv’s MAE is close to experimental uncertainty. Compared to the classical force field models, ReSolv demonstrates better accuracy for molecules with large negative hydration free energies. In water, the molecular polar surface area strongly correlates with negative solvation free energy values. This improvement, therefore, may indicate that ReSolv implicitly captures local charge distributions in its energy predictions more effectively than the classical force fields. However, further work is necessary to clarify this point. The obtained accuracy is also similar to the previously reported structure–property models, employing ML algorithms to directly predict the hydration free energy from the physics-inspired fingerprints8,17,18,29 or molecular structure.9,19,20,22–26 The latter employs graph-based neural network architectures and typically perform better than the former with the reported MAE in the range of 0.58–0.76 kcal/mol and RMSE in the range of 0.82–1.23 kcal/mol. These results indicate that further improvements will likely require an enlarged and improved experimental database rather than a better modeling approach or neural network architecture. In line with this conclusion is a recent study21 achieving a notably lower MAE of 0.42 kcal/mol where a graph neural network was first pre-trained on a large dataset and later fine-tuned on the FreeSolv database, i.e., by exploiting the transfer learning approach. ReSolv showcases good generalization across the functional space but a decreased generalization to larger molecules, which should be considered when planning future experiments. In addition, we found error correlations between our models and classical atomistic models that were not trained on the FreeSolv database. Molecules with large and correlated errors, such as nitralin and pirimor, call for a reevaluation of the experimental data. These points should be excluded when training future models to avoid detrimental effects.

In a broader context, ReSolv could be a first step toward a general implicit solvent ML potential with better accuracy and efficiency than classical atomistic models. Solvation free energy encodes solute–solvent interactions, which in turn govern many biomolecular processes, including folding, aggregation, and ligand binding. In vivo, these processes occur in an aqueous solution rendering the hydration free energy a general benchmark for classical force field validation,1,10,86 fine-tuning,2,87 comparison,88,89 and calibration.90,91 Nevertheless, before using the ReSolv model in general simulations, the model needs to be tested and potentially additionally trained on other available experimental data, which we leave for future studies. Similar as for other implicit solvent ML potentials, using a prior potential (i.e., a fixed simple potential) is likely also necessary to achieve stable simulations of large macromolecules.65 Concerning computational efficiency, the ReSolv and ML-based implicit solvent models, in general, provide a speedup compared to the classical explicit solvent force fields due to reduced degrees of freedom and associated accelerated dynamics,92,93 even though evaluating an ML potential requires much more operations than an empirical potential.94 Further gains could be obtained with constrained dynamics or a coarse-grained representation of solute molecules, enabling larger integration time steps.

See the supplementary material for hyperparameters and additional results including performance and error analysis on the train dataset, absolute error vs experimental hydration free energy, correlation of molecular properties with experimental hydration free energy, potential energy difference vs Gasteiger charges, Pearson correlation of errors, and convergence of the hydration free energy computation.

This work was funded by the ERC StG under Grant No. 101077842—SupraModel. The authors thank Paul Fuchs for implementing the BAR method into the chemtrain library https://github.com/tummfm/chemtrain and for insightful discussions on theory and implementation of the ReSolv method.

The authors have no conflicts to disclose.

S.R. and A.F.B. contributed equally to this work. S.R. and A.F.B. implemented and applied the ReSolv method and conducted simulations and postprocessing. All authors planned the study, analyzed and interpreted the results, and wrote the paper.

Sebastien Röcken: Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Software (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Anton F. Burnet: Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Software (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Julija Zavadlav: Conceptualization (lead); Funding acquisition (lead); Project administration (lead); Resources (lead); Supervision (lead); Writing – original draft (equal); Writing – review & editing (lead).

The data that support the findings of this study are available within the article and its supplementary material and will be made openly available at https://github.com/tummfm/ReSolv upon acceptance of this paper. The QM7-X dataset by Hoja et al.61 is available at https://doi.org/10.5281/zenodo.4288677. The FreeSolv dataset by Mobley and Guthrie13 is available at https://escholarship.org/uc/item/6sd403pz. The code will be made publicly available at https://github.com/tummfm/ReSolv.git upon acceptance of this paper.

1.
M. J.
Fossat
,
X.
Zeng
, and
R. V.
Pappu
, “
Uncovering differences in hydration free energies and structures for model compound mimics of charged side chains of amino acids
,”
J. Phys. Chem. B
125
,
4148
4161
(
2021
).
2.
P. S.
Nerenberg
,
B.
Jo
,
C.
So
,
A.
Tripathy
, and
T.
Head-Gordon
, “
Optimizing solute–water van der Waals interactions to reproduce solvation free energies
,”
J. Phys. Chem. B
116
,
4524
4534
(
2012
).
3.
E. L.
Ratkova
,
D. S.
Palmer
, and
M. V.
Fedorov
, “
Solvation thermodynamics of organic molecules by the molecular integral equation theory: Approaching chemical accuracy
,”
Chem. Rev.
115
,
6312
6356
(
2015
).
4.
S.
Boobier
,
D. R.
Hose
,
A. J.
Blacker
, and
B. N.
Nguyen
, “
Machine learning with physicochemical relationships: Solubility prediction in organic solvents and water
,”
Nat. Commun.
11
,
5753
(
2020
).
5.
T.
Deng
and
G.-Z.
Jia
, “
Prediction of aqueous solubility of compounds based on neural network
,”
Mol. Phys.
118
,
e1600754
(
2020
).
6.
R.
Zubatyuk
,
J. S.
Smith
,
J.
Leszczynski
, and
O.
Isayev
, “
Accurate and transferable multitask prediction of chemical properties with an atoms-in-molecules neural network
,”
Sci. Adv.
5
,
eaav6490
(
2019
).
7.
C.
Rauer
and
T.
Bereau
, “
Hydration free energies from kernel-based machine learning: Compound-database bias
,”
J. Chem. Phys.
153
,
014101
(
2020
).
8.
S. T.
Hutchinson
and
R.
Kobayashi
, “
Solvent-specific featurization for predicting free energies of solvation through machine learning
,”
J. Chem. Inf. Model.
59
,
1338
1346
(
2019
).
9.
H.
Lim
and
Y.
Jung
, “
Delfos: Deep learning model for prediction of solvation free energies in generic organic solvents
,”
Chem. Sci.
10
,
8306
8315
(
2019
).
10.
A.
Nicholls
,
D. L.
Mobley
,
J. P.
Guthrie
,
J. D.
Chodera
,
C. I.
Bayly
,
M. D.
Cooper
, and
V. S.
Pande
, “
Predicting small-molecule solvation free energies: An informal blind test for computational chemistry
,”
J. Med. Chem.
51
,
769
779
(
2008
).
11.
M.
Geballe
,
A.
Skillman
,
A.
Nicholls
,
J.
Guthrie
, and
P.
Taylor
, “
The SAMPL2 blind prediction challenge: Introduction and overview
,”
J. Comput.-Aided Mol. Des.
24
,
259
279
(
2010
).
12.
J. P.
Guthrie
, “
SAMPL4, a blind challenge for computational solvation free energies: The compounds considered
,”
J. Comput.-Aided Mol. Des.
28
,
151
168
(
2014
).
13.
D. L.
Mobley
and
J. P.
Guthrie
, “
FreeSolv: A database of experimental and calculated hydration free energies, with input files
,”
J. Comput.-Aided Mol. Des.
28
,
711
720
(
2014
).
14.
G.
Duarte Ramos Matos
,
D. Y.
Kyu
,
H. H.
Loeffler
,
J. D.
Chodera
,
M. R.
Shirts
, and
D. L.
Mobley
, “
Approaches for calculating solvation free energies and enthalpies demonstrated with an update of the FreeSolv database
,”
J. Chem. Eng. Data
62
,
1559
1569
(
2017
).
15.
A.
Klamt
and
M.
Diedenhofen
, “
Blind prediction test of free energies of hydration with COSMO-RS
,”
J. Comput.-Aided Mol. Des.
24
,
357
360
(
2010
).
16.
S.
Ehlert
,
M.
Stahn
,
S.
Spicher
, and
S.
Grimme
, “
Robust and efficient implicit solvation model for fast semiempirical methods
,”
J. Chem. Theory Comput.
17
,
4250
4261
(
2021
).
17.
S.
Riniker
, “
Molecular dynamics fingerprints (MDFP): Machine learning from md data to predict free-energy differences
,”
J. Chem. Inf. Model.
57
,
726
741
(
2017
).
18.
Z.
Zhang
,
D.
Peng
,
L.
Liu
,
L.
Shen
, and
W.
Fang
, “
Machine learning prediction of hydration free energy with physically inspired descriptors
,”
J. Phys. Chem. Lett.
14
,
1877
1884
(
2023
).
19.
H.
Lim
and
Y.
Jung
, “
MLSolvA: Solvation free energy prediction from pairwise atomistic interactions by machine learning
,”
J. Cheminf.
13
,
56
(
2021
).
20.
K.
Low
,
M.
Coote
, and
E.
Izgorodina
, “
Explainable solvation free energy prediction combining graph neural networks with chemical intuition
,”
J. Chem. Inf. Model.
62
,
5457
5470
(
2022
).
21.
D.
Zhang
,
S.
Xia
, and
Y.
Zhang
, “
Accurate prediction of aqueous free solvation energies using 3D atomic feature-based graph neural network with transfer learning
,”
J. Chem. Inf. Model.
62
,
1840
1848
(
2022
).
22.
Z.
Wu
,
B.
Ramsundar
,
E.
Feinberg
,
J.
Gomes
,
C.
Geniesse
,
A.
Pappu
,
K.
Leswing
, and
V.
Pande
, “
MoleculeNet: A benchmark for molecular machine learning
,”
Chem. Sci.
9
,
513
530
(
2018
).
23.
K.
Yang
,
K.
Swanson
,
W.
Jin
,
C.
Coley
,
P.
Eiden
,
H.
Gao
,
A.
Guzman-Perez
,
T.
Hopper
,
B.
Kelley
,
M.
Mathea
,
A.
Palmer
,
V.
Settels
,
T.
Jaakkola
,
K.
Jensen
, and
R.
Barzilay
, “
Analyzing learned molecular representations for property prediction
,”
J. Chem. Inf. Model.
59
,
3370
3388
(
2019
).
24.
H.
Cho
and
I.
Choi
, “
Enhanced deep-learning prediction of molecular properties via augmentation of bond topology
,”
ChemMedChem
14
,
1604
1609
(
2019
).
25.
Y.
Pathak
,
S.
Mehta
, and
U.
Priyakumar
, “
Learning atomic interactions through solvation free energy prediction using graph neural networks
,”
J. Chem. Inf. Model.
61
,
689
698
(
2021
).
26.
D.
Chen
,
K.
Gao
,
D. D.
Nguyen
,
X.
Chen
,
Y.
Jiang
,
G.-W.
Wei
, and
F.
Pan
, “
Algebraic graph-assisted bidirectional transformers for molecular property prediction
,”
Nat. Commun.
12
,
3521
(
2021
).
27.
A.
Alibakhshi
and
B.
Hartke
, “
Improved prediction of solvation free energies by machine-learning polarizable continuum solvation model
,”
Nat. Commun.
12
,
3584
(
2021
).
28.
J.
Scheen
,
W.
Wu
,
A.
Mey
,
P.
Tosco
,
M.
Mackey
, and
J.
Michel
, “
Hybrid alchemical free energy/machine-learning methodology for the computation of hydration free energies
,”
J. Chem. Inf. Model.
60
,
5331
5339
(
2020
).
29.
J.
Weinreich
,
N.
Browning
, and
O. A.
von Lilienfeld
, “
Machine learning of free energies in chemical compound space using ensemble representations: Reaching experimental uncertainty for solvation
,”
J. Chem. Phys.
154
,
134113
(
2021
).
30.
E.
Boulanger
,
L.
Huang
,
C.
Rupakheti
,
A. D.
MacKerell
, Jr.
, and
B.
Roux
, “
Optimized Lennard-Jones parameters for druglike small molecules
,”
J. Chem. Theory Comput.
14
,
3121
3131
(
2018
).
31.
J.
Karwounopoulos
,
Å.
Kaupang
,
M.
Wieder
, and
S.
Boresch
, “
Calculations of absolute solvation free energies with Transformato – Application to the FreeSolv database using the CGenFF force field
,”
J. Chem. Theory Comput.
19
,
5988
5998
(
2023
).
32.
S.
Luukkonen
,
L.
Belloni
,
D.
Borgis
, and
M.
Levesque
, “
Predicting hydration free energies of the FreeSolv database of drug-like molecules with molecular density functional theory
,”
J. Chem. Inf. Model.
60
,
3558
3565
(
2020
).
33.
J.
Zhang
,
H.
Zhang
,
T.
Wu
,
Q.
Wang
, and
D.
van der Spoel
, “
Comparison of implicit and explicit solvent models for the calculation of solvation free energy in organic solvents
,”
J. Chem. Theory Comput.
13
,
1034
1043
(
2017
).
34.
A.
Onufriev
, “
Implicit solvent models in molecular dynamics simulations: A brief overview
,”
Annu. Rep. Comput. Chem.
4
,
125
137
(
2008
).
35.
B.
Roux
and
T.
Simonson
, “
Implicit solvent models
,”
Biophys. Chem.
78
,
1
20
(
1999
).
36.
S. S. M.
Mahmoud
,
G.
Esposito
,
G.
Serra
, and
F.
Fogolari
, “
Generalized born radii computation using linear models and neural networks
,”
Bioinformatics
36
,
1757
1764
(
2020
).
37.
R.
Zhou
and
B. J.
Berne
, “
Can a continuum solvent model reproduce the free energy landscape of a β-hairpin folding in water?
,”
Proc. Natl. Acad. Sci. U. S. A.
99
,
12777
12782
(
2002
).
38.
A.
Cumberworth
,
J. M.
Bui
, and
J.
Gsponer
, “
Free energies of solvation in the context of protein folding: Implications for implicit and explicit solvent models
,”
J. Comput. Chem.
37
,
629
640
(
2016
).
39.
M.
Brieg
,
J.
Setzler
,
S.
Albert
, and
W.
Wenzel
, “
Generalized born implicit solvent models for small molecule hydration free energies
,”
Phys. Chem. Chem. Phys.
19
,
1677
1685
(
2017
).
40.
A.
Klamt
and
G.
Schüürmann
, “
COSMO: A new approach to dielectric screening in solvents with explicit expressions for the screening energy and its gradient
,”
J. Chem. Soc., Perkin Trans.
2
,
799
805
(
1993
).
41.
A.
Klamt
and
F.
Eckert
, “
COSMO-RS: A novel and efficient method for the a priori prediction of thermophysical data of liquids
,”
Fluid Phase Equilib.
172
,
43
72
(
2000
).
42.
A.
Klamt
and
M.
Diedenhofen
, “
Calculation of solvation free energies with DCOSMO-RS
,”
J. Phys. Chem. A
119
,
5439
5445
(
2015
).
43.
P.
Friederich
,
F.
Häse
,
J.
Proppe
, and
A.
Aspuru-Guzik
, “
Machine-learned potentials for next-generation matter simulations
,”
Nat. Mater.
20
,
750
761
(
2021
).
44.
P.
Gkeka
,
G.
Stoltz
,
A.
Barati Farimani
,
Z.
Belkacemi
,
M.
Ceriotti
,
J.
Chodera
,
A.
Dinner
,
A.
Ferguson
,
J.
Maillet
,
H.
Minoux
,
C.
Peter
,
F.
Pietrucci
,
A.
Silveira
,
A.
Tkatchenko
,
Z.
Trstanova
,
R.
Wiewiora
, and
T.
Lelièvre
, “
Machine learning force fields and coarse-grained variables in molecular dynamics: Application to materials and biological systems
,”
J. Chem. Theory Comput.
16
,
4757
4775
(
2020
).
45.
S.
Thaler
,
M.
Stupp
, and
J.
Zavadlav
, “
Deep coarse-grained potentials via relative entropy minimization
,”
J. Chem. Phys.
157
,
244103
(
2022
).
46.
S.
Thaler
and
J.
Zavadlav
, “
Learning neural network potentials from experimental data via differentiable trajectory reweighting
,”
Nat. Commun.
12
,
6884
(
2021
).
47.
S.
Röcken
and
J.
Zavadlav
, “
Accurate machine learning force fields via experimental and simulation data fusion
,”
npj Comput. Mater.
10
,
69
(
2024
).
48.
X.
Ding
, “
Optimizing force fields with experimental data using ensemble reweighting and potential contrasting
,”
J. Phys. Chem. B
128
,
6760
6769
(
2024
).
49.
M. D.
Polêto
and
J. A.
Lemkul
, “
Integration of experimental data and use of automated fitting methods in developing protein force fields
,”
Commun. Chem.
5
,
38
(
2022
).
50.
A.
Cesari
,
S.
Bottaro
,
K.
Lindorff-Larsen
,
P.
Banás
,
J.
Šponer
, and
G.
Bussi
, “
Fitting corrections to an RNA force field using experimental data
,”
J. Chem. Theory Comput.
15
,
3425
3431
(
2019
).
51.
T.
Fröhlking
,
M.
Bernetti
,
N.
Calonaci
, and
G.
Bussi
, “
Toward empirical force fields that match experimental observables
,”
J. Chem. Phys.
152
,
230902
(
2020
).
52.
P. R.
Vlachas
,
J.
Zavadlav
,
M.
Praprotnik
, and
P.
Koumoutsakos
, “
Accelerated simulations of molecular systems through learning of effective dynamics
,”
J. Chem. Theory Comput.
18
,
538
549
(
2021
).
53.
A.
Coste
,
E.
Slejko
,
J.
Zavadlav
, and
M.
Praprotnik
, “
Developing an implicit solvation machine learning model for molecular simulations of ionic media
,”
J. Chem. Theory Comput.
20
,
411
420
(
2023
).
54.
J.
Wang
,
N.
Charron
,
B.
Husic
,
S.
Olsson
,
F.
Noé
, and
C.
Clementi
, “
Multi-body effects in a coarse-grained protein force field
,”
J. Chem. Phys.
154
,
164113
(
2021
).
55.
W.
Wang
,
S.
Axelrod
, and
R.
Gómez-Bombarelli
, “
Differentiable molecular simulations for control and learning
,” in
ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations
,
2019
, https://openreview.net/forum?id=YDNzrQRsu.
56.
B. E.
Husic
,
N. E.
Charron
,
D.
Lemm
,
J.
Wang
,
A.
Pérez
,
M.
Majewski
,
A.
Krämer
,
Y.
Chen
,
S.
Olsson
,
G.
de Fabritiis
,
F.
Noe
, and
C.
Clementi
, “
Coarse graining molecular dynamics with graph neural networks
,”
J. Chem. Phys.
153
,
194101
(
2020
).
57.
W.
Wang
and
R.
Gómez-Bombarelli
, “
Coarse-graining auto-encoders for molecular dynamics
,”
npj Comput. Mater.
5
,
125
(
2019
).
58.
S.
Yao
,
R.
Van
,
X.
Pan
,
J.
Park
,
Y.
Mao
,
J.
Pu
,
Y.
Mei
, and
Y.
Shao
, “
Machine learning based implicit solvent model for aqueous-solution alanine dipeptide molecular dynamics simulations
,”
RSC Adv.
13
,
4565
4577
(
2023
).
59.
R.
Ramakrishnan
,
P. O.
Dral
,
M.
Rupp
, and
O. A.
von Lilienfeld
, “
Quantum chemistry structures and properties of 134 kilo molecules
,”
Sci. Data
1
,
140022
(
2014
).
60.
J. S.
Smith
,
O.
Isayev
, and
A. E.
Roitberg
, “
ANI-1, A data set of 20 million calculated off-equilibrium conformations for organic molecules
,”
Sci. Data
4
,
170193
(
2017
).
61.
J.
Hoja
,
L.
Medrano Sandonas
,
B. G.
Ernst
,
A.
Vazquez-Mayagoitia
,
R. A.
DiStasio
, Jr.
, and
A.
Tkatchenko
, “
QM7-X, A comprehensive dataset of quantum-mechanical properties spanning the chemical space of small organic molecules
,”
Sci. Data
8
,
43
(
2021
).
62.
O. T.
Unke
and
M.
Meuwly
, “
PhysNet: A neural network for predicting energies, forces, dipole moments, and partial charges
,”
J. Chem. Theory Comput.
15
,
3678
3693
(
2019
).
63.
Y.
Basdogan
,
M. C.
Groenenboom
,
E.
Henderson
,
S.
De
,
S. B.
Rempe
, and
J. A.
Keith
, “
Machine learning-guided approach for studying solvation environments
,”
J. Chem. Theory Comput.
16
,
633
642
(
2019
).
64.
W. G.
Noid
,
J.-W.
Chu
,
G. S.
Ayton
,
V.
Krishna
,
S.
Izvekov
,
G. A.
Voth
,
A.
Das
, and
H. C.
Andersen
, “
The multiscale coarse-graining method. I. A rigorous bridge between atomistic and coarse-grained models
,”
J. Chem. Phys.
128
,
244114
(
2008
).
65.
A.
Durumeric
,
N.
Charron
,
C.
Templeton
,
F.
Musil
,
K.
Bonneau
,
A.
Pasos-Trejo
,
Y.
Chen
,
A.
Kelkar
,
F.
Noé
, and
C.
Clementi
, “
Machine learned coarse-grained protein force-fields: Are we there yet
,”
Curr. Opin. Struct. Biol.
79
,
102533
(
2023
).
66.
Y.
Chen
,
A.
Krämer
,
N.
Charron
,
B.
Husic
,
C.
Clementi
, and
F.
Noé
, “
Machine learning implicit solvation for molecular dynamics
,”
J. Chem. Phys.
155
,
084101
(
2021
).
67.
R. W.
Zwanzig
, “
High-temperature equation of state by a perturbation method. I. Nonpolar gases
,”
J. Chem. Phys.
22
,
1420
1426
(
1954
).
68.
S. P.
Carmichael
and
M. S.
Shell
, “
A new multiscale algorithm and its application to coarse-grained peptide models for self-assembly
,”
J. Phys. Chem. B
116
,
8383
8393
(
2012
).
69.
C. H.
Bennett
, “
Efficient estimation of free energy differences from Monte Carlo data
,”
J. Comput. Phys.
22
,
245
268
(
1976
).
70.
S.
Batzner
,
A.
Musaelian
,
L.
Sun
,
M.
Geiger
,
J. P.
Mailoa
,
M.
Kornbluth
,
N.
Molinari
,
T. E.
Smidt
, and
B.
Kozinsky
, “
E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials
,”
Nat. Commun.
13
,
2453
(
2022
).
71.
S. S.
Schoenholz
and
E. D.
Cubuk
, “
JAX, M.D. A framework for differentiable physics
,” in
Advances in Neural Information Processing Systems
(
Curran Associates, Inc.
,
2020
), Vol.
33
.
72.
See https://www.rdkit.org for RDKit: Open-source cheminformatics.
73.
T. A.
Halgren
, “
Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94
,”
J. Comput. Chem.
17
,
490
519
(
1996
).
74.
T. A.
Halgren
, “
Merck molecular force field. II. MMFF94 van der Waals and electrostatic parameters for intermolecular interactions
,”
J. Comput. Chem.
17
,
520
552
(
1996
).
75.
T. A.
Halgren
, “
Merck molecular force field. III. Molecular geometries and vibrational frequencies for MMFF94
,”
J. Comput. Chem.
17
,
553
586
(
1996
).
76.
T. A.
Halgren
and
R. B.
Nachbar
, “
Merck molecular force field. IV. Conformational energies and geometries for MMFF94
,”
J. Comput. Chem.
17
,
587
615
(
1996
).
77.
T. A.
Halgren
, “
Merck molecular force field. V. Extension of MMFF94 using experimental data, additional computational data, and empirical rules
,”
J. Comput. Chem.
17
,
616
641
(
1996
).
78.
J.
Wagner
,
M.
Thompson
,
D. L.
Mobley
,
J.
Chodera
,
C.
Bannan
,
A.
Rizzi
,
trevorgokey
,
D. L.
Dotson
,
J. A.
Mitchell
,
jaimergp
,
Camila
,
P.
Behara
,
C.
Bayly
,
JoshHorton
,
L.
Wang
,
I.
Pulido
,
PEFrankel
,
V.
Lim
,
S.
Sasmal
,
SimonBoothroyd
,
A.
Dalke
,
E.
Holz
,
B.
Westbrook
,
D.
Smith
,
J.
Horton
,
L.-P.
Wang
,
R.
Gowers
, and
Z.
Zhao
(2024). “openforcefield/openff-toolkit: 0.16.5 Minor feature and bugfix release Pre-release, 2024,” Zenodo, version 0.16.5. https://doi.org/10.5281/zenodo.13886754
79.
P.
Eastman
,
R.
Galvelis
,
R. P.
Peláez
,
C. R. A.
Abreu
,
S. E.
Farr
,
E.
Gallicchio
,
A.
Gorenko
,
M. M.
Henry
,
F.
Hu
,
J.
Huang
et al, “
OpenMM 8: Molecular dynamics simulation with machine learning potentials
,”
J. Phys. Chem. B
128
,
109
116
(
2023
).
80.
A. A.
Duval
,
V.
Schmidt
,
A.
Hernández-Garcıa
,
S.
Miret
,
F. D.
Malliaros
,
Y.
Bengio
, and
D.
Rolnick
, “
FAENeT: Frame averaging equivariant GNN for materials modeling
,” in
International Conference on Machine Learning
(
PMLR
,
2023
), pp.
9013
9033
.
81.
O. T.
Unke
,
S.
Chmiela
,
M.
Gastegger
,
K. T.
Schütt
,
H. E.
Sauceda
, and
K.-R.
Müller
, “
SpookyNet: Learning force fields with electronic degrees of freedom and nonlocal effects
,”
Nat. Commun.
12
,
7273
(
2021
).
82.
J. P.
Jämbeck
,
F.
Mocci
,
A. P.
Lyubartsev
, and
A.
Laaksonen
, “
Partial atomic charges and their impact on the free energy of solvation
,”
J. Comput. Chem.
34
,
187
197
(
2013
).
83.
J.
Reinisch
and
A.
Klamt
, “
Prediction of free energies of hydration with COSMO-RS on the SAMPL4 data set
,”
J. Comput.-Aided Mol. Des.
28
,
169
173
(
2014
).
84.
J. H.
Moore
,
D. J.
Cole
, and
G.
Csanyi
, “
Computing hydration free energies of small molecules with first principles accuracy
,” arXiv:2405.18171 (
2024
).
85.
H.
Van de Waterbeemd
,
H.
Lennernäs
, and
P.
Artursson
,
Drug Bioavailability: Estimation of Solubility, Permeability, Absorption and Bioavailability
,
Methods and Principles in Medicinal Chemistry Vol. 18
(
Wiley-VCH
,
Weinheim, Germany
,
2004
).
86.
G.
Duarte Ramos Matos
,
G.
Calabro
, and
D. L.
Mobley
, “
Infinite dilution activity coefficients as constraints for force field parametrization and method development
,”
J. Chem. Theory Comput.
15
,
3066
3074
(
2019
).
87.
J.
Huang
,
S.
Rauscher
,
G.
Nawrocki
,
T.
Ran
,
M.
Feig
,
B. L.
De Groot
,
H.
Grubmüller
, and
A. D.
MacKerell
, “
CHARMM36m: An improved force field for folded and intrinsically disordered proteins
,”
Nat. Methods
14
,
71
73
(
2017
).
88.
S.
Kashefolgheta
,
M. P.
Oliveira
,
S. R.
Rieder
,
B. A.
Horta
,
W. E.
Acree
, Jr.
, and
P. H.
Hünenberger
, “
Evaluating classical force fields against experimental cross-solvation free energies
,”
J. Chem. Theory Comput.
16
,
7556
7580
(
2020
).
89.
M. R.
Shirts
,
J. W.
Pitera
,
W. C.
Swope
, and
V. S.
Pande
, “
Extremely precise free energy calculations of amino acid side chain analogs: Comparison of common molecular mechanics force fields for proteins
,”
J. Chem. Phys.
119
,
5740
5761
(
2003
).
90.
C.
Oostenbrink
,
A.
Villa
,
A. E.
Mark
, and
W. F.
Van Gunsteren
, “
A biomolecular force field based on the free enthalpy of hydration and solvation: The GROMOS force-field parameter sets 53A5 and 53A6
,”
J. Comput. Chem.
25
,
1656
1676
(
2004
).
91.
S.
Kashefolgheta
and
A.
Vila Verde
, “
Developing force fields when experimental data is sparse: AMBER/GAFF-compatible parameters for inorganic and alkyl oxoanions
,”
Phys. Chem. Chem. Phys.
19
,
20593
20607
(
2017
).
92.
M. K.
Meinel
and
F.
Müller-Plathe
, “
Loss of molecular roughness upon coarse-graining predicts the artificially accelerated mobility of coarse-grained molecular simulation models
,”
J. Chem. Theory Comput.
16
,
1411
1419
(
2020
).
93.
P.
Katzberger
and
S.
Riniker
, “
A general graph neural network based implicit solvation model for organic molecules in water
,”
Chem. Sci.
15
,
10794
(
2024
).
94.
N. E.
Charron
,
F.
Musil
,
A.
Guljas
,
Y.
Chen
,
K.
Bonneau
,
A. S.
Pasos-Trejo
,
J.
Venturin
,
D.
Gusew
,
I.
Zaporozhets
,
A.
Krämer
et al, “
Navigating protein landscapes with a machine-learned transferable coarse-grained model
,” arXiv:2310.18278 (
2023
).