In this work, a systematic protocol is proposed to automatically parametrize the non-polar part of implicit solvent models with polar and non-polar components. The proposed protocol utilizes either the classical Poisson model or the Kohn-Sham density functional theory based polarizable Poisson model for modeling polar solvation free energies. Four sets of radius parameters are combined with four sets of charge force fields to arrive at a total of 16 different parametrizations for the polar component. For the non-polar component, either the standard model of surface area, molecular volume, and van der Waals interactions or a model with atomic surface areas and molecular volume is employed. To automatically parametrize a non-polar model, we develop scoring and ranking algorithms to classify solute molecules. The their non-polar parametrization is obtained based on the assumption that similar molecules have similar parametrizations. A large database with 668 experimental data is collected and employed to validate the proposed protocol. The lowest leave-one-out root mean square (RMS) error for the database is 1.33 kcal/mol. Additionally, five subsets of the database, i.e., SAMPL0-SAMPL4, are employed to further demonstrate that the proposed protocol. The optimal RMS errors are 0.93, 2.82, 1.90, 0.78, and 1.03 kcal/mol, respectively, for SAMPL0, SAMPL1, SAMPL2, SAMPL3, and SAMPL4 test sets. The corresponding RMS errors for the polarizable Poisson model with the Amber Bondi radii are 0.93, 2.89, 1.90, 1.16, and 1.07 kcal/mol, respectively.

Solvation, in which separated solvent and solute molecules are combined to form a solution, is an elementary physical process in nature that provides a foundation for more complicated chemical and biological processes, such as ion channel permeation, protein-ligand binding, electron transfer, signal transduction, DNA specification, transcription, post-transcription modification, gene expression, and protein synthesis. Therefore, the understanding of solvation is a prerequisite for the quantitative study of other more complex chemical and biological processes and is of paramount importance to chemistry, physics, and biology.14,67,78 The most basic and reliable experimental observation of solvation is the solvation free energy, which measures the free energy change in the solvation process. As a result, one of the most important tasks in the solvation modeling and computation is the prediction of solvation free energies, which has recently drawn much attention in computational biophysics and chemistry.14,15,46,67,78,88

A large variety of solvation models has been developed in the past few decades for solvation analysis and solvation free energy prediction. In general, these models can be classified into either knowledge based models or physics based models. Knowledge based models usually utilize a large amount of available data of solvation free energies to train statistical models for the solvation free energy prediction. One example of this type of models makes use of solvent accessible surface areas, solvent excluded surface areas, or calibrated atomic surface areas for solvation free energy predictions.85 Knowledge based models can provide relatively accurate predictions provided that enough experimental data are available. On the other hand, physics based solvation models are based on fundamental laws of physics. These models can be further classified into three major classes. One of them is explicit solvent models in which both solvent and solute molecules are described at the atomic or electronic level. This class of models is accurate in general, but is usually computationally expensive.45,49,72 Another class of models is integral equation based solvation theories,36,65,66 in which the solute molecule is still modeled at the atomic level, while the solvent is modeled by statistical mechanics, such as liquid density functional theory, Ornstein-Zernike equation with hypernetted-chain, Percus-Yevick equation, or other specified closure. Integral equation based solvation models reduce the number of degrees of freedom dramatically compared to explicit solvent models. A major feature of integral equation approaches is that these methods are able to provide a good approximation of solvent microstructures near the solvent-solute interface.28 The other class of the physics based solvation models is implicit solvent models,1,7,12,22,27,61,64,71,89 in which the solute molecule can be modeled at either the atomic or the quantum mechanical level, while the solvent is described as a dielectric continuum.

Compared to other methods, implicit solvent models have advantages of involving a small number of degrees of freedom, highly accurate modeling of strong and long-range electrostatic interactions that often dominate solvation phenomena, and convenient treatments of solvent-solute electronic polarization effects.12 In implicit solvent models, the solvation free energy is typically divided into polar and non-polar components. The polar solvation free energy is due to electrostatic interactions and can be modeled by a number of approaches, including Born dielectric sphere model, Kirkwood model,43 generalized Born (GB) model,2,16,19,30,31,41 polarizable continuum model (PCM),18,39,81 and Poisson-Boltzmann (PB) theory.29,37,62,70,73 Among these approaches, Born dielectric sphere model which treats the solute molecule as the dielectric sphere with a centrally located atomic charge is the simplest one. The Kirkwood model gives analytical expressions for electrostatic potentials in a spherical solute-solvent system with multiple charges inside the solute molecule.43 The GB model2,16,19,30,31,41 is able to deal with arbitrarily shaped molecules and account for the impact of each point charge by its effective distance (i.e., the Born radius) from the solvent-solute interface.13,80 The GB model is relatively faster, while depending on other methods, such as the PB theory, for its parametrization. The PCM describes solute electrons by using quantum mechanics so that solvation induced polarizations can be accounted for in an iterative procedure. A few different versions of PCM models have been reported, including dielectric PCM, integral equation formalism PCM, and the variational PCM models.18,39,81 PCM is relatively computationally expensive due to its quantum mechanical charge calculation while its accuracy is bounded by the PB type of treatment of electrostatic potential as well as the quality of solvent-solute interfaces. The PB theory can be derived from more fundamental Maxwell’s equations.27,37,62,70,73 It is more accurate than the GB model and computationally more efficient than the PCM. The PB can be easily coupled with the quantum mechanics for a more accurate description of the solute charge density.9,86,87 It has become one of the most popular solvation models due to its relatively low computational cost and high modeling accuracy for biomolecules.

The non-polar solvation component has been modeled by a number of terms. A popular approach, based on the scaled-particle theory (SPT) for non-polar solutes in aqueous solutions,63,76 is to use a solvent-accessible surface area (SASA) term.53,79 It was shown that a solvent-accessible volume (SAV) term is relevant in large length scale regimes.38,51 Recent studies indicate that SASA based solvation models may not describe van der Waals (vdW) interactions near solvent-solute interface.11,22,23,82 A combination of surface area, surface enclosed volume, and vdW potential has been shown to provide accurate non-polar solvation predictions.10,83

The polar and non-polar components are typically decoupled in traditional implicit solvent models. Recently, new efforts have been made to couple polar and non-polar components in the literature.7,17,89 Differential geometry based solvation model makes use of the differential geometry theory of surfaces to dynamically couple polar and nonploar models by surface evolution, which is driven by free energy optimization.7–9,89 Coupled with an optimization procedure, this model was shown to deliver some of the best non-polar solvent free energy analysis.10 By applying constrained optimization to non-polar parameter selection, this model provides state-of-the-art solvation free energy fitting and cross validation results for a large number of solute molecules.83 

It is important to validate solvation models by experimental data. SAMPLx blind solvation prediction project, aimed at testing protein ligand binding models, also provides benchmark tests of the performance of the solvation models. Many different solvation approaches have been tested by the SAMPLx challenging molecules.21,25,26,34,35,55,56,58–60,68,69 Implicit solvent models are among the most competitive approaches in SAMPLx tests.

The objective of the present work is to develop an automatic protocol to parametrize implicit solvent models for the blind solvation free energy prediction. In our approach, we utilize PB theories for computing electrostatic solvation free energies. Both a point charge based PB model and a Kohn-Sham density functional theory (KSDFT) based polarizable PB model are developed in the present work. For the KSDFT based PB model, an iterative process is developed to take care of the solvent polarization and solute response.9 In our non-polar models, we employ either a collection of surface area, molecular volume and van der Waals interactions, or a combination of atomic surface areas and molecular volume. To determine parameters in our non-polar models, we assume that similar molecules admit the same set of non-polar parameters. In this work, we develop two parametrization algorithms. One of them is based on a functional group scoring, and the other is based on a nearest neighbor search. Both algorithms utilize machine learning methods to optimize parameters. A database contains the solvation free energies in aqueous solvent for 668 solute molecules which is collected from the literature4,57 to validate the proposed parametrization protocol.

The rest of this paper is organized as follows. In Section II, we present our solvation models. Both polar and non-polar models are discussed. Sections II B and II C present two non-polar parametrization algorithms, namely, functional group scoring and nearest neighbor approaches. An algorithm for unified blind solvation free energy prediction is developed in Section II D. The blind prediction of a large number of solvation free energies is carried out in Section III. The leave-one-out test of 668 molecules is employed to validate the proposed protocol. Finally, the accuracy and robustness of the proposed protocol is demonstrated by five SAMPL test sets. This paper ends with a conclusion.

As an implicit approach, our solvation model has polar and non-polar components,

Δ G = Δ G p + G np ,
(1)

where ΔG is the total solvation free energy, ΔGp is the polar or electrostatic solvation free energy, and Gnp is the non-polar solvation free energy. The related polar and non-polar solvation models are described in Secs. II A 1 and II A 2.

1. Polar solvation models

For polar solvation modeling, we utilize either the standard Poisson model or a KSDFT based Poisson model (i.e., a polarizable Poisson model) for polar solvation free energy calculation. These two approaches are described below.

a. The Poisson model.

The Poisson model is given by Refs. 27 and 73,

( ϵ ( r ) ϕ ( r ) ) = i = 1 N m q i δ ( r r i ) , r Ω , ϕ s ( r ) ϕ m ( r ) = 0 , r Γ , ϵ s ϕ s ( r ) n ϵ m ϕ m ( r ) n = 0 , r Γ , ϕ ( r ) = 1 4 π i = 1 N m q i ϵ s | r r i | , r Ω ,
(2)

where ϵ(r) is the discontinuous dielectric constant and is chosen as ϵm = 1 in the solute domain Ωm and at ϵs = 80 in the solvent domain Ωs. Here ϕ is the electrostatic potential, Nm is the total number of partial charges in the solute, and qi is the ith charge at the position of ri. Additionally, ϕs and ϕm are limiting values of the potential from solvent and solute domains, respectively. Here n is the outer norm vector at point r of the interface Γ. Finally, Ω and ∂Ω are computational domain and domain boundary, respectively.

We use the solvent excluded surface (SES) in the present work and it is generated by using our in-house Eulerian solvent excluded surface (ESES) software.50 ESES automatically embeds the SES into the Cartesian mesh, which is convenient for the finite difference type of Poisson solvers. The matched interface and boundary based PB (MIBPB) software package6,27,84 is used to solve Eq. (2) with appropriate charge force fields for {qi} and radius sets in SESs.

The polar solvation free energy is then calculated as

Δ G p = 1 2 i = 1 N m q ( r i ) ( ϕ ( r i ) ϕ home ( r i ) ) ,
(3)

where ϕhome(ri) is the solution of the Poisson equation with uniform dielectric constant ϵm over the whole domain Ω.

b. The KSDFT based polarizable Poisson model.

Instead of using a given partial charge force field, the KSDFT based polarizable Poisson model utilizes a DFT approach to compute the charge distribution in the solvent environment. As such, this approach partially takes into account for solvent polarization and solute response in an iterative procedure. The combination of the Poisson equation with SES and KSDFT has been reported in the literature.86,87 Recently, we have proposed a differential geometry based polarizable PB model that further allows solvent-solute interface to be optimized in the Poisson-Kohn-Sham (PKS) formalism.9 In the present work, we adopt SES in our PKS software for the sake of an accurate estimation of atomic surface areas via our ESES software package.50 Consequently, our MIBPB software is also employed for solving the Poisson equation with sharp interfaces.6,27,84

In this approach, the Poisson model is9 

( ϵ ( r ) ϕ ( r ) ) = ρ total ( r ) , r Ω , ϕ s ( r ) ϕ m ( r ) = 0 , r Γ , ϵ s ϕ s ( r ) n ϵ m ϕ m ( r ) n = 0 , r Γ , ϕ ( r ) = 1 4 π Ω ρ total ( r ̃ ) ϵ s | r r ̃ | d r ̃ , r Ω ,
(4)

where ρtotal(r) = qn(r) − qnn(r) is the charge distribution in the solute domain, with q being the unit charge of an electron, n(r) being the electron density, and n n ( r ) = I Z I δ ( r R I ) being the nucleus density, in which ZI and RI are the atomic number and position vector of nucleus I, respectively.

The Poisson equation (4) is coupled to the following generalized KS equation:9 

ħ 2 2 m 2 + U eff ψ i = E i ψ i ,
(5)

where m is electron mass and ħ2 = h/2π with h being the Planck constant. Here ψi and Ei are the ith KS orbital and ith eigenvalue, respectively, and U eff q ϕ RF + U eff 0 is the effective potential of the generalized Kohn-Sham equation with ϕRF = ϕϕ0 being the reaction field potential of the solute in the solvent environment, in which ϕ and ϕ0 are the electrostatic potentials generated by the solute in the solvent and solute environments, respectively. Finally, U eff 0 is the effective Kohn-Sham potential in the vacuum environment. Note that the solution of Eq. (5) gives rise to the electron density n ( r ) = i | ψ i | 2 , which, in turn, updates the charge distribution for Eq. (4). A self-consistent iterative process has been developed to solve Eqs. (4) and (5) to their convergence9 starting from solving the KSDFT in vacuum as an initial guess to the coupled system.

For KSDFT calculations, we employ the SIESTA solver,75 which uses pseudopotential to eliminate the complicated effects of core electrons. In all calculations, the default double-ζ plus single polarization (DZP) basis is used. The MeshCutOff is set as 125 Rydberg and local density approximation (LDA) is used to approximate the exchange correlation potential. The solution method is set to be “diagon.”

In this formalism, the polar solvation free energy is computed by

Δ G p = 1 2 Ω m q ( r ) ϕ RF ( r ) d r .
(6)

2. Non-polar solvation models

For the non-polar solvation component, we use either vdW term based model or an atomic surface area based model.

a. A vdW term based non-polar model.

In our earlier work, the non-polar solvation free energy was modeled by7–9,89

G np = γ A + p V + + ρ 0 Ω s U vdW d r ,
(7)

where γ and A are, respectively, surface tension and area of the solute molecule. Additionally, p and V are, respectively, surface enclosed volume and hydrodynamic pressure difference. Finally, ρ0 is the solvent bulk density, and UvdW is the vdW interaction potential, i.e., the Lennard-Jones (LJ) potential, the detailed implementation of vdW interaction potential can be found in the work,7,83

U vdW ( r ) = α N i α ε α σ α + σ s | r r i | 12 2 σ α + σ s | r r i | 6 ,
(8)

where N is the total number of atomic types in a given solute molecule and σs is the probe radius. Here, εα and σα are the vdW potential well depth and vdW radius of the αth type of atoms, respectively, and ri is position of the ith atom in the α type of atoms.

In Eq. (7) the integration is over solvent domain Ωs. This non-polar solvation free energy model was shown to offer excellent predictions of experimental data for various non-polar molecules.10 In our recent work, we have demonstrated superb results for a large number of polar and non-polar molecules when the non-polar model in Eq. (7) is combined with a PB model for the polar solvation component.83 

In this non-polar solvation free energy model, the parameters γ, p, σs, and εα need to be learned from a training set. However, one can set ρ0 = 1 because its effect can be absorbed to εα.

The LJ potential offers a physical model for dealing with vdW interactions near the solvent-solute interface. However, a drawback of this non-polar term is that the LJ potential energy is nonlinear with respect to the probe radius (σs), which gives rise to a difficulty for a linear optimization process. Note that the non-polar energy is linear with respect to pressure and surface tension parameters.

b. An atomic surface area based nonp-polar model.

Mathematically, for a small probe radius, the vdW term for a given atom is proportional to the atomic surface area. Therefore, atomic surface area approach used by Kollman and co-workers85 should have a similar modeling effect as that of the LJ potential. Based on this observation, we propose the following non-polar solvation energy model:

Δ G np = α = 1 N γ α Area α + p V ol = α = 1 N i α γ α Area i α + p V ol ,
(9)

where N is the total number of atomic types in a given solute molecule. Here, γα and Areaα are the surface tension and surface area of the αth type of atoms, respectively, and Area i α is the surface area of the ith atom in the α type of atoms. In this non-polar solvation free energy model, the parameters γα and p need to be learned from a training set. Both the LJ and the atomic-surface-area based non-polar solvation models are validated in the present work.

In the above non-polar solvation models, the solvent excluded surface area, atomic solvent excluded surface areas, and molecular enclosed volume are calculated by our in-house software ESES.50 

1. Functional group modeling

In our non-polar solvation model, we assume that each functional group of molecules admits the same set of optimal parameters. Similar approach has been successfully used in the literature,74 including ours.10,83 In this work, we further incorporate machine learning type of methods for the non-polar solvation free energy prediction. To this end, the whole dataset, which contains 668 molecules, is partitioned into training sets and testing sets. In the leave-one-out test case, in each step we only leave one molecule out as the test set, and all the other molecules are regarded as the training set. In a specific leave-SAMPLx-out study, all the SAMPLx molecules are left as the testing set, while the remaining molecules are treated as the training set. Furthermore, the training set in this scenario is also divided into two parts: (i) the mono-functional group molecules, in which molecules of a specific functional group are used to train a set of parameters for all similar molecules and (ii) the poly-functional groups molecules in which scoring weights are used to weight the contributions of parameters of various involved functional groups.

Table I lists twenty functional groups used in the training set. We denote {T1, T2, …, Tn} as a given set of n molecules with a specific functional group from the above 20 groups. For a molecule indexed j, the solvation free energy calculated from the polar solvation free energy model, atomic surface areas and molecular volume are listed as

Δ G j p , Area j 1 , Area j 2 , , Area j N , V ol ,
(10)

where j = 1, 2, …, n.

In our approach, the solvation free energy is modeled as the sum of the polar and non-polar parts, thus for the jth molecule, the corresponding modeled solvation free energy can be expressed as

Δ G j = Δ G j p + α = 1 N γ α Area α + p V ol G j ( P ) ,
(11)

where P≐{γ1, γ2, …, γN, p} is the set of parameters to be optimized.

For a given set of molecules with the same mono-func tional group, we denoted Δ G ( P ) G 1 ( P ) , G 2 ( P ) , , G n ( P ) , and the associated experimental solvation free energy is denoted as Δ G Exp Δ G 1 Exp , Δ G 2 Exp , , Δ G n Exp . Then the optimal parameter set can be obtained by solving the following Tikhonov regularized least square problem (also known as ridge regression) which has a closed form solution:

argmin P Δ G ( P ) Δ G Exp 2 + λ P 2 ,
(12)

where ‖∗‖2 is the L2 norm of the quantity ∗. Here λ is the regularization parameter chosen to be a large number, such as 100, in the present work to ensure the dominance of the first term and avoid over-fitting through controlling the magnitude of ‖P2.

TABLE I.

Functional groups and corresponding number of molecules used in the classification.

Group Number Group Number Group Number Group Number
Alkynyl  Alkenyl  38  Aldehyde group  11  Nitrile group 
Carboxyl  Ester group  34  Ketone  23  Amino  35 
Nitro  Alcoholic hydroxy  33  Phenolic hydroxyl  16  Ether  22 
Alkane  38  Aromatics  33  Nitrogen heterocyclic  19  Chlorinated hydrocarbon  53 
Nitrate  Amid  Thiol  Thioether 
Group Number Group Number Group Number Group Number
Alkynyl  Alkenyl  38  Aldehyde group  11  Nitrile group 
Carboxyl  Ester group  34  Ketone  23  Amino  35 
Nitro  Alcoholic hydroxy  33  Phenolic hydroxyl  16  Ether  22 
Alkane  38  Aromatics  33  Nitrogen heterocyclic  19  Chlorinated hydrocarbon  53 
Nitrate  Amid  Thiol  Thioether 

2. Functional group scoring

Most molecules in the SAMPL blind test sets involve poly-functional groups. In this case, we further employ the poly-functional group molecules in the training set for training the optimal relative scoring weights between different functional groups. According to the relative scoring weights, the scoring weights between all the functional groups can be obtained through a simple normalization procedure. We denote T ̃ 1 , T ̃ 2 , , T ̃ n as a given set of poly-functional group molecules that has the same functional groups in the training set. The associated optimized parameter sets are P1, P2, …, Pm, where m is the number of functional groups in this set, and each Pi, i = 1, 2, …, m is learned through solving the regularized optimization problem given by Eq. (12). For the jth molecule in this poly-functional group set, we model its solvation free energy as

Δ G ̄ j ( ω ) = i = 1 m ω i Δ G j ( P i ) ,
(13)

where ω 1 i = 1 m ω i = 1 , with ωi being the scoring weight of ith functional group.

The relative scoring weights for the m functional groups associated to this set of poly-functional groups molecules can be learned via solving the following constraint optimization problem,

argmin ω Δ G ̃ ( ω ) Δ G ̃ Exp 2 ,
(14)

subject to

ω 1 = 1
(15)

and

ω i 0 , i = 1 , 2 , , m ,
(16)

where Δ G ̃ ( ω ) and Δ G ̃ Exp represent, respectively, the predicted and experimental solvation free energies for this group of poly-functional group molecules. Since both the optimization object given by Eq. (14) and the constraint conditions in Eqs. (15) and (16) are convex with respect to the scoring weights ω, the above constrained optimization can be easily solved via a convex optimization solver in the CVX software package.32,33

In the rest of this section, we provide an example to illustrate the procedure of functional group based approach for solvation free energy prediction. As shown in Fig. 1, the target molecule 2-chlorosyringaldehyde (Pubchem ID: 53479) contains four different functional groups. We need to find scoring weights for these functional groups. Note that in the above molecular searching scheme, pairwisely, the relative weights for each two functional groups can be determined by solving the constrained optimization problem in Eqs. (14) and (15) for molecules in the two corresponding functional groups. According to the pairwise relative weights, the functional group scoring in the target molecule can be achieved.

FIG. 1.

An illustration of the functional group scoring method for the prediction of the solvation free energy for molecule 2-chlorosyringaldehyde (Pubchem ID: 53479), which contains four functional groups: aldehyde group, phenolic hydroxyl, ether, and chlorinated hydrocarbon. We first compute relative weights between phenolic hydroxyl and chlorinated hydrocarbon; phenolic hydroxyl and ether; phenolic hydroxyl and ester group; and ester group and aldehyde group. Then relative weights are combined to generate the full set of weights ω1, ω2, ω3, and ω4 for solvation free energy prediction.

FIG. 1.

An illustration of the functional group scoring method for the prediction of the solvation free energy for molecule 2-chlorosyringaldehyde (Pubchem ID: 53479), which contains four functional groups: aldehyde group, phenolic hydroxyl, ether, and chlorinated hydrocarbon. We first compute relative weights between phenolic hydroxyl and chlorinated hydrocarbon; phenolic hydroxyl and ether; phenolic hydroxyl and ester group; and ester group and aldehyde group. Then relative weights are combined to generate the full set of weights ω1, ω2, ω3, and ω4 for solvation free energy prediction.

Close modal

The above functional group based model is tested to be able to provide very accurate solvation free energy prediction provided that for each test molecule, parameters for all of the involved mono-functional groups can be determined, and a set of molecules with the same ploy-functional groups can be found in the training set as well. However, for some complex molecules that contain functional groups beyond those listed in Table I, this method fails. For instance, considering a SAMPL1 test molecule, thifensulfuron, as shown in Fig. 2, it has a very complex structure and contains functional groups cannot be found in Table I. We therefore propose the following ranking algorithm.

FIG. 2.

Thifensulfuron (Pubchem ID: 91729).

FIG. 2.

Thifensulfuron (Pubchem ID: 91729).

Close modal

1. Atomic feature based molecular ranking

To deal with general and complex molecules, we propose another approach to rank molecules and develop a nearest neighbor approach for non-polar solvation free energy prediction. Our molecular ranking is based on atomic features. Our basic assumption is that if two solute molecules have similar atomic features, their chemical properties are also similar. Therefore, these molecules should share the same parameter set for the non-polar solvation free energy model. This approach is further described below.

Our method is based on the ansatz that molecules chemical properties are mainly determined by atomic features, including structural features and atomic electrostatic features. Among them, atomic structural features include atomic type, atom hybridization state, and bonding information. Atomic electrostatic features include atomic charge, atomic dipole, atomic quadrupole, and atomic electrostatic solvation free energy. Atomic features are selected based the criteria that a given atomic feature is retained if it can effectively discriminate the previously mentioned mono-functional groups listed in Table I. Table II lists all the atomic features that are used for molecule ranking.

TABLE II.

Atomic features used for ranking molecules.

Feature name Feature name
Number of atoms  Number of heavy atoms 
Number of hydrogen atoms  Number of single bonds 
Number of double bonds  Number of triple bonds 
Number of aromatic bonds  Number of each type of atoms 
Number of sp1 carbon atoms  Number of sp2 carbon atoms 
Number of sp3 carbon atoms  Number of sp1 nitrogen atoms 
Number of sp2 nitrogen atoms  Number of sp3 nitrogen atoms 
Number of sp2 oxygen atoms  Number of sp3 oxygen atoms 
Number of sp2 sulfur atoms  Number of sp3 sulfur atoms 
Maximum of atomic reaction field energy  Minimum of atomic reaction field energy 
Maximum reaction field energy of each type of atoms  Minimum reaction field energy of each type of atoms 
Maximum of atomic reaction field energy  Average reaction field energy of each type of atoms 
Total absolute charge  Total charge of each type of atoms 
Total absolute charge of each type of atoms  Maximum charge of each type of atoms 
Minimum charge of each type of atoms  The variation of each type of atomic charges 
Maximum of atomic dipole  Maximum of each atomic dipole 
Minimum of each atomic dipole  Variation of atomic dipole 
Maximum quadrupole  Maximum quadrupole of each type of atoms 
Variation of atomic quadrupole  Variation of each type of atom’s quadrupole 
Feature name Feature name
Number of atoms  Number of heavy atoms 
Number of hydrogen atoms  Number of single bonds 
Number of double bonds  Number of triple bonds 
Number of aromatic bonds  Number of each type of atoms 
Number of sp1 carbon atoms  Number of sp2 carbon atoms 
Number of sp3 carbon atoms  Number of sp1 nitrogen atoms 
Number of sp2 nitrogen atoms  Number of sp3 nitrogen atoms 
Number of sp2 oxygen atoms  Number of sp3 oxygen atoms 
Number of sp2 sulfur atoms  Number of sp3 sulfur atoms 
Maximum of atomic reaction field energy  Minimum of atomic reaction field energy 
Maximum reaction field energy of each type of atoms  Minimum reaction field energy of each type of atoms 
Maximum of atomic reaction field energy  Average reaction field energy of each type of atoms 
Total absolute charge  Total charge of each type of atoms 
Total absolute charge of each type of atoms  Maximum charge of each type of atoms 
Minimum charge of each type of atoms  The variation of each type of atomic charges 
Maximum of atomic dipole  Maximum of each atomic dipole 
Minimum of each atomic dipole  Variation of atomic dipole 
Maximum quadrupole  Maximum quadrupole of each type of atoms 
Variation of atomic quadrupole  Variation of each type of atom’s quadrupole 

We also construct atomic features by using the statistics variables, i.e., average, maximum, minimum, and variation of quantities in Table II. Finally, we rule out redundant features, where redundancy is due to the fact that some atomic features are 100% correlated with each other. In this case, only one of these highly correlated features is retained.

Atomic features are calculated by following methods.

  • Molecular structural information is parsed by the Open Babel software.54 

  • Atomic charge, atomic dipole, and atomic quadrupole are obtained via the distributed multipole analysis (DMA) method,77 in which the charge density is originally computed by the density function theory with B3LYP and 6-31G basis in the Gaussian quantum chemistry software.3,20,47

  • Atomic electrostatic solvation energy is calculated by our in-house MIBPB software.6,27,84

With the above selected atomic features, intuitively, we measure the similarity of molecules by the Pearson correlation coefficient of atomic feature vectors. Specifically, we denote the features of two molecules as vector X≐{x1, x2, …, xk} and vector Y≐{y1, y2, …, yk}, then their similarity is measured by

C X Y = | i = 1 k ( x i x ̄ ) ( y i y ̄ ) | i = 1 k ( x i x ̄ ) 2 i = 1 k ( y i y ̄ ) 2 ,

where x ̄ 1 k i = 1 k x i and y ̄ 1 k i = 1 k y i , and k is the dimension of the feature space. The higher the correlation between two atomic feature vectors indicates the more similarity between molecules.

2. Nearest neighbor solvation free energy prediction

By using the above nearest neighbor based nearest molecule searching procedure, for a given molecule, we obtain the similarity ranking of the remaining molecules with this specified one, as well as correlations between molecules. In the nearest neighbor prediction of the non-polar solvation free energy, we learn the parameters in the non-polar solvation free energy expression, Eq. (9), by a given number of nearest molecules found in the training set. The number of the nearest molecules used is based on the following principles:

  • All the molecules whose correlation with a given molecule is greater than 0.99 are used for the parameter learning for this given molecule;

  • If the above criterion yields less than 5 molecules, we use 5 nearest molecules for the parameter learning. Here the 5-molecule nearest neighbor method is found to provide the best leave-one-out prediction.

Models for the total solvation free energy are given in Eq. (1). We utilize a unified protocol for the prediction of solvation free energy. First, for the polar solvation free energy ΔGp, we select either the Poisson model with a given point charge force field or the polarizable Poisson model. Second, for the calculation of the non-polar solvation free energy, we note that in general, the functional group scoring approach can deliver better blind predictions than the nearest neighbor approach. Therefore, in our non-polar energy prediction step, the functional group scoring method is used whenever it works. Otherwise, we utilize the nearest neighbor approach. Our solvation prediction protocol is illustrated in a flowchart shown in Fig. 3.

FIG. 3.

A flowchart for the solvation free energy prediction protocol.

FIG. 3.

A flowchart for the solvation free energy prediction protocol.

Close modal

1. Datasets and force fields

We consider a total of 668 molecules, and the structures of these molecules are downloaded from the Pubchem project. The experimental solvation free energies of these molecules are collected from the literature. This dataset contains molecules from the Sampl blind solvation prediction projects, ranging from SAMPL0 to SAMPL4;25,26,34,35 and the remaining molecules in our dataset are collected from the literature.4,48,85 Coincidentally, there is a considerable overlap of our database with Mobley’s solvation database, which is available from http://mobleylab.org/resources.html. More precisely, 589 molecules in our database are already covered by Mobley’s. The detailed information of our dataset is provided in the supplementary material.

For both the standard Poisson model and the KSDFT based polarizable Poisson model, we consider four types of atomic radii, namely, Amber 6, Amber bondi, Amber mbondi2,5 and ZAP960 parametrizations. Additionally, for the standard Poisson model, three sets of charge assignments, namely, OpenEye-AM1-BCC v1 parameters,40 Gasteiger,24 and Mulliken,5 are tested. Our MIBPB solver6,27,84 is utilized to solve the Poisson interface problem to obtain electrostatic solvation energies. The probe radius is set to 1.4 Å in all PB calculations.

For the KSDFT based polarizable Poisson model, the Poisson interface problem is coupled with the generalized KSDFT in a self-consistent approach. The charge density used by Poisson interface problem is calculated in ab initio approach by the generalized KSDFT, here generalization due to the additional solute-solvent reaction field energy is included in the effective KS potential. The Poisson interface problem is employed for calculating the reaction field energy. This approach is modified from our earlier differential geometry based polarizable PB model.9 We iteratively couple the SIESTA75 and the MIBPB solver,6,27,84 in which a sharp solvent-solute interface is employed. The SIESTA software with additional reaction field energy is used for charge density calculation, and our in-house software is used for reaction field energy calculation. A uniform mesh size of 0.25 Å is used for solving the Poisson equation in both the standard Poisson and Polarizable Poisson model. In the polarizable Poisson model, the computed change densities are mapped to the uniform mesh with the conservation of the total charge.

For the molecules containing iodine atoms, the current level of DFT method used in this work, including the Gaussian software, cannot handle this element appropriately. Therefore, for a uniform comparison, we ignore molecules containing iodine atoms. There is a similar situation for bromine atoms — there is no appropriate pseudo-potential for this atom in the SIESTA software.75 Therefore, molecules involving bromine atoms are excluded in our KSDFT based polarizable PB calculations. Table III lists four molecules in SAMPLx that are not considered in some of our predictions.

TABLE III.

Molecules in SAMPLx sets involving bromine and/or iodine atoms.

Test set Molecule
SAMPL0  Benzyl bromide 
SAMPL1  Bromacil 
SAMPL2  5-bromouracil 
5-iodouracil 
SAMPL3  None 
SAMPL4  None 
Test set Molecule
SAMPL0  Benzyl bromide 
SAMPL1  Bromacil 
SAMPL2  5-bromouracil 
5-iodouracil 
SAMPL3  None 
SAMPL4  None 

For both the Poisson model and the polarizable Poisson model, the ESES software50 is used for solute molecular surface generation. The MIBPB software6,27,84 is used for solving the Poisson model. The polarizable Poisson model is solved by the self-consistent coupled MIBPB and SIESTA software.75 

2. Atomic surface area and molecular volume calculation

In our non-polar solvation free energy model, both atomic surface areas and surface enclosed volume are required. In a recently developed Eulerian solvent excluded surface (ESES) package, a second-order accurate numerical scheme for surface area calculations and a third-order accurate numerical scheme for volume estimations have been developed.50 A weighted Voronoi diagram algorithm is implemented to partition a molecular surface area into atomic surface areas. These schemes have been intensively validated by a large number of test examples. In this work, both atomic surface areas and molecular volume are calculated directly by using our ESES software package.50 

3. Validation of atomic surface area based non-polar model

In this work, instead of using the classical non-polar model that includes the van der Waals interaction between the solvent and solute non-polar interaction83 as shown in Eq. (7), we utilize the atomic surface area model given in Eq. (9) for the corresponding effects. In this part, we consider a set of numerical test to verify the validity of this treatment. Our numerical results indicate that the atomic surface area model provides results as good as those obtained by the van der Waals based non-polar model.

The van der Waals interaction Eq. (7) is modeled as a semi-continuous and semi-atomic potential. More specifically, we consider the 6-12 LJ potential for modeling the van der Waals interaction between the continuum solvent and atomistic solute molecules.83 The probe radius of the solvent in the LJ potential is set to be 1.4 Å, which is the same as that used in all other calculations. Table IV lists the root mean square (RMS) errors with four types of atomic radii and four charge parametrization methods for the prediction of SAMPL0’s solvation free energies. From these results, it is seen that for a given force field parametrization, the atomic surface area approach provides slightly more accurate non-polar solvation free energy prediction. As shown in our earlier work,83 the performance of the semi-continuous and semi-atomic LJ potential is sensitive to the choice of the probe radius, due to its nonlinear dependence on the radius, whereas the SES based atomic surface area approach is less sensitive to the probe radius. We conclude that in general the atomic surface area based non-polar model is at least as good as the LJ potential based one for modeling the non-polar solvation free energy. Therefore, the atomic surface area based non-polar model is employed in the rest of this paper.

TABLE IV.

The RMS errors of the solvation free energy prediction with atomic surface area and van der Waals interaction models of non-polar solvation free energy for SAMPL0 test set. The molecule in the SAMPL0 set that contains Br atom is excluded from this comparison. All results are in unit kcal/mol.

Non-polar model Radius AM1-BCC Mulliken Gasteiger SIESTA
Atomic surface area  Amber 6  1.30  1.27  1.23  0.99 
Amber Bondi  1.40  1.30  1.30  0.93 
Amber Mbondi2  1.41  1.34  1.33  1.10 
ZAP9  1.33  1.39  1.37  1.08 
van der Waals  Amber 6  1.42  1.31  1.34  0.93 
Amber Bondi  1.44  1.30  1.30  1.05 
Amber Mbondi2  1.45  1.38  1.30  1.18 
ZAP9  1.44  1.39  1.32  1.32 
Non-polar model Radius AM1-BCC Mulliken Gasteiger SIESTA
Atomic surface area  Amber 6  1.30  1.27  1.23  0.99 
Amber Bondi  1.40  1.30  1.30  0.93 
Amber Mbondi2  1.41  1.34  1.33  1.10 
ZAP9  1.33  1.39  1.37  1.08 
van der Waals  Amber 6  1.42  1.31  1.34  0.93 
Amber Bondi  1.44  1.30  1.30  1.05 
Amber Mbondi2  1.45  1.38  1.30  1.18 
ZAP9  1.44  1.39  1.32  1.32 

1. Leave-one-out prediction of 668 molecules

We first examine the proposed nearest neighbor approach by the leave-one-out test of 668 molecules. In this examination, we select one molecule at a time and use all other molecules as the training set to predict the selected one’s solvation free energy. This process is applied systematically to all the molecules in the whole dataset of 668 molecules. Four different atomic radius sets are considered, together with four different charge force fields. Our results are illustrated in Fig. 4, and the associated values are listed in Table V. In general, all radius sets and charge force fields perform similarly well. The maximum RMS error is below 1.7 kcal/mol for all methods over all molecules. More specifically, Bondi and Mbondi radii offer the best overall results. For charge force field, AM1-BBC charges appear to provide the best predictions and their RMS errors are less than 1.5 kcal/mol for all the four radius sets. The optimal result obtained with AM1-BBC charges and ZAP9 radii has an RMS error of 1.33 kcal/mol. Figure 5 plots the optimal results on the leave-one-out test.

FIG. 4.

The leave-one-out error of the whole training set of 668 molecules with 16 different charge and radius parametrizations. The nearest neighbor approach is employed for solvation free energy prediction.

FIG. 4.

The leave-one-out error of the whole training set of 668 molecules with 16 different charge and radius parametrizations. The nearest neighbor approach is employed for solvation free energy prediction.

Close modal
TABLE V.

The RMS errors of the leave-one-out test of the solvation free energy prediction with different methods. All results are in unit kcal/mol.

Radius AM1-BCC Mulliken Gasteiger SIESTA
Amber 6  1.47  1.49  1.65  1.65 
Amber Bondi  1.34  1.48  1.66  1.40 
Amber Mbondi2  1.33  1.49  1.68  1.42 
ZAP9  1.33  1.39  1.70  1.53 
Radius AM1-BCC Mulliken Gasteiger SIESTA
Amber 6  1.47  1.49  1.65  1.65 
Amber Bondi  1.34  1.48  1.66  1.40 
Amber Mbondi2  1.33  1.49  1.68  1.42 
ZAP9  1.33  1.39  1.70  1.53 
FIG. 5.

Correlations of optimal leave-one-out test results for 668 solute molecules. Left: Results parametrized with AM1-BCC charges and MBondi2 radii. Pearson correlation is 0.956 and R2 is 0.913. Right: Results parametrized with AM1-BCC charges and ZAP9. Pearson correlation is 0.955 and R2 is 0.911. In both cases, RMS errors are 1.33 kcal/mol.

FIG. 5.

Correlations of optimal leave-one-out test results for 668 solute molecules. Left: Results parametrized with AM1-BCC charges and MBondi2 radii. Pearson correlation is 0.956 and R2 is 0.913. Right: Results parametrized with AM1-BCC charges and ZAP9. Pearson correlation is 0.955 and R2 is 0.911. In both cases, RMS errors are 1.33 kcal/mol.

Close modal

For a set of 643 molecules, which largely overlaps with present dataset, Mobley and Guthrie reported an RMS error of 1.51 kcal/mol.57 Our results indicate that the present nearest neighbor approach can achieve highly accurate predictions for the solvation free energies for all the atomic radius and charge force fields.

2. SAMPLx blind predictions

In this section, we present results of the blind solvation free energy predictions based on the proposed protocol. All the SAMPL0-SAMPL4 challenges for solvation free energies are considered. We predict the solvation free energies with a leave-SAMPLx-out approach, in which the SAMPLx molecules and their experimental solvation free energies are regarded as unknown while information of other molecules is utilized to predict selected SAMPL test set, based on molecular formulas.

We first consider SAMPL0 test set, in which molecules are diverse. One molecule in this test set contains Br atom, for which our polarizable PB model does not an appropriate pseudo-potential. The structures of the SAMPL0 molecules can be found in the literature.60 Tables S1-S4 of the supplementary material present the solvation free energies for the SAMPL0 test set obtained from both experiments and blind predictions using different charges and atomic radii parametrizations. The left chart of Fig. 6 shows the plot of RMS errors for each charge and atomic radius combination. It is clear that the change in atomic radii has a minor influence on the accuracy of predictions, while the change in charge force fields has a much more significant influence. In other words, the solvation free energy prediction for this test set is more sensitive to the charge parametrization than the atomic radii parametrization. In general, our KSDFT based polarizable Poisson model provides better predictions than those of other charge force fields. It is noted that the SIASTA based polarizable Poisson model with Amber Bondi radius parameters delivers the best solvation free energy prediction. This result is depicted in the right chart of Fig. 6. The associated RMS error (0.93 kcal/mol) appears to be better than that in the literature60 (i.e., 1.71 kcal/mol for the full SAMPL0 test set of 17 molecules). The best prediction in the literature has an RMS error of 1.34 kcal/mol.42 Even though for the SIESTA based ab initio charge calculation, the molecule contains the Br atom is neglected, based on the general pattern that when any other force field used, the prediction for that omitted molecules is very well. We believe with proper pseudopotential for Br atom, our polarizable Poisson model can deliver an accurate prediction for this molecule also. Moreover, for this test set, all the other three charge parametrizations can provide a similar level of solvation free energy prediction.

FIG. 6.

The prediction results for the SAMPL0 blind test set. The left chart shows the RMS errors between the experimental and prediction solvation free energies. The right chart shows the optimal predictions of the solvation free energies for the SAMPL0 test set.

FIG. 6.

The prediction results for the SAMPL0 blind test set. The left chart shows the RMS errors between the experimental and prediction solvation free energies. The right chart shows the optimal predictions of the solvation free energies for the SAMPL0 test set.

Close modal

We next consider the SAMPL1 challenge set for solvation predictions.34 This set contains not only a largest number of 63 molecules, which is the largest among all the Sampl test sets, but also many molecules with extremely complicated structures. The detailed description of this set is given in the literature.34 Most molecules in this set are druggable and very complex. The difficulty of this test set comes from two aspects: on one hand, the structures of molecules in this set are very complicated. On the other hand, the knowledge in the training set that can be used for predicting the solvation free energies of these molecules is rare or insufficient. In addition to the molecular complexity, the reported experimental solvation free energies also admit large uncertainty.34 Most earlier computational predictions report RMS errors of 3–4 kcal/mol when some extremely complex molecules are excluded. Some best prediction for the whole set has an RMS error of 2.45 kcal/mol.42 The best performance was shown to give an RMS error of 2.4 kcal/mol on a subset of the SAMPL1 test set that contains only 56 molecules.52 Tables S5-S8 of the supplementary material present the solvation free energies for the SAMPL1 test set obtained from both experiments and blind predictions using different charges and atomic radii. Figure 7 plots the present blind predictions. One molecule (bromacil) that contains a Br atom is considered by all the charge models except for the polarizable PB method. It is noted that two of present approaches, namely, the AM1-BCC semi-empirical charges and the ab initio charge, provide relatively accurate predictions for this test set. When Amber Mondi2 atomic radii together with the SIESTA charge calculation is applied, the optimal solvation free energy prediction is achieved with RMS error 2.82 kcal/mol. When AM1-BCC charges and the Amber Bondi radii are used, the prediction for the whole SAMPL1 set without a molecule that contains Br atom has an RMS error of 3.06 kcal/mol, and the RMS error is 3.07 kcal/mol for the same test with all molecules. For this test set, the large prediction RMS error is mainly due to the extremely large error in predicting solvation free energies for some complex molecules, for which the RMS error can be as large as 15 kcal/mol. This unreasonable prediction is due to the fact that inappropriate molecule force field parametrization yields unreasonable electrostatic solvation free energies, which in turn leads to erroneous solvation free energy prediction. Similar to the SAMPL0 test set, the prediction is more sensitive to the charge parametrization. Nevertheless, the atomic radius parametrization also plays a critical role in the prediction accuracy for this test set.

FIG. 7.

The prediction results for the SAMPL1 blind test set, the left chart shows the RMS errors between the experimental and prediction solvation free energies. The right chart shows the optimal predictions of the solvation free energies for the SAMPL1 test set.

FIG. 7.

The prediction results for the SAMPL1 blind test set, the left chart shows the RMS errors between the experimental and prediction solvation free energies. The right chart shows the optimal predictions of the solvation free energies for the SAMPL1 test set.

Close modal

SAMPL2 is another difficult test set with almost the same level of difficulty as the SAMPL1 test set.44 Compared with SAMPL1 test set, it contains a few complex molecules, and most molecules in this test set are drug like ones as well. Contrary to molecules in SAMPL1 test set, this set has less uncertainty in the experimental solvation free energies. As shown in Tables S9-S12 of the supplementary material, the experimental solvation free energies of this test set distribute over a wide range. Using all-atom molecular dynamics simulations and multiple starting conformations for blind prediction, Klimovich and Mobley reported an RMS error of 2.82 kcal/mol over the whole set and 1.86 kcal/mol over all the molecules except several hydroxyl-rich compounds.44 Some best prediction has an RMS error of 1.59 kcal/mol.42 In the present work, the molecule containing an I atom (5-iodouracil) is excluded in all calculations. Additionally, 5-bromouracil has a Br atom which is excluded in the polarizable PB model. Tables S9-S12 of the supplementary material list the details of the present predictions. The RMS errors from various radius and charge force fields are given Fig. 8. Apparently, this test set has a strong force field dependence as well. The RMS errors vary very much from one charge force field to another. However, the performance of these predictions has a weak radius dependence. The best prediction, obtained from a combination of Amber MBondi2 radius parameters and AM1-BCC charge force field, or Amber Bondi radius together with modified SIESTA charge, both have the RMS error of 1.90 kcal/mol when the molecule with a Br atom is excluded in the prediction. When all molecules are included, the combination of Amber MBondi2 radius and AM1-BCC charge parametrization gives the optimal prediction of RMS error 1.96 kcal/mol as depicted in the right chart of Fig. 8. Compared to the prediction of SAMPL1 test set, these predictions are more accurate, due to two reasons. First, the molecules in this set are slightly simpler and experimental uncertainly is less severe for the deterministic prediction. Second, in our parametrization of the non-polar solvation free energy prediction, we have more similar molecules in training set, which enables us to obtain better non-polar solvation free energy parameters in the nearest neighbor based approach. In contrast, in the SAMPL1 prediction, when the nearest neighbor approach is applied, the nearest molecules selected from the training set differ much from SAMPL1 molecules.

FIG. 8.

The prediction results for the SAMPL2 blind test set; the left chart shows the RMS errors between the experimental and prediction solvation free energies. The right chart shows the optimal predictions of the solvation free energies for the SAMPL2 test set.

FIG. 8.

The prediction results for the SAMPL2 blind test set; the left chart shows the RMS errors between the experimental and prediction solvation free energies. The right chart shows the optimal predictions of the solvation free energies for the SAMPL2 test set.

Close modal

SAMPL3 test set is a relatively easy one with 36 solute molecules.25 Its molecular structures are less versatile than the earlier test sets. Additionally, in this test set, there is no molecule that involves Br or I atom. One of difficulties in the prediction of this set is the lack of similar molecules in our database when all the SAMPL3 molecules are left out. The best prediction in the literature offers an RMS error of 1.29 kcal/mol.42 Our blind predictions and experimental results for SAMPL3 test set are given in Tables S13-S16 of the supplementary material. The RMS errors of our blind predictions for the SAMPL3 test set are plotted in Fig. 9. In this case, the accuracy of predictions also depends strongly on charge force fields and weakly on radius parameters. In general, Gasteiger charges are superb for this test set and their RMS errors are always smaller than 1 kcal/mol. Our best prediction, obtained from the combination of ZAP9 radius parameters and Gasteiger charge force field, has a small RMS error of 0.78 kcal/mol and is depicted in the right chart of Fig. 9. In general Gasteiger charge performs very poor in the solvation free energy prediction for previous tests on complex molecules, while we see that this charge parametrization method is superior for chlorinated hydrocarbon molecules in the present test set. Unfortunately, there is no uniformly optimal parametrization for all the molecules. This fact motivates us to seek a solvation free energy prediction that does not heavily depend on the force fields parametrization.

FIG. 9.

The prediction results for the SAMPL3 blind test set; the left chart shows the RMS errors between the experimental and prediction solvation free energies. The right chart shows the optimal predictions of the solvation free energies for the SAMPL3 test set.

FIG. 9.

The prediction results for the SAMPL3 blind test set; the left chart shows the RMS errors between the experimental and prediction solvation free energies. The right chart shows the optimal predictions of the solvation free energies for the SAMPL3 test set.

Close modal

SAMPL4 test set is studied by using the proposed methods as well. A key feature of this test set is that its molecules are diversified. As shown in Tables S17-S20 of the supplementary material, this test set also involves a wide range of solvation free energies. However, the structures of these molecules are not as complex as those in both SAMPL1 and SAMPL2 test sets, which indicates a slightly easier task for solvation free energy prediction. This test set was studied by a number of researchers in the literature and the best prediction in the literature has the RMS error of 1.2 kcal/mol.59 Figure 10 illustrates the RMS errors of our predictions for a total of 16 charge and radius combinations. Compared with those in the literature, all of our predictions are of high quality. Our best result has the RMS error of 1.03 kcal/mol and the corresponding results are given in the right chart of Fig. 10.

FIG. 10.

The prediction results for the SAMPL4 blind test set, the left chart shows the RMS errors between the experimental and prediction solvation free energies. The right chart shows the optimal predictions of the solvation free energies for the SAMPL4 test set.

FIG. 10.

The prediction results for the SAMPL4 blind test set, the left chart shows the RMS errors between the experimental and prediction solvation free energies. The right chart shows the optimal predictions of the solvation free energies for the SAMPL4 test set.

Close modal

Table VI lists the root mean square errors (RMSEs), mean errors (MEs), Pearson correlations (Rs), and P-values (Ps) of our blind predictions of solvation free energies for all five test sets using a total of 16 charge and radius implementations. Some comments are in order. First, all predictions are quite sensitive to charge force fields and relatively less dependent on radius parameter selection. For SAMPL0 test set, although predictions with the AM1-BCC charge set have similar correlations as those obtained with other charge force fields, their MEs are significantly larger. However, AM1-BCC charge force field does a relatively better job in SAMPL1 and SAMPL2. Additionally, it is difficult to identify a clear winner over all the test sets — some approaches perform better in one or two test sets, but do not do well in the rest test sets. This phenomenon highlights the difficulty of designing optimal models for solvation analysis. Moreover, the KSDFT based polarizable Poisson model is more often to provide better predictions over most charge force fields and radius parameters. This indicates a potential to develop better physical models by improving the quantum charge density calculation. Further, the Pearson correlation becomes an interesting measure in SAMPL3 test set. Results obtained with AM1-BCC and Mulliken charge sets have particularly low correlations, although their RMSEs are quite good. Results for SAMPL1 and SAMPL4 test sets are quite consistent in terms of their RMSE, ME, and correlations. Furthermore, our P-value analysis indicates that most results very reliable, except for cases when the AM1-BCC and Mulliken charge forces fields are applied to SAMPL3. However, all P-values are smaller than the standard cutoff, 0.05. Finally, we point out that the blind prediction results presented in the present work are the state-of-the-art compared to those in the literature.21,25,26,34,35,42,55,56,58–60,68,69

TABLE VI.

The RMSE, mean error (ME), Pearson correlation (R) and P-value (P) of solvation free energy predictions with different methods. Values inside parentheses are for all molecules, while those outside parentheses are for molecules without Br atoms. All RMSEs and MEs are in unit kcal/mol. The boldface highlights best results.

AM1-BCC Mulliken Gasteiger SIESTA
Test set Radius RMSE ME R P RMSE ME R P RMSE ME R P RMSE ME R P
SAMPL0  Amber 6  1.30 (1.26)  0.78 (0.76)  0.96 (0.96)  7.97 × 10−9 (1.67 × 10−9 1.27 (1.25)  −0.18 (−0.12)  0.95 (0.94)  6.91 × 10−8 (1.78 × 10−8 1.23 (1.20 −0.16 (−0.13)  0.98 (0.94)  5.99 × 10−8 (1.45 × 10−8 0.99 (NA)  0.13 (NA)  0.96 (NA)  3.79 × 10−9 (NA) 
  Amber Bondi  1.40 (1.37)  0.88 (0.86)  0.95 (0.95)  1.70 × 10−8 (3.75 × 10−9 1.30 (1.27)  −0.26 (−0.20)  0.94 (0.94)  8.10 × 10−8 (2.14 × 10−8 1.30 (1.27)  −0.27 (−0.24)  0.94 (0.94)  1.05 × 10−7 (2.64 × 10−8 0.93 (NA)  −0.12 (NA)  0.94 (NA)  1.71 × 10−9 (NA) 
  Amber Mbondi2  1.41 (1.37)  0.91 (0.88)  0.95 (0.96)  1.15 × 10−7 (2.48 × 10−8 1.34 (1.32)  0.27 (0.21)  0.94 (0.94)  1.05 × 10−7 (2.78 × 10−8 1.33 (1.29)  −0.28 (−0.25)  0.93 (0.94)  1.28 × 10−7 (3.25 × 10−8 1.10 (NA)  −0.17 (NA)  0.95 (NA)  9.51 × 10−9 (NA) 
  ZAP9  1.33 (1.29)  0.44 (0.44)  0.96 (0.96)  4.09 × 10−9 (9.93 × 10−10 1.39 (1.37)  0.00 (0.05)  0.94 (0.94)  8.47 × 10−8 (2.10 × 10−8 1.37 (1.33)  0.11 (0.12)  0.94 (0.94)  5.96 × 10−8 (1.54 × 10−8 1.08 (NA)  −0.21 (NA)  0.96 (NA)  4.89 × 10−9 (NA) 
SAMPL1  Amber 6  3.26 (3.27)  0.83 (0.88)  0.74 (0.74)  5.23 × 10−12 (3.18 × 10−12 4.74 (4.77)  −2.22 (−2.28)  0.74 (0.75)  3.87 × 10−12 (2.12 × 10−12 4.92 (4.96)  −0.93 (−1.02)  0.69 (0.69)  6.40 × 10−10 (4.36 × 10−10 2.92 (NA)  −0.52 (NA)  0.81 (NA)  1.93 × 10−15 (NA) 
  Amber Bondi  3.06 (3.07 0.96 (0.99)  0.79 (0.79)  2.61 × 10−14 (1.39 × 10−14 4.65 (4.68)  −2.22 (−2.28)  0.78 (0.78)  5.81 × 10−14 (3.35 × 10−14 5.52 (5.55)  −1.16 (−1.26)  0.67 (0.68)  1.81 × 10−9 (1.21 × 10−9 2.89 (NA)  −1.15 (NA)  0.88 (NA)  9.76 × 10−21 (NA) 
  Amber Mbondi2  3.29 (3.30)  1.17 (1.22)  0.82 (0.82)  2.40 × 10−16 (1.45 × 10−16 5.39 (5.41)  −2.32 (−2.39)  0.80 (0.80)  7.07 × 10−15 (3.82 × 10−15 4.76 (4.82)  −0.56 (−0.67)  0.60 (0.60)  2.64 × 10−7 (2.07 × 10−7 2.82 (NA)  −1.05 (NA)  0.87 (NA)  3.72 × 10−20 (NA) 
  ZAP9  4.26 (4.35)  1.26 (1.30)  0.81 (0.81)  2.52 × 10−15 (1.29 × 10−15 6.16 (6.16)  −3.01 (−3.06)  0.73 (0.73)  1.28 × 10−11 (7.52 × 10−12 5.33 (5.45)  −0.66 (−0.82)  0.64 (0.64)  2.61 × 10−8 (2.26 × 10−8 3.76 (NA)  0.31 (NA)  0.82 (NA)  5.14 × 10−16 (NA) 
SAMPL2  Amber 6  2.09 (2.11)  −0.59 (−0.65)  0.95 (0.95)  1.34 × 10−15 (3.90 × 10−15 3.51 (3.59)  1.56 (1.65)  0.86 (0.88)  2.93 × 10−9 (1.59 × 10−9 4.78 (4.86)  2.51 (2.65)  0.77 (0.78)  1.68 × 10−6 (1.01 × 10−6 3.46 (NA)  1.18 (NA)  0.85 (NA)  8.10 × 10−9 (NA) 
  Amber Bondi  1.95 (1.97)  −0.17 (−0.26)  0.96 (0.96)  9.32 × 10−16 (4.36 × 10−16 3.38 (3.47)  1.55 (1.69)  0.88 (0.90)  8.77 × 10−10 (4.92 × 10−10 4.62 (4.72)  2.45 (2.61)  0.79 (0.80)  6.77 × 10−7 (4.29 × 10−7 1.90 (NA)  0.87 (NA)  0.96 (NA)  1.55 × 10−16 (NA) 
  Amber Mbondi2  1.90 (1.96 −0.36 (−0.46)  0.96 (0.96)  9.87 × 10−16 (5.99 × 10−16 3.55 (3.66)  2.46 (2.62)  0.86 (0.89)  7.96 × 10−7 (5.59 × 10−7 4.65 (4.76)  1.64 (1.79)  0.78 (0.80)  2.90 × 10−9 (1.91 × 10−9 2.35 (NA)  1.21 (NA)  0.95 (NA)  1.17 × 10−14 (NA) 
  ZAP9  2.05 (2.03)  −0.37 (−0.41)  0.95 (0.95)  1.18 × 10−14 (2.40 × 10−15 3.19 (3.15)  2.36 (2.37)  0.89 (0.91)  6.43 × 10−7 (2.06 × 10−7 4.56 (4.51)  1.52 (1.45)  0.79 (0.81)  1.50 × 10−10 (3.88 × 10−11 1.93 (NA)  1.13 (NA)  0.97 (NA)  2.91 × 10−17 (NA) 
SAMPL3  Amber 6  1.28  0.38  0.47  3.55 × 10−3  1.42  0.72  0.51  1.67 × 10−3  0.97  −0.16  0.86  1.85 × 10−11  1.08  0.34  0.81  2.98 × 10−9 
  Amber Bondi  1.47  −0.56  0.47  3.56 × 10−3  1.58  0.85  0.51  1.65 × 10−3  0.82  −0.09  0.86  1.89 × 10−11  1.16  0.45  0.81  2.95 × 10−9 
  Amber Mbondi2  1.47  −0.56  0.37  2.43 × 10−2  1.58  0.85  0.39  2.02 × 10−2  0.82  −0.09  0.86  2.28 × 10−11  1.16  0.39  0.70  1.50 × 10−6 
  ZAP9  1.55  −0.68  0.33  4.93 × 10−2  1.28  −0.37  0.59  1.55 × 10−4  0.78  0.31  0.89  2.98 × 10−13  1.33  −0.59  0.50  1.84 × 10−3 
SAMPL4  Amber 6  1.28  −0.14  0.97  3.28 × 10−30  1.20  0.11  0.95  7.77 × 10−25  1.08  0.18  0.96  3.42 × 10−27  1.41  0.15  0.96  8.09 × 10−26 
  Amber Bondi  1.12  0.06  0.96  3.53 × 10−27  1.41  0.31  0.94  5.08 × 10−22  1.10  0.33  0.96  2.95 × 10−27  1.07  0.10  0.96  1.58 × 10−27 
  Amber Mbondi2  1.09  0.15  0.96  1.02 × 10−27  1.33  0.14  0.94  1.84 × 10−22  1.03  −0.08  0.96  4.64 × 10−26  1.04  0.00  0.96  8.87 × 10−28 
  ZAP9  1.12  −0.01  0.96  2.30 × 10−27  1.12  0.07  0.96  2.7 × 10−26  1.09  −0.04  0.96  1.41 × 10−26  1.32  0.36  0.95  1.20 × 10−23 
AM1-BCC Mulliken Gasteiger SIESTA
Test set Radius RMSE ME R P RMSE ME R P RMSE ME R P RMSE ME R P
SAMPL0  Amber 6  1.30 (1.26)  0.78 (0.76)  0.96 (0.96)  7.97 × 10−9 (1.67 × 10−9 1.27 (1.25)  −0.18 (−0.12)  0.95 (0.94)  6.91 × 10−8 (1.78 × 10−8 1.23 (1.20 −0.16 (−0.13)  0.98 (0.94)  5.99 × 10−8 (1.45 × 10−8 0.99 (NA)  0.13 (NA)  0.96 (NA)  3.79 × 10−9 (NA) 
  Amber Bondi  1.40 (1.37)  0.88 (0.86)  0.95 (0.95)  1.70 × 10−8 (3.75 × 10−9 1.30 (1.27)  −0.26 (−0.20)  0.94 (0.94)  8.10 × 10−8 (2.14 × 10−8 1.30 (1.27)  −0.27 (−0.24)  0.94 (0.94)  1.05 × 10−7 (2.64 × 10−8 0.93 (NA)  −0.12 (NA)  0.94 (NA)  1.71 × 10−9 (NA) 
  Amber Mbondi2  1.41 (1.37)  0.91 (0.88)  0.95 (0.96)  1.15 × 10−7 (2.48 × 10−8 1.34 (1.32)  0.27 (0.21)  0.94 (0.94)  1.05 × 10−7 (2.78 × 10−8 1.33 (1.29)  −0.28 (−0.25)  0.93 (0.94)  1.28 × 10−7 (3.25 × 10−8 1.10 (NA)  −0.17 (NA)  0.95 (NA)  9.51 × 10−9 (NA) 
  ZAP9  1.33 (1.29)  0.44 (0.44)  0.96 (0.96)  4.09 × 10−9 (9.93 × 10−10 1.39 (1.37)  0.00 (0.05)  0.94 (0.94)  8.47 × 10−8 (2.10 × 10−8 1.37 (1.33)  0.11 (0.12)  0.94 (0.94)  5.96 × 10−8 (1.54 × 10−8 1.08 (NA)  −0.21 (NA)  0.96 (NA)  4.89 × 10−9 (NA) 
SAMPL1  Amber 6  3.26 (3.27)  0.83 (0.88)  0.74 (0.74)  5.23 × 10−12 (3.18 × 10−12 4.74 (4.77)  −2.22 (−2.28)  0.74 (0.75)  3.87 × 10−12 (2.12 × 10−12 4.92 (4.96)  −0.93 (−1.02)  0.69 (0.69)  6.40 × 10−10 (4.36 × 10−10 2.92 (NA)  −0.52 (NA)  0.81 (NA)  1.93 × 10−15 (NA) 
  Amber Bondi  3.06 (3.07 0.96 (0.99)  0.79 (0.79)  2.61 × 10−14 (1.39 × 10−14 4.65 (4.68)  −2.22 (−2.28)  0.78 (0.78)  5.81 × 10−14 (3.35 × 10−14 5.52 (5.55)  −1.16 (−1.26)  0.67 (0.68)  1.81 × 10−9 (1.21 × 10−9 2.89 (NA)  −1.15 (NA)  0.88 (NA)  9.76 × 10−21 (NA) 
  Amber Mbondi2  3.29 (3.30)  1.17 (1.22)  0.82 (0.82)  2.40 × 10−16 (1.45 × 10−16 5.39 (5.41)  −2.32 (−2.39)  0.80 (0.80)  7.07 × 10−15 (3.82 × 10−15 4.76 (4.82)  −0.56 (−0.67)  0.60 (0.60)  2.64 × 10−7 (2.07 × 10−7 2.82 (NA)  −1.05 (NA)  0.87 (NA)  3.72 × 10−20 (NA) 
  ZAP9  4.26 (4.35)  1.26 (1.30)  0.81 (0.81)  2.52 × 10−15 (1.29 × 10−15 6.16 (6.16)  −3.01 (−3.06)  0.73 (0.73)  1.28 × 10−11 (7.52 × 10−12 5.33 (5.45)  −0.66 (−0.82)  0.64 (0.64)  2.61 × 10−8 (2.26 × 10−8 3.76 (NA)  0.31 (NA)  0.82 (NA)  5.14 × 10−16 (NA) 
SAMPL2  Amber 6  2.09 (2.11)  −0.59 (−0.65)  0.95 (0.95)  1.34 × 10−15 (3.90 × 10−15 3.51 (3.59)  1.56 (1.65)  0.86 (0.88)  2.93 × 10−9 (1.59 × 10−9 4.78 (4.86)  2.51 (2.65)  0.77 (0.78)  1.68 × 10−6 (1.01 × 10−6 3.46 (NA)  1.18 (NA)  0.85 (NA)  8.10 × 10−9 (NA) 
  Amber Bondi  1.95 (1.97)  −0.17 (−0.26)  0.96 (0.96)  9.32 × 10−16 (4.36 × 10−16 3.38 (3.47)  1.55 (1.69)  0.88 (0.90)  8.77 × 10−10 (4.92 × 10−10 4.62 (4.72)  2.45 (2.61)  0.79 (0.80)  6.77 × 10−7 (4.29 × 10−7 1.90 (NA)  0.87 (NA)  0.96 (NA)  1.55 × 10−16 (NA) 
  Amber Mbondi2  1.90 (1.96 −0.36 (−0.46)  0.96 (0.96)  9.87 × 10−16 (5.99 × 10−16 3.55 (3.66)  2.46 (2.62)  0.86 (0.89)  7.96 × 10−7 (5.59 × 10−7 4.65 (4.76)  1.64 (1.79)  0.78 (0.80)  2.90 × 10−9 (1.91 × 10−9 2.35 (NA)  1.21 (NA)  0.95 (NA)  1.17 × 10−14 (NA) 
  ZAP9  2.05 (2.03)  −0.37 (−0.41)  0.95 (0.95)  1.18 × 10−14 (2.40 × 10−15 3.19 (3.15)  2.36 (2.37)  0.89 (0.91)  6.43 × 10−7 (2.06 × 10−7 4.56 (4.51)  1.52 (1.45)  0.79 (0.81)  1.50 × 10−10 (3.88 × 10−11 1.93 (NA)  1.13 (NA)  0.97 (NA)  2.91 × 10−17 (NA) 
SAMPL3  Amber 6  1.28  0.38  0.47  3.55 × 10−3  1.42  0.72  0.51  1.67 × 10−3  0.97  −0.16  0.86  1.85 × 10−11  1.08  0.34  0.81  2.98 × 10−9 
  Amber Bondi  1.47  −0.56  0.47  3.56 × 10−3  1.58  0.85  0.51  1.65 × 10−3  0.82  −0.09  0.86  1.89 × 10−11  1.16  0.45  0.81  2.95 × 10−9 
  Amber Mbondi2  1.47  −0.56  0.37  2.43 × 10−2  1.58  0.85  0.39  2.02 × 10−2  0.82  −0.09  0.86  2.28 × 10−11  1.16  0.39  0.70  1.50 × 10−6 
  ZAP9  1.55  −0.68  0.33  4.93 × 10−2  1.28  −0.37  0.59  1.55 × 10−4  0.78  0.31  0.89  2.98 × 10−13  1.33  −0.59  0.50  1.84 × 10−3 
SAMPL4  Amber 6  1.28  −0.14  0.97  3.28 × 10−30  1.20  0.11  0.95  7.77 × 10−25  1.08  0.18  0.96  3.42 × 10−27  1.41  0.15  0.96  8.09 × 10−26 
  Amber Bondi  1.12  0.06  0.96  3.53 × 10−27  1.41  0.31  0.94  5.08 × 10−22  1.10  0.33  0.96  2.95 × 10−27  1.07  0.10  0.96  1.58 × 10−27 
  Amber Mbondi2  1.09  0.15  0.96  1.02 × 10−27  1.33  0.14  0.94  1.84 × 10−22  1.03  −0.08  0.96  4.64 × 10−26  1.04  0.00  0.96  8.87 × 10−28 
  ZAP9  1.12  −0.01  0.96  2.30 × 10−27  1.12  0.07  0.96  2.7 × 10−26  1.09  −0.04  0.96  1.41 × 10−26  1.32  0.36  0.95  1.20 × 10−23 

In this work, we propose a protocol to parametrize implicit solvent models for the blind prediction of solvation free energies. The proposed protocol parametrizes non-polar solvation model by scoring and ranking algorithms while utilizes either the Poisson model or a density functional theory (DFT) based polarizable Poisson model for polar solvation free energy calculations. For the parametrization of non-polar solvation models, we first utilized the assumption that molecules with the same functional group admit the same parametrization of the non-polar solvation energy functional. For complex poly-functional-group molecules, we develop a scoring procedure to determine the optimal relative weight of each functional group. For extremely complex molecules that fail the functional group scoring method, we further develop a molecule ranking algorithm to select an optimal set of nearest neighbor molecules for parameter training. We construct atomic features for the molecule ranking. Finally, we systematically integrate the above mentioned models and algorithms into a robust protocol for blind solvation free energy prediction.

In the present work, we considered an experimental database of 668 solvation molecules, see Table S21 in the supplementary material, the largest database ever constructed for solvation, to validate our approach. Among them, SAMPL0–SAMPL4 test sets are paid special attention. For the Poisson model or DFT based polarizable Poisson model, four sets of atomic radius parameters (i.e., Amber 6, Amber bondi, Amber mbondi, and ZAP9 radii) are combined with four sets of charge force fields (i.e., AM1-BCC, Mulliken, Gasteiger, and SIESTA DFT) to arrive at a total of 16 different implementations. The resulting polar solvation free energies are utilized in our parametrization for blind predictions. We first carry out the leave-one-out validation of the whole database. The AM1-BCC charge force field delivers a low root mean square (RMS) error of 1.33 kcal/mol, which is the lowest for such a large test database, to our best knowledge. We further conduct a series of leave-SAMPLx-out blind tests. On average, the AM1-BCC parametrization in the Poisson model and DFT based polarizable Poisson model performs better than other charge force fields, especially for predicting the solvation free energies of the complex molecules. The optimal RMS errors for SAMPL0-SAMPL4 are, respectively, 0.93, 2.82, 1.90, 0.78, and 1.03 kcal/mol.

From the solvation free energy predictions, particularly on SAMPL1 and SAMPL2 test sets, we conclude that atomic charge parametrization is extremely important for the present physical models, namely, the Poisson model or the KSDFT based polarizable Poisson model. Without an appropriate charge parametrization, the prediction errors can be amplified for molecules with complex structures. In general, for four charge parametrization methods, both the semi-empirical AM1-BCC charge and the ab initio charge calculation from the generalized KSDFT can provide relatively reliable charge assignments. For this reason, a solvation free energy prediction method that does not heavily depend on the molecule parametrization is under our consideration. Essential idea is that if one does not partition the solvation free energy into two isolated parts, the prediction errors in the polar solvation free energy will not propagate to the non-polar solvation free energy prediction. Alternative, our differential geometry based solvation model that dynamically couples polar and non-polar models7,89 might also provide a less sensitive approach.

This work can be improved in a number of ways. First, the classification of the database into functional groups is not unique. Future study will explore optimal molecules partition. Additionally, the selection and computation of atomic features need to be further investigated. It is possible to construct an optimal set of atomic features for solvation analysis and prediction. Further, in the current work, molecular ranking for nearest neighbor searching is not optimal yet. More sophisticated machine learning and/or deep learning algorithms can be developed for this purpose. Finally, a more versatile DFT solvers can be utilized to further improve our in-house polarizable Poisson model. These issues are under our consideration.

See supplementary material for specific predictions of SAMPL0-SAMPL4 test sets for all force field parametrizations and molecular name, PubChem ID, and experimental solvation free energies of all 668 molecules. Additionally, more detailed datasets with coordinates, radii, and charges for 668 molecules can be downloaded from website http://weilab.math.msu.edu/MIBPB/.

This work was supported in part by NSF Grant No. IIS-1302285 and MSU Center for Mathematical Molecular Biosciences Initiative. B.W. thanks Dr. Pengfei Li, Professor Ray Luo, Professor David Mobeley, and Professor Wei Yang for helpful discussions. G.-W.W. acknowledges Nathan Baker, Michael Gilson, and Weitao Yang for useful discussions.

1.
N. A.
Baker
, “
Improving implicit solvent simulations: A Poisson-centric view
,”
Curr. Opin. Struct. Biol.
15
(
2
),
137
143
(
2005
).
2.
D.
Bashford
and
D. A.
Case
, “
Generalized Born models of macromolecular solvation effects
,”
Annu. Rev. Phys. Chem.
51
,
129
152
(
2000
).
3.
A. D.
Becke
, “
Density-functional thermochemistry. III. The role of exact exchange
,”
J. Chem. Phys.
98
(
7
),
5648
(
1993
).
4.
S.
Cabani
,
P.
Gianni
,
V.
Mollica
, and
L.
Lepori
, “
Group contributions to the thermodynamic properties of non-ionic organic solutes in dilute aqueous solution
,”
J. Solution Chem.
10
(
8
),
563
595
(
1981
).
5.
D. A.
Case
,
J. T.
Berryman
,
R. M.
Betz
,
D. S.
Cerutti
,
T. E.
Cheatham
III
,
T. A.
Darden
,
R. E.
Duke
,
T. J.
Giese
,
H.
Gohlke
,
A. W.
Goetz
,
N.
Homeyer
,
S.
Izadi
,
P.
Janowski
,
J.
Kaus
,
A.
Kovalenko
,
T. S.
Lee
,
S.
LeGrand
,
P.
Li
,
T.
Luchko
,
R.
Luo
,
B.
Madej
,
K. M.
Merz
,
G.
Monard
,
P.
Needham
,
H.
Nguyen
,
H. T.
Nguyen
,
I.
Omelyan
,
A.
Onufriev
,
D. R.
Roe
,
A.
Roitberg
,
R.
Salomon-Ferrer
,
C. L.
Simmerling
,
W.
Smith
,
J.
Swails
,
R. C.
Walker
,
J.
Wang
,
R.
Wolf
,
X.
Wu
,
D. M.
York
, and
P. A.
Kollman
, Amber 2015, University of California, San Francisco, 2015.
6.
D.
Chen
,
Z.
Chen
,
C.
Chen
,
W. H.
Geng
, and
G. W.
Wei
, “
MIBPB: A software package for electrostatic analysis
,”
J. Comput. Chem.
32
,
657
670
(
2011
).
7.
Z.
Chen
,
N. A.
Baker
, and
G. W.
Wei
, “
Differential geometry based solvation models I: Eulerian formulation
,”
J. Comput. Phys.
229
,
8231
8258
(
2010
).
8.
Z.
Chen
,
N. A.
Baker
, and
G. W.
Wei
, “
Differential geometry based solvation models II: Lagrangian formulation
,”
J. Math. Biol.
63
,
1139
1200
(
2011
).
9.
Z.
Chen
and
G. W.
Wei
, “
Differential geometry based solvation models III: Quantum formulation
,”
J. Chem. Phys.
135
,
194108
(
2011
).
10.
Z.
Chen
,
S.
Zhao
,
J.
Chun
,
D. G.
Thomas
,
N. A.
Baker
,
P. B.
Bates
, and
G. W.
Wei
, “
Variational approach for non-polar solvation analysis
,”
J. Chem. Phys.
137
,
084101
(
2012
).
11.
N.
Choudhury
and
B. M.
Pettitt
, “
On the mechanism of hydrophobic association of nanoscopic solutes
,”
J. Am. Chem. Soc.
127
(
10
),
3556
3567
(
2005
).
12.
C. J.
Cramer
and
D. G.
Truhlar
, “
Implicit solvation models: Equilibria, structure, spectra, and dynamics
,”
Chem. Rev.
99
(
8
),
2161
2200
(
1999
).
13.
C. J.
Cramer
and
D. G.
Truhlar
, “
A universal approach to solvation modeling
,”
Acc. Chem. Res.
41
,
760
768
(
2008
).
14.
R.
Daudel
, “
Quantum theory of chemical reactivity
,” in
Quantum Theory of Chemical Reactivity
(
Springer
,
1973
).
15.
M. E.
Davis
and
J. A.
McCammon
, “
Electrostatics in biomolecular structure and dynamics
,”
Chem. Rev.
94
,
509
521
(
1990
).
16.
B. N.
Dominy
and
C. L.
Brooks
III
, “
Development of a generalized Born model parameterization for proteins and nucleic acids
,”
J. Phys. Chem. B
103
(
18
),
3765
3773
(
1999
).
17.
J.
Dzubiella
,
J. M. J.
Swanson
, and
J. A.
McCammon
, “
Coupling hydrophobicity, dispersion, and electrostatics in continuum solvent models
,”
Phys. Rev. Lett.
96
,
087802
(
2006
).
18.
B. M. E.
Cances
and
J.
Tomasi
, “
A new integral equation formalism for the polarizable continuum model: Theoretical background and applications to isotropic dielectrics
,”
J. Chem. Phys.
107
,
3032
(
1997
).
19.
M.
Feig
,
W.
Im
, and
C. L.
Brooks
III
, “
Implicit solvation based on generalized Born theory in different dielectric environments
,”
J. Chem. Phys.
120
(
2
),
903
911
(
2004
).
20.
M. J.
Frisch
,
G. W.
Trucks
,
H. B.
Schlegel
,
G. E.
Scuseria
,
M. A.
Robb
,
J. R.
Cheeseman
,
G.
Scalmani
,
V.
Barone
,
B.
Mennucci
,
G. A.
Petersson
,
H.
Nakatsuji
,
M.
Caricato
,
X.
Li
,
H. P.
Hratchian
,
A. F.
Izmaylov
,
J.
Bloino
,
G.
Zheng
,
J. L.
Sonnenberg
,
M.
Hada
,
M.
Ehara
,
K.
Toyota
,
R.
Fukuda
,
J.
Hasegawa
,
M.
Ishida
,
T.
Nakajima
,
Y.
Honda
,
O.
Kitao
,
H.
Nakai
,
T.
Vreven
,
J. A.
Montgomery
, Jr.
,
J. E.
Peralta
,
F.
Ogliaro
,
M.
Bearpark
,
J. J.
Heyd
,
E.
Brothers
,
K. N.
Kudin
,
V. N.
Staroverov
,
R.
Kobayashi
,
J.
Normand
,
K.
Raghavachari
,
A.
Rendell
,
J. C.
Burant
,
S. S.
Iyengar
,
J.
Tomasi
,
M.
Cossi
,
N.
Rega
,
J. M.
Millam
,
M.
Klene
,
J. E.
Knox
,
J. B.
Cross
,
V.
Bakken
,
C.
Adamo
,
J.
Jaramillo
,
R.
Gomperts
,
R. E.
Stratmann
,
O.
Yazyev
,
A. J.
Austin
,
R.
Cammi
,
C.
Pomelli
,
J. W.
Ochterski
,
R. L.
Martin
,
K.
Morokuma
,
V. G.
Zakrzewski
,
G. A.
Voth
,
P.
Salvador
,
J. J.
Dannenberg
,
S.
Dapprich
,
A. D.
Daniels
,
Farkas
,
J. B.
Foresman
,
J. V.
Ortiz
,
J.
Cioslowski
, and
D. J.
Fox
, gaussian 09, Revision E.01, Gaussian, Inc., Wallingford, CT, 2009.
21.
J.
Fu
,
Y.
Liu
, and
J.
Wu
, “
Fast prediction of hydration free energies for sampl4 blind test from a classical density functional theory
,”
J. Comput.-Aided Mol. Des.
28
,
299
304
(
2014
).
22.
E.
Gallicchio
and
R. M.
Levy
, “
AGBNP: An analytic implicit solvent model suitable for molecular dynamics simulations and high-resolution modeling
,”
J. Comput. Chem.
25
(
4
),
479
499
(
2004
).
23.
E.
Gallicchio
,
L. Y.
Zhang
, and
R. M.
Levy
, “
The SGB/NP hydration free energy model based on the surface generalized Born solvent reaction field and novel non-polar hydration free energy estimators
,”
J. Comput. Chem.
23
(
5
),
517
529
(
2002
).
24.
J.
Gasteiger
and
M.
Marsili
, “
Iterative partial equalization of orbital electronegativity—a rapid access to atomic charges
,”
Tetrahedron
36
,
3219
3228
(
1980
).
25.
M. T.
Geballe
and
J. P.
Guthrie
, “
The SAMPL3 blind prediction challenge: Transfer energy overview
,”
J. Comput.-Aided Mol. Des.
26
,
489
496
(
2012
).
26.
M. T.
Geballe
,
A. G.
Skillman
,
A.
Nicholls
,
J. P.
Guthrie
, and
J. P.
Taylor
, “
The SAMPL2 blind prediction challenge: Introduction and overview
,”
J. Comput.-Aided Mol. Des.
24
,
259
279
(
2010
).
27.
W.
Geng
,
S.
Yu
, and
G. W.
Wei
, “
Treatment of charge singularities in implicit solvent models
,”
J. Chem. Phys.
127
,
114106
(
2007
).
28.
G. M.
Giambasu
,
T.
Luchko
,
D.
Herschlag
,
D. M.
York
, and
D. A.
Case
, “
Ion counting from explicit-solvent simulations and 3d-RISM
,”
Biophys. J.
106
,
883
894
(
2014
).
29.
M. K.
Gilson
,
M. E.
Davis
,
B. A.
Luty
, and
J. A.
McCammon
, “
Computation of electrostatic forces on solvated molecules using the Poisson-Boltzmann equation
,”
J. Phys. Chem.
97
(
14
),
Phys. Chem. Chem. Phys.
3600
(
1993
).
30.
H.
Gohlke
and
D. A.
Case
, “
Converging free energy estimates: MM-PB(GB)SA studies on the protein-protein complex Ras-Raf
,”
J. Comput. Chem.
25
(
2
),
238
250
(
2004
).
31.
J. A.
Grant
,
B. T.
Pickup
,
M. T.
Sykes
,
C. A.
Kitchen
, and
A.
Nicholls
, “
The Gaussian generalized Born model: Application to small molecules
,”
Phys. Chem. Chem. Phys.
9
,
4913
4922
(
2007
).
32.
M.
Grant
and
S.
Boyd
, “
Graph implementations for nonsmooth convex programs
,” in
Recent Advances in Learning and Control
,
Lecture Notes in Control and Information Sciences
, edited by
V.
Blondel
,
S.
Boyd
, and
H.
Kimura
(
Springer-Verlag Limited
,
2008
), pp.
95
110
, http://stanford.edu/~boyd/graph_dcp.html.
33.
M.
Grant
and
S.
Boyd
, CVX: Matlab software for disciplined convex programming, version 2.1, http://cvxr.com/cvx, 2014.
34.
J. P.
Guthrie
, “
A blind challenge for computational solvation free energies: Introduction and overview
,”
J. Phys. Chem. B
113
,
4501
4507
(
2009
).
35.
J. P.
Guthrie
, “
SAMPL4, a blind challenge for computational solvation free energies: The compounds considered
,”
J. Comput.-Aided Mol. Des.
28
,
151
168
(
2014
).
36.
F.
Hirata
 et al, “
Molecular theory of solvation
,” in
Molecular Theory of Solvation
, edited by
F.
Hirata
(
Springer
,
2003
).
37.
B.
Honig
and
A.
Nicholls
, “
Classical electrostatics in biology and chemistry
,”
Science
268
(
5214
),
1144
1149
(
1995
).
38.
D. M.
Huang
and
D.
Chandler
, “
Temperature and length scale dependence of hydrophobic effects and their possible implications for protein folding
,”
Proc. Natl. Acad. Sci. U. S. A.
97
(
15
),
8324
8327
(
2000
).
39.
B. M.
Jacopo Tomasi
and
R.
Cammi
, “
Quantum mechanical contunuum solvation models
,”
Chem. Rev.
105
,
2999
3093
(
2005
).
40.
A.
Jakalian
,
B. L.
Bush
,
D. B.
Jack
, and
C. I.
Bayly
, “
Fast, efficient generation of high-quality atomic charges. AM1-BCC model: I. Method
,”
J. Comput. Chem.
21
(
2
),
132
146
(
2000
).
41.
B.
Jayaram
,
D.
Sprous
, and
D. L.
Beveridge
, “
Solvation free energy of biomacromolecules: Parameters for a modified generalized Born model consistent with the AMBER force field
,”
J. Phys. Chem. B
102
(
47
),
9571
9576
(
1998
).
42.
C. W.
Kehoe
,
C. J.
Fennell
, and
K. A.
Dill
, “
Testing the semi-explicit assembly solvation model in the SAMPL3 community blind test
,”
J. Comput.-Aided Mol. Des.
26
,
563
568
(
2012
).
43.
J. G.
Kirkwood
, “
Theory of solution of molecules containing widely separated charges with special application to zwitterions
,”
J. Comput. Phys.
7
,
351
361
(
1934
).
44.
P. V.
Klimovich
and
D. L.
Mobley
, “
Predicting hydration free energies using all-atom molecular dynamics simulations and multiple starting conformations
,”
J. Comput.-Aided Mol. Des.
24
,
307
316
(
2010
).
45.
P.
Koehl
, “
Electrostatics calculations: Latest methodological advances
,”
Curr. Opin. Struct. Biol.
16
(
2
),
142
151
(
2006
).
46.
M.
Kreevoy
and
D.
Truhlar
, in
Investigation of Rates and Mechanisms of Reactions, Part I
, edited by
C.
Bernasconi
(
Wiley
,
New York
,
1986
), p.
13
.
47.
C.
Lee
,
W.
Yang
, and
R. G.
Parr
, “
Development of the Colle-Salvetti correlation-energy formula into a functional of the electron density
,”
Phys. Rev. B
37
(
2
),
785
(
1988
).
48.
J.
Li
,
T.
Zhu
,
G. D.
Hawkins
,
P.
Winget
,
D. A.
Liotard
,
C. J.
Cramer
, and
D. G.
Truhlar
, “
Extension of the platform of applicability of the SM5.42R universal solvation model
,”
Theor. Chem. Acc.
103
,
9
63
(
1999
).
49.
J.-H.
Lin
,
N. A.
Baker
, and
J. A.
McCammon
, “
Bridging the implicit and explicit solvent approaches for membrane electrostatics
,”
Biophys. J.
83
(
3
),
1374
1379
(
2002
).
50.
B.
Liu
,
B.
Wang
,
R.
Zhao
,
Y.
Tong
, and
G.-W.
Wei
, “ESES: Software for Eulerian solvent excluded surface,” preprint (2016).
51.
K.
Lum
,
D.
Chandler
, and
J. D.
Weeks
, “
Hydrophobicity at small and large length scales
,”
J. Phys. Chem. B
103
(
22
),
4570
4577
(
1999
).
52.
A. V.
Marenich
,
C. J.
Cramer
, and
D. G.
Truhlar
, “
Performance of SM6, SM8, and SMD on the SAMPL1 test set for the prediction of small-molecule solvation free energies
,”
J. Phys. Chem. B
113
,
4538
4543
(
2009
).
53.
I.
Massova
and
P. A.
Kollman
, “
Combined molecular mechanical and continuum solvent approach (MM-PBSA/GBSA) to predict ligand binding
,”
Perspect. Drug Discovery Des.
18
(
1
),
113
135
(
2000
).
54.
P.
Mazzatorta
,
L.-A.
Tran
,
B.
Schilter
, and
M.
Grigorov
, “
Integration of structure-activity relationship and artificial intelligence systems to improve in silico prediction of Ames test mutagenicity
,”
J. Chem. Inf. Model.
47
(
1
),
34
38
(
2007
).
55.
D. L.
Mobley
,
C. I.
Bayly
,
M. D.
Cooper
, and
K. A.
Dill
, “
Predictions of hydration free energies from all-atom molecular dynamics simulations
,”
J. Phys. Chem. B.
13
,
4533
4537
(
2009
).
56.
D. L.
Mobley
,
K. A.
Dill
, and
J. D.
Chodera
, “
Treating entropy and conformational changes in implicit solvent simulations of small molecules
,”
J. Phys. Chem. B.
112
,
938
946
(
2008
).
57.
D. L.
Mobley
and
J. P.
Guthrie
, “
FreeSolv: A database of experimental and calculated hydration free energies, with input files
,”
J. Comput.-Aided Mol. Des.
28
,
711
720
(
2014
).
58.
D. L.
Mobley
,
S.
Liu
,
D. S.
Cerutti
,
W. C.
Swope
, and
J. E.
Rice
, “
Alchemical prediction of hydration free energies for SAMPL
,”
J. Comput.-Aided Mol. Des.
26
,
551
562
(
2012
).
59.
D. L.
Mobley
,
K. L.
Wymer
,
N. M.
Lim
, and
J. P.
Guthrie
, “
Blind prediction of solvation free energies from the SAMPL4 challenge
,”
J. Comput.-Aided Mol. Des.
28
,
135
150
(
2014
).
60.
A.
Nicholls
,
D. L.
Mobley
,
J. P.
Guthrie
,
J. D.
Chodera
,
C. I.
Bayly
,
M. D.
Cooper
, and
V. S.
Pande
, “
Predicting small-molecule solvation free energies: An informal blind test for computational chemistry
,”
J. Med. Chem.
51
,
769
799
(
2008
).
61.
M.
Orozco
and
F. J.
Luque
, “
Theoretical methods for the description of the solvent effect in biomolecular systems
,”
Chem. Rev.
100
,
4187
4225
(
2000
).
62.
I.
Park
,
Y. H.
Jang
,
S.
Hwang
, and
D. S.
Chung
, “
Poisson-Boltzmann continuum solvation models for nonaqueous solvents I. 1-Octanol
,”
Chem. Lett.
32
,
4
(
2003
).
63.
R. A.
Pierotti
, “
A scaled particle theory of aqueous and nonaqeous solutions
,”
Chem. Rev.
76
(
6
),
717
726
(
1976
).
64.
N. V.
Prabhu
,
P.
Zhu
, and
K. A.
Sharp
, “
Implementation and testing of stable, fast implicit solvation in molecular dynamics using the smooth-permittivity finite difference Poisson-Boltzmann method
,”
J. Comput. Chem.
25
(
16
),
2049
2064
(
2004
).
65.
R.
Ramirez
and
D.
Borgis
, “
Density functional theory of solvation and its relation to implicit solvent models
,”
J. Phys. Chem. B
109
,
6754
6763
(
2005
).
66.
E. L.
Ratkova
,
G. N.
Chuev
,
V. P.
Sergiievskyi
, and
M. V.
Fedorov
, “
An accurate prediction of hydration free energies by combination of molecular integral equations theory with structural descriptors
,”
J. Phys. Chem. B
114
,
12068
12079
(
2010
).
67.
C.
Reichardt
,
Solvents and Solvent Effects in Organic Chemistry
(
Wiley-VCH
,
New York
,
1990
).
68.
J.
Reinisch
and
A.
Klamt
, “
Prediction of free energies of hydration with cosmo-rs on the SAMPL4 data set
,”
J. Comput.-Aided Mol. Des.
28
,
169
173
(
2014
).
69.
R. C.
Rizzo
,
T.
Aynechi
,
D. A.
Case
, and
I. D.
Kuntz
, “
Estimation of absolute free energies of hydration using continuum methods: Accuracy of partial charge models and optimization of non-polar contributions
,”
J. Chem. Theory Comput.
2
,
128
139
(
2006
).
70.
W.
Rocchia
,
E.
Alexov
, and
B.
Honig
, “
Extending the applicability of the nonlinear Poisson-Boltzmann equation: Multiple dielectric constants and multivalent ions
,”
J. Phys. Chem.
105
,
6507
6514
(
2001
).
71.
B.
Roux
and
T.
Simonson
, “
Implicit solvent models
,”
Biophys. Chem.
78
(
1–2
),
1
20
(
1999
).
72.
A.
Savelyev
and
G. A.
Papoian
, “
Inter-DNA electrostatics from explicit solvent molecular dynamics simulations
,”
J. Am. Chem. Soc.
129
(
19
),
6060
6061
(
2007
).
73.
K. A.
Sharp
and
B.
Honig
, “
Calculating total electrostatic energies with the nonlinear Poisson-Boltzmann equation
,”
J. Phys. Chem.
94
,
7684
7692
(
1990
).
74.
D.
Shlvakumar
,
J.
Willams
,
Y.
Wu
,
W.
Damm
,
J.
Shelly
, and
W.
Sherman
, “
Prediction of absolute solvation free energies using molecular dynamics free energy perturbation and the OPLS force field
,”
J. Chem. Theory Comput.
6
(
5
),
1509
1519
(
2010
).
75.
J. M.
Soler
,
E.
Artacho
,
J. D.
Gale
,
A.
Garca
,
J.
Junquera
,
P.
Ordejn
, and
D.
Snchez-Portal
, “
The SIESTA method for ab-initio order-n materials simulation
,”
J. Phys.: Condens. Matter
14
,
2745
2779
(
2002
).
76.
F. H.
Stillinger
, “
Structure in aqueous solutions of non-polar solutes from the standpoint of scaled-particle theory
,”
J. Solution Chem.
2
,
141
158
(
1973
).
77.
A. J.
Stone
, “
Distributed multipole analysis, or how to describe a molecular charge distribution
,”
Chem. Phys. Lett.
83
(
2
),
233
239
(
1981
).
78.
J. W.
Storer
,
D. J.
Giesen
,
G. D.
Hawkins
,
G. C.
Lynch
,
C. J.
Cramer
,
D. G.
Truhlar
, and
D. A.
Liotard
, “
Solvation modeling in aqueous and nonaqueous solvent, new techniques and a reexamination of the claisen rearrangement
,” in
Structure, Energetics, and Reactivity in Aqueous Solution: Characterization of Chemical and Biological Systems
, edited by
C. J.
Cramer
and
D. G.
Truhlar
(
American Chemical Society
,
1994
), Vol.
568
, pp.
24
49
.
79.
J. M. J.
Swanson
,
R. H.
Henchman
, and
J. A.
McCammon
, “
Revisiting free energy calculations: A theoretical connection to MM/PBSA and direct calculation of the association free energy
,”
Biophys. J.
86
(
1
),
67
74
(
2004
).
80.
J. J.
Tan
,
W. Z.
Chen
, and
C. X.
Wang
, “
Investigating interactions between HIV-1 gp41 and inhibitors by molecular dynamics simulation and MM-PBSA/GBSA calculations
,”
J. Mol. Struct.: THEOCHEM
766
(
2–3
),
77
82
(
2006
).
81.
J.
Tomasi
and
M.
Persico
, “
Molecular interactions in solution: An overview of methods based on continuous distributions of the solvent
,”
Chem. Rev.
94
,
2027
2094
(
1994
).
82.
J. A.
Wagoner
and
N. A.
Baker
, “
Assessing implicit models for non-polar mean solvation forces: The importance of dispersion and volume terms
,”
Proc. Natl. Acad. Sci. U. S. A.
103
(
22
),
8331
8336
(
2006
).
83.
B.
Wang
and
G. W.
Wei
, “
Parameter optimization in differential geometry based solvation models
,”
J. Chem. Phys.
143
,
134119
(
2015
).
84.
B.
Wang
and
G.-W.
Wei
, “Accurate, robust and reliable calculations of Poisson-Boltzmann solvation and binding energies,” preprint (2016).
85.
J.
Wang
,
W.
Wang
,
S.
Huo
,
M.
Les
, and
P. A.
Kollman
, “
Solvation model based on weighted solvent accessible surface area
,”
J. Phys. Chem. B
105
,
5055
5067
(
2001
).
86.
M. L.
Wang
and
C. F.
Wong
, “
Calculation of solvation free energy from quantum mechanical charge density and continuum dielectric theory
,”
J. Phys. Chem. A
110
,
4873
4879
(
2006
).
87.
M. L.
Wang
,
C. F.
Wong
,
J. H.
Liu
, and
P. X.
Zhang
, “
Efficient quantum mechanical calculation of solvation free energies based on density functional theory, numerical atomic orbitals and Poisson-Boltzmann equation
,”
Chem. Phys. Lett.
442
,
464
467
(
2007
).
88.
A.
Warshel
and
A.
Papazyan
, “
Electrostatic effects in macromolecules: Fundamental concepts and practical modeling
,”
Curr. Opin. Struct. Biol.
8
(
2
),
211
217
(
1998
).
89.
G. W.
Wei
, “
Differential geometry based multiscale models
,”
Bull. Math. Biol.
72
,
1562
1622
(
2010
).

Supplementary Material