Grid Inhomogeneous Solvation Theory (GIST) has proven useful to calculate localized thermodynamic properties of water around a solute. Numerous studies have leveraged this information to enhance structure-based binding predictions. We have recently extended GIST toward chloroform as a solvent to allow the prediction of passive membrane permeability. Here, we further generalize the GIST algorithm toward all solvents that can be modeled as rigid molecules. This restriction is inherent to the method and is already present in the inhomogeneous solvation theory. Here, we show that our approach can be applied to various solvent molecules by comparing the results of GIST simulations with thermodynamic integration (TI) calculations and experimental results. Additionally, we analyze and compare a matrix consisting of 100 entries of ten different solvent molecules solvated within each other. We find that the GIST results are highly correlated with TI calculations as well as experiments. For some solvents, we find Pearson correlations of up to 0.99 to the true entropy, while others are affected by the first-order approximation more strongly. The enthalpy-entropy splitting provided by GIST allows us to extend a recently published approach, which estimates higher order entropies by a linear scaling of the first-order entropy, to solvents other than water. Furthermore, we investigate the convergence of GIST in different solvents. We conclude that our extension to GIST reliably calculates localized thermodynamic properties for different solvents and thereby significantly extends the applicability of this widely used method.
INTRODUCTION
The solvation free energy of a compound in different solvents is of key importance to many fields of chemistry because it determines the partition coefficient, i.e., the tendency of the solute to be found in one solvent or another.1–3 The octanol–water partition coefficient is a key descriptor of small molecules and is related to membrane permeability of drugs as well as their ADME (absorption, distribution, metabolism, excretion) properties.4 Many experimental techniques, such as reverse-phase liquid chromatography5 and parallel artificial membrane permeability (PAMPA),6,7 depend on the distribution of the analyte between phases of different polarities.
Several methods have been devised to describe the solvation free energy and the structure of water around a solute.8–12 In silico approaches include free energy calculations using Thermodynamic Integration (TI),13 Kirkwood-Buff theory,14,15 scaled particle theory,16,17 and methods based on density fluctuations.18,19
In the well-known 3D-RISM method,20 the solvent molecular density is approximated by separate atomic density distributions.21 It has been demonstrated that the molecular distribution can be approximately reconstructed from the atomic densities.21 3D-RISM can be used with different solvents22 as well as different salt strengths.23 While the accuracy of the original 3D-RISM method is limited,24 remarkable improvements have been made using so-called pressure corrections.23,25
In Inhomogeneous Solvation Theory (IST),26 the system coordinates are expressed relative to a central solute molecule. While RISM methods generally solve the occurring integrals in an iterative manner, IST-based methods generally approximate them using molecular densities obtained from MD simulation. While this greatly increases the calculation time, it has the advantage that the density distribution underlying the free energy calculation is physically accurate within the limitations of the force field.27
Various implementations are based on IST, among those WaterMap28,29 and GIST.30,31 Their main limitation is that usually only the two-body entropy is considered, neglecting all higher order terms. Attempts have been made32,33 to compute three-body entropies, but the high computational demand makes this impractical for routine calculations. However, Chen et al.34 have shown that the higher order terms in water can be approximated with a reasonable accuracy as −0.4 times the two-body entropy using a set of small molecules in water. This number is similar to literature results that correlate first and higher order terms in the conformational entropy of proteins.35,36
In theory, GIST could be used with any rigid solvent molecule. However, current implementations are strongly aimed toward water as the solvent. Nevertheless, they have contributed to tackle diverse research problems.30,31,37–45
Recently, we expanded our GIST implementation toward chloroform as a solvent.46 This can be highly beneficial, since the calculated transfer free energy can be used to estimate macrocycle passive membrane permeability.46 Experimentally, the hydrophobicity of a molecule is most often defined as the distribution between water and a hydrophobic solvent, traditionally octanol. We used chloroform because it can be modeled as rigid. In general, a measure for hydrophobicity such as the octanol–water partition coefficient is extremely beneficial in drug discovery.47,48 The importance of this quantity is shown by its inclusion in the SAMPL6 challenge.49 For small molecules, the calculation of these partition coefficients is usually rather simplistic but highly reliable.50–60 However, a physics-based model allows a more in-depth analysis of the partition coefficient. GIST is such a physics-based approach and, in theory, could be used for all rigid solvents.
Since the results from calculation with chloroform were promising,46,61 we here introduce a much more generalized version of the GIST algorithm, which allows all rigid solvent molecules to be analyzed. We apply our algorithm to compute cross-solvation free energies62 for a set of ten compounds, i.e., solvation free energies for each of the compounds dissolved in all the others. We show that our algorithm works as expected and produces highly accurate solvation free energies in several of the solvents. It also allows us to test whether the linear correlation between first and higher order terms observed for water34 also generalizes to other solvents. Furthermore, we show that the resulting molecular maps are valuable to analyze the systems in question.
METHODS
Systems
Figure 1 shows an overview of the solvent molecules used to test our extended and generalized GIST implementation. This dataset was presented in a publication by Kashefolgheta et al.62 with solid experimentally determined cross-solvation free energies. The cross-solvation matrix consists of the solvation free energies of each compound solvated in any of the other molecules. We chose ten molecules out of their dataset, which features a rigid backbone of heavy atoms. This is important because GIST treats each solvent molecule as a rigid body. Some of the molecules contain rotatable torsions around methyl groups. Throughout this work, we disregard the entropy of methyl torsions, assuming that it will not be strongly affected by the interaction with the solute.
Overview of the solvent molecules used to benchmark our algorithm. The systems are taken from the study by Kashefolgheta et al.62
Overview of the solvent molecules used to benchmark our algorithm. The systems are taken from the study by Kashefolgheta et al.62
GIST theory
Grid inhomogeneous solvation theory (GIST) as devised by Nguyen et al.30,31,41,63 is a method to calculate the free energy of solvation of a solute in a solvent based on the inhomogeneous fluid solvation theory (IST) of Lazaridis.26,64–67 This section aims to give a short overview of the basics of GIST.
GIST aims to compute the free energy of solvation according to Ben Naim’s standard process of solvation, which involves transferring the solute molecule from a fixed position in the gas phase to a fixed position in the solution.68,69 This is equal to the free energy of transferring a solute between gas and liquid phases of equal molarity, although the enthalpy and entropy contributions are different.26,68,69
GIST solves the spatial integrals appearing in IST by discretizing them onto a three-dimensional grid centered on the solute, thereby approximating them as sums over the grid voxels. The free energy of solvation is estimated from the difference between the mixture and the bulk solvent. The free energy of solvation of each grid voxel is split into an energetic and an entropic contribution,
where G represents the free energy, E is the energy contribution, T is the temperature, S is the entropy contribution, and rk represents the coordinates of voxel k.
The energy is computed from the force field. Traditionally, interactions in GIST are computed under the minimum image convention, although a particle mesh Ewald (PME)-based algorithm, which mirrors the energy of MD engines more closely, was published recently.34 The energy is further split into a solvent–solvent term and a solute–solvent term,
where the solute is denoted by u and the solvent by v.
Both terms are easily computed from the force field energy by averaging over all frames in the current voxel k. To avoid double-counting, only half of the solvent–solvent mean energy is added.
In GIST, previously only two-body solute–solvent entropy terms were used to approximate the entropic contribution while terms of higher order are neglected. Recent studies found that while the energy contribution to the GIST free energy of solvation compares quite favorably to thermodynamic integration (TI) calculations, the entropy contribution compares less favorably.34,37,40
Since GIST computes a free energy difference between the solution and the pure solvent rather than the total free energy of the system, terms that affect both states equally cancel out. For instance, the vibrations of the solvent have been demonstrated to be rather unaffected by small solutes such as methane.70
In the calculation, the first order (two-body) entropic contribution can either be split into a translation entropy and an orientation entropy of the solvent molecules with respect to the solute [Eq. (3)] or a combined entropy can be estimated from the six-dimensional (translation and orientation) distribution (see below). To bring the entropy term more in line with the TI results, Chen et al.34 suggest approximating the higher order entropy terms as −0.4 times the solvent–solute first order entropy. Interestingly, that factor is similar to that found by Polyansky et al.35 for the correlation between first and higher order terms in the conformational entropy of proteins. Equation (4) shows the correction for the entropy, with the higher order approximation already added,
The translational () and orientational first order entropy () are evaluated using a first nearest neighbor approach,71
where R is the ideal gas constant, γ is the Euler-Mascheroni constant that compensates an asymptotic bias in the entropy estimator, Nk is the total number of solvent molecules in voxel k, Nf is the total number of frames, ρ0 is the reference density of the bulk solvent, dtrans,i is the Cartesian distance of the solvent molecule i to its nearest neighbor in the same voxel, and Δωi is the angular distance of the solvent molecule i to its nearest neighbor in the same voxel rotationally. While the translational distance is straight-forwardly defined as the Euclidean distance, the angular distance [Eq. (7)] is defined as the distance between two quaternions (qi, qj) representing the orientations of solvent molecules,72,73
Here, |qi · qj| denotes the absolute value of the quaternion’s dot product. We note that this does not take the solvent symmetry into account, which would affect the entropy of the solution and that of the pure solvent equally and thereby cancel out in the GIST result.
In Eq. (6), the cumulative probability density of distances is approximated by Δω3/(6π). This probability density corresponds to the volume of a ball in the space of rotations. However, the exact volume is analytically given by . We updated our implementation to use this equation unless the user requests the old volume calculation.
Another approach to calculate the entropy contributions is the direct approximation of the combined six-dimensional (translation + orientation) integral [Eq. (8)]. For this approach, the six-dimensional distance between neighbors is needed. The straight-forward approach is to use the ℓ2-norm, i.e., , where d is the total, six-dimensional distance between the solvent molecules,
As with the orientational entropy, Eq. (8) uses an approximate volume term. In Eq. (8), the volume of a ball is approximated given the six-dimensional distance as
However, the true value for this volume is given by Eq. (10).74 The error of the approximation in Eq. (10) is below 5% if the rotational distance is below 1. However, it increases at larger distances.
This approximation is one reason for the entropy convergence issue discussed in a recent publication by Chen et al.34 Furthermore, the expectation value of dtrans is higher for larger solvents, which on average also results in a higher rotational distance to the nearest neighbor. The effect on the bulk entropy of toluene is shown in the supplementary material, Fig. S1, and the relative error of the approximate formula is shown in the supplementary material, Fig. S2.
Thus, we computed the exact sphere volume for distances from 0 to 10 Å with a spacing of 0.01 Å using the following equation:74
Here, dtrans and Δω are the integration variables, and l is an arbitrary (implementation-dependent) constant defining the distance that corresponds to 1 rad of rotational distance (1 Å in all current GIST implementations). The correction term is given in the following equation, and the resulting approximation is divided by the correction term at the correct position of d:
During the GIST calculation, fcorr(d) is evaluated through linear interpolation and is used to correct the sphere volume for the nearest neighbor entropy. This significantly reduces the error in the entropy, especially when the sampling is low.
Implementation
Since we used multiple different solvents in this work, the definition of the quaternion had to be consistent for all molecules. Additionally, a very general definition was used to ensure that this algorithm would work with rigid solvent molecules beyond the ones included here. In the original GIST implementation, the center of the molecule—used for the calculation of the translational entropy and the binning—is the oxygen atom. The quaternion of water is calculated as the rotation from the lab coordinate system onto a coordinate system defined by the two OH vectors. Since one of them is defined first, this results in the two hydrogen atoms being unique.
In this study, we instead use the center of mass (COM) to define the center of the molecule. The quaternion is constructed from a selection of three atoms, which should define a rigid substructure of the molecule. For example, if the molecule contains a rotatable methyl group but is otherwise rigid, it is preferable to define the molecular orientation via the heavy atoms rather than the hydrogens.
In our implementation, the user can choose three atoms to define this rigid substructure. Otherwise, three atoms are chosen automatically by searching for the atom with the most bonds, as well as two atoms that are bonded to it. If there is an atom in the molecule that has more than two bonds to heavy (non-hydrogen) atoms, we exclude hydrogens from the search, otherwise, we include them.
In this study, we defined the rigid substructure of methanol to be the C–O–H atoms, rather than the H–C–H substructure, which was chosen automatically. For all other atoms, we used the automatic selection algorithm.
These conditions ensure that for water the quaternion is built from the two O–H vectors, as in the original GIST implementation. However, this implementation assumes that the solvent orientation can be described as a rigid body and is badly suited for flexible solvents, due to the inherent inability of IST to deal with the entropy of internal degrees of freedom.
Furthermore, we improved the entropy algorithm using the correction for the distance in translation-rotation space defined by Eq. (11). This correction is negligible for small total distances between the solvent molecules. It becomes important for larger distances, i.e., lower sampling. Second, we improved the calculation of the nearest neighbor distance by extending the search toward voxels further away until the distance to the nearest neighbor is below the smallest distance to voxels that were not yet searched. Both these improvements lead to a better convergence of the entropy calculation and allow us to increase the integration cutoff for the final ΔG integrals. The effect of both these changes is shown in the supplementary material, Fig. S1.
Our updated cpptraj version has several additional flags to control the new behavior. The “nocom” flag specifies that the solvent position should be defined by a solvent atom (as in previous versions) rather than the center of mass. The “rigidatoms” flag allows the user to manually choose a rigid substructure of the molecule that defines the quaternion. The “oldnnvolume” flag specifies that the reference volume for the nearest-neighbor entropy should be calculated using the old equations rather than the updated ones. The “nnsearchlayers” flag specifies the maximum number of layers of voxels that should be used when searching for the nearest neighbor of a molecule. To avoid an unnecessary slowdown, the search stops early if it can be proven that the nearest neighbor was already found.
MD simulations
The structures of different molecules were generated with the molecular operating environment (MOE).75 The parameters were derived using an antechamber of AmberTools1976 and the GAFF277 force field. The partial charges were calculated with the RESP78 procedure and Gaussian 1679 using HF/6-31G*. The solute molecules were placed in different solvent boxes with 2700 solvent molecules using LeaP and an adaption of the shell script by Daniel Roe included in the AMBER18 suite. For water, the TIP3P80 model was used.
The graphics processing unit (GPU) implementation of the pmemd81 module of AMBER1876 was used for the restrained simulation on the GPU. To maintain a constant temperature of 300 K, we deployed a Langevin82 thermostat with a collision frequency of 2 ps−1. To keep constant atmospheric pressure, a Berendsen et al.83 barostat with a pressure relaxation time of 2 ps was used. Long-range electrostatics were treated with the particle-mesh Ewald method and a van der Waals cut-off of 8 Å was used. For all simulations, a time step of 1 fs was employed. All nonsolvent atom positions were restrained using a harmonic potential with a weight of 1000 kcal/mol Å2. All simulations, except those with aniline as a solvent, were run for 100 ns, collecting frames every 10 ps, resulting in 10 000 frames per simulation. For aniline as a solvent, simulations of 1 µs were performed, collecting frames every 100 ps, resulting in 10 000 frames per simulation. This is to ensure that the distance found in the nearest neighbor search is not artificially reduced by slow diffusion of the solvent, as shown in the section titled Results (subsection Convergence).
TI calculations
For TI calculations, 11 lambda windows were used per solvent–solute pair, resulting in a difference between lambda windows of 0.1. The simulation was not restrained, and softcore potentials84 were used with an α parameter of 0.5. Each lambda window was simulated for 10 ns, except for calculations involving aniline as a solvent, which were prolonged to 50 ns due to slower convergence. The remaining settings are the same as for the GIST calculation.
Pymbar was used to calculate the solvation free energies from the TI simulations.85
The uncertainty of the free energies was estimated by splitting each trajectory into three equal parts and computing the sample standard deviation based on the resulting free energies.
To obtain the entropy contribution of the free energy from TI, the change in internal energy ΔU is needed.86 However, the convergence of end-state energies in TI is rather slow, since energy fluctuations of the entire simulation box are included, although large portions of the box do not interact with the solute. Fortunately, the GIST energy converges more quickly since it allows us to focus on the proximity of the solute. Therefore, we approximate the change in internal energy by the integral of the GIST energy terms (computed as explained below, in the subsection titled Analysis). We then compute the total entropy contribution TΔStotal as
where ΔA is the free energy from TI and ΔE is the change in internal energy. We validated the accuracy of this approach using the systems where a molecular species is dissolved in itself, since the hydration energy of those systems can be trivially computed as Etot/nmols − Eself, where Etot is the potential energy of the system, nmols is the number of molecules, and Eself is the self-interaction energy of a molecule. We compare this value to the energy computed from GIST and from the TI end-states and find a root-mean-square error (RMSE) of 0.6 kcal/mol using GIST and 1.6 kcal/mol using 100 ns simulations of each TI end-state.
The uncertainty of the entropy contribution was computed using
where σ denotes the sample standard deviation of the respective quantity.
GIST calculations
GIST calculations were performed using an updated version of cpptraj. The changes in the entropy calculation are already merged into the official GitHub version (https://github.com/Amber-MD/cpptraj.git), and the extension to multiple solvents will be incorporated as well.
Energy calculations were performed on the GPU using the minimum image convention, rather than the previously published PME implementation.34
For the production GIST runs, the temperature was set to be 300 K, a grid size of 200 × 200 × 200 Å3 and a spacing of 0.5 Å was used to envelop the entire simulation box in the grid. The reference density of each solvent was calculated from an MD simulation of a box containing 2700 solvent molecules. The solvent positions were assigned using the center of mass (COM). We always defined the first molecule as the “solute” and all others as the “solvents.”
To test the reproducibility of the GIST results, we split the trajectories into three parts and performed GIST calculations individually. We estimate the uncertainty using the sample standard deviation based on the three individual results.
Analysis
For the convergence analyses in energy and entropy, we grouped the grid voxels into shells based on their distance to the closest solvent atom. The shells were calculated up to a distance of 20 Å with a shell thickness dr of 0.2 Å. The energy and entropy values were integrated over each shell.
From this radial convergence analysis, we calculated the reference energies and entropies for each solute/solvent pair using the average between 15 and 20 Å from the solute. The only exception is water as a solvent, where we reduced the distance to 13–16 Å to avoid the edge of the solvent box. The reference values for the entropy of each molecule dissolved in itself are shown in the supplementary material, Fig. S3. Referencing the entropy is only necessary because the bulk entropy does not reach zero with finite sampling. This non-zero bulk value can be due to inaccuracies in the entropy algorithm or due to a correlation between subsequent frames in solvents with long relaxation times. While the former is significantly improved by our updates to the entropy calculation, as shown in the supplementary material, Fig. S1, the latter is still relevant when using aniline as a solvent, as shown in the subsection titled Convergence.
For the total solvation free energy, we used distance cutoffs corresponding to three hydration shells in each solvent. supplementary material, Table S1, shows the cutoff values. We note that the cutoff for water (8 Å) is higher than what we used previously.87 This is possible due to our changes in the entropy calculation, which improve the convergence in bulk. The post-processing was performed using our in-house gisttools, package, which can be obtained from GitHub (https://github.com/liedllab/gisttools).
RESULTS
Convergence
We performed simulations of different molecules solvated in themselves, e.g., water in water, to analyze the convergence of our new GIST implementation with respect to the distance to the central molecule, the simulation time, and the number of frames. Figure 2 shows the “bulk” value of the GIST entropy computed at a distance of 20 Å to the solute. We find that the entropy in most solvents is well converged (i.e., close to zero) using 10000 frames from 100 ns simulation. However, some solvents show non-zero bulk values at higher frame numbers, indicating that the autocorrelation time exceeds the separation between frames. This is by far the most pronounced in aniline. Increasing the simulation time to 1000 ns clearly improves the convergence. While the average entropy converges better at a lower number of frames, the individual voxel values have higher fluctuations. We, therefore, chose to use 10 000 frames for all analyses while increasing the simulation time to 1000 ns whenever the solvent is aniline. Furthermore, we subtract a reference entropy corresponding to the bulk value from our GIST results before computing integrals to remove the remaining bias.
Average of the GIST entropy contribution TΔSsix at a distance of 20 Å to the “solute” molecule for different numbers of frames and different simulation lengths. First row: 100 ns simulations. Second row: 1000 ns simulations.
Average of the GIST entropy contribution TΔSsix at a distance of 20 Å to the “solute” molecule for different numbers of frames and different simulation lengths. First row: 100 ns simulations. Second row: 1000 ns simulations.
Even in solvents with good convergence, the bulk entropy (i.e., the reference value of the entropy) is not exactly zero. In the supplementary material, Fig. S3, we show the entropy reference values obtained from 10000 frames of a 100 ns simulation for every molecule dissolved in itself. We find a large improvement for all molecules except aniline, where the strong autocorrelation between the frames leads to a strongly negative bulk entropy.
In the supplementary material, we show further convergence analyses for all solvents, including the convergence with distance to the solute (Figs. S4 and S5) as well as different energy and entropy quantities (Fig. S6). For the latter analyses, the GIST grids were integrated within 10 Å of the solute.
Since water has been extensively studied in prior work using GIST, we checked the water–water reference energy (computed as explained in the section titled Methods) to see whether our approach matches the value from the AMBER manual. The two values are rather close, where the AMBER manual reports 9.533 kcal/mol per solvent molecule, and our calculations lead to 9.554 kcal/mol. The difference of about 0.2% stems from the larger box size in our study, which leads to slightly different energies under the minimum image convention.
Although the energy per molecule is rather consistent between implementations, it is still important that the reference energy is computed with the same algorithm for the energy calculation (PME or minimum image convention), as well as roughly the same box size. In our case, the integration region for the GIST energy contains on average 81–199 solvent molecules, depending on the system. Therefore, a small difference of 0.02 kcal/mol per solvent amount to 4 kcal/mol in total.
Comparison to experiment and TI
We performed TI calculations on every solute/solvent combination of our dataset and computed the free energy of solvation as well as the energy and entropy contributions. We computed the total solvation entropy TΔStotal using the free energy values from TI and the energy from GIST (as explained in the section titled Methods, subsection TI calculations). In Fig. 3, we compare the GIST entropy TΔSfirst to TΔStotal and also show error bars from trajectory splitting as explained in the section titled Methods.
TΔStotal plotted against the GIST entropy for each of the ten investigated solvent molecules. Error bars for GIST were computed by splitting the trajectories into three parts of equal length and computing the sample standard deviation. The error bars represent the sample standard deviation from trajectory splitting, as explained in the section titled Methods.
TΔStotal plotted against the GIST entropy for each of the ten investigated solvent molecules. Error bars for GIST were computed by splitting the trajectories into three parts of equal length and computing the sample standard deviation. The error bars represent the sample standard deviation from trajectory splitting, as explained in the section titled Methods.
In Table I, we use the same data to compute linear correction terms for each solvent. In a previous study, Chen et al.34 showed that a linear correlation between the higher and first order entropies exists for water, such that ΔStotal = 0.6 × ΔSfirst. Our results indicate that such a relationship also exists for some other solvents, although the correlation factor is different. In particular, the slope of the linear correlation is almost one for benzene, pyridine, dimethylacetamide, tetrachloromethane, and toluene, indicating that the solvation entropy is dominated by first-order terms in those solvents. Furthermore, we find a good correlation in chloroform and in water with slopes of 0.75 and 0.62. This indicates a linear correlation between first and higher order entropies.
Linear fit parameters for the relation between TΔStotal (estimated by TI) and TΔSfirst (estimated by GIST). The intercept values are given in kcal/mol. R is the Pearson correlation. RMSE is the root-mean-squared error of the linear regression in kcal/mol, and MUE is the mean unsigned error, also in kcal/mol.
Solvent . | Slope . | Intercept . | R . | P value . | RMSE . | MUE . |
---|---|---|---|---|---|---|
Acetone | 0.89 | 1.43 | 0.95 | 2.00 × 10−5 | 0.34 | 0.27 |
Aniline | 0.41 | −0.39 | 0.67 | 3.50 × 10−2 | 0.90 | 0.71 |
Benzene | 1.12 | 2.69 | 0.96 | 9.60 × 10−6 | 0.34 | 0.29 |
Chloroform | 0.75 | 0.37 | 0.94 | 4.20 × 10−5 | 0.38 | 0.31 |
Dimethylacetamide | 0.96 | 2.00 | 0.99 | 8.80 × 10−8 | 0.27 | 0.24 |
Methanol | 0.52 | 0.11 | 0.82 | 3.40 × 10−3 | 0.65 | 0.47 |
Pyridine | 0.95 | 2.38 | 0.9 | 3.50 × 10−4 | 0.52 | 0.47 |
Tetrachloromethane | 0.99 | 1.49 | 0.97 | 4.70 × 10−6 | 0.30 | 0.21 |
Toluene | 0.90 | 1.03 | 0.97 | 5.70 × 10−6 | 0.25 | 0.20 |
Water | 0.62 | 0.93 | 0.95 | 1.90 × 10−5 | 0.36 | 0.32 |
Solvent . | Slope . | Intercept . | R . | P value . | RMSE . | MUE . |
---|---|---|---|---|---|---|
Acetone | 0.89 | 1.43 | 0.95 | 2.00 × 10−5 | 0.34 | 0.27 |
Aniline | 0.41 | −0.39 | 0.67 | 3.50 × 10−2 | 0.90 | 0.71 |
Benzene | 1.12 | 2.69 | 0.96 | 9.60 × 10−6 | 0.34 | 0.29 |
Chloroform | 0.75 | 0.37 | 0.94 | 4.20 × 10−5 | 0.38 | 0.31 |
Dimethylacetamide | 0.96 | 2.00 | 0.99 | 8.80 × 10−8 | 0.27 | 0.24 |
Methanol | 0.52 | 0.11 | 0.82 | 3.40 × 10−3 | 0.65 | 0.47 |
Pyridine | 0.95 | 2.38 | 0.9 | 3.50 × 10−4 | 0.52 | 0.47 |
Tetrachloromethane | 0.99 | 1.49 | 0.97 | 4.70 × 10−6 | 0.30 | 0.21 |
Toluene | 0.90 | 1.03 | 0.97 | 5.70 × 10−6 | 0.25 | 0.20 |
Water | 0.62 | 0.93 | 0.95 | 1.90 × 10−5 | 0.36 | 0.32 |
For methanol and aniline, we find comparably poor correlations of 0.82 and 0.67. This might be due to convergence issues, as shown below for aniline, or due to nonlinear contributions of the higher-order entropies and internal degrees of freedom.
For all solvents except aniline, we find relatively low sample standard deviations compared to the absolute values, as shown in Fig. 3. This indicates that both the GIST and TI calculations are relatively well converged at the simulation times chosen in this study. Aniline is an exception due to the very slow diffusion of aniline molecules in the GAFF2 force field, which was already noted in prior MD studies.88
Last, we analyzed the solvation free energy of each solute in each solvent, leading to a ten times ten matrix of solvation free energy values. In Fig. 4, we visualized the correlation between TI calculations and the GIST calculations. We observe a Pearson R of 0.63 over the entire matrix.
Results of the GIST calculation vs the results of the TI calculation. The 100 data points are depicted as points, shown as a black line is the linear fit. Orange: aniline as a solvent. Blue: other solvents. First row: uncorrected GIST free energies. Second row: free energy using a linear correction term for the entropy of each solvent. Third row: comparison between TI and experimental data.
Results of the GIST calculation vs the results of the TI calculation. The 100 data points are depicted as points, shown as a black line is the linear fit. Orange: aniline as a solvent. Blue: other solvents. First row: uncorrected GIST free energies. Second row: free energy using a linear correction term for the entropy of each solvent. Third row: comparison between TI and experimental data.
We note that the worst predictions are found for aniline. This might be a result of poor convergence, since we have shown in Fig. 2 that the bulk entropy of aniline is significantly below zero unless the spacing between MD frames is very long, indicating a long autocorrelation time. Additionally, it is known from the literature that the self-diffusion of aniline using the GAFF2 force field is very slow.88 The prediction for solutions in aniline is significantly better using TI than using GIST and omitting aniline as the solvent increases the correlation between GIST and TI to 0.79.
However, there is one datapoint (the point at the lowest experimental free energy in Fig. 4), which is also overpredicted using TI, pointing toward a force field issue rather than a problem with convergence. To test this hypothesis, we performed GIST calculations of pyridine in aniline with scaled aniline charges. The results are shown in the supplementary material, Fig. S7.
Correcting the entropy using the linear models defined in Table I leads to an R value of 0.84 for the experiment, and 0.97 for the TI calculations. The raw energy and entropy values used to produce Fig. 4 are shown in the supplementary material, in Table S2 (GIST energy), Table S3 (GIST entropy), Table S4 (GIST free energy), and Table S5 (TI free energy). The experimental reference data are taken from the study by Kashefolgheta et al.62
Spatial interpretation
Figure 5 shows the localized entropy, enthalpy, and free energy of solvation for chloroform and dimethylacetamide dissolved in benzene and water. We observe strong differences in the solvation properties of dimethylacetamide depending on the solvent. In contrast to water, which predominantly solvates the oxygen atom of the amide group, benzene is more localized to the parts above and below the center of the amide group. For chloroform, we find that both solvents are predominantly located close to the hydrogen atom. Entropy and energy contributions to the solvation free energy are readily available from GIST and allow for a more detailed analysis of the solvation free energy. Energy-entropy compensation can be observed to some extent for the two solutes in both solvents. Usually, more favorable energy contributions are accompanied by unfavorable entropy contributions. This is visible from the large blue areas in the part of the figure depicting the energy. These are almost identical to the green areas in the entropy part.
Per-molecule-normalized solvation free energy (bottom) and its two contributions, the two-body entropy (middle) and enthalpy (top). Each quantity is shown for chloroform and dimethylacetamide as solutes, and for benzene and water as solvents. Free energy contributions below −1.25 and −0.5 kcal/mol per solvent are depicted in light and dark purple, entropy contributions −TΔSsix below −2.5 and −0.5 kcal/mol per solvent are depicted in light and dark green, and areas with solvation energy below −2.5 and −0.5 kcal/mol per solvent are shown in light and dark blue. The GIST grids are coarse-grained to 1 Å resolution to reduce noise in the visualization.
Per-molecule-normalized solvation free energy (bottom) and its two contributions, the two-body entropy (middle) and enthalpy (top). Each quantity is shown for chloroform and dimethylacetamide as solutes, and for benzene and water as solvents. Free energy contributions below −1.25 and −0.5 kcal/mol per solvent are depicted in light and dark purple, entropy contributions −TΔSsix below −2.5 and −0.5 kcal/mol per solvent are depicted in light and dark green, and areas with solvation energy below −2.5 and −0.5 kcal/mol per solvent are shown in light and dark blue. The GIST grids are coarse-grained to 1 Å resolution to reduce noise in the visualization.
DISCUSSION
In this study, we present a newly implemented, generalized GIST algorithm, which can be applied to all rigid solvent molecules. In combination with our recent work,33 GIST calculations of arbitrary solvent mixtures are now possible. Using the center of mass instead of a center atom, combined with two algorithmic improvements that we included, shows promising results. On the one hand, it permits us to analyze any rigid solvent. Furthermore, the changes in the underlying calculation of the entropy now allow the entropy to reach zero in bulk, even with smaller amount of sampling required.
In most cases, we find that the GIST calculations are well converged using 10000 frames out of 100 ns simulation. In the case of aniline, however, the convergence is much slower. In this case, the entropy values do not converge to 0 for bulk-like solvents, but to some negative value, which is both dependent on the total number of frames and the total amount of simulation time. This value is closer to zero when using few frames and long simulation times, showing that the nearest neighbor in aniline is still autocorrelated, i.e., the next frame is more likely to be the nearest neighbor as would be expected from statistics. It appears that for aniline, the relaxation time of molecular orientation is orders of magnitude slower than it should be from experiments. This is consistent with literature results reporting the very slow movement of aniline molecules simulated using the GAFF2 forcefield88 as well as with prior work by our own group showing that nitrogen atoms are often poorly represented by point-charge models.89,90
In the current work, we decided to prolong all calculations involving aniline as a solvent to 1 µs for the GIST runs or 50 ns per window for the TI calculations. Nevertheless, the prediction of solvation free energies is still worse than for other solvents.
For the rather apolar solvents benzene, toluene, pyridine, and tetrachloromethane, we find excellent agreement between the first-order entropy from GIST and the total entropy calculated from the TI free energy. This shows that the solvation entropy in those solvents is dominated by first-order terms. In water, we find a correlation to the exact entropy of 0.95 with a slope of 0.62. Similarly, we find a good correlation in chloroform, although with a slope of 0.75. This shows that higher order terms are also relevant, but that they can be easily approximated by a linear correction factor as suggested by Chen et al.34 In some other solvents, we find comparably poor correlations with a slope significantly below 1, showing that higher order entropies are relevant but not linearly related to the first order entropy, at least for the dataset investigated here.
In general, we find a correlation of 0.63 to experimental solvation free energies over the whole dataset, which improves to 0.79 when excluding aniline as a solvent. The correlation further improves when using linear correction terms for each solvent. Since the entropies per solvent shown in Fig. 3 are highly correlated with the total entropy from TI, we conclude that GIST is very accurate within the same solvent, but the accuracy should be benchmarked when comparing between different solvents.
In Fig. 4, there is one outlier that exists in GIST as well as in the TI calculation, pointing toward a problem with the underlying force field. This datapoint represents pyridine dissolved in aniline. Since it is known that simple point charges are a poor representation for nitrogen atoms and especially amine groups,89,90 we surmised that this deficiency of the force field might lead to wrong pyridine–aniline interactions. To test this, we performed GIST calculations where we linearly scaled down the aniline charges, as shown in the supplementary material, Fig. S7. We find that there is a strongly unfavorable entropic contribution to the pyridine–aniline interaction, which is quickly removed at lower aniline charges while the energy contribution remains more stable. This indicates that the distribution of aniline around the dissolved pyridine might be too rigid due to the high charge on the nitrogen (about −0.966). Of course, it cannot be assumed that such a simple change of the aniline description would lead to an overall improvement of the force field, but it points toward a problem for this specific system.
Visualizing the localized free energy contributions shows interesting differences in the solvation of various solute–solvent pairs. We show on the one hand very similar patterns for some small molecules, e.g., chloroform. Chloroform has a very similar profile no matter which solvent, where most of the interactions are located at the hydrogen atom. However, this is not always the case. In our dataset, especially dimethylacetamide shows differences for the various solvents. We have shown that the energy is especially favorable above and below the amide bond solvated in benzene, however, solvated in water it is more favorable near the oxygen atom.
It is worth noting here that we are using the conventional approach to calculate the energy in GIST. Chen et al.,34 published a GIST algorithm using the Particle Mesh Ewald method to calculate the energy, which presumably improves the consistency between MD simulations and GIST energy values.
CONCLUSION
Here, we present an extension to the GIST algorithm that allows the calculation of solvation free energies for all solvent molecules that can be modeled as rigid.
We show that the convergence is similar to that in water for most systems, with the exception of aniline, which seems to be an issue with the parameterization using GAFF2. We show that the first-order entropy is an excellent approximation of the total solvation entropy for several solvents, especially relatively apolar ones such as toluene or tetrachloromethane. Furthermore, the solvation entropy in water and chloroform can be excellently described by linear scaling factors of about 0.75 and 0.62, as suggested by Chen et al.34 for water. However, we find worse correlations in methanol and in aniline. While the poor prediction in aniline is probably due to convergence issues, the lower accuracy in methanol might be due to approximations in the GIST method. Since the energy contribution in GIST is expected to be very accurate, also the free energies are accurate whenever the first-order approximation is justified.
Furthermore, our improvements in the entropy calculation allow us to analyze larger chunks of the calculation box even with limited sampling, since the bulk entropy converges toward zero more quickly at modest numbers of simulation frames. Since these improvements are already incorporated into cpptraj, they will benefit future GIST studies regardless of the used solvent.
The results calculated using GIST are inherently localized and can, thus, be used to identify differences in solvation, which can help to uncover and overcome limitations in current models. We, therefore, conclude that GIST provides highly accurate free energy calculations combined with detailed three-dimensional information, whenever the first-order approximation is justified.
SUPPLEMENTARY MATERIAL
See the supplementary material for calculated values for the enthalpy, entropy, and free energy of solvation for all solute–solvent pairs, the cutoff that was used for the calculation of the solvation free energy for each solvent, figures with convergence information as well as reference entropies for the GIST calculations in all different solvents, a figure showing the relative error of the approximate volume terms in the nearest neighbor entropy, and GIST integrals of pyridine in aniline with scaled aniline charges.
ACKNOWLEDGMENTS
We thank Dr. Yin Wang for his technical support and his help with the simulation setup.
This work was supported by the Austrian Science Fund (FWF): Standalone Project Nos. P30565, P30737, P34518, and DOC30. The computational results presented were achieved, in part, using the HPC infrastructure LEO of the University of Innsbruck and the Vienna Scientific Cluster (VSC).
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
F.W. wrote parts of the manuscript, helped with programming, and performed research. J.K. wrote parts of the manuscript and programmed the extension of GIST. V.J.H. helped with writing and validation of the results. F.H. helped with planning the research and simulation setup. A.S.K. helped with writing and planning the research. M.L.F.-Q. helped with writing and supervised the research. K.R.L. supervised the research. F.W. and J.K. contributed equally to this work.
DATA AVAILABILITY
The data that support the findings of this study are available within the article and its supplementary material. Experimental reference data are taken from the literature. Raw data such as trajectories or GIST output files are available from the corresponding author upon reasonable request. The starting structures and force field parameters used for the small molecules can be found on the GitHub page of our group (https://github.com/liedllab/md-setup-gist-cross-solvation.git).