A streamlined molecular-dynamics workflow for computing solubilities of molecular and ionic crystals

Computing the solubility of crystals in a solvent using atomistic simulations is notoriously challenging due to the complexities and convergence issues associated with free-energy methods, as well as the slow equilibration in direct-coexistence simulations. This paper introduces a molecular-dynamics workflow that simplifies and robustly computes the solubility of molecular or ionic crystals. This method is considerably more straightforward than the state-of-the-art, as we have streamlined and optimised each step of the process. Specifically, we calculate the chemical potential of the crystal using the gas-phase molecule as a reference state, and employ the S0 method to determine the concentration dependence of the chemical potential of the solute. We use this workflow to predict the solubilities of sodium chloride in water, urea polymorphs in water, and paracetamol polymorphs in both water and ethanol. Our findings indicate that the predicted solubility is sensitive to the chosen potential energy surface. Furthermore, we note that the harmonic approximation often fails for both molecular crystals and gas molecules at or above room temperature, and that the assumption of an ideal solution becomes less valid for highly soluble substances.


I. INTRODUCTION
The solubility quantifies the maximum amount of a material, known as the solute, that can be dissolved in a solvent at equilibrium.2][3] For example, to ensure that a drug is absorbed at a suitable rate, it should have an appropriately high or low solubility, 4 so it is useful to estimate the solubility as part of the screening process even before the drug is synthesised experimentally.[9][10][11][12][13][14][15][16][17][18][19][20] However, predicting solubility computationally poses challenges for several reasons.First, the thermodynamic equilibrium between the solute and the solution is governed by both energetic and entropic factors.Addressing the entropic components often demands intricate and costly computations.Second, it is not straightforward to design the thermodynamic cycles necessary to perform these computations.Finally, it is important to model the molecular interactions accurately.Empirical force fields are often parameterised to describe either the liquid or the solid phase of a material accurately, but capturing both simultaneously is much more challenging.
The solubility of a crystal corresponds to the solution concentration  at which there is an equilibrium between the solid solute and the solution; at a known pressure  and temperature , the equilibrium is reached when the chemical potentials of the two phases are equal,  crystal (, ) =  sol (, , ).In principle, it may be possible to determine this equilibrium using direct-coexistence simulations 21,22 by placing the crystal in the same simulation box as a solution, and determining what the concentration of the solution is once the amount of crystal a) Electronic mail: ar732@cam.ac.uk b) Electronic mail: bingqing.cheng@ist.ac.at no longer changes. 23,24However, such simulations are difficult in practice, since crystal dissolution and precipitation are complex kinetic processes which usually require equilibration times beyond typical timescales of molecular-dynamics (MD) simulations, particularly so when either solutes or solvents are large molecules and the systems of necessity entail large numbers of atoms.Moreover, many atomic and molecular systems have numerous polymorphs, i.e. several distinct possible crystal structures.Although typical solubility ratios of different crystalline polymorphs are within a factor of two, there are exceptions, 25 and each polymorph can in principle have significantly different solubility properties. 1,3Predicting which polymorph will be the most stable or the most soluble under specific conditions requires a consideration of numerous possible crystal structures, further complicating computational efforts.
0][31] For atomic crystals, the Einstein crystal method 26,32 can be used to obtain the absolute free energy by exploiting artificial thermodynamic integration 26 to switch between the Einstein crystal potential ( 0 ), for which each atom is attached to its equilibrium position by a harmonic spring, and the potential of interest ( 1 ).For molecular crystals of flexible molecules, the extended Einstein crystal method was developed, 7,8,11,33 where the molecular crystal is transformed into an Einstein crystal of independent molecules via a multi-step workflow, including turning on additional harmonic restraints on selected atoms of each molecule to control the orientation of the molecules, subsequently turning on Van der Waals and electrostatic interactions between molecules, and finally turning off all harmonic restraints.This workflow can be somewhat convoluted, and it can be difficult to determine which atoms needs to be tethered and what spring strength is needed on each tethered atom.
The free energy of the molecule in the solution can be computed as the free-energy difference between a solvated and a gas-phase molecule.In a dilute solution, this can be determined by computing the chemical potential of a single solute molecule in a box of solvent using artificial thermodynamic integration by switching on the interactions between the solute and the solvent. 7n practice, such an approach suffers from origin singularities when the particle is first inserted. 34This singularity can be circumvented using the cavity method: a cavity can be switched on and increased in size to accommodate the solute molecule, the solute molecule then added, and finally the cavity is gradually switched off. 7For sufficiently dilute solutions, an ideal-dilute solution approximation can be used to estimate the chemical potential at other concentrations; in other words, if Henry's law applies to the solute,  sol () =  sol ( → 0) +  B  ln , where  is the mole fraction of the solute.If the solute is not very poorly soluble, it is likely that deviations from ideality will necessitate several laborious calculations of the solvation free energy as the concentration increases. 11he chemical potentials of the solute in solution and in the crystal must have a common origin.In the procedure outlined above, this is achieved by the stepwise calculation of the extended Einstein crystal's free energy.Since the calculation passes through a freely rotating state, the rotational partition function cancels out, and as long as the intramolecular partition functions in the crystal and the solution are the same, a consistent thermodynamic state is obtained. 7,11In the context of computing solid-liquid phase equilibria, it has been suggested that a simpler alternative approach can be used which avoids this effective stepwise solid-fluid transformation; 9,10 by computing the absolute free energy of the molecule in the gas phase explicitly, the liquid or solution free energy can be determined relative to the same origin as the molecular crystal.
In this work, we introduce a set of steps that simplify the calculation of solubilities.We use a combination of thermodynamic integration (TI), 26 free-energy perturbation (FEP) 35 and the S0 method 30 to compute the chemical potentials required.In our approach, we employ a Debye crystal reference for solid phases, reducing the need for manual intervention compared to the extended Einstein-crystal method previously discussed.We show how we can compute chemical potentials with a common baseline by using a gas-phase Debye molecule reference.We showcase this workflow by calculating the solubilities of an ionic solid (sodium chloride) and molecular crystals with varied solubilities, including paracetamol in water, paracetamol in ethanol, and urea in water.

II. METHODS
The general workflow is illustrated in Fig. 1.The absolute chemical potentials of the crystalline phase, the ideal-gas molecule and the solute are each computed separately.By 'absolute', we mean that the chemical potentials have a common origin, irrespective of the reference state from which they are computed.Specifically, the baseline of the chemical potential for a system comprising molecules with internal bonding, irrespective of the phase, is perhaps most naturally taken to be the chemical potential of an isolated molecule with intramolecular interactions.For instance, when determining the chemical potential of a crystalline phase relative to the isolated atoms, we should deduct the baseline chemical potential of the isolated molecule.This procedure mirrors constructing a thermodynamic cycle between the crystal and the solution via the gas-phase molecule state.
For the crystal, we use a reference Debye harmonic crystal [Subsec.II A].We then determine the baseline chemical poten-tial of an isolated molecule in the gas phase; to do this, we again use a harmonic reference state in which all pairs of atoms are connected with harmonic springs, and separately add analytical chemical potentials of translational and rotational degrees of freedom [Subsec.II B].Finally, for the solute, we split the calculation into two parts.We first compute the solvation free energy at a certain low concentration using a combination of TI and FEP [ § II C 1], and then determine the chemical potential of the solution as a function of concentration with the S0 method [ § II C 2].

A. Crystalline states
We use the Debye crystal reference 27,[36][37][38] as the starting point of TI.This reference gives a chemical potential baseline of isolated atoms; we account for this in the next step of the calculation by shifting the baseline to that of isolated molecules.The Debye crystal has all pairs of atoms connected with harmonic springs (as schematically shown in Fig. 1(a)), and it has the same mass-weighted hessian matrix as the physical crystal.The hessian can be computed by expanding the potential energy surface around the (local) minimum-energy state in a numerical normal-mode approximation. 39The eigenvalues of the hessian matrix,   , correspond to the square roots of the normal-mode angular frequencies,   = √   , where  identifies each of the normal modes.The Helmholtz energy of the Debye crystal with a constrained centre of mass (CM) at  0 is where the sum excludes the three normal modes with zero eigenvalues corresponding to translations of the entire system.
For TI, we define a simple linear scaling of the overall potential of a system,  () =  1 + (1 − ) 0 , 40 so that the potential changes from the reference harmonic potential  0 to the potential of interest  1 as the parameter  changes from 0 to 1. Using artificial thermodynamic integration ('-TI'), the Helmholtz energy of the system governed by  1 is 26,27,32,38 Once we have the Helmholtz energy, we can obtain the Gibbs energy, 27 e.g.via  =  + , and hence the chemical potential  = /.Since the harmonic approximation is just the potential energy expanded to quadratic terms, at sufficiently low temperatures, the integral is smooth and the procedure is efficient.As the temperature increases, however, anharmonic effects begin to play a role and the -TI becomes progressively more difficult.In particular, rotations about single bonds within each molecule can be activated at higher temperatures, and although in principle this should not lead to a discontinuity in ⟨ 1 −  0 ⟩  at  = 1, the potential energy difference can become large and the numerical integration may require many data points and long equilibration times.It is thus preferable to perform the -TI at sufficiently low temperatures when the integral is well-behaved.Once the Gibbs energy is determined at the initial low temperature  0 , we can find how it changes with temperature by We use the normal-mode approximation to construct a reference Debye crystal with a known free energy, and then use thermodynamic integration (TI) to switch off the Debye harmonic interaction as the potential of interest is switched on.(b) Isolated molecule.We construct a normal-mode Debye-molecule landscape with a known free energy.We constrain the molecule's centre of mass and rotational motion, shown pictorially with a cyan pin, and account for these contributions analytically.
We then follow one of two routes: we either (1) use artificial thermodynamic integration until most of the potential is switched on and compute the final part by a free-energy perturbation (FEP) method, or (2) use thermodynamic integration to a tethered interacting molecule, and then another thermodynamic integration to switch off the tethering.(c) Solution.We start with the pure solvent or a very dilute solution.We use a free-energy perturbation to add an additional solute molecule with a soft potential, and then use either thermodynamic integration or further free-energy perturbations to switch the soft potential to the fully interacting system.Finally, we use the S0 method to compute how the chemical potential changes from this initial low concentration as the solute is concentrated.
numerically integrating the Gibbs-Helmholtz equation, where  =  +  is the enthalpy of the system.

B. Gas-phase molecule
To obtain a consistent origin for the absolute chemical potentials of the crystal and the solution, we subtract the free energy of the gas-phase molecule from the free energy of the crystal referenced to the isolated atom state.To compute the free energy of the isolated molecule, we perform a -TI between a harmonic reference and the physical system [Fig.1(b)].It is advantageous, and indeed often necessary, to constrain the centre of mass (CM) and the rotational degrees of freedom of the molecule when performing TI, and then one can add explicit free-energy contributions from the free ideal-gas particle to correct for the CM constraint ( cm ), as well as the rigid-body rotations of the entire molecule ( rot ).
The Helmholtz energy associated with the free CM in a volume  at temperature  is and the rotational Helmholtz energy is 41 where  1 ,  2 and  3 are the principal components of the moment of inertia tensor.The rotational partition function sometimes entails a division by a symmetry number to account for the number of equivalent rotations of the molecule; however, in this framework, we do not do so because the same degeneracies are also ignored when sampling the free energy of molecular crystals with restricted molecular orientations.When performing the -TI from the reference system to the potential of interest, although in principle one can use any harmonic or ideal-gas reference, choosing a reference harmonic molecule that has the same frequency modes and equilibrium configuration as the real molecule is statistically efficient.Selecting such a reference system follows the same procedure as setting up the Debye crystal in the previous section, except that a total of 6 degrees of freedom are removed (3 translational and 3 rotational).The classical Helmholtz energy of such a constrained Debye molecule at temperature  0 is where   is the angular frequency of normal mode .
Finally, the Helmholtz energy of the real molecule with a fixed CM and orientation can be obtained using -TI (Eq.2).As with the Debye crystal, in order to improve the numerical convergence of this thermodynamic integration, this step should be performed at some low temperature  0 at which the system is quasi-harmonic.Although for fairly rigid molecules this procedure works well, for molecules with different conformers or rotating bonds, the integrand can still adopt very large values as  → 1.To circumvent this, one can envisage two approaches, as illustrated in Fig. 1(b).The first approach is to perform a FEP in lieu of the final stage of the TI from 1 −  to 1, i.e.

𝐴(𝜆
where  is a small number close to zero.The second approach is to perform a TI between the harmonic reference and a tethered molecule (the physical system plus harmonic springs connecting each atom to its equilibrium position), and another TI between the tethered and the untethered molecule.Finally, the temperature dependence of the free energy from  0 to  can be computed using the Helmholtz-energy analogue of Eq. ( 3).

Solvation free energy
The solvation free energy is the free-energy difference between a solute molecule that is fully interacting with a dilute solution and the same molecule in the ideal-gas state.To compute it, in principle we can use -TI [Eq.(2)] between these two states directly.Here the potential of interest  1 corresponds to the molecule fully interacting with its surroundings at some concentration of the solution.The reference potential  0 corresponds to the solute molecule not interacting with its surroundings, but where the intramolecular energy of the molecule, as well as the interactions between solvents, are the same as in  1 .Typically, we use a solution with one molecule dissolved in a box of the solvent for this purpose.
Many procedures for the hamiltonian switching are available. 14,42The simplest is the linear  scaling and get the free energy difference using Eq. ( 2).However, a naïve implementation of the -TI results in a divergent integrand in Eq. ( 2) as  → 0: when the 'ghost' molecule and the rest of the system are not interacting, atoms can overlap and cause infinity in  1 −  0 .To circumvent this problem, one can instead perform a FEP at the end point 43 (as highlighted using a blue arrow in Fig. 1(c).),i.e.

𝐴(𝜆
and then compute the integral in Eq. ( 2) from  to 1 in  only.
Another approach that avoids numerical convergence issues in TI is the use of multiple-step FEP using a -dependent soft-core potential.For example, a possible soft-core functional form of the Lennard-Jones potential is 42 where  LJ and  LJ are appropriate energy and distance units, and  LJ controls the switching protocol.Similarly, a soft-core Coulomb interaction is where   and   are the charges on the two interacting atoms,  0 is the electric constant,  C is the dielectric constant, and  C controls the switching.When using multiple-step FEP, one uses a number of intermediate states, for example,  (  ) with   = / for  ∈ {0, 1, . . .,  − 1}.We have found it to be statistically efficient to combine a forward and a backward FEP at half steps.For example, to compute the free-energy difference between two potential-energy surfaces  a and  b , one runs the simulation using the potential  1/2 = ( a +  b )/2, and the estimator is

Concentration-dependent chemical potential
In the previous step, we computed the solvation free energy, and hence chemical potential  sol 0 , at some low concentration  0 .To find how this quantity changes as a function of solution concentration, we use the recently proposed S0 method, 30 as shown schematically by the green arrow in Fig. 1(c).The S0 method is based on the thermodynamic relationship between fluctuations in particle numbers and derivatives of the chemical potentials with respect to the molar concentration, 29 and only uses the static structure factors computed from equilibrium MD simulations. 30Specifically, we perform multiple equilibrium MD simulations at different concentrations and find  sol () by numerical integration with respect to ln , where the subscripts M and S denote solute and solvent molecules, respectively. 0 M-M is the static structure factor in the  → 0 limit between solute molecules and  0 M-S is the equivalent between solute and solvent molecules.

A. Sodium chloride in water
To model NaCl and water, we used the JC/SPC/E 23 force field, with the Lennard-Jones interactions truncated at 1 nm with tail corrections, and long-range Coulomb interactions were treated using a particle-particle particle-mesh solver.Since NaCl is made up of individual ions, we do not need to explicitly compute the free energy of the gas-phase molecule in this case, as the reference state for the solvation free energy and the crystal free energy are both a pair of ideal-gas NaCl ions with an analytic free energy.
For the solid phase, we performed a -TI from the Debye crystal to the physical system at 298.15 K.The result is consistent with a -TI first performed at a lower  of 50 K and then an integration along an isobar [Eq.3].The chemical potential for each ion pair in the solid phase 298.15K is (−190.145± 0.015) kcal mol −1 .
To compute the solvation free energy, we used i-PI 44 to perform TI between two states: (i) a pair of Na and Cl ions that are non-interacting and a box of water (with 510 water molecules), and (ii) Na and Cl ions that are fully interacting with water and with each other and a box of water.Crucially, a combination of a stochastic velocity rescaling thermostat 45 and a weak local Langevin thermostat was used to ensure sufficient equilibration of the ghost molecule and efficient temperature control.The time step was 1 fs, and each MD run lasted for ∼1.5 ns.We used FEP from  = 0 to  = 0.0001, and linear  scaling in the TI over a dense grid for the rest.The FEP at  = 0 is necessary because the TI integrand is divergent at that point due to the singularity of the LJ and the Coulomb potential related to the ions.In Fig. 2(a), we show the comparison of the excess chemical potential  sol 0 () computed using FEP and the combination of TI and FEP (TIFEP), 43 as a function of the switching parameter .At  < 0.02, both methods yield similar results.At higher , however, the direct FEP has a much higher statistical error and a systematic bias.Our estimated solvation free energy is  sol 0 = (−177.30± 0.05) kcal mol −1 .The final step is to obtain the  sol of NaCl ion pairs at different concentrations.Excess chemical potentials for NaCl ion pairs ( ex NaCl ) were computed using the S0 method (Eq.( 12)) in Ref. 30.Simulations of NaCl water solutions at different molar concentrations were performed using LAMMPS 46 at 298.15 K and 1 bar; more details can be found in Ref. 30.The ideal chemical potentials is given by the first two terms on the right-hand side of Eq. ( 12), and is shown by the dashed black curve in Fig. 2(b).By combining the ideal and excess parts of the chemical potential, we computed  sol of NaCl ion pairs at different salt molalities  (i.e. the chemical amount of the solute per unit mass of the solvent), shown as the blue curve in Fig. 2(b).In the same figure, we also show the chemical potential of the ion pairs in the crystal,  crystal [grey horizontal line].The point at which the blue curve and grey line cross gives the solubility of NaCl in water; for this model, we estimate the solubility to be (3.87 ± 0.14) mol NaCl kg −1 water .This estimate is lower than the experimental result of 6.15 mol kg −1 at 25 °C, 47 but is consistent with previous simulation results computed using osmotic ensemble Monte Carlo, 48 the Bennett acceptance ratio method 49 and thermodynamic integration, 18 all employing the same force field.

B. Urea in water
1][52][53][54] For our investigation, we used the CHARMM36 55 parameterisation from CHARMM-GUI, 56,57 since unlike several other commonly used empirical potentials, we verified that the experimentally known crystalline polymorphs are at least mechanically stable at suitable thermodynamic conditions.The potential entails combinations of Lennard-Jones and Coulomb interactions, coupled with bond, angle, dihedral and improper constraints that maintain a relatively rigid molecular structure. 58We do not use any further rigid-body constraints.We remark that since harmonic bonds are used, simulations should not be run with the Nosé-Hoover thermostat, 59,60 and alternatives such as stochastic velocity rescaling 45 or the Langevin thermostat should be used instead.For the solution phase, water molecules are modelled with a potential compatible with the urea CHARMM force field; the water model entails a three-site potential with Coulomb and Lennard-Jones interactions at each water atom, and harmonic bond and angle constraints.
For the bulk of our work, we considered two crystalline polymorphs of urea, namely forms I 61 and B1 62 (Fig. 3).For both polymorphs, we performed a -TI first at a low temperature (25 K) using a Debye crystal reference, followed by TI in  along the 1 bar isobar.The free energy computed for the crystals is thus referenced to isolated atoms.For the gas phase, the workflow for computing the free-energy reference to the isolated atoms is similar to that of crystals, except that both translational and rotation are constrained in the TI simulations.The analytic translational and rotational free-energy corrections (Eq.( 4) and Eq. ( 5)) are added at the end.The volume of the gas used in Eq. ( 4) was 1000 3 0 , where  0 is the Bohr radius.The chemical potentials of the crystalline polymorphs ( I and  B1 ) are calculated by subtracting the free energy of the gas-phase molecule from the free energies of the crystal polymorphs.These chemical potentials are shown in Fig. 3(a).Consistently with previous work, 62,63 form I is the most stable form at ambient pressure.By way of comparison, the harmonic-approximation chemical potentials  I h and  B1 h are also plotted in Fig. 3(a); the difference to the true chemical potentials accounting for anharmonicity is rather large in both cases.
The solvation free energy is computed using the multiple-step FEP with the soft-core potentials, with the switching parameters  LJ = 0.5 and  C = 10 in Eqs ( 9) and (10).We used a total of 20 windows, and performed the FEP in the middle step as described in Eq. ( 11).We confirmed that using more windows did not change the estimate.By contrast, a forward or backward FEP would require many more windows to reach convergence.Each independent run lasts for ∼5 ns.We used a time step of 1 fs since the resulting  sol 0 is consistent with that computed using a smaller time step of 0.1 fs.The calculations were performed using i-PI. 44A combination of a stochastic velocity rescaling thermostat 45 and a weak local Langevin thermostat was used for ergodic sampling of the solute molecule and efficiency.The pressure was kept at 1 bar.As discussed above, the computed solvation free energy is already referenced to the gas-molecule state.
As a sanity check, we computed the solvation free energy of urea in its own melt, which is just the chemical potential of the melt  melt (the red symbols in Fig. 3(a)). melt computed at different temperatures agree well with the values obtained from TI of the melted phase with respect to  (Eq.( 3)), as shown using the red curve in Fig. 3(a).The crossover between  melt and  I or  B1 is the melting point of the specific crystalline phase, and we estimate  I m = (425±7) K and  B1 m = (405±5) K.We then performed independent direct-coexistence simulations in elongated boxes (3840 atoms for form I with typical box dimensions of 71 Å × 22 Å × 22 Å and 5760 atoms for form B1 with typical box dimensions of 90 Å × 27 Å × 21 Å).We first determined the equilibrium lattice parameters as a function of temperature at 1 bar, then melted half the box along  at a high temperature whilst keeping the remaining half the molecules frozen, and finally evolving the system at fixed   until one phase grew at the expense of the other.The melting point for form I determined with this method is in the range 415 K to 430 K, and for form B1 it is in the range 400 K to 410 K. Since both ranges are consistent with estimates from free-energy calculations, this suggests that the computed chemical potentials had a consistent choice of baseline, and the correctness of the workflow.The experimental melting point of form I urea is 406 K; 64 the close agreement with simulation is perhaps somewhat surprising given the simplicity of the force field.
We computed the solvation free energy of urea in water ( sol 0 ) at 20 °C intervals between 0 °C and 100 °C.Once a 'ghost' urea molecule was added, the simulation box contained 1 urea molecule and 123 water molecules, corresponding to a concentration of about 0.49 mol dm −3 .These results are indicated using orange symbols with error bars in Fig. 3(a).To obtain chemical potentials of the solute at other concentrations, we used the S0 method.We simulated urea-water mixtures at different molar concentrations using LAMMPS 46 at the temperatures specified above and a pressure of 1 bar.The simulation box contained about 10,000 molecules.We used a time step of 1 fs, with runs of ∼1 ns each.To compute (k), we collected a snapshot per 1000 steps of the trajectory.We used the positions of oxygen atoms as the positions of water molecules, solubility of solubility of  / °C  / K I / mol kg and the positions of carbon atoms as the positions of urea molecules.The (k) values were then fitted to the Ornstein-Zernike form 30 with a maximum cutoff in the wave vector The  0 values between urea-urea and ureawater can then be used to compute the concentration-dependent  sol () using Eq.(12).As an example,  sol at 20 °C at different molalities is shown as the orange curve in Fig. 3(b).The shaded area indicates the statistical uncertainty, which principally comes from the error in the estimation of  sol 0 .The ideal chemical potential is shown as the dashed black curve.Finally, the chemical potentials of the crystalline forms I and B1 are plotted as blue and purple horizontal lines.As the concentration increases, the solution becomes less ideal.The crossover points between the orange curve and the two horizontal lines indicate the solubilities of the corresponding phases.In this case, the ideal-solution assumption turns out to be rather accurate in the solubility prediction.However, the agreement becomes worse at other temperatures considered; e.g. at 80 °C, it would lead to an underestimate in the solubility by about 15 %.We summarise the solubilities at other temperatures in Table I.The solubilities increase steeply with temperature, and the solubility of form B1 is higher than that of form I at all temperatures considered.
It has been shown that using different urea models can result in considerably different thermodynamic properties in solution. 51To check the role of the choice of force field in determining the solubility, we used the AMBER-GAFF-ESP-2018 force field (GAFF) 65,66 for both urea and water.We performed the solubility calculations for urea form I in water with GAFF, following the same workflow as with CHARMM. 67n Fig. 3(c), we show  sol at 20 °C computed using GAFF as a function of urea molality.At this temperature, the solubility of form I of urea is (5.1 ± 0.3) mol kg −1 , about 100 times the solubility predicted for the CHARMM potential under the same conditions.
Experimentally, urea dissolves very readily in water; the solubility in 100 g of water ranges between ∼70 g (12 mol kg −1 ) at 0 °C to ∼700 g (120 mol kg −1 ) at 100 °C. 64,68,69These solubilities are considerably higher than the CHARMM predictions obtained in Table I.The GAFF solubility at 20 °C is closer to, but still lower than, the experimental value.On the other hand, the crystalline phase is less well described by the GAFF potential than by CHARMM; at temperatures above 20 °C, form I is not dynamically stable under the GAFF force field, and readily transforms into an analogue of form III in MD simulations.These results thus demonstrate the sensitivity of the solubility prediction on the assumed potential energy surface, and the challenge of accurately modelling both the crystalline and the solution phases with the same potential.

C. Paracetamol in water and in ethanol
We used the CHARMM36 55 parameterisation from CHARMM-GUI 56,57 for paracetamol, water and ethanol, since all common crystalline polymorphs of paracetamol are at least metastable with this potential.
Specifically, we considered two crystalline polymorphs of paracetamol, namely forms I 70 and II. 71For both polymorphs, we computed the free energy of the Debye crystal at 25 K and then used -TI to the potential of interest, followed by a TI in  along the 1 bar isobar.For both polymorphs, it is important to perform the -TI from the Debye crystal at  ≲ 40 K, as the internal rotational modes become activated at higher , causing numerical problems with the integrand of Eq. ( 2) as  → 1.We used a time step of 0.25 fs and, as with the case of urea above, a combination of a stochastic velocity rescaling thermostat 45 and a weak local Langevin thermostat.
For the gas phase, we computed the free-energy difference between the Debye molecule reference and the molecule with a constrained CM and rotations at  = 100 K.Because rotations about bonds are possible and lead to different conformers, vanilla -TI is thwarted by divergences in the integrand.Instead, both the tethering approach and the FEP at the end point, as outlined in the Methods section, help to circumvent the issue.We used a time step of 0.25 fs, and the same combined thermostat as for the solid phases.The analytic translational and rotational free-energy corrections are added in the end.The volume of the gas used in Eq. ( 4) was 1587 3 0 .The total free energy of the gas molecule at 100 K was estimated to be (−41.84± 0.04) kcal mol −1 .The free energies at other temperatures were then determined using TI along the 1 bar isobar.
The chemical potentials of the crystalline polymorphs ( I and  II ) are baseline adjusted, so the reference state is the gas-phase molecule.These chemical potentials are shown in Fig. 4(a), together with the harmonic-approximated  I h and  II h in dashed lines; the difference is of the order of 1 kcal mol −1 .
For computing solvation free energies, we used the multiplestep FEP with the soft-core potentials described in Sec.II C 1, with  LJ = 0.5 and  C = 10.We used a total of 20 windows and performed the FEP in the middle step as described above.We confirmed that using more windows did not change the estimate.For the solvation free energy of paracetamol in water ( sol-water 0 ), the simulation box contained 1 paracetamol molecule and 118 water molecules ( 0 = 0.47 mol dm −3 ).The time step was 1 fs, and each independent run lasted about 1.5 ns.For the solvation free energy of paracetamol in ethanol ( sol-EtOH 0 ), we used a simulation box comprising 1 paracetamol molecule and 73 ethanol molecules ( 0 = 0.23 mol dm −3 ), and each independent FEP run lasted about 5 ns.These solvation free energies ( sol-water 0 and  sol-EtOH 0 ) are shown in Fig. 4(a).The large gap between the two sets of solvation energies indicates that paracetamol solvation in ethanol is rather more favourable than in water when using the CHARMM force field.
In addition, we computed the solvation free energy of paracetamol in its own melt,  melt , shown using the the red symbols in Fig. 4(a).These values agree well with the TI results of the melt (Eq.( 3)) [red curve in Fig. 4(a)].We estimate the melting points of form I and II to be  I m = (440 ± 11) K and  II m = (407 ± 11) K.3][74] We took an elongated box of crystal form I (roughly 107 Å × 29 Å × 35 Å with 504 molecules, with  =  = 90 • and  ≈ 98 • ), equilibrated it at 1 bar and a range of temperatures to determine optimal box parameters, melted half the box at a high temperature and locally equilibrated the system.We then performed interface-pinning simulations in which only the  component of the pressure was coupled to a barostat.We used a Steinhardt-Ten Wolde-style order parameter 75,76 on the carbonyl oxygen to classify molecules as being either in the crystal or the liquid phase, and then, using PLUMED, 77 applied a harmonic restraint with spring constant  to bias the number of crystalline molecules to be close to half the total molecules in the system,  0 ≈ /2.We computed the chemical-potential difference between form I and the melt via Δ = ( 0 − ⟨ cryst ⟩), 38,72 by determining the average number of crystalline molecules as a function of temperature.Using this approach, we estimate that the melting point, at which Δ = 0, is (451 ± 5) K.These results are consistent with Fig. 4(a), giving us confidence that all the relevant factors have been considered in the free-energy calculations.The experimental melting point of form I paracetamol is 442 K 78 (169 °C), in close agreement with simulation despite the assumptions of the force field.
For the concentration dependence of  sol , we used the S0 method.We simulated paracetamol-water and paracetamolethanol mixtures at different molar concentrations using LAM-MPS, 46 although for paracetamol-water this proved not to be necessary, since at low concentrations the solution is nearly ideal.The simulation box contained about 10,000 molecules.The time step was 1 fs and each run lasted about 1 ns.Snapshots where taken every 1000 steps, and the co-ordinates of oxygen atoms in water, nitrogen atoms in paracetamol, and the carbon bonded to oxygen in ethanol were used for recording the positions of the corresponding molecules.In the estimation of  0 , the cutoff in the wave vector was  2 cut = 0.0025 Å −2 .
We show in Fig. 4(b) the solution chemical potential  sol for paracetamol in ethanol at 20 °C as a function of the paracetamol molality [green curve].The chemical potentials of the two crystalline polymorphs are plotted as blue and purple horizontal lines.At higher concentrations, the solution becomes less ideal, and indeed the ideal-solution assumption would lead to an overestimate in the solubilities by a factor of two to three.The solubilities of paracetamol in water or ethanol at all temperatures considered are provided in Tables II and III.The solubilities increase with temperature in both water and ethanol, and the solubility of form II is about double that of form I under all conditions considered.
The CHARMM22 potential with SwissParam 79 parameters has been used to identify the source of stabilisation of different polymorphs of paracetamol relative to DFT-level calculations; 80 this work illustrates that empirical force fields likely do not capture all the relevant physics.However, the CHARMM36 force field with optimised partial charges has been shown to be a satisfactory potential for studying the nucleation of paracetamol in acetonitrile, 81,82 and this reparameterised potential has also been used to investigate the solubility of paracetamol in ethanol 11 using the cavity method.In this work, previous experimental values [83][84][85][86] were averaged to obtain a solubility mole fraction of 0.0585 ± 0.0040 at 20 °C, or a molality of 1.35 mol kg −1 .The solubility of paracetamol with the CHARMM36 potential with reoptimised partial charges was (0.085 ± 0.014) 11 (or 2.0 mol kg −1 ), a slight overestimate compared to the experimental result.By contrast, using the CHARMM-GUI parameterisation of the charges results in a slight underestimate of the solubility ( = 0.82 mol kg −1 , or  = 0.036).Both parameterisations appear to be reasonable, but the solubility is very sensitive to the details of the model's parameters.However, the solubility of paracetamol in water at 20 °C is  = 0.0845 mol kg −183 (or  = 0.0015), increasing to ∼ = 0.48 mol kg −1 ( = 0.0086) at 70 °C, 87    interactions of paracetamol with water are accounted for less well.

IV. CONCLUSIONS
In summary, we present a workflow that enables simple and robust computation of solubilities of molecular or ionic crystals.As illustrated in Fig. 1, we compute the free energies of the gas, the crystal and the solution phases separately.The workflow mainly uses thermodynamic integration from reference systems with known free energies to physical systems, with free-energy perturbations incorporated at the end points of the TI to avoid numerical issues and to increase efficiency.Compared to the state-of-the-art methods, 7,8,10,11,20,33 we streamline many steps.For example, we compute the chemical potential of the crystal using the gas-phase molecule as a baseline, which avoids the convoluted many-step switching in the extended Einstein crystal method. 7,8Using Debye crystals or Debye molecules as the starting points of the TI ensures high statistical efficiency, as these reference potential-energy surfaces closely resemble the potentials of interest.Constraining the CM and the rotation of the gas-phase molecule further reduces the sampling space.For the solute, the FEP at half steps (Eq.( 11)) requires fewer windows than forward or backward FEP.We use the S0 method to compute the concentration dependence of the chemical potential of the solute, which overcomes the idealsolution assumption and avoids separate solvation free-energy calculations at different concentrations that would otherwise be necessary. 11e have applied the workflow to systems of a range of solubilities: sodium chloride in water, urea in water, paracetamol in water, and paracetamol in ethanol.We additionally computed the solid-liquid equilibria for urea and paracetamol using the same workflow, for validation.From these calculations, we have observed that the harmonic approximation for both the gas molecules and the crystals often breaks down at higher temperatures.Even at room temperature, for molecular crystals, the error can be of the order of a kcal mol −1 , as in the case of urea and paracetamol.The ideal-solution assumption can also fail: it is reasonably accurate for dilute solutions, but can result in an error of a factor of two in the solubility predictions of more soluble substances, such as paracetamol in ethanol.
We have shown that quantitative solubility predictions hinge on the accuracy of the potentials chosen.For instance, the solubility of urea in water at 20 °C predicted using GAFF is two orders of magnitude higher than the CHARMM analogue.Solubility calculations could thus provide a useful tool for parameterising empirical potentials.Beyond enhancing empirical force fields, an alternative approach entails the use of machine-learning potentials (MLPs), which capture the accuracy of quantum-mechanical calculations, but at a considerably reduced computational cost. 88Given that MLPs have the capacity to model accurately the intricate interactions at the atomic and molecular levels that are crucial for describing molecular solutions, they stand out as a promising tool for refining solubility predictions.Nonetheless, due to the higher computational demands of MLPs compared to classical force fields, integrating them with an efficient workflow, such as ours, becomes essential to ensure the tractability of calculations.
In this paper, we have focused on classical chemical potentials, but of course one could also account for the influence of nuclear quantum effects (NQEs).For instance, to obtain the chemical potentials of the molecule in solution, one can still use the workflow presented in Fig. 1(c) while using the path-integral molecular dynamics (PIMD) formalism 89 to represent the whole system.][92] We expect that our workflow for calculating solubilities will be useful across various technologically significant systems.For example, for electrolytes, the ability to compute solubilities easily may deepen our understanding of the solvation of conducting ions in batteries.Similarly, in drug design, a clearer insight into solubility properties of drugs would enable drug delivery and absorption to be optimised.Moreover, the method can shed light on the precipitation of crystals from solutions, the behaviour of co-solvents, as well as extraction and purification methods in the chemical industry.

Figure 1 A
Figure1A schematic illustration of the molecular-dynamics workflow for determining chemical potentials.(a) Crystalline solute.We use the normal-mode approximation to construct a reference Debye crystal with a known free energy, and then use thermodynamic integration (TI) to switch off the Debye harmonic interaction as the potential of interest is switched on.(b) Isolated molecule.We construct a normal-mode Debye-molecule landscape with a known free energy.We constrain the molecule's centre of mass and rotational motion, shown pictorially with a cyan pin, and account for these contributions analytically.We then follow one of two routes: we either (1) use artificial thermodynamic integration until most of the potential is switched on and compute the final part by a free-energy perturbation (FEP) method, or (2) use thermodynamic integration to a tethered interacting molecule, and then another thermodynamic integration to switch off the tethering.(c) Solution.We start with the pure solvent or a very dilute solution.We use a free-energy perturbation to add an additional solute molecule with a soft potential, and then use either thermodynamic integration or further free-energy perturbations to switch the soft potential to the fully interacting system.Finally, we use the S0 method to compute how the chemical potential changes from this initial low concentration as the solute is concentrated.

Figure 2
Figure 2 (a) Chemical potential of solvation  sol 0 of an NaCl ion pair computed using free-energy perturbation (FEP) and a combination of thermodynamic integration and free-energy perturbation (TIFEP).(b) The blue curve shows the chemical potentials  sol for NaCl ion pairs as a function of molality  at 298.15 K.The dashed black line is the ideal chemical potential.The grey horizontal line indicates the chemical potential of the ion pairs in the solid phase.The statistical uncertainties are indicated by the widths of the curves.

Figure 3
Figure 3 (a) Chemical potentials of urea in different phases computed using the CHARMM force field.The blue and purple solid curves show the chemical potentials of crystal form I ( I ) and crystal form B1 ( B1 ), respectively.The dashed curves show the harmonic free energies of the corresponding phases.The red crosses with error bars show the estimate of the chemical potential of melted urea ( melt ), and the red curve shows the values from the thermodynamic integration of the liquid along the 1 bar isobar.The blue and the purple shaded areas show estimates of the melting temperatures of forms I and B1 with uncertainty, computed using the coexistence method.The orange symbols show the solvation free energy  sol 0 at the concentration  0 = 0.49 mol dm −3 with error estimates.(b) Chemical potentials of urea in solution of different molalities  (i.e. the amount of urea dissolved in a kilogram of water) computed using the CHARMM force field.(c) Chemical potentials of urea in solution of different molalities computed using the GAFF force field.In (b) and (c), the orange curve shows the chemical potentials  sol for urea as a function of the urea molality.The dashed black line is the ideal chemical potential.The blue and purple horizontal lines indicates the chemical potential of phase I and B1.The statistical uncertainties are indicated by the width of the curves.

Figure 4
Figure 4 (a) Chemical potentials of paracetamol in different phases computed using the CHARMM force field.The blue and purple solid curves show the chemical potentials of paracetamol in crystal forms I ( I ) and II ( II ), respectively.The dashed curves show the harmonic free energies of the corresponding phases.The red crosses with error bars show the estimate of the chemical potential of the paracetamol melt ( melt ), and the red curve shows the values from the TI of the liquid along the 1 bar isobar.The blue shaded area shows the estimate of the melting temperature of phase I with statistical uncertainty computed using the interface-pinning method for solid-liquid coexistence.The orange symbols show the solvation chemical potentials  sol-water 0 of paracetamol in water at  0 = 0.47 mol dm −3 with error estimates, and the green symbols show the solvation chemical potentials  sol-EtOH 0 in ethanol at  0 = 0.23 mol dm −3 .(b) The green curve shows the chemical potentials  sol for paracetamol dissolved in ethanol (EtOH) as a function of molality  (i.e.amount of paracetamol per kilogram of ethanol).The dashed black line is the ideal chemical potential.The blue and purple horizontal lines indicate the chemical potentials of forms I and II, respectively.Statistical uncertainties are indicated by the widths of the curves.

Table I
Solubilities of urea in water expressed as molalities at different temperatures for the CHARMM potential.
which is very different from the values we have obtained, suggesting that the

Table III
Solubility of paracetamol in ethanol expressed as molalities at different temperatures.