The chemical potential of a component in a solution is defined as the free energy change as the amount of that component changes. Computing this fundamental thermodynamic property from atomistic simulations is notoriously difficult because of the convergence issues involved in free energy methods and finite size effects. This Communication presents the so-called S0 method, which can be used to obtain chemical potentials from static structure factors computed from equilibrium molecular dynamics simulations under the isothermal–isobaric ensemble. This new method is demonstrated on the systems of binary Lennard-Jones particles, urea–water mixtures, a NaCl aqueous solution, and a high-pressure carbon–hydrogen mixture.

## INTRODUCTION

The chemical potential of a solute as a function of its concentration is pivotal for understanding many important physical and biological processes, such as osmosis, the solvation of organic molecules, the behavior of electrolytes, precipitation of crystals from solutions, and phase equilibria of mixtures.

Despite their massive importance, computing the chemical potential of solutions from atomistic simulations is notoriously difficult and has not become a routine task. This is because the existing methods^{1–9} apply with many caveats and are often restricted to a subset of systems. For example, methods based on Monte Carlo particle insertion and removal^{10,11} can have numerical convergence issues^{5} and do not work for dense fluids. Thermodynamic integration or overlapping distribution method from the real solution to a reference fluid^{1,2} involves running multiple simulations with different coupling constants as well as along different thermodynamic paths, may have singularity problems at the end points of integration,^{7,8} and is difficult to use for molecules with complex topology. The direct coexistence^{6} can have long equilibration time, and it only works at conditions close to the solubility limit. Kirkwood–Buff integrals^{12,13} suffer from severe finite size effects.^{13,14}

Here, we propose a generic, easy-to-use method (the S0 method) for computing chemical potentials of solutes at different concentrations in solutions. This method uses the thermodynamic connection between composition fluctuations and the derivative of chemical potential with respect to concentration. In practice, only the static structure factors computed from equilibrium isothermal–isobaric (NPT) molecular dynamics (MD) simulations at different solute concentrations are needed. We benchmark the method against the systems of binary Lennard-Jones particles, urea–water, NaCl aqueous solution, and a high-pressure carbon–hydrogen mixture.

## THEORY

Consider a large binary particle reservoir with fixed numbers of type A and type B particles in the NPT ensemble (illustrated in Fig. 1). The Gibbs free energy for this reservoir can be expressed as

which indicates that each particle type has a constant chemical potential.

Inside the NPT ensemble, a fixed volume *V*, which is large but much smaller than the reservoir, with permeable boundary can be regarded a grand canonical ensemble (*μ*VT). The corresponding grand potential is

where *Q* is the canonical partition function of the *V*-region with the numbers of the two types of particles *n*_{A} and *n*_{B}. By taking the derivatives of Ω with respect to the chemical potentials, one gets the particle number fluctuations Δ*n*_{A} = *n*_{A} − ⟨*n*_{A}⟩ and Δ*n*_{B} = *n*_{B} − ⟨*n*_{B}⟩,

where ⟨…⟩ indicates the expectation value of an observable in the grand canonical ensemble. As derived in the seminal paper of Kirkwood–Buff,^{12} such equilibrium fluctuations can be used obtain the derivatives of *μ* at constant (*P*, *T*) conditions, e.g.,

where *c*_{A} = ⟨*n*_{A}⟩/*V* equals the concentration of type A particles in the NPT reservoir. Reference 12 then relates the particle number fluctuations to the Kirkwood–Buff integrals of radial distribution functions. Instead, here we take a different route and evaluate ⟨Δ*n*_{A}Δ*n*_{B}⟩ using static structure factors detailed below.

The instantaneous density field of particle number inside the NPT ensemble is

where $riA(t)$ is the position of atom *i* of type *A* at time *t*, and the average number density is

where $riA(t)$ is the position of atom *i* of type *A* at time *t*, and for isotropic systems, $\rho A(1)(r)=cA$. Note that ⟨…⟩_{NPT} is the NPT ensemble average, while ⟨…⟩ without subscript indicates *μ**VT* average. To consider the two-body correlations between the density at different points in space, **r**′ and **r**″,

These density correlation functions from the NPT ensemble encode the particle fluctuations inside the *μ*VT ensemble of volume *V* because

One can do a Fourier expansion of the instantaneous density field in space **r** inside the *μ*VT ensemble with volume *V*, e.g.,

As the density field is a real function, $\rho \u0303A(\u2212k,t)=\rho \u0303A\u22c6(k,t)$, where the latter is the complex conjugate.

The static structure factor between two types of particles (A–A, A–B, or B–B) is defined as

It is easy to verify that *S*_{AB}(**k**) is the Fourier expansion of $\rho AB(2)(r\u2032,r\u2033)$, i.e.,

Combining Eq. (10) with (13), the structure factor is related to the particle number fluctuations via

Furthermore, the Kirkwood–Buff integral (KBI)^{12} between components *A* and *B* is related to the structure factor by

Importantly, although the Fourier expansions above are performed inside the volume *V* of the *μ**VT* ensemble, for isotropic liquid with translational invariance, *S*_{AB}(**k**) should be the same in all parts of the larger NPT ensemble—including the whole volume of the NPT ensemble. To obtain *S*_{AB}(**k**) from NPT simulations of finite fluid systems with periodic boundary conditions, one can only let one dimension of the simulation box fluctuate with barostat, while only collecting *S*(**k**) for **k** along the plane of the two fixed dimensions. Even better, one can also perform the Fourier expansion on the scaled coordinates and obtain the static structure factors using

where $r\u0302(t)=r(t)\u27e8l\u27e9NPT/l(t)$ and *l*(*t*) is the instantaneous dimension of the supercell. The scaling procedure is rigorous at the thermodynamic limit, where the NPT and the NVT ensembles are equivalent.

To determine lim_{k→0}*S*_{AB}(**k**) from MD simulations of finite system sizes, one can compute *S*(**k**) at small **k** under the NPT ensemble and extrapolate to the **k** → 0 case using the Ornstein–Zernike form,^{15}

Finally, to compute chemical potentials, one can run multiple equilibrium NPT simulations with different concentrations and then obtain $\mu A(cA=\u27e8cA\u27e9NPT)$ using numerical integration with respect to ln *c*_{A},

referenced to the chemical potential $\mu A0$ at a standard molar concentration $cA0$. One can conveniently select this reference to be the pure state. Strictly speaking, Eq. (19) provides the relative chemical potential instead of the absolute value, but only the former is a physical observable. The first two terms on the right-hand side of Eq. (19) give the ideal-mixture chemical potential *μ*^{id} and the third term is the excess chemical potential *μ*^{ex} = *k*_{B}*T* ln(*γ*_{A}) where *γ*_{A} is the activity coefficient of solute A. The ratio $\gamma A\u2032=1SAA0\u2212SAB0cA/cB$ is related to the activity coefficient by $\gamma A\u2032=1+d\u2061ln(\gamma A)/d\u2061lncA$.

In addition, although one can evaluate *μ*_{A}(*c*_{A}) and *μ*_{B}(*c*_{B}) separately using Eq. (19) from the same simulations, one can obtain *μ*_{A} from *μ*_{B} employing the Gibbs–Duhem equation under constant P,T conditions,^{16}

To automatically satisfy the Gibbs–Duhem equation with the S0 method, one can use

where *χ*_{A} and *χ*_{B} = 1 − *χ*_{A} are the molar fraction of *A* and *B*, respectively.

It is worth discussing finite size effects in the current approach and, in particular, the difference with the KBI method. Typically, when using the KBI, one either starts from the pair correlation functions or collects *n*_{A} and *n*_{B} inside a fixed volume V during NPT simulations.^{14,17} However, the open boundary of *V* imposes very large finite size effects,^{14} and even for *μ**VT* systems with hundreds of thousands of atoms, a significant fraction is found lying on the boundary.^{18} Without finite size corrections, the KBI approach is hardly applicable.^{14,17} In contrast, the computation of *S*(**k**) in an NPT ensemble avoids such boundary effects. In addition, the effects coming from using a finite wavelength (2*π*/**k**) are partly corrected by the physically inspired extrapolation [Eq. (18)].

## BINARY LENNARD-JONES SYSTEM

A binary mixture (A, B) of Lennard-Jones (LJ) fluids was simulated using the same potential as Ref. 14: a purely repulsive 6–12 LJ potential truncated and shifted with cutoff radius 2^{1/6}*σ*, and with *σ*_{AA} = *σ*_{BB} = *σ*_{AB} = 1, *ϵ*_{AA} = 1.2, *ϵ*_{BB} = 1.0, and $\u03f5AB=\u03f5AA+\u03f5BB/2$. Constant temperature and pressure were enforced through a stochastic velocity rescaling thermostat and Nose–Hoover barostat. Three system sizes, i.e., total number of atoms, 4000, 23 328, and 108 000, were considered in the range of mole fractions of type A particles, *χ*_{A} = 0.05, …, 1.0. Simulations were carried out using LAMMPS^{19} with a time step of 10^{−3}. The total simulation time was 10^{6} simulation steps for the two larger systems and 10^{7} for *N* = 4000. To compute *S*(**k**), we collected a snapshot per 2000 steps of the trajectory.

Figure 2(a) shows *S*_{AB}(**k**) computed from NPT simulations at different system sizes for a selected molar fraction of particles. The dashed curves are fits to Eq. (18) with a maximum cutoff in the wavevector *k*_{cut} = 2*π* × 0.2*σ*^{−1}, although we found the fitted values for *S*^{0} are insensitive to this choice as long as the corresponding wavelengths are larger than the atomic spacing. Although the *S*_{AB}(**k**) values at small **k**-vectors are scattered, due to the many *S*_{AB}(**k**) values available, the fitted values of $SAB0$ have extremely small statistical errors, as shown in the legend of Fig. 2(a). Even with a small size of 4000 particles, the estimate for $SAB0$ is converged. Such insensitivity to the system size is again confirmed from Fig. 2(b), which shows the extrapolated values of $SAB0$ at different molar fractions. As a benchmark, in Fig. 2(c), we compare the excess chemical potential $\mu Aex$ and $\mu Bex$, with the reference $cA0$ and $cB0$ set to the concentrations of pure A and pure B [Eq. (19)], computed using the S0 method and Widom particle insertion.^{20} Widom insertion simulations were performed with the same system size (23 328 atoms), 4 × 10^{6} time step, and one particle insertion per time step. The magnitudes of *μ*^{ex} are quite small, and they only differ slightly between A and B. *μ*^{ex} values for both A and B are larger when *χ*_{A} is higher because type A particles have a stronger repulsive core. However, the *μ*^{ex} values and their trend, which are smaller than the statistical errors of the Widom particle insertion method, can be captured by the S0 method.

## UREA IN WATER

We analyzed the MD trajectories of urea/water mixtures from Ref. 21, simulated using the Kirkwood–Buff derived force field^{22} and SPC/E water at 1 atm pressure and 300 K temperature. Four urea molar concentrations were considered (2, 4, 6, and 8M) with the total number of urea and water molecules, i.e., system size, equal to 13 000–16 000. A comparison of the derivative of the activity coefficient $\gamma urea\u2032=1+d\u2061ln(\gamma urea)/d\u2061lncurea$ with previous results is shown in Fig. 3. The S0 results agree closely with those of Ref. 14, which uses a KBI with finite size corrections, and with Exp. 1.^{22,23}

## NaCl AQUEOUS SOLUTION

Simulations of NaCl water solutions at different molar concentrations (0.1–9.3M) were performed using LAMMPS^{19} at 298.15 K and 1 bar. The JC/SPC/E^{6} force field was used, with the Lennard-Jones interactions truncated at 1 nm, long-range Coulomb interactions treated using a particle–particle particle-mesh solver, and the constraints for the rigid water molecules enforced by SHAKE. A fixed amount of 32 640 water molecules together with 128–10 880 NaCl ion pairs were used. The time step was 2 fs, the total number of steps were 2 000 000, and the Nose–Hoover thermostat and barostat were used.

Figure 4 shows the excess chemical potentials for NaCl ion pairs $(\mu NaClex)$ and water $(\mu waterex)$ at different salt molality *m* (mol NaCl/kg water) calculated using the S0 method, which was integrated using Eq. (19), and we found that employing Eq. (21) rendered fully consistent values. $\mu NaClex$ and $\mu waterex$ are compared with previous results computed using Osmotic Ensemble Monte Carlo,^{16} the Bennett acceptance ratio method,^{25} and thermodynamic integration.^{26} Note that the value of $\mu NaClex$ depends on the standard chemical potential reference, and each study handles this differently (we set $\mu NaClex(m=0.1)=0$), so only the change as a function of *m* is physically meaningful. All four studies compared in Fig. 4 are fairly consistent, while the results of the S0 method agree particularly well with those of Ref. 25.

## HIGH-PRESSURE CARBON–HYDROGEN MIXTURE

The implications for this C–H mixture in the context of planetary science will be discussed in Ref. 27; here, we focus on how adding hydrogen to liquid carbon changes the chemical potential of carbon atoms. We used a machine learning potential developed in Ref. 27 and performed NPT simulations at different *N*_{C}/*N*_{H} ratios at T = 5000 K and P = 100 GPa. The system sizes, i.e., total number of atoms, lie between 11 232 and 82 944.

In Fig. 5, we show the derivative of the activity coefficient $(\gamma C\u2032)$ and excess chemical potential $\mu Cex$ of carbon at different molar fractions. The S0 method predicts $\mu Cex$ values that are in good agreement with the results obtained by coexistence simulations,^{27} but it can be used for the regime of low carbon concentration, which becomes prohibitive using the coexistence approach. Interestingly, $\gamma C\u2032$ shows a nonmonotonic behavior with a minimum at *N*_{H}/*N*_{C} = 2 and maxima at *N*_{H}/*N*_{C} = 1 and 4. These magic numbers are probably related to the chemical bonds between C and H.

To conclude, we present the S0 method to compute the chemical potential of mixtures just from equilibrium MD simulations at the NPT ensemble by simply computing static structure factors. We demonstrate the generality and robustness of the S0 method on diverse systems, including a model LJ system, organic molecule in water, aqueous electrolyte, and a high-pressure solution. In principle, the S0 method is also applicable to larger molecules such as polymers or large-molecule solvents. Extension to multicomponent systems is also straightforward: To compute the chemical potential of a certain species, one can simply treat the remaining particles as the second species. Compared with the previous methods, such as particle insertion^{10,11} and thermodynamic integration,^{1,2} the S0 method is more generally applicable and works particularly well for dense fluids with complex interactions and high solute concentrations. Extension to solid solutions may be possible by following the derivation of Ref. 28. We envisage the S0 method will largely simplify the computation of the chemical potential of complex solutions and make them routine tasks.

## SUPPLEMENTARY MATERIAL

The supplementary material contains a detailed derivation of the statistical mechanical framework and additional information of the molecular dynamics simulations.

## ACKNOWLEDGMENTS

I thank Daan Frenkel for providing feedback on an early draft and for stimulating discussions, Debashish Mukherji and Robinson Cortes-Huerto for sharing the trajectories for urea–water mixtures, and Aleks Reinhardt for useful suggestions on the manuscript.

## AUTHOR DECLARATIONS

### Conflict of Interest

The author have no conflicts to disclose.

### Author Contributions

**Bingqing Cheng**: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Funding acquisition (equal); Investigation (equal); Methodology (equal); Project administration (equal); Resources (equal); Software (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal).

## DATA AVAILABILITY

All PYTHON scripts and simulation input files generated for the study are in the SI repository https://github.com/BingqingCheng/S0, Ref. 29.