We present an accurate and efficient method to obtain Kirkwood-Buff (KB) integrals in the thermodynamic limit from small-sized molecular dynamics simulations. By introducing finite size effects into integral equations of statistical mechanics, we derive an analytical expression connecting the KB integrals of the bulk system with the fluctuations of the number of molecules in the corresponding closed system. We validate the method by calculating the activity coefficients of aqueous urea mixtures and the KB integrals of Lennard-Jones fluids. Moreover, our results demonstrate how to identify simulation conditions under which computer simulations reach the thermodynamic limit.

One of the most significant achievements of statistical mechanics is the connection between macroscopic thermodynamic quantities and the microscopic components and interactions of a physical system. This relation is routinely employed in computer simulations, e.g., molecular dynamics (MD), where structural, nanoscale features of a system can be easily measured and related to macroscopic emergent properties. However, this protocol is still a source of major concern.^{1} In particular, the physical systems under examination in experiments are open, that is, they exchange energy and matter with the environment or, at least, are composed by an Avogadro number of molecules. By contrast, MD simulations are usually constrained to work under canonical (microcanonical) conditions with fixed temperature (energy), volume *V*_{0}, and number of particles *N*_{0}. Finite number of particles and small volumes used to compute physical observables thus introduce various kinds of finite size effects, whose impact needs to be quantified if one aims at extracting thermodynamic quantities from simulations.

In this study, we include finite size effects into statistical mechanics integral equations, namely the Kirkwood-Buff^{2} equations, to derive accurate relations between thermodynamic properties and structural quantities obtained from nanoscale MD simulations. We use simple scaling arguments to keep track of two effects originating from the finite size of the simulated system: *ensemble* effects due to the fixed number of particles and *boundary* effects due to the small size of the subdomain used to compute physical observables.^{3} The derived analytical expression is used to obtain Kirkwood-Buff (KB) integrals in the thermodynamic limit. We validate this approach with a mixture of Lennard-Jones particles and employ it to study the solvation thermodynamics of prototypical urea/water liquid mixtures. In the latter case, we compare the results with previous computational studies^{4} and available experimental data.^{5–7}

We focus our attention on the Kirkwood-Buff (KB) theory,^{2} as it is arguably the most successful framework to investigate the properties of complex liquid mixtures. KB theory relates the local structure of a system to density fluctuations in the grand canonical ensemble, which are in turn related to equilibrium thermodynamic quantities such as the compressibility, the partial molar volumes, and the derivatives of the chemical potentials.^{8} Despite the fact that KB theory was formulated more than sixty years ago, it enjoys renewed interest in computational soft-matter and statistical physics communities. Recent works, in fact, have shown promising applications related to solvation of biomolecules^{9} and potential uses to compute multicomponent diffusion in liquids,^{10} and to study complex phenomena such as self-assembly of proteins.^{11}

For a multicomponent fluid of species *i*, *j* in equilibrium at temperature *T*, the Kirkwood-Buff integral (KBI) is defined as

with *δ _{ij}* the Kronecker delta. The superscript (o) indicates that this definition holds for an open system, i.e., a system described by the grand canonical ensemble. This means that we compute fluctuations of the number of particles in a subdomain of volume

*V*embedded in a reservoir whose size goes to infinity. Thus, 〈

*N*〉 is the average number of

_{i}*i*-particles inside

*V*, or

*ρ*= 〈

_{i}*N*〉/

_{i}*V*. Moreover, $ g i j o ( r 12 ) $ is the multicomponent radial distribution function (RDF) of the infinite system, with

**r**

_{12}=

**r**

_{2}−

**r**

_{1}.

To evaluate the integral on the right hand side (r.h.s.) of Eq. (1), we assume that the fluid is homogeneous, and we use translational invariance that implies the limit *V* → ∞. Therefore, the transformation **r**_{2} → **r** = **r**_{2} − **r**_{1} gives $ G i j \u221e $, the KB integrals of the bulk system

Eq. (2) is strictly valid in the thermodynamic limit. However, it is extensively used in computer simulations despite the fact that one is usually constrained to study closed systems (c), i.e., systems described by the canonical or microcanonical ensembles. More precisely, in computer simulations one calculates the quantity

where $ g i j c ( r ) $ is the multicomponent RDF of the finite system and *R* is a cutoff distance.

To justify the use of Eq. (3), it is always assumed that $ g i j o ( r ) \u2248 g i j c ( r ) $, which implies that the volume of the simulation box is large enough to locally reproduce the open system, or grand canonical, thermodynamics. In addition, it is assumed that *R* ≫ *ζ*, with *ζ* being the structural correlation length of the liquid. This implies that $ g i j c ( r \u2265 R ) =1$, that is, structural correlations between points separated by a distance *R* vanish and the integral of Eq. (3) can be safely truncated.

We thus face two different finite sizes simultaneously at play: the finite—and often too small—number of particles that constitute the simulated system (ensemble); and the extension of the subregion in which some physical observable is computed (boundary effects), which is at risk of being smaller than or too close to the system’s correlation length. It is therefore only fair to expect that results obtained using Eq. (3) might not be fully reliable. As a matter of fact, the minimal errors introduced by using Eq. (3) yield dramatic explicit finite size effects that become apparent when comparing standard MD with semi-grand canonical simulations.^{12}

Here, instead of using Eq. (3), we explicitly include both finite size effects into an expression, analogous to Eq. (1), defined for a finite system. Let us recall that in computer simulations one considers systems with total fixed number of particles *N*_{0} and volume *V*_{0} (see Fig. 1). Thus we define^{13}

or, to rephrase it, we define a quantity *G _{ij}*(

*V*,

*V*

_{0}) that is evaluated by computing fluctuations of the number of particles in finite subdomains of volume

*V*inside a simulation box of volume

*V*

_{0}. The average number of

*i*-particles 〈

*N*〉′ ≡ 〈

_{i}*N*〉

_{i}_{V,V0}depends on both subdomain and simulation box volumes, and it is therefore conceptually different from the average used in Eq. (1). Moreover, the integral on the r.h.s. of Eq. (4) should be evaluated for the RDF of the finite system with volume

*V*

_{0}by using a finite integration domain

*V*.

As a first step, we introduce a correction to ensemble effects accounting for the fact that $ g i j c ( r ) \u2248 g i j o ( r ) $ holds only in the thermodynamic limit. This implies that the asymptotic behaviour of the tail of $ g i j c ( r ) $ should be included to account for small differences at large values of *r*. We propose to use an expression for the multicomponent pair correlation functions of the form

based on the asymptotic limit $ g i j c ( | r | \u226b \zeta ) =1\u2212 ( \delta i j / \rho i + G i j \u221e ) / V 0 $ discussed in Ref. 8. As expected, when the total volume *V*_{0} → ∞ we recover $ g i j c ( r ) = g i j o ( r ) $. We include Eq. (5) in the integral on the r.h.s. of Eq. (4) and obtain

In Eq. (5) we neglect *O*(1/*V*_{0}) contributions that include derivatives of $ g i j o ( r ) $ with respect to *ρ _{i}*, as well as $O ( 1 / V 0 2 ) $ terms, because their contribution to ensemble size effects becomes negligible upon integration.

Second, for the boundary effects, we correct for the finite domain in which we compute the integral on the r.h.s of Eq. (6), which indeed includes the RDF of the open system. However, the volume *V* is finite and the transformation **r**_{2} → **r** = **r**_{2} − **r**_{1} cannot be applied directly since the system is no longer translationally invariant. Alternatively, we take into account the different integration domains defined by the volumes *V* and *V*_{0},^{13} and write

with *S* the surface area of the subdomain *V* and $ \alpha i j \u2032 $ a proportionality constant. The meaning of Eq. (7) can be understood as follows. Both integrals, *I _{VV}* and

*I*

_{VV0}, are equal to zero for

*r*

_{12}>

*ζ*. Nevertheless, for values of

*r*

_{12}close to the boundary of the subdomain V, i.e.,

**r**inside

_{1}*V*and

**r**outside the subdomain with

_{2}*r*

_{12}<

*ζ*, there are contributions to

*I*

_{VV0}that are missing in

*I*. Consequently, the difference

_{VV}*I*

_{VV0}−

*I*must be proportional to the surface area of the subdomain

_{VV}*V*, so that $ \u222b V \u222b V 0 \u2212 V d r 1 d r 2 [ g i j o ( r 12 ) \u2212 1 ] \u221dS$.

^{14}Replacing Eq. (7) in Eq. (6), we obtain

where for convenience we use $S/V\u221d1/ V 1 3 $. The constant *α _{ij}* depends only on intensive thermodynamic system properties such as density and temperature.

To compute the integral on the r.h.s. of Eq. (8), we assume that *V*_{ζ} < *V* < *V*_{0} with *V*_{ζ} = 4*πζ*^{3}/3 and implicitly neglect finite size effects due to oscillations of the RDF beyond the volume *V*. Accordingly, we assume that periodic boundary conditions are applied such that the system becomes translationally invariant. We use the coordinate transformation **r**_{2} → **r** = **r**_{2} − **r**_{1} to obtain an integral equal to the one in Eq. (2), since for |**r**| > *ζ*, $ g i j o =1$. Finally, we rewrite Eq. (8) as

It is worth noting that Eq. (9) gives both the correct thermodynamic limit and asymptotic behaviour^{8} of *G _{ij}*(

*V*,

*V*

_{0}), provided

*V*>

*V*

_{ζ}. The former is obtained in the limit (

*V*,

*V*

_{0}) → ∞ that gives $ G i j ( ( V , V 0 ) \u2192 \u221e ) = G i j \u221e $. The latter holds when

*V*=

*V*

_{0}, and gives

*G*(

_{ij}*V*=

*V*

_{0},

*V*

_{0}) = −

*δ*/

_{ij}*ρ*.

_{i}The two finite size corrections discussed above can be easily identified in Eq. (9): (i) the two *V*/*V*_{0} terms (ensemble effects) appear from using an asymptotic correction to the pair correlation functions, Eq. (5), which includes explicitly the impact of the finite size of the simulation box *V*_{0}. This correction is similar to the one deduced from the Taylor expansion in terms of *N*_{0} applied to the RDF of closed single-component systems.^{3} (ii) The 1/*V*^{1/3} term (boundary effect) results from considering finite integration domains explicitly. Using a different approach, namely arguments based on the thermodynamics of small systems,^{15} a correction of this form was proposed recently for single component systems^{16} and extended for multicomponent systems.^{17} However, the results obtained using this method neglect the finite size of the simulation box, because the underlying theory is only valid for finite systems embedded in a reservoir of infinite volume.^{16}

Finally, we emphasise that Eq. (9) allows us to evaluate $ G i j \u221e $, the infinite system KBI, by computing *G _{ij}*(

*V*,

*V*

_{0}). In practice, we perform a single NVT MD simulation at fixed

*V*

_{0}and, for the production trajectory, compute the fluctuations of the number of particles as a function of the volume

*V*of the subdomain. We recall that extrapolations of thermodynamic properties from finite size computer simulations have been applied in the past to study single-component idealised systems. Examples include the calculation of the compressibility of the Ising lattice gas

^{18}and hard disk fluids,

^{19}and the investigation of the gas-liquid transition in two-dimensional Lennard-Jones fluids,

^{20}as well as the calculation of the elastic constants of model solids.

^{21}

We define $\lambda \u2261 V / V 0 1 3 $ and rewrite Eq. (9) as

This expression depends on *λ* and the volume of the simulation box *V*_{0}. Since in MD simulations *V*_{0} is fixed, the effect of the size of the simulation box on Eq. (10) reduces to a global shift that leaves the general form of the equation unchanged. This implies that results for different values of *V*_{0} superpose and it is possible to calculate $ G i j \u221e $, and therefore the open system RDF given the order of approximation implicit in Eq. (5), even from relatively small MD simulations.

To validate Eq. (10), we first consider a binary mixture (*A*, *B*) of Lennard-Jones (LJ) fluids. We use a purely repulsive 6-12 LJ potential truncated and shifted with cutoff radius 2^{1/6}*σ*. The potential parameters are chosen as *σ* = *σ _{AA}* =

*σ*=

_{BB}*σ*= 1, and

_{AB}*ϵ*= 1.2,

_{AA}*ϵ*= 1.0 with

_{BB}*ϵ*= (

_{AB}*ϵ*+

_{AA}*ϵ*)/2. All the results are expressed in LJ units with mass

_{BB}*m*= 1, time

*σ*(

*m*/

*ϵ*)

_{BB}^{1/2}, temperature

*ϵ*/

_{BB}*k*, and pressure

_{B}T*σ*

^{3}/

*ϵ*. Simulations were carried out using ESPResSo++

_{BB}^{22}with a time step of

*δt*

^{∗}= 10

^{−3}. Constant temperature was enforced through a Langevin thermostat with damping coefficient

*γ*

^{∗}= 1.0. Two system sizes,

*N*

_{0}= 23 328 and 108 000, were considered in the range of mole fractions of

*A*-molecules

*C*= 0.1, …, 1.0. The number density has been adjusted around

_{A}*ρ*

^{∗}= 0.864, thus

*V*

_{0}≈ (30

*σ*)

^{3}and (50

*σ*)

^{3}, such that pressure remains nearly constant at

*P*

^{∗}= 9.8. We performed equilibration runs of 64 ⋅ 10

^{6}MD steps (16 ⋅ 10

^{6}for

*N*

_{0}= 108 000 systems) and production runs of 2 ⋅ 10

^{6}MD steps. To compute

*G*(

_{ij}*λ*), we selected 800 frames per trajectory and for each frame identified 1000 randomly positioned subdomains with volumes ranging from (2

*σ*)

^{3}<

*V*<

*V*

_{0}.

Fig. 2 shows results corresponding to *C _{A}* = 0.3 (a) and

*C*= 0.5 (b) mole fractions. In both cases a linear regime for

_{A}*λ*≤ 0.3, indicated by vertical lines, appears. In this interval we fit a line with slope $ G i j \u221e $ and intersection

*α*. By replacing these values in Eq. (10) we obtain curves that superpose remarkably well on the simulation data over the entire range.

_{ij}In this linear regime, *λ*^{3} ≈ 0 thus $ G i j ( V , V 0 ) \u2248 G i j \u221e $, provided *V* > *V*_{ζ}. This suggests that the region defined by *λ* ≤ 0.3 corresponds to the regime where the canonical and grand canonical ensembles coincide, i.e., the thermodynamic limit. Moreover, this also implies $ g i j c ( r ) \u2248 g i j o ( r ) $. In practical terms, if we seek an accurate value of KBIs using the standard expression (3) for $ G i j R $, i.e., obtaining $ g i j c ( r ) $ from simulations and integrating up to a cutoff *R*, for a LJ system with correlation length *ζ* = 10*σ*, we would need to run a simulation with, at least, *V*_{0} = (50*σ*)^{3}. In other words, the volume of the subdomain *V* > *V*_{ζ} should be *at most* 3% of the volume *V*_{0} in order to get $ G i j R \u2248 G i j \u221e $.^{12}

To test this argument, we compare the results obtained using Eq. (10) with the quantity $ G i j R $ obtained from Eq. (3). Fig. 3 shows $ G A B R $ obtained with *V*_{0} = 50*σ* (red line) and *V*_{0} = 30*σ* (blue line). The horizontal line indicates the value of $ G A A \u221e $ obtained from Eq. (10) and Fig. 2. Clearly, upon increasing the size of the simulation box, $ G A A R $ converges to the $ G A A \u221e $ calculated with our method.

Finally, we use Eq. (10) to study the solvation thermodynamics of urea/water mixtures. In a system with cosolvent *U* (urea) and solvent *W* (water) we have, at pressure *P*, temperature *T*, and density *ρ _{U}*, the derivative of urea activity coefficient defined as

with *γ _{U}* and

*k*

_{B}

*T*ln

*γ*the urea activity coefficient and chemical potential, respectively.

_{U}^{8}

Urea/water mixtures were simulated^{23} in GROMACS,^{24} using the Kirkwood-Buff derived force field^{6} and SPC/E water.^{25} Four urea molar concentrations were considered (2.0, 3.99, 6.00, and 8.02 M). NPT simulations were performed with pressure and temperature enforced with Berendsen barostat (0.5 ps coupling) and thermostat (0.1 ps coupling) to ensure 1 atm pressure and 300 K temperature, respectively.

We compute *γ _{UU}* and report the results in Fig. 4. We compare our data with two experiments, Exp. 1

^{5,6}(solid line) and Exp. 2

^{7}(dashed line), and with simulation results using Eq. (3).

^{4}The latter simulation results (blue squares) tend to follow the behaviour observed in Exp. 1. This is not surprising since the force field

^{6}used in the simulations was parameterised to reproduce the activity coefficients reported in Exp. 1. Nevertheless, by using our method with the same simulation sizes and trajectories of Ref. 4, we obtain results in decidedly better agreement with Exp. 1. This suggests that our method is more accurate than using Eq. (3) and can be profitably employed to investigate complex molecular fluids.

In conclusion, by explicitly including ensemble and boundary finite size effects into integral equations of statistical mechanics, we presented here a method to extract thermodynamic properties of a physical system from small-sized atomistic simulations. In particular, the combination of small-sized simulation setups and the analytical relation established here allows us to determine quantitatively the activity coefficients of aqueous urea mixtures. We also identified the thermodynamic limit for MD simulations based on the ratio between the volume of the domain in which a physical observable is computed and the volume of the simulation box. Finally, we envisage a beneficial usage of this method to study solvation properties of other complex mixtures of biological interest and to characterise the phase diagram of molecular fluids.

R.C.H. thanks Debashish Mukherji for many stimulating discussions and for kindly sharing with us his trajectories for urea water mixtures. We thank Roberto Menichetti, Aoife Fogarty, and Cristina Greco for a critical reading of the manuscript. R.C.H. gratefully acknowledges the Alexander von Humboldt Foundation for financial support.