We present a multilevel Monte Carlo simulation method for analyzing multi-scale physical systems via a hierarchy of coarse-grained representations, to obtain numerically exact results, at the most detailed level. We apply the method to a mixture of size-asymmetric hard spheres, in the grand canonical ensemble. A three-level version of the method is compared with a previously studied two-level version. The extra level interpolates between the full mixture and a coarse-grained description where only the large particles are present—this is achieved by restricting the small particles to regions close to the large ones. The three-level method improves the performance of the estimator, at fixed computational cost. We analyze the asymptotic variance of the estimator and discuss the mechanisms for the improved performance.

A key challenge in the molecular simulation of soft matter is posed by the separation of length-scales between its microscopic description and the existence or emergence of mesoscopic structure. In such cases, one often relies on coarse-grained (CG) descriptions of the system that (approximately) integrate out microscopic degrees of freedom1–3 to yield a tractable simplified model. Examples in the soft-matter context include polymers,4,5 biomolecules,6–8 and colloidal systems.9,10 Such CG descriptions are essential for multi-scale modeling approaches.11,12 However, they are not usually exact, and the associated coarse-graining errors are often difficult to assess.

Such CG models have been studied extensively in colloidal systems with depletion interactions.9,13,14 The typical example is a mixture of relatively large colloidal particles with much smaller non-adsorbing polymers that generate effective attractions between the colloids. This can drive de-mixing, crystallization, or gelation, depending on the context. Model systems in this context include the Asakura–Oosawa (AO) model9 where the CG model can even be exact if the disparity between colloid and polymer radii is large enough. The theoretical tractability of the AO model arises from a simplified modeling assumption that polymer particles act as spheres that can interpenetrate.

Alternatively, one may consider a mixture where both the colloids and the depletant are modeled as hard spheres. From a theoretical perspective, this is an interesting model in its own right as it undergoes a fluid–fluid de-mixing phase separation for large enough size-disparities and concentrations.10,15–17 This happens despite the lack of attractive forces between the particles in the model and can be attributed to geometric packing effects of the big and small spheres.

Direct simulation of such mixtures is very challenging because of the large number of small particles. Accurate CG models are also available in this context,10 but the CG representations are not exact: their errors can be detected by accurate computer simulation of the full (FG) mixtures. Hence, such models are natural testing grounds for theories and simulation methods associated with coarse-graining.

In this context, we recently developed a method17,18 that links a CG description with the underlying fine-grained (FG) description. We call this the two-level method because the CG and FG models describe the same system with different levels of details. The method was validated by computations on the AO system,18 where it provided numerically exact results for the FG model, even in the regime where the CG description is not quantitatively accurate. The methodology was also applied to the hard sphere mixture,17 where it provided a quantitative analysis of the critical point associated with de-mixing of the large and small particles.

These previous results rely on the idea that properties of the FG model can be estimated in terms of some CG quantity, with an additive correction that accounts for the coarse-graining error. This is an importance sampling method, familiar in equilibrium statistical mechanics from free-energy perturbation theory,19 which involves reweighting between two thermodynamic ensembles. In the present context, the reweighting factors depend on the free energy of the small spheres, computed in a system where the large particles are held fixed. This free energy can be estimated by an annealing process based on Jarzynski’s equality20–22 that slowly introduces small particles to fixed CG configurations.

In this paper, we present an extension of the two-level method that incorporates additional intermediate levels to improve the overall performance. Specifically, we introduce a step in the annealing process where small particles are partially inserted in regions close to big particles. Before finishing the small-particle insertion, we then replace weighted sets of configurations with unweighted ones, duplicating configurations with large weight and deleting ones with low weight. This resampling step allows us to make optimal use of the information available at the intermediate stage, focusing our subsequent computations on configurations that matter.

This general approach fits in the framework of sequential Monte Carlo (SMC).23–25 Such algorithmic ideas have been successfully applied in applications across disciplines under various names, including population Monte Carlo26 or the go-with-the-winners strategy.27 Examples in computational physics include the pruned-enriched Rosenbluth method for polymers,28 the cloning method for rare events,29 and diffusion quantum Monte Carlo.30 We combine the SMC method with an additional variance reduction strategy. Instead of estimating the FG average directly, we combine a CG estimate with estimates of subsequent level differences using the previous levels as control variate. This is the idea behind multilevel Monte Carlo methods.31–33 The combination of a difference estimate with SMC has been previously investigated, for example, in Refs. 3436. As in Ref. 17, we develop a general method alongside its application to highly size-asymmetric binary hard-sphere mixtures, which provide a challenging but well understood example to benchmark our algorithm.

This paper is organized as follows: In Sec. II, we introduce the hard-sphere mixture model. In Sec. III, we summarize the setup of the two-level method before presenting our extension to three (or more) levels. The three-level method requires an intermediate level for the hard-sphere mixture, whose details we discuss in Sec. IV. In Sec. V, we present a numerical test of the method and compare its performance against the two-level method, and in Sec. VI, we present convergence results. We conclude in Sec. VII.

Throughout this work, we illustrate the multilevel method with an example system, which is a mixture of large and small hard spheres at size ratio 10:1. This system is challenging for simulation because the big particles may display interesting collective behavior (in particular, a critical point), but the dominant computational cost for simulation comes from the large number of small particles.

However, despite our focus on this single example, we emphasize that the multilevel method is presented in a general way, which should also be applicable in other systems with a separation of length scales.

The example system is a mixture of big and small particles, whose diameters are σB and σS, respectively. We consider a periodic box [0, L]3 of linear size L and we work in the grand canonical ensemble. (This choice is particularly relevant for analysis of de-mixing, where the number density of large particles is a suitable order parameter.37)

In a given configuration, the numbers of big and small particles are N and n, respectively; the position of the ith big particles is Ri, while the position of the jth small particle is rj. We denote the configurations of big and small particles by C=(N;R1,,RN) and F=(n;r1,,rn), respectively, and the full configuration is denoted X=(C,F).

Since the particles are hard, the temperature plays no role in the following, and so we set the temperature as kBT = 1 without any loss of generality. The equilibrium distribution of the mixture is described by a probability density

pF(C,F)=1ΞFeμBN+μSnUF(C,F),
(1)

where the subscript F indicates that we refer to the FG model, μB and μS are the chemical potentials for the large and small particles, respectively, and ΞF is the grand canonical partition function. The particles are hard (non-overlapping), so the potential energy is

UF(C,F)=,if any particles overlap,0,otherwise.
(2)

This pF is normalized as 1=pF(C,F)dCdF, the precise meaning of these integrals is given in Appendix A 1.

Within this setting, the dimensionless parameters of the model are the ratio of the particle diameters σB/σS, the system size parameter L/σB, and the two chemical potentials μS and μB. In practice, μS is more naturally parameterized by the associated reservoir volume fraction ηSr, which we relate to μB via an accurate equation of state.38 

Our multi-level method is designed for accurate estimates of properties of the large particles. Specifically, we consider observable quantities of interest A=A(C) that only depend on the large particles. (Examples are discussed in Sec. II C: see also Fig. 3.) Our aim is to compute the equilibrium average of A, that is,

AF=A(C)pF(C,F)dCdF.
(3)

Since A(C) does not depend on F, it is natural to define the marginal distribution for the big particles

pF(C)=pF(C,F)dF
(4)

so that AF=A(C)pF(C)dC. A similar situation occurs in the context of statistics, where one seeks to analyze the behavior of a few quantities of interest in a high-dimensional system: in that context, the small-particle degrees of freedom in (3) would be referred to as nuisance parameters. This means that their values are not required to compute the quantity of interest, but their statistical properties strongly affect the average of this quantity.

If samples for the marginal distribution pF(C) could be generated by an MC method for the big particles alone, this would make the system much more tractable by simulation. This is a central idea in coarse-grained modeling.2 However, the complexity of packing of the small hard spheres means that pF(C) is a complex distribution, and it is not possible to sample it exactly. A great deal of effort has gone into developing CG models that approximate this distribution with high accuracy.10,16,39

A suitable CG model is an equilibrium distribution with probability density

pC(C)=1ΞCeμBNUC(C),
(5)

where ΞC is the partition function, and the CG (effective) interaction energy is

UC(C)=NΔμ+i=1N1j=i+1NV2(|RiRj|),
(6)

where V2 is a pairwise interaction potential. Averages with respect to the CG model are denoted as

AC=A(C)p(C)dC.
(7)

For a suitably chosen V2, the coarse distribution pC(C) can be an accurate approximation to pF(C). For the CG model in this work, we take the accurate potential V2 = VRED, developed by Roth, Evans, and Dietrich.10 Following Ref. 17, we choose Δμ such that the distributions of N coincide for FG and CG models.

Throughout the paper, we benchmark our numerical methods by considering the hard-sphere mixture with fixed parameters, as follows: We take the ratio of particle sizes (σB/σS) = 10, the linear size of the periodic system is L = 31σS, and the small-particle (reservoir) volume fraction is ηSr=0.2. This volume fraction is large enough to generate a significant depletion attraction between the large particles, but not strong enough to cause de-mixing of the large and small particles.17 

Aspects of the CG and FG models are illustrated in Figs. 1(a)1(c) for these parameters. In particular, we show representative configurations of the CG and FG models, as well as a plot of the RED potential. While direct grand canonical Monte Carlo (GCMC) sampling of the full mixture is possible, in principle, it should be apparent from Fig. 1(c) that this would be intractable because insertion of large particles in such a fluid is hardly ever possible. Advanced MC methods39,40 might be applicable, but these tend to struggle when the volume fraction gets large. This motivates the development of two-level and multi-level methods.

FIG. 1.

An overview of the levels of the example system from Sec. II C and the structure of the three-level method. (a) A sample of the CG model pC with N = 11 big particles. (b) The two-body potential VRED used for the CG model. (c) A sample of the full FG model pF that has the big particle configuration as (a). This system contains n = 8842 small particles. (d) A sample of the partially inserted model pI used in the three-level algorithm. The small particles are primarily inserted around big particles, reducing the number to n = 4473. (e) A sketch of the two- and three-level method: starting with a population of CG configurations, we can directly compute importance weights by simulating an annealing process introducing the small particles (upper path, two-level method). Alternatively, we introduce a partially inserted intermediate level where we interrupt this annealing process and resample to boost relevant configurations (lower path, three-level method).

FIG. 1.

An overview of the levels of the example system from Sec. II C and the structure of the three-level method. (a) A sample of the CG model pC with N = 11 big particles. (b) The two-body potential VRED used for the CG model. (c) A sample of the full FG model pF that has the big particle configuration as (a). This system contains n = 8842 small particles. (d) A sample of the partially inserted model pI used in the three-level algorithm. The small particles are primarily inserted around big particles, reducing the number to n = 4473. (e) A sketch of the two- and three-level method: starting with a population of CG configurations, we can directly compute importance weights by simulating an annealing process introducing the small particles (upper path, two-level method). Alternatively, we introduce a partially inserted intermediate level where we interrupt this annealing process and resample to boost relevant configurations (lower path, three-level method).

Close modal

Figure 2 highlights properties of the distribution of the number of big particles for the CG model when varying the effective large-particle chemical potential μBeff=μBΔμ. In particular, Fig. 2(b) shows that increasing μBeff in the CG model leads to a non-monotonic behavior in the variance of the particle number N (analogous to the compressibility of the model). This maximum indicates that the system has a tendency for de-mixing at larger ηSr (one expects a divergent compressibility at the critical point, if one exists). In the following, we fix μB at the value corresponding to this maximum—the relatively large fluctuations at this point are challenging for the multi-level model because the distributions pC(C) and pF(C) are broader, requiring good sampling. The corresponding CG system has an average of N ≈ 11.6 big particles, occupying around 20% of the available volume.

FIG. 2.

Properties of a big-particle-only hard sphere model with RED potential when varying the effective chemical potential μBeff as defined in the main text. (a) The average number of big particles ⟨N⟩. (b) The variance of the number of big particles var(N), which is maximized around μBeff=7.

FIG. 2.

Properties of a big-particle-only hard sphere model with RED potential when varying the effective chemical potential μBeff as defined in the main text. (a) The average number of big particles ⟨N⟩. (b) The variance of the number of big particles var(N), which is maximized around μBeff=7.

Close modal

For the specific quantities that we will compute for this mixture, Fig. 3 shows the expectations of the big-particle pair correlation function g(r) and the distribution of the number of big particles P(N). Results are shown for both CG and FG models (in the FG case, the results are computed using the two-level method). For both quantities of interest, the CG model provides an accurate but not exact description of the model. In particular, the CG model underestimates the pair correlation at the point where two big particles are in contact. The distributions of the number of big particles in Fig. 3(b) are both unimodal: both the FG and CG systems are well below the critical point of demixing.

FIG. 3.

Two quantities of interest for the binary hard-sphere system, computed for the CG and FG model from Sec. II C. (a) The big-particle pair correlation function g(r). Apart from underestimating its value at the touch of two big particles, the CG approximation captures the behavior accurately. (b) The distribution of the number of big particles N. By the choice of Δμ, the average number of big particles of the CG and FG models coincide. Both models are clearly well below the critical point of demixing.

FIG. 3.

Two quantities of interest for the binary hard-sphere system, computed for the CG and FG model from Sec. II C. (a) The big-particle pair correlation function g(r). Apart from underestimating its value at the touch of two big particles, the CG approximation captures the behavior accurately. (b) The distribution of the number of big particles N. By the choice of Δμ, the average number of big particles of the CG and FG models coincide. Both models are clearly well below the critical point of demixing.

Close modal

Compared to the critical hard-sphere mixture discussed in Ref. 17, the system we consider here is smaller and has a lower volume fraction ηSr of the small particles. This is still challenging for conventional Monte Carlo algorithms but can be simulated fast enough to evaluate the performance and compare the computational methods discussed here. Furthermore, the lower small-particle volume fraction helps with the construction of the intermediate level in Sec. IV, whose underlying approximation decays as ηSr increases (see  Appendix B).

This section reviews the two-level method of Refs. 17 and 18 and then lays out its three-level extension. The presentation of the method is intended to be generic and applicable to a variety of systems. However, we first introduce the key ideas using the example and illustrations of Fig. 1 for the hard-sphere mixture.

The two-level method is constructed with the scale separation of the mixture in mind: it splits the simulation of the big and small spheres into two stages by first simulating a CG system of large particles alone and computing ⟨AC. Then, differences between ⟨AC and ⟨AF are computed by a reweighting (importance sampling) method. The weight factors for this computation are obtained by an annealing step, where the small particles are slowly inserted into the system, with the large particles held fixed [see Fig. 1(e)]. The advantage of this procedure is that large particle motion only happens in the CG simulation where the small particles are absent—there is no scale separation in this case so simulations are tractable. Similarly, insertion of the small particles happens in a background of fixed large particles, so these annealing simulations do not suffer long time scales associated with large-particle motion. This makes for tractable simulations in scale-separated systems, as long as the CG model is sufficiently accurate: see Refs. 17 and 18 for further discussion.

In practice, the simulation effort for two-level computations is dominated by the annealing step. The weighting factors are required to high accuracy, which means that the annealing must be done gradually. Moreover, the weights are subject to numerical uncertainties that tend to be large in systems with many small particles. This limits the method to systems of moderate size, with moderate ηSr (see Ref. 17).

We show in this work that such problems can be reduced by breaking the annealing process into several stages—this is the idea of the three-level method [Fig. 1(e)]. Specifically, we start (as before) with a population of configurations of the CG model. We perform a first annealing step where the small particles are added in regions that are close to large ones. The information from this step is used in a resampling process, which partially corrects the coarse-graining error by discarding some of the configurations from the population and duplicating others. (This idea is similar to go-with-the-winners.27) Finally, the second annealing step inserts the small particles in the remaining empty regions, arriving at configurations of the FG model. Hence, the end point is the same as the two-level method, but the annealing route is different.

In practice, the effectiveness of the three-level method relies on a clear physical understanding of the intermediate (partially inserted) system in order to decide which configurations to discard in the resampling step. For the hard-sphere case, that issue will be discussed in Sec. IV; a more general discussion is given in Sec. VII. The remainder of this section describes the two- and three-level methods in more detail.

We review the two-level method of Refs. 17 and 18. For a general presentation, we assume that CG and FG models exist with configurations C and X=(C,F), respectively. In the case of hard spheres, C and F correspond to configurations of the large and small spheres, respectively.

The two-level method is an importance sampling41 (or reweighting) computation, closely related to the free-energy perturbation method of Zwanzig.19 We use the grand canonical Monte Carlo (GCMC) method to sample MC configurations from pC, and these are denoted by C1,C2,,CMC. Then, the CG average can be estimated as

ÂC=1MCj=1MCA(Cj).
(8)

As the sampling is increased (MC), we have ÂCAC. However, if the coarse-graining error

Δ=AFAC
(9)

is significant, ÂC does not provide an accurate estimate of ⟨AF.

To address this problem, we use an annealing procedure based on Jarzynski’s equality20 that starts from a coarse configuration C and populates the fine degrees of freedom F̂; at the same time, it generates a random weight Ŵ(C) with the property that

Ŵ(C)J=ξpF(C)pC(C),
(10)

where the angle brackets with subscript J indicate an averaging over the annealing process (analogous to Jarzynski’s equality20), and ξ is a constant (independent of C). The details of the annealing process are given in  Appendix A. It is applied to a set of MF coarse configurations, again denoted by C1,C2,,CMF, which are typically a subset of the MC CG configurations above.

For later convenience, we define

Ŵn(C)=Ŵ(C)/ξ.
(11)

In practical applications, the constant ξ is not known, but its effect can be controlled by defining the self-normalized weight

ŵ(Cj)=Ŵ(Cj)1MFi=1MFŴ(Ci).
(12)

Since the Cj are representative of pC, the denominator in ŵ converges to ξ as MF and so ŵ(Cj)Ŵn(Cj). Then, the estimator

ÂF=1MFj=1MFŵ(Cj)A(Cj)
(13)

converges to ⟨AF as MF. (In the case that Ŵ is not random then this procedure recovers the free energy perturbation theory of Zwanzig.19)

The annealing process has one useful additional property: Let the joint probability density for the weight and the fine degrees of freedom be κ(Ŵ,F̂|C), which is normalized as κ(Ŵ,F̂|C)dF̂dŴ=1. We show in  Appendix A that

Ŵκ(Ŵ,F̂C)dŴ=ξpF(C,F̂)pC(C).
(14)

This formula is the essential property of the annealing procedure, which is required for the operation of the method. Additionally integrating over F̂ shows that (14) ensures that (10) also holds. This means in turn that if B=B(C,F) is an observable quantity that depends on both coarse and fine degrees of freedom,

B̂F=1MFj=1MFŵ(Cj)B(Cj,F̂j)
(15)

converges to ⟨BF as MF.

This method can be easily improved without extra computational effort. The key idea31–33 is to estimate the FG average as the sum of the CG average and the coarse-graining error (9),

AF=AC+Δ.
(16)

Then, use importance sampling to estimate Δ, as

Δ̂=1MFj=1MFŵ(Cj)1A(Cj).
(17)

Finally, a suitable estimator for the FG average is obtained by combining the estimate of the coarse-graining error with the corresponding CG quantity,

ÂF,Δ=ÂC+Δ̂.
(18)

This estimator converges to ⟨AF in the limit where MC, MF. As discussed in Ref. 18, the variance of the estimate Δ̂ is typically smaller than that of ÂF, and the CG estimate ÂC is cheap to compute accurately. Thus, the combined difference estimator ÂF,Δ is typically more accurate at fixed computational cost.

The importance sampling methodology has a useful physical interpretation, which we explain for the example of the hard-sphere mixture. If we consider a fixed configuration of the large particles, the grand canonical partition function for the small particles is

Ξ[C,μS]=eμSnUF(C,F)dF.
(19)

As the system is annealed (the small particles are inserted), we estimate (19) by a free-energy method based on Jarzynski’s equality20 (see  Appendix A for details). Since the annealing is stochastic, this yields an estimate of the partition function, which we denote by Ξ̂[C,μS]. Moreover, this estimate is unbiased Ξ̂[C,μS]J=Ξ[C,μS]. Hence, we can take

Ŵ(C)=Ξ̂[C,μS]eUC(C),
(20)

and using (4) and (5), we see that (10) holds, with ξ = (ΞCF).

Physically, the CG model is constructed so that the Boltzmann factor eUC(C) is a good estimate of the small-particle partition function Ξ[C,μS]. If this is the case, the model is accurate. The two-level methodology uses estimates of the small-particle partition function (or, equivalently, their free energy) and compares it with the assumptions that were made about this quantity in the CG model. By analyzing the differences between these quantities, the differences between CG and FG models can be quantified. The effectiveness of this method for numerical simulation of the mixtures of large and small particles was discussed in Refs. 17 and 18.

The distribution of the importance weights ŵ(C) impacts the accuracy of the resulting FG estimate ÂF. Additionally, it serves as a useful indicator of the accuracy of the CG model and the variance of the free energy computation. To give an example, we apply the two-level method to the example problem from Sec. II C. In Fig. 4, we show the empirical distribution of 18 000 weights of the example system that are computed using an accurate annealing process; we use these computations as the reference solution in Sec. V. This illustrates a situation where the two-level method is applicable, where no single sample dominates and only very few samples have a weight larger than 10.

FIG. 4.

The empirical distribution of weights ŵ(C) of the two-level method for 18 000 coarse samples CpC applied to the example system from Sec. II C.

FIG. 4.

The empirical distribution of weights ŵ(C) of the two-level method for 18 000 coarse samples CpC applied to the example system from Sec. II C.

Close modal

If one considers less accurate CG models, the variance of the weights increases, and the tail of their distribution gets heavier. Eventually, one would reach a situation where a few samples dominate the weighted sum (13). For accurately computed weights ŵ(C), such a breakdown of the two-level method indicates that the CG model is not sufficiently accurate. This behavior provides a useful feedback loop that can be used to iterate on the CG model itself.18 

We now present the three-level method for estimation of A(C)F.

1. Coarse level

We start by generating M0 samples of the CG model, denoted by C01,,C0M0. The subscript indicates the step within the algorithm (which is 0 for the initial sampling of coarse configurations). The CG average of A can be estimated similarly to (8)

ÂC3L=1M0j=1M0A(C0j).
(21)

2. Intermediate level

In addition to the CG and FG models, the three-level method also relies on an intermediate set of configurations, which correspond in the hard-sphere mixture to the system where the small particles have been inserted in regions close to the large ones (see Fig. 5). This state is described by an equilibrium probability distribution

pI(C,F)=1ΞIeμBN+μSnUI(C,F),
(22)

where UI(C,F) is an interaction energy. Its construction for the hard-sphere mixture will be discussed in Sec. IV.

FIG. 5.

Schematic representation of the small particle insertion process during the two stages of the three level method.

FIG. 5.

Schematic representation of the small particle insertion process during the two stages of the three level method.

Close modal

The first annealing step of the three-level algorithm applies the two-level method, with the FG distribution pF replaced by pI. This part of the algorithm closely follows Sec. III B, and we give a brief discussion that mostly serves to fix notation. We start with a set of M1 coarse configurations that are samples of pC; they are denoted by C11,C12,,C1M1, where the subscript 1 indicates the intermediate stage of the three-level method. (These will typically be a subset of the configurations that were generated on the coarse level.)

For each coarse configuration C1, we anneal the fine degrees of freedom of the system F̂1 to arrive at the intermediate level and generate a random weight Ŵ1(C1) with the property

Ŵ1(C1)J=ξ1pI(C1)pC(C1),
(23)

with a constant ξ1 independent of C1. (For the hard spheres, we recall that particles are inserted preferentially in regions close to large ones, and this is illustrated in the top row of Fig. 5.)

As before, we define Ŵ1n(C1)=Ŵ1(C1)/ξ1. Again the constant ξ1 is generally not known, so we define the self-normalized weight

ŵ1(C1j)=Ŵ1(C1j)1M1i=1M1Ŵ1(C1i),
(24)

which converges to Ŵ1n(C1j) as M1. Then, the estimator

ÂI3L=1M1j=1M1ŵ1(C1j)A(C1j)
(25)

converges to ⟨AI as M1. Similar to (14), the joint probability density κ1(Ŵ1,F̂1C1) of the weight and fine degrees of freedom at the intermediate level, defined by the annealing process, fulfills

Ŵ1κ1(Ŵ1,F̂1C1)dŴ1=ξ1pI(C1,F̂1)pC(C1).
(26)

Hence, similar to (15), we also obtain

B̂I3L=1M1j=1M1ŵ1(C1j)B(C1j,F̂1j),
(27)

which converges to ⟨BI as M1.

3. Fine level

At the end of the intermediate level, we have M1 large-particle configurations. For each configuration C1j, the process of annealing to the intermediate level also provided the weight ŵ1(C1j) and the small-particle configuration F̂1j. This information can be used to build a set of configurations that are representative of pI. This procedure is called resampling, its validity in this example relies on the property (26) of the annealing procedure. This is the part of the method that is similar to population-based sampling approaches, such as SMC28 or go-with-the-winners.27 The idea is that one should focus the effort of the annealing process onto coarse configurations that are typical of the full system, and to discard those which are atypical, see Fig. 6 for a visualization of this step.

FIG. 6.

Visualization of the resampling step. We start with a population of weighted configurations (top row), where the weighting is depicted by a star rating. The goal of the resampling step is to randomly transform the weighted population into an unweighted one that has, on average, the same empirical distribution. We achieve this by duplicating large-weight configurations and deleting small-weight configurations, yielding an unweighted population of configurations (bottom row).

FIG. 6.

Visualization of the resampling step. We start with a population of weighted configurations (top row), where the weighting is depicted by a star rating. The goal of the resampling step is to randomly transform the weighted population into an unweighted one that has, on average, the same empirical distribution. We achieve this by duplicating large-weight configurations and deleting small-weight configurations, yielding an unweighted population of configurations (bottom row).

Close modal

We write X̂1j=(C1j,F̂1j) for the full configuration that is obtained by the annealing procedure at the intermediate level. The resampled configurations will be denoted by X21,X22,,X2M2; they are representative of the intermediate level pI. There are M2 of them, and the subscript 2 indicates the final stage of the three-level method. The simplest resampling method (multinomial resampling) is that each X2i is obtained by copying one of the X̂1j, chosen at random with probability ŵ1(C1j). In applications, one typically replaces this by a lower variance resampling scheme like residual resampling; see Ref. 42 for a comparison of commonly used variants.

We then perform the second annealing step that starts from an intermediate level configuration X2=(C2,F2) and anneals the fine degrees of freedom from the intermediate to the fine level, yielding F̂2 and a weight Ŵ2(X2); details are given in  Appendix A. For the hard sphere system, this involves further insertion of small particles to fill the system and generate realistic configurations of the full mixture. This procedure is shown in the bottom row of Fig. 5.

Since the starting point of the annealing procedure is X2, the joint probability density of the annealing process κ2(Ŵ2,F̂2X2) depends on both large and small particles. Therefore, the analog of (26) requires an additional average over the small particles of the starting configuration

Ŵ2κ2(Ŵ2,F̂2C2,F2)pI(F2C2)dŴ2dF2=ξ2pF(C2,F̂2)pI(C2)
(28)

for some constant ξ2. Note that pI(F|C)=pI(C,F)/pI(C). Similar to (23), the weights Ŵ2(X2) have the property

Ŵ2(C2,F2)JpI(F2C2)dF2=ξ2pF(C2)pI(C2).
(29)

From here, we proceed as before. We define the normalized weight Ŵ2n(X2)=Ŵ2(X2)/ξ2 and its self-normalized estimate

ŵ2(X2j)=Ŵ2(X2j)1M2i=1M2Ŵ2(X2i).
(30)

Since the X2j are representative of pI, it follows from (28) that observables of the coarse system A can be estimated as

ÂF3L=1M2j=1M2ŵ2(X2j)A(C2j),
(31)

which converges to ⟨AF as M2. Similar to (15), we can also obtain a consistent FG estimates of observable quantities B that depend both on coarse and fine degrees of freedom by

B̂F3L=1M2j=1M2ŵ2(X2j)B(C2j,F̂2j).
(32)

Following the same variance reduction strategy as in Sec. III B, we can define a difference estimator of the FG average, which is expected to have lower statistical uncertainty: let

Δ̂I3L=1M1j=1M1ŵ1(C1j)1A(C1j),
(33)
Δ̂F3L=1M2j=1M2ŵ2(X2j)1A(C2j).
(34)

Then,

ÂF,Δ3L=ÂC3L+Δ̂I3L+Δ̂F3L
(35)

is a consistent estimator of ⟨AF, analogous to (18).

4. General features of the three-level method

A few comments on the three-level method are in order. First, there is a simple generalization to four or more levels by splitting the annealing procedure into more than two stages. As such, the method is an example of a sequential Monte Carlo (SMC) algorithm (which is sometimes more descriptively referred to as sequential importance sampling and resampling28,43–45). We note from (10) that the weights obtained from the annealing step are random, and this is not the standard situation in SMC, but similar ideas have been previously studied in Refs. 4649. Combining an SMC algorithm with a difference estimate as in (35) has been investigated in Refs. 3436.

Second, we observe that the key distinction between the two- and three-level algorithms is the resampling step at the intermediate level. Without this, the three-level method reduces to a simple two-level method with an arbitrary stop in the middle of the annealing process. As noted above, the resampling process is designed to partially correct differences between the CG and FG models. This relies on a good accuracy of the intermediate level (otherwise the wrong configurations might be discarded, which hinders numerical accuracy). On the other hand, we note that for sufficiently large numbers of samples M0, M1, M2, the method does provide accurate FG estimates, even if the CG and intermediate level models are not extremely accurate. The distinction between the different methods comes through the number of samples that are required to obtain accurate FG results.

Third, note that the ideal situation for difference estimation is that the three terms in (35) get successively smaller. That is, the coarse estimate is already close to ⟨AF, the intermediate-level estimate provides a large part of the correction, and the fine-level correction is small. In this case, it is natural to use a tapering strategy where the number of samples used at each level decreases,

M0>M1>M2.
(36)

This allows a fixed computational budget to be distributed evenly between the various levels to minimize the total error.

As noted above, the intermediate probability distribution pI must be designed carefully in order for the resampling part of the three-level method to be effective. We now describe how this is achieved for the hard sphere mixture.

To motivate the intermediate level, recall Fig. 5, and note that defining a suitable CG model is equivalent to an estimate of the small-particle free energy in the final (fully inserted) state. The physical idea of the intermediate level is that the free energy associated with the first stage of insertion may be hard to estimate (because of the complicated packing of the small particles around the large ones), but the free energy difference associated with the second stage should be easier (because it corresponds to insertion into large empty regions where the packing of the small particles is similar to that of a homogeneous fluid, whose free energy can be estimated based on analytic approximations). A combination of these ideas yields an intermediate level that represents the big particle statistics more accurately than the CG model. Similar ideas have been considered before in multi-scale simulation,4,50,51 in particular the problem of estimating the small-particle free energy has some similarities to estimation of solvation free energies (where the depletant here plays the role of a solvent).

We start by analyzing the small particles, so we fix the large particles in some configuration C. The idea of the intermediate level is to first insert small particles only in a region close to the large particles C, and then use this information to make the intermediate marginal distribution pI(C) match the FG marginal pF(C) as closely as possible. The structure of the intermediate level is depicted in the bottom row of Fig. 1(e), and an example configuration is shown in Fig. 1(d). We implement this idea by introducing an effective (one-body) potential that acts on the small particles. We first define

dist(r,C)=minj=1,,N|rRj|
(37)

to be the distance from the point r to the nearest large particle. Small-particle insertion is suppressed in regions far from large particles by a potential energy term

Ũ(C,F)=j=1nEC(r),
(38)

where

EC(r)=ε(dist(r,C))
(39)

and the function

ε(r)=0,r<δfree,ssin2(rδfree)π2l,δfreer<δfree+l,s,rδfree+l
(40)

interpolates from zero (for small distances r) to the value s at large r. This function acts as a smoothed out step function, where δfree is the position of the step and l its width. In Figs. 1(e) and 5, areas where EC(r)>0 are indicated by blue shaded regions, in which the insertion of small particles is suppressed.

Then, define a grand-canonical probability distribution for the small particles in the partially inserted (intermediate) system as

p̃I(FC)=1Ξ̃I[C,μS]eμSnUF(C,F)Ũ(C,F).
(41)

This distribution is normalized as p̃I(FC)dF=1. It depends on the three parameters s, δfree, l, as well as the underlying parameters of the hard sphere mixture model.

The next step is to construct the weights Ŵ1(C). For consistency with (22), we write the intermediate-level distribution in the form

pI(C,F)=1ΞIeμBN+μSnUF(C,F)Ũ(C,F)Φcorr(C).
(42)

As discussed above, the term Φcorr should be designed so that the respective coarse-particle marginals pI(C) and pF(C) match as closely as possible. Using (4), (19), and (41), we can show that a perfect match requires Φcorr(C)=Φex(C) with

Φex(C)=logΞ̃I[C,μS]ΞF[C,μS]ϕ0,
(43)

where ϕ0 is an irrelevant constant. Since the Ξs in (43) are partition functions, determination of Φex reduces to computation of the free energy difference between the non-homogeneous small particle distributions of the partially and fully inserted system. We now explain how Φcorr is defined, as an approximation to Φex.

As a preliminary step for estimating Φex, we first consider the grand potential Φ for the small particles, in a system with no large particles, where the small particles feel an (arbitrary) smooth potential E=E(r). The grand potential of this system is

Φ[E;μS]=logeμSnUF(0,F)j=1nE(rj)dF,
(44)

where 0 indicates the large-particle configuration with no particles at all (N = 0).

If E varies slowly in space, a simple approach to this integral is to assume that the system is locally the same as a homogeneous system in equilibrium—similar to the local density approximation.52 In this case,

Φ[E;μS]Vp(μSE(r))dr,
(45)

where p is the pressure, expressed as a function of the chemical potential.

However, this approximation is not sufficiently accurate for the current application. To this end, we include a correction to account for inhomogeneities, as a squared gradient term Φ[E;μS]Φsq[E;μS] with

Φsq[E;μS]=Vp(μSE(r))+g(μSE(r))|E(r)|2dr.
(46)

(Within a gradient expansion, this is the first correction that is consistent with rotational and inversion symmetry.)

We show in Appendix B 1 that g can be estimated as

g(μ)=3η2πσS32q2S(μ;q)q=0,
(47)

where S(μq) is the structure factor of the small hard-sphere system. For a numerical estimate of this Φsq, we estimate the pressure p by the accurate equation of state from Ref. 38, and g is estimated from (47) using the structure factor from Ref. 53. A numerical example demonstrating the accuracy of this second order approximation for a non-homogeneous hard-sphere fluid can be found in Appendix B 2.

We are now in a position to approximate Φex in terms of Φsq. This (analytical) calculation is illustrated in Fig. 7. We require an estimate of Φex, which is the free-energy difference between the partially inserted and fully inserted systems in panels (a) and (b). This is achieved as a sum of three free-energy differences. In the first step, the large particles are removed and the small-particle fluid is re-equilibrated, to fill up the remaining space, leading to panel (c). Then, the confining potential Ũ is removed and the small particles fully inserted, leading to (d). Finally, the large particles are re-inserted and the small particles re-equilibrated again, leading to (b).

FIG. 7.

Illustration of the computation of Φcorr, which is an estimate of the free energy difference Φex between panels (a) and (b) [see (48)]. As described in the text, this difference is computed as a sum of three differences using the integration path [(a),(c),(d),(b)]. The free energy difference between (c) and (d) is Φ[0;μS]Φ[EC;μS]. We make the approximation that the differences between [(a),(c)] and [(d),(b)] are equal and opposite, and this should be accurate if the shaded blue regions in panel (a) are well-separated in space from the large particles. Combining this assumption with the square gradient approximation (46) yields Φcorr(C) in (50) as a numerically tractable estimate of Φex.

FIG. 7.

Illustration of the computation of Φcorr, which is an estimate of the free energy difference Φex between panels (a) and (b) [see (48)]. As described in the text, this difference is computed as a sum of three differences using the integration path [(a),(c),(d),(b)]. The free energy difference between (c) and (d) is Φ[0;μS]Φ[EC;μS]. We make the approximation that the differences between [(a),(c)] and [(d),(b)] are equal and opposite, and this should be accurate if the shaded blue regions in panel (a) are well-separated in space from the large particles. Combining this assumption with the square gradient approximation (46) yields Φcorr(C) in (50) as a numerically tractable estimate of Φex.

Close modal

To make this precise, define ΦC[E;μS] as the grand potential of the small particles in the potential E, where the large particles are also included, with configuration C. Then, the desired free energy difference between panels (a) and (b) is

Φex(C)=ΦC[0;μS]ΦC[EC;μS],
(48)

where we took ϕ0 = 0.

From the definitions in Sec. IV A, the free energy difference between panels (c) and (d) is Φ[0;μS]Φ[EC;μS], from (38) and (44). Our central approximation is that the free energy difference between panels (a) and (c) is (approximately) equal and opposite of the difference between (d) and (b) because the local environment of the large particles is the same in both cases. (The only differences are in regions far from any large particles.) At this level of approximation, the free energy differences between [(a),(b)] and [(c),(d)] are equal,

Φex(C)Φ[0;μS]Φ[EC;μS].
(49)

Finally, the right-hand side can be estimated by the square gradient approximation (46), yielding Φex(C)Φcorr(C) with

Φcorr(C)=Φsq[0;μS]Φsq[EC;μS].
(50)

Operation of the three-level method requires numerical estimates of this Φcorr, which includes the integral in (46). Moreover, its value is exponentiated when computing weight factors Ŵ, so these numerical estimates are required to high accuracy. This is a non-trivial requirement because the integrand is constant on regions far from the big particles, but it varies much more rapidly when these particles are approached. In such situations, adaptive quadrature schemes are appropriate: we use the cuhre algorithm of the cuba library54 that uses globally adaptive subdivision to refine its approximations in the relevant regions of space. Note, however, that while the choice of the numerical integrator influences the intermediate level, small errors in estimation of this integral will be corrected by the second annealing step, so such errors do not affect the consistency of our numerical estimators.

Given this choice of Φcorr(C), the intermediate level distribution pI of (42) has been completely defined, although it still depends on the three parameters δfree, s, l that appear in the function ɛ(r). We also note that given the approximations made, it is not expected that this pI is optimal (its marginal pI(C) does not match pF(C) perfectly). Subsection IV C discusses the parameter choices, and some possibilities for correction factors that can be added to Φcorr in order to address specific sources of error.

In fixing the parameters δfree, s, l, several considerations are relevant. First, if s is too small or δfree is too large, the potential EC has little effect on the system and the small particles are not restricted to be close to the large ones. In this case, pI ends up close to pF, and there is little benefit from the intermediate level. On the other hand, the accuracy of Φsq is greatest when the gradient of the potential EC is small, and this favors small s and large l, δfree. In practice, it is also convenient if the two annealing stages insert similar numbers of particles, so that their computational costs are similar. For the example system in Sec. II C, we will present results for a suitable parameter set

δfree=0.5σS,s=4.4,l=3.5σS.
(51)

We have also tested other values, a few comments are given below.

We will consider several variants of the intermediate level. We denote by pI(1) the distribution defined by (42) and (50), with parameters (51). Figure 8 shows how the quantities of interest differ between the CG and FG models, and the corresponding differences between the intermediate level and the FG model. Here, Δg(r) is the difference between g(r) for the FG model and the distribution of interest (which is either the CG distribution pC or one of the variants of the intermediate distribution). ΔP(N) is the corresponding difference in the probability that the system has N large particles.

FIG. 8.

Estimated accuracy of the CG and the intermediate levels from the main text for the example from Sec. II C. We show the difference between the respective CG and intermediate level estimates and the true FG estimate for the pair correlation function g(r) in (a) and the distribution of big particles in (b). The FG estimates of these quantities of interest are shown in Fig. 3.

FIG. 8.

Estimated accuracy of the CG and the intermediate levels from the main text for the example from Sec. II C. We show the difference between the respective CG and intermediate level estimates and the true FG estimate for the pair correlation function g(r) in (a) and the distribution of big particles in (b). The FG estimates of these quantities of interest are shown in Fig. 3.

Close modal

For the value of g(r) at contact, we see that the intermediate level pI(1) corrects around half of the deviation between CG and FG models. However, the probability distribution of the N has the opposite situation that the intermediate level is less accurate than the CG model. [This is partly attributable to the fact that Δμ in Eq. (6) has been chosen to make the CG model accurate.]

To explore the behavior of the intermediate level, we constructed two variants of pI. The aim is to understand why pI(1) has inaccuracies and to (partially) correct for them. There are two main approximations in the intermediate level pI(1): the first is (49) and the second is that Φ[EC,μS] can be approximated by the square-gradient approximation (46). The first approximation neglects a significant physical phenomenon in these systems, which is a layering effect of the small particles around the large ones. This is illustrated in Fig. 9 by the radial distribution function gBS0 between large and small particles (measured in a system with a single large particle). One sees that there is typically an excess of small particles close to the large ones, followed by a deficit [gBS0(r)<1], and a (weak) second layer.

FIG. 9.

The layering of small particles (σS = 1) around one big particle (σB = 10) at volume fraction ηS = 0.2. The displayed pair correlation function gBS(r) given the radius from the center of the big particle shows that small particles form layers of higher and lower concentration that vanish with increased distance from the big particle.

FIG. 9.

The layering of small particles (σS = 1) around one big particle (σB = 10) at volume fraction ηS = 0.2. The displayed pair correlation function gBS(r) given the radius from the center of the big particle shows that small particles form layers of higher and lower concentration that vanish with increased distance from the big particle.

Close modal

For (49) to be accurate, the intermediate level should have enough small particles to capture this layering so that the particles being inserted in the second annealing stage are not strongly affected by the presence of the large particles. However, computational efficiency requires that δfree is not too large, so these layers are not fully resolved at the intermediate level. To partially account for this effect, we make an ad hoc replacement of μS in (46) by an effective chemical potential μlay(r), which is chosen such that the corresponding reservoir volume fraction ηSlay(r) satisfies

ηSlay(r)ηSr=gBS0(dist(r,C)).
(52)

In estimating the free energy of the small particles that are inserted in the second level of annealing, this adjustment to Φsq helps to counteract the error made in (49), leading to an updated potential Φcorr,2. The intermediate level constructed in this way is denoted by pI(2). The results of Fig. 8 show that this variant is (somewhat) more accurate than pI(1). However, the intermediate level still tends to have a smaller number of large particles than the full (FG) mixture.

To investigate this further, we took 800 representative CG configurations. For each one, we estimate the error associated with the approximation (50)

ΔΦcorr=ΦexΦcorr,2.
(53)

Results are shown in Fig. 10. One sees that the errors are of order unity (note that Φex itself is of order 104 so this is a small relative error; see below); there is a systematic trend that Φcorr,2 underestimates Φex when N is large. To correct this error, we introduce an additional correction term to Φcorr,2,

Φcorr,3(C)=Φcorr,2(C)+αcorrN,
(54)

and denote the intermediate level constructed in this way by pI(3).

FIG. 10.

The error of the free energy prediction for 800 configurations sampled from the coarse distribution pC, grouped by their number of big particles. Each dot represents the difference of an estimate of the predicted small-particle free energy used in the intermediate level pI(2) against an estimate of the full free energy. We correct for the noticeable trend in the error with a linear ad-hoc correction term, displayed in black.

FIG. 10.

The error of the free energy prediction for 800 configurations sampled from the coarse distribution pC, grouped by their number of big particles. Each dot represents the difference of an estimate of the predicted small-particle free energy used in the intermediate level pI(2) against an estimate of the full free energy. We correct for the noticeable trend in the error with a linear ad-hoc correction term, displayed in black.

Close modal

A least squares fit to Fig. 10 suggests to take αcorr = 0.076; in practice, this tends to over-correct the error in Φcorr,2(C), and we find better performance with a smaller value

αcorr=0.058.
(55)

However, the performance of the method depends only weakly on the specific choice of αcorr, which is discussed in  Appendix C. For all following results, we define the intermediate level pI=pI(3) to use the potential Φcorr,3.

An important aspect of the three-level method is the self-consistency of the general approach. The intermediate level variants pI(1) and pI(2) were constructed on a purely theoretical basis. The corresponding results in Fig. 8 indicated good performance, but that the distribution of N had a systematic error. This error was quantified precisely in Fig. 10, which enabled an improvement to the intermediate level. In principle, this procedure could be repeated to develop increasingly accurate variants of pI. That approach would be useful if (for example) one wanted to consider increasingly large systems, where the requirements for the accuracy of pI become increasingly demanding.

One way to see the effect of system size is to note that Fig. 10 required the estimation of ΦC[EC,μS] and ΦC[0,μS], whose values are of order 1 × 104. Since the free energies are exponentiated in the weights for resampling, an absolute error of ±1 is required on these free energies, while their absolute values are extensive in the system size. Hence, one sees that accurate estimates of the free energy are required: their relative error is required to be of the order of the inverse volume of the system.

In this section, we apply the three-level method to the example from Sec. II C using the intermediate level from Sec. IV, with the parameters defined in (51) and (55). The parameters and the annealing schedules are chosen such that, on average, the first and second steps have the same computational effort; see  Appendix A for details.

It can be proven49 that the three-level method provides accurate results, in the limit where the population sizes M0, M1, M2 are all large. In particular, we expect the estimators ÂF3L,ÂF,Δ3L to all obey central limit theorems (CLTs), and the two-level estimators ÂF,ÂF,Δ behave similarly. Detailed results are given in Sec. VI. The important fact is that, for large populations, the variances of the estimators behave as

Var(Â)1MΣ,
(56)

where M is the relevant population size and Σ is called the asymptotic variance (it depends on the observable A and on which specific estimator is used). In general, the estimators may have a bias, which is also of order 1/M. This means that the uncertainty in our numerical computations is dominated by the random error, whose typical size is Σ/M, and the mean squared error is given by the variance MSE(Â)=Var(Â), to leading order.

This gives us an easy way to measure and compare the performance of the different estimators. Suppose that we require an estimate of A with a prescribed mean squared error. The associated computational cost can be identified with the population size M and is given by (56) as MΣ/MSE(Â). Clearly, estimators with small Σ should be preferred. In practice, we do not compare computational costs at fixed error, instead we compare variances Var(Â) at fixed M. For any two algorithms (and assuming that M is large), the ratio of these variances approximates the ratio of the Σ’s, which can then be interpreted as a ratio of computational costs (at fixed MSE). Numerical results are presented in Sec. V B.

We note that the theoretical results for convergence do not require that the coarse or intermediate levels are accurate. However, one easily sees18 that serious inaccuracies in these levels lead to very large Σ. In such cases, one may require prohibitively large populations to obtain accurate results.

In this section, we demonstrate (for the example in Sec. II C) that we do not require very large populations for the three-level method and that the numerical results are consistent with (56). After that, we estimate the asymptotic variances for the two-level and three-level methods. We will find that introducing the third level improves the numerical performance, corresponding to a reduction in Σ.

To this end, we investigate the pair correlation g(r) of the big particles. As seen in Fig. 8(a), the coarse approximation of g(r) has a substantial error, especially when two big particles are in contact. To quantify this specific effect, we define the coordination number Nc, which is the number of large particles within a distance r1 of a given large particle. (For a given configuration, this quantity is estimated as an average over the large particles. We take r1 ≈ 10.73σS to be the first minimum of g(r) of the CG model.) For our example, the coordination number for the FG and CG systems are given by

NcF1.61,NcC1.56.
(57)

To illustrate the reliable performance of the method, we take a simple example with M0 = 4 × 105 and M1 = M2 (no tapering) and we focus on the difference estimator ÂF,Δ3L, which we expect to be the most accurate. The corresponding numerical estimate of g(r) is denoted by ĝ(r), binned using 40 equidistant bins at positions rj between r = 10 and r = 12. Figure 11(a) shows estimates of the difference between ĝ(r) and its true value, as the population size increases. (The FG result was estimated independently by the two-level method using a large value of MF = 18 000.) A population M2 of several thousand is sufficient for an accuracy better than 0.5 in each bin of g(r).

FIG. 11.

Estimating the pair-correlation function g(r) from Fig. 3(a) with the two- and three-level method. (a) The difference of three-level estimates ĝ(r) with increasing numbers M1 = M2 number of particles against a reference value of g(r) that was computed with the two-level method. (b) The error of the binned values in (a) as defined in (58). The dotted black line displays the expected asymptotic Monte Carlo convergence rate of M21/2.

FIG. 11.

Estimating the pair-correlation function g(r) from Fig. 3(a) with the two- and three-level method. (a) The difference of three-level estimates ĝ(r) with increasing numbers M1 = M2 number of particles against a reference value of g(r) that was computed with the two-level method. (b) The error of the binned values in (a) as defined in (58). The dotted black line displays the expected asymptotic Monte Carlo convergence rate of M21/2.

Close modal

For smaller M2, fluctuations in the measured ĝ(r) are apparent in Fig. 11(a). To estimate their size, we define the error for a single run of the three-level method by summing over the bins,

(error)2=j=140|ĝ(rj)g(rj)|2.
(58)

Hence, one expects from (56) that this error decays with increasing population, proportional to MF1/2. Figure 11(b) shows an estimate of (58), which is consistent with this expected scaling.

We now investigate whether the three-level method does indeed improve on the performance of the (simpler) two-level method of Refs. 17 and 18. The key question is whether the resampling step is effective in focusing the computational effort on the most important configurations of the big particles.

We recall from above that removing the resampling step from the three-level method leads to a two-level method, where the annealing process is paused at the intermediate level and then restarted again. In order to test the effect of resampling, we compare these two schemes, keeping the other properties of the algorithm constant, including the annealing schedule. (To test the overall performance, one might also optimize separately the annealing schedules for the two-level and three-level algorithms and compare the total computational time for the two methods to obtain a result of fixed accuracy. However, such an optimization would be very challenging, so instead we focus on the role of resampling.)

As a very simple quantity of interest, we take the co-ordination number Nc. We run the whole algorithm Nruns independent times, and we estimate Nc for each run. This can be done using several different estimates of Nc. These are (i) the two-level estimates ÂF and ÂF,Δ from (13) and (18); (ii) the corresponding three-level estimates ÂF3L and ÂF,Δ3L of (31) and (35), in which we also vary the ratio M1: M2, to see the effects of tapering.

All comparisons are done with a fixed total computational budget. We have chosen parameters such that the first and second annealing stages have the same (average) computational cost. This means we need to hold MT = (M1 + M2)/2 constant during tapering. The two-level method takes MF = MT (because the single step of annealing in the two-level method has the same cost as the two annealing steps of the three-level method). For the coarse level estimates ÂC and ÂC3L (which are used in computation of ÂF,Δ and ÂF,Δ3L), the CG computations are cheap so we take M0 = MC = 6 × 106. This is large enough that the numerical errors on these coarse estimates are negligible in comparison to the errors from higher levels.

For each version of the algorithm, we measure the sample variance of the Nruns estimates. Results are shown in Fig. 12 for Nruns = 60 and MT = 500. The error bars are computed by the bootstrap method.55 It is useful that the variance of all these estimators is expected to be proportional 1/MT: this means that reducing the variance by a factor of α requires that the computational effort is increased by the same factor. Hence, the ratio of variances of two estimators is a suitable estimate for the ratio of their computational costs.

FIG. 12.

The sample variance of Nruns = 60 independent estimates of the coordination number Nc. We compare results using a two-level method as well as three-level methods, with and without tapering. Furthermore, we give the results for the final-level (left) and difference (right) variants of the estimator. The error bars are computed via bootstrap; their interpretation is, however, not obvious as the different estimators are highly correlated; see the main text.

FIG. 12.

The sample variance of Nruns = 60 independent estimates of the coordination number Nc. We compare results using a two-level method as well as three-level methods, with and without tapering. Furthermore, we give the results for the final-level (left) and difference (right) variants of the estimator. The error bars are computed via bootstrap; their interpretation is, however, not obvious as the different estimators are highly correlated; see the main text.

Close modal

When carrying out these runs, each estimator was computed by performing annealing on the same set of coarse configurations, to ensure direct comparability. (More precisely, we take a set of 700 representative configurations that are used for the method with M1:M2 = 7:3, and other versions of the method used a subset of these 700.) In addition, it is possible to share some of the annealing runs when computing the different estimators (while always keeping the 60 different runs completely independent). This freedom was exploited as far as possible, which reduces the total computational effort. However, it does mean that the calculations of the different estimators are not at all independent of each other.

All three-level estimators have a reduced standard deviation compared to their two-level equivalents, demonstrating the usefulness of the intermediate resampling step. In all cases, the difference estimate outperforms its equivalent final-level estimate; this effect is stronger for the three-level estimate, providing evidence that the intermediate stop additionally improves the quality of the control variate in the difference estimates.

The effect of introducing tapering from MI = 600 to MF = 400 is difficult to assess, given the statistical uncertainties in this example. The variance of the tapered final-level estimator is very close to the non-tapered one, despite averaging over fewer configurations. This is possible since we start with more samples in the CG model that improves the sampling at the intermediate step, where we then resample to keep relevant configurations. As the results for the 700–300 tapering show, the tapering rate needs to be chosen carefully as a too aggressive rate can decrease the performance quickly.

Overall, the numerical tests in this section provide strong evidence of the benefit of the intermediate resampling. For our example, switching from a two-level to a three-level difference estimator substantially reduces the variance, from around 0.0029 for the two-level method to 0.0016 at a fixed computational budget. As discussed just below (56), the ratio of these numbers can be interpreted as the ratio of costs for the two- and three-level method: the conclusion for this case is that including the intermediate level reduces the cost by ∼45%. This demonstrates a significant speedup in this specific case, which provides a proof-of-principle of the approach.

In Sec. V, we have seen that the three-level method outperforms the two-level method in numerical tests, both for the final-level as well as the difference version of the estimator. In this section, we provide convergence results for both algorithms and compare their asymptotic performance as the number of configurations goes to infinity.

The proof is general, but it does require some assumptions on the models of interest. First, for every allowed CG configuration [that is, configurations C with pC(C)>0], we assume that the quantity of interest A is bounded. In addition, the probability density pC(C) must be non-zero whenever pI(C) is non-zero, and similarly pI(C,F) must be non-zero whenever pF(C,F) is non-zero.

The two-level method has been previously analyzed in Ref. 18. We summarize its key properties. It was noted in Sec. III B that ÂFAF as MF (specifically, this is convergence in probability56). We also expect a CLT for this quantity: as in Eq. (56), the distribution of the error (ÂFAF) converges to a Gaussian with mean zero and variance ΣF/MF. We will derive a formula for this variance, which will be compared later with the corresponding quantity for the three-level model.

For compact notation, it is convenient to define the re-centered quantity of interest

Ar(C)=A(C)AF.
(59)

A significant contribution to ΣF comes from the randomness of the annealing procedure, and this can be quantified as

v(C)=VarJŴn(C),
(60)

where the variance is again with respect to the annealing procedure (from coarse to fine). Then, following Ref. 18, it can be shown that

ΣF=Ar(C)2[w(C)2+v(C)]C,
(61)

where w(C)=Ŵn(C)J=pF(C)/pC(C), so one identifies w(C)2+v(C) as the mean square weight obtained from the annealing procedure. Similarly, the estimator Δ̂ that appears in the difference estimate ÂF,Δ also obeys a CLT, with variance Var(Δ̂)ΣF,Δ/MF, where

ΣF,Δ=Ar(C)2w(C)2+v(C)1C+VarC(A)VarF(A).
(62)

As discussed in Ref. 18, if the computational cost of the coarse model is low, MC can be taken large enough that the variance of the coarse estimator ÂC is negligible, in which case (18) implies Var(ÂF,Δ)Var(Δ̂), and hence

Var(ÂF,Δ)1MFΣF,Δ.
(63)

Comparing (61) and (62)—which give the variances of ÂF and ÂF,Δ, respectively—the term v(C) in (61) is replaced by v(C)1 in (62), which reduces the variance of the estimator. We expect, in general, that VarC(A) and VarF(A) should be similar in magnitude, in which case these terms in (62) should have little effect. Hence, one expects that the estimator ÂF,Δ has lower variance than ÂF. This is consistent with the results of Fig. 12.

The results (61) and (63) are based on the property that each estimator is a sum of (nearly) independent random variables, which means that we can immediately apply standard Monte Carlo convergence results.41 This is not possible for the three-level method, since the resampling step correlates the configurations. This makes the analysis of SMC-type algorithms challenging, but widely applicable results are available.43,57,58 The three-level method in Sec. III C is an implementation of a random-weight SMC method that has been analyzed in Ref. 49.

To analyze the variance of the three-level method, we require results analogous to (61), which depend on the mean square weights associated with the annealing procedure. To this end, define the average of the final level weight

w2(X2)=Ŵ2n(X2)J,
(64)

which fulfills (29). Similar to (60), the variance of this weight is

v2(X2)=VarJŴ2n(X2).
(65)

The averages in these equations are with respect to the second annealing step (from intermediate to fine level), starting at configuration X2 (see Sec. III C 3).

For the contribution to the asymptotic variance of the first annealing step, it is important to consider a product of weight factors: Ŵ1n(C1)w2(X̂1). The first factor in this product is the random weight Ŵ1n that is obtained by annealing from the coarse to the intermediate level, leading to the intermediate configuration is X̂1=(C1,F̂1). The second factor is the averaged weight w2(X̂1) from (64) associated with the second (subsequent) annealing step. Combining (26) and (29), the average of the product is

Ŵ1n(C1)w2(X̂1)J=pF(C1)/pC(C1)=w(C1),
(66)

and the corresponding variance is

v1(C1)=VarJŴ1n(C1)w2(X̂1).
(67)

Hence, w(C1)2+v1(C1) is the mean square value of Ŵ1n(C1)w2(X̂1) with respect to the annealing process: this turns out to be a relevant quantity for the asymptotic variance.

The number of configurations M1, M2 can be varied between steps of the three-level method. We formulate the asymptotic variance in the average number of configurations

MT=12(M1+M2).
(68)

If the two annealing steps have comparable cost, we can then directly compare the variances for different tapering rates at fixed MT. Define also

c=M12MT,c̄=1c=M22MT.
(69)

Then, a direct application of Theorem 2.1 of Ref. 49 gives a CLT for ÂF3L: for large MT, we have

Var(ÂF3L)1MTΣF3L
(70)

with asymptotic variance

ΣF3L=12cΣF,13L+12c̄ΣF,23L
(71)

with

ΣF,13L=Ar(C)2w(C)2+v1(C)C,ΣF,23L=Ar(C)2w2(X)2+v2(X)I.
(72)

The physical interpretation of these formulas will be discussed in Subsection VI C.

Computing the asymptotic variance of the three-level difference estimator ÂF,Δ3L is more difficult, since it involves difference of non-trivially correlated samples. For some examples of multilevel difference estimators, upper bounds on the asymptotic variance have been developed in Refs. 35 and 36. A detailed analysis of these bounds in the context of our algorithm is beyond the scope of this paper.

To understand the differences between the two- and three-level method, we compare the asymptotic variances of their corresponding final level estimators ΣF in (61) and ΣF3L in (71). The variance of the three-level method has two contributions ΣF,13L and ΣF,23L; they are the variances of two-level methods from the coarse to the fine model and the intermediate to the fine model, respectively. The first term ΣF,13L is therefore directly related to ΣF, where the variance of the importance weight v(C) has been replaced by v1(C).

In order to make quantitative comparisons, we again consider the three-level method without intermediate resampling. As discussed in Sec. V B, this is a two-level method with a specific annealing process that consists of the concatenation of the two annealing processes of the three-level method. For the concatenated annealing process, we have

Ŵn(C)=Ŵ1n(C)Ŵ2n(X̂),
(73)

where X̂=(C,F̂) is generated by the first annealing stage. This means that

v(C)=VarJŴ1n(C)Ŵ2n(X̂),
(74)

where the variance is now over the randomness of both annealing processes. Comparing (74) to (67), we see that v1(C) computes the variance of the same importance weight, but after averaging over the second annealing stage in (64). We can apply Jensen’s inequality56 to show that

v(C)v1(C).
(75)

By definitions (61) and (71), this directly implies

ΣF,13LΣF.
(76)

For the case without tapering c = 1/2, the three-level method, therefore, trades a reduction in the variance of the importance weights from coarse to fine in ΣF,13L for the addition of a term ΣF,23L that corresponds to the variance of a two-level method going from the intermediate to the fine level. The possibility of tapering, i.e., c ≠ 1/2, further allows us to optimize the distribution of computation effort between the two stages, which is particularly useful if ΣF,23LΣF,13L.

For our application to hard sphere mixture example in Sec. II C, the annealing process is computationally expensive and the resulting weights are noisy. We are therefore in the situation where the variance v(C) contributes substantially to the overall variance, and where we have constructed an intermediate in Sec. IV that improves on the CG model. Following the discussion above, this is the setting where we expect the three-level method to improve upon a two-level method, which is confirmed by the numerical results in Sec. V. Further discussion of the effect of resampling on random-weight SMC methods can be found in Ref. 49.

We have introduced a three- and multilevel extension of the two-level simulation method first discussed in Ref. 18. We have applied this method to a highly size-asymmetric binary hard-sphere system. As shown in the numerical test in Sec. V and theoretical results in Sec. VI, the introduction of intermediate resampling that distinguishes the two-level from the three-level method can lead to substantial improvements in performance by reducing the variance in importance weights and by allowing efficient allocation of resources between levels via tapering.

In the application to binary hard-sphere systems, the introduction of an intermediate level required us to construct a semi-analytic estimate of the free energy of a system with partially inserted small particles. For this, we have combined a highly accurate square-gradient theory with pre-computed ad hoc corrections, yielding an intermediate level that substantially improves the accuracy of the investigated quantities of interest compared to the initial coarse level. Furthermore as we show in  Appendix C, the three-level method appears robust with respect to slight deviations of the intermediate level.

Compared to our numerical example, Ref. 17 applied the two-level method to larger and more dense systems than considered here, to investigate the critical point of demixing. This was achieved by replacing the two-body CG model with RED potential used in this publication by a highly accurate two- and three-body potential. The computation of accurate effective potentials entails a substantial upfront computational cost (compared to our construction of the intermediate level), but for the hard sphere mixtures, this results in a CG level that is more accurate than our intermediate level. Despite the challenges of keeping the variance of the importance weights under control for large systems, this turned out to be more efficient overall.

We have emphasized throughout that our three-level modeling approach is generally applicable whenever a suitable intermediate level can be constructed. We can identify two main scenarios where this might be attempted. The first scenario is illustrated by the binary hard sphere mixture, which is a two-scale system by construction (there are two species). In this situation, there is no obvious intermediate level, and a careful construction is required, to design one. Our results show that this strategy is possible—it is worthwhile in this example because the system is very challenging to characterize by other methods, so the effort of constructing the intermediate level is worthwhile.

The second scenario—where we may expect a multilevel method to be particularly useful—is that a multi-scale system admits a true hierarchy of coarse-graining, such as a system of long-chain polymers. We can coarse-grain a polymer chain by representing groups of monomers by their center of masses with suitable effective interactions.5,59 By varying the number of monomers per group, we get a hierarchy of CG models that could be targeted by a multilevel method. For such methods to be efficient in such a scenario, we require high accuracy of the CG models, and an efficient annealing process to introduce the finer degrees of freedom analogous to the introduction of the small spheres in the hard sphere mixture. Fulfilling these requirements is still challenging and requires considerable physical insight about the specific polymer system of interest, but the hierarchical structure of the system hints that a suitable method might be fruitfully extended to more than three levels, with commensurately increased performance gains.

In both scenarios, careful thought is required to apply the three-level (or multi-level) methods: our approach is far from being a black-box method. Still, the results presented here show that it can be applied in a practical (challenging) computational problem.

A separate limitation of multilevel methods is that the population of unique coarse configurations is fixed from the start and reduces with each subsequent resampling step. This is closely related to the sample depletion effect commonly observed effect in particle filtering, and SMC methods in general.25,60 For the multilevel method, we can address this by following each resampling step with a number of Markov Chain Monte Carlo (MCMC) steps to decorrelate duplicated configurations and further explore the system at the current level of coarse-graining.60 While such an approach is not feasible for the hard-sphere system where intermediate MCMC is limited by the cost of computing the required approximations, we expect this to be beneficial, for example, whenever intermediate physical systems are described in terms of effective, few-body interactions.

We end with a comment on the implementation of these methods. The introduction of intermediate levels increases the complexity of the code required to simulate the systems. It requires adding an intermediate stage to the annealing process and computing the required integrals (see Sec. IV). Additionally, when implementing the algorithm for the use on compute clusters, the resampling step requires the communication between all nodes. However, we emphasize that while these extra steps require some extra programming, none of the additional steps of the three-level method have added significant computational cost in our example.

To conclude, our results show that the multilevel method can effectively make use of intermediate levels when available, leading to improvements in performance at fixed computational cost. We look forward to further applications of multilevel methods in physical simulations.

This work was supported by the Leverhulme Trust through Grant No. RPG–2017–203.

The authors have no conflicts to disclose.

Paul B. Rohrbach: Investigation (lead); Methodology (lead); Validation (equal); Writing – original draft (lead). Hideki Kobayashi: Investigation (supporting); Methodology (supporting). Robert Scheichl: Conceptualization (equal); Funding acquisition (equal); Writing – review & editing (equal). Nigel B. Wilding: Conceptualization (equal); Funding acquisition (equal); Writing – review & editing (equal). Robert L. Jack: Conceptualization (equal); Funding acquisition (equal); Supervision (equal); Writing – original draft (supporting).

The data that support the findings of this study are available from the corresponding author upon reasonable request.

1. Grand canonical ensemble

We define the grand canonical ensemble of the hard sphere mixture discussed in Sec. II. Recall that kBT = 1. For the system of interest, the equilibrium average of a quantity of interest A(C) in (3) is defined as

AF=1ΞFN=0n=0eμBN+μSnn!N!σB3NσS3n×A(C)eUF(C,F)dR1dRNdr1rn,
(A1)

where each particle position is integrated over the periodic domain [0,L]3. For ease of notation, we introduce the integration measures dC,dF that include the prefactors accounting for the indistinguishability of particles that appear in (A1), which then becomes

AF=1ΞFA(C)eμBN+μSnUF(C,F)dCdF,
(A2)

consistent with (1) and (3). By definition, we require that pF is normalized as pF(C,F)dCdF=1, so we have

ΞF=N=0n=0eμBN+μSnn!N!σB3NσS3neUF(C,F)dR1dRNdr1rn.
(A3)

The relevant quantities of the CG model ⟨AC are defined analogously.

2. Estimation of the partition function

The implementation of the two- and three-level methods requires the computation of the small-particle partition function that appears in the importance weights. We use a method based on Jarzynski’s equality20,21 that yields an unbiased estimator; see also Ref. 18. In the statistics literature, this is also known as annealed importance sampling.22 We first give a short summary of the method in Appendix A 2 a and then discuss how the annealing processes are implemented for the two- and three-level method in Appendix A 2 b. The parameters used for the numerical tests are given in Appendix A 2 c.

a. Theoretical details

We derive an annealing process that inserts the small particles F for a fixed big particle configuration C. This process produces weighted configurations that correctly characterize the FG distribution. We closely follow the results from Appendix A of Ref. 18; see also Refs. 2022 and 61.

Let ps(C,F) and pe(C,F) be two probability distributions for the FG model of the form

pα(C,F)=1ΞαeΦα(C,F),α{s,e}.
(A4)

The corresponding marginal distributions are pα(C)=pα(C,F)dF. The distributions ps, pe are the start and end point of an annealing process, with a sequence of intermediate distributions

pk(C,F)=1ΞkeΦk(C,F),k=0,,K,
(A5)

where p0 = ps and pK = pe.

Let C be a sample from ps(C): this configuration remains fixed during the annealing process. We anneal the small particles, as follows: first sample an initial small particle configuration F from ps(FC), the conditional distribution of ps. This distribution is p0 so write F0=F and set k = 1: then apply a sequence of MC steps with transition kernel qC,k(Fk1Fk) that is in detailed balance with the small particle distribution pk(FC). Iterate this process for k = 1, …, K − 1: this yields a sequence of small-particle configurations (F0,F1,,FK1). The big-particle configuration C stays fixed throughout this process.

The relevant results of this procedure are the final small-particle configuration F̂=FK1 and an annealing weight

ŴA=ek=1KΦk(C,Fk1)Φk1(C,Fk1).
(A6)

Given the initial coarse configuration (C,F), the MC steps define a probability distribution over the weight ŴA and the final small particle configuration F̂, which we denote by

κ(ŴA,F̂C,F).
(A7)

Given the initial configuration (C,F), averages with respect to the annealing process are denoted by ⟨·⟩J.

We now show that this annealing process produces weighted samples of pe, up to a constant. More specifically,

ŴAκ(ŴA,F̂C,F)ps(FC)dŴAdF=ΞeΞspe(C,F̂)ps(C).
(A8)

This implies that averaging over the start distribution ps and the annealing process yields

ŴAB(C,F̂)J,ps=ΞeΞsB(C,F)pe
(A9)

for any function B=B(C,F), which may depend on both big and small particles.

To show (A8), we compute the average over the annealing process explicitly

ŴAκ(ŴA,F̂C,F)ps(FC)dŴAdF=ek=1KΦk(C,Fk1)Φk1(C,Fk1)k=1K1qC,k(Fk1Fk)×ps(F0C)dF0dFK2.
(A10)

By rearranging the factors in the exponential, the right-hand side of (A10) becomes

k=1K1qC,k(Fk1Fk)eΦk(C,Fk1)Φk(C,Fk)×eΦK(C,FK1)ps(F0C)eΦ0(C,F0)dF0dFK2.
(A11)

Detailed balance of the Markov kernels qC,k implies

qC,k(Fk1Fk)eΦk(C,Fk1)Φk(C,Fk)=qC,k(FkFk1),
(A12)

and by definition

eΦK(C,FK1)ps(F0C)eΦ0(C,F0)=ΞeΞspe(C,FK1)ps(C).
(A13)

Using (A12) and (A13), Eq. (A11) simplifies to

k=1K1qC,k(FkFk1)dF0dFK2ΞeΞspe(C,FK1)ps(C).
(A14)

Since qC,k(FkFk1) is a normalized probability density for Fk1, we can perform the integrals in (A14) one by one, yielding (A8).

b. Application to the two- and three-level method

This section describes the details of the annealing processes used in the two- and three-level method. We first discuss its implementation for the two-level method before showing how to split this process into two stages for the three-level method.

The two-level method starts with samples C of the CG model pC. We describe the annealing process,18 which produces a weight Ŵ(C) and small particle configuration F̂ that fulfills (14). Since we have no initial small particle distribution, we cannot directly apply the results of Appendix A 2 a and need to proceed in two steps. Let

pμS(FC)=pF(FC)=1Ξ[C,μS]eμSnUF(C,F)
(A15)

be the distribution of small particles around a fixed big-particle configuration C, where we now explicitly note the dependence on the small particle chemical potential μS. Computing the unnormalized importance weight Ŵ(C) from (20) requires an estimate of the partition function of the small particles Ξ[C,μS]. For a system with a small value of the chemical potential μ0μS, we can directly estimate this quantity

Ξ[C,μ0]=1Ppμ0(C)n=0
(A16)

as it is the reciprocal probability of having zero small particles in a system with fixed C. For small enough μ0, this value is close to 1 and can be estimated quickly by a GCMC simulation that decorrelates quickly due to the low density of small particles. Since we can compute this value to a very low variance at negligible cost, we consider our estimate of Ξ[C,μ0] it to be exact and we neglect the influence of its fluctuations on the overall variance of the method. Furthermore, we assume that we can generate samples from the low chemical potential distribution of small particles pμ0(FC).

Starting with a sample F of the initial small particle distribution pμ0(FC), we can now apply steps of the annealing process defined in Appendix A 2 a. We define the steps of the annealing process by slowly increasing the chemical potential of the small particles μk in K + 1 steps, from μ0 to μK = μS while keeping the CG distribution fixed. More specifically, we simulate an annealing process for the sequence of probability distributions

pC,μk(C,F)=pC(C)pμk(FC),k=0,,K,
(A17)

yielding an annealing weight ŴA and a fine-particle configuration F̂ as described previously.

Averaging over the initial distribution of small particles and the annealing process and using (A8), (A15), and (A17) yields

ŴAκ(ŴA,F̂C,F)pμ0(FC)dŴAdF=Ξ[μS,C]Ξ[μ0,C]pF(F̂C).
(A18)

Combining this with (19), we have

Ξ[μS,C]Ξ[μ0,C]pF(F̂C)=ΞFΞCpF(C,F̂)pC(C)eUC(C)Ξ[μ0,C].
(A19)

Thus, we scale the weight that is produced by the annealing process

Ŵ(C)=Ξ[μ0,C]eUC(C)ŴA.
(A20)

For this weight, the annealing process fulfills (14) when we include the sampling from the distribution of the initial small particles Fpμ0(FC) as part of the annealing process.

For the three-level method, we split the annealing process discussed above into two consecutive steps. The first part follows exactly the same steps as above, where the annealing process increases the chemical potential μk from a small value μ0 to μS. The only difference is that we include the potential Ũ of the intermediate distribution: in place of (A15), we have

pμS(FC)=pI(FC)
(A21)

so that small particle insertion is suppressed in regions far from large particles. As before, the annealing process results in a (scaled) weight Ŵ1(C) and small particle configuration F̂1 that now fulfills (26).

For the second step of the three-level method, we need to define an annealing process that fulfills (28). We start with a sample X2=(C2,F2) from the intermediate level pI. Since this configuration already contains small particles, we can directly apply the results of Appendix A 2 a to anneal from pI(C,F) to pF(C,F). This is achieved by a sequence of intermediate annealing distributions that increase the parameter δ of the potential (40) so that the volume available to the small particles is slowly increased. This is done in K + 1 steps from the parameter δ0 = δfree (the intermediate level) to a final value

δK=maxrdist(r,C2),
(A22)

at which point the suppression potential does not affect any point in the domain. Then, the intermediate level with δK corresponds to the fine-level distribution, up to the correction factor eΦcorr(C) in (42) that only depends on the big particles. Following from (A8), this annealing process with scaled weight Ŵ2(C)=ŴAeΦcorr(C) and final small particle configuration F̂2 fulfills the property (28).

c. Annealing schedules for simulations

The importance weights produced by the annealing process are unbiased, and this feature is independent of the details of the annealing schedules. In this sense, the algorithm is valid for any schedule. However, the variance of the computed importance weights depends strongly on the choice of schedule.

The initial chemical potential μ0 is chosen such that, in a system with no big particles, and there would be an average of n̄0=0.01 small particles present, that is, the initial reservoir volume fraction is ηS,0r=0.01/L3.

For the first stage annealing process, we increase the chemical potential in steps Δμk such that the average change in the number of small particles would be δn̄k=0.2, in a system where no big particles were present. For the second stage, we increase δk in fixed steps Δδk = σS/20 000. In both cases, we run one GCMC sweep between each step.

To compute accurate FG reference results used, for example, in Figs. 4 and 11, we apply the two-level method using the same annealing strategy as for the first stage of the three-level method but with δn̄k=0.05 for increased accuracy. Note that, for the numerical tests in Figs. 12 and 15 that directly compare the performance of the two- and three-level method, the two-level method uses the same annealing schedule as the three-level method outlined earlier. The only difference is the lack of resampling.

1. Perturbative approximation of non-homogeneous hard-sphere fluid

This section derives (47) of the main text. To this end, consider a homogeneous hard sphere fluid at chemical potential μ and add a perturbing potential

E(r)=asin(qr)
(B1)

in (44). In a finite periodic system, q should be a reciprocal lattice vector. We aim to estimate the free energy difference between the perturbed and homogeneous system. For this, we follow the steps of the local density approximation discussed in Ref. 52; see also Chap. 6 of Ref. 62. We approximate this difference as

δΦ[E]=ΦΦ[E]p(μE)p(μ)+g(μE)|E|2dr.
(B2)

To compute g, we need to investigate both sides of Eq. (B2). Starting with the right-hand side, we assume that g is smooth; therefore, we can approximate it for small a by a constant g(μE)g(μ). To compute the integral of the pressure difference in (B2), we expand around μ,

p(μE)p(μ)dr=p(μ)E(r)+p(μ)2E(r)2dr+O(a3)=V4a2p(μ)+O(a3),
(B3)

and integrate the gradient correction term

g(μ)|E|2dr=q2a2g(μ)cos2(qr)dr=V2q2a2g(μ).
(B4)

Overall, we obtain

p(μE)p(μ)+g(μE)|E|2drV2a2q2g(μ)+12p(μ).
(B5)

To investigate the left-hand side of (B2), we look at perturbations of the free energy in the limit of small a. We can express the derivatives of the free energy in a as equilibrium averages of the perturbed system. For a = 0,

Φ[E]a=ρqμ=0,
(B6)
2Φ[E]a2=|ρq|2μ,
(B7)

with ρq=j=1nsin(qrj). The first average is zero by translational invariance. For the second average, use that |ρq|2μ=nμS(μ;q)/2, where S(μq) is the structure factor of the fluid62 at chemical potential μ. So,

2Φ[E]a2=nμ2S(μ;q)=3VπσS3ηS(μ;q).
(B8)

Now differentiate (B2) twice with respect to a, yielding

3VπσS3ηS(μ;q)Vq2g(μ)+12p(μ),
(B9)

where the left-hand side used (B8) and the right-hand side was approximated with (B5) before differentiation.

Finally, differentiate with respect to q and send |q| → 0: we can identify the second order term of the square-gradient approximation as

g(μ)=3η2πσS32q2S(μ;q)q=0,
(B10)

which is (47).

2. Accuracy of the square-gradient approximation

The intermediate level that is constructed in Sec. IV relies on the approximation (46) for the free energy of a non-homogeneous hard-sphere fluid. This section discusses the accuracy of this approximation for a system that only contains small particles in an external potential.

We consider a grand-canonical ensemble of small hard-spheres (σS = 1) in a periodic box V = [0,L]3 of length L = 10σS without any big particles. We perturb this system by a one-dimensional cosine potential with m periods

Em,cos(r)=2cos2mπ10r+1.
(B11)

We apply this potential to the first component r1 of the small particle positions r = (r1, r2, r3). The potential Em,cos has a fixed maximal strength of 4. We vary the steepness of the potential by varying the number of periods of the cosine m = 1, 2, 3; as m increases, the potential changes more rapidly. We expect this to make our approximations increasingly inaccurate, as it is constructed under the assumption that the derivatives of the external potential are small.

We have computed the predicted free energy (46) in this system, as well as the cruder approximation (45), which lacks the square gradient approximation. We compare these values with the true free energy, computed via thermodynamic integration, and investigate the dependence of the accuracy of the approximation on the small particle volume fraction ηSr and the steepness of the external potential. The results of this computation are shown in Fig. 13.

FIG. 13.

A numerical test of the approximation accuracy of the square-gradient method for computing free energies of non-homogeneous hard-sphere fluids. (a) The free energy Φ of the hard-sphere fluid described in Appendix B 2 with a cosine potential with 1, 2, and 3 periods within the simulation box as a function of the volume fraction η. (b)–(d) The difference between the free energy and its pressure integral approximation in (46) for a cosine potential with one (b), two (c) or three (d) periods, with (solid line) and without (dashed lines) the square-gradient term.

FIG. 13.

A numerical test of the approximation accuracy of the square-gradient method for computing free energies of non-homogeneous hard-sphere fluids. (a) The free energy Φ of the hard-sphere fluid described in Appendix B 2 with a cosine potential with 1, 2, and 3 periods within the simulation box as a function of the volume fraction η. (b)–(d) The difference between the free energy and its pressure integral approximation in (46) for a cosine potential with one (b), two (c) or three (d) periods, with (solid line) and without (dashed lines) the square-gradient term.

Close modal

Figure 13(a) shows that the absolute value of the free energy depends weakly on the number of periods m in the external potential. As expected, the error of the approximation methods increases substantially in Figs. 13(b)13(d), as the number of periods m increases, as does the steepness of the cosine potential Em,cos. The absolute prediction error also increases in the volume fraction ηSr. For all m considered here, the square-gradient method (46) substantially outperforms the prediction (45). For m = 1, where the external potential varies the slowest, the square-gradient approximation is nearly exact, which confirms the accuracy of the square-gradient factor g=g(μ) in Appendix B 1. In addition to the increasing error on increasing m, the relative improvement of the square-gradient method over the simple approximation decreases. This indicates that higher order terms in the derivative of the potential start to become more important.

The choice of the 1d cosine potential Em,cos here is motivated by the use of a half-period of the cosine to introduce the suppression potential in the construction of the intermediate level in Sec. IV. For the numerical examples in Sec. V, the suppression potential has a (half-period) length of l = 3.5 and a maximal strength of s = 4.4. In terms of the maximal squared gradient that appears in this potential, it lies between the cases m = 1 and m = 2. This indicates that our square-gradient approximation is appropriate for the use of predicting the free energy of the partially inserted system in the binary hard-sphere example.

The construction of the intermediate level in Sec. IV included an ad hoc correction term that was identified using preliminary computations. In this section, we take another look at this parameter and discuss its importance for the performance of the three-level method. For this, we consider four values for the correction factor

αcorr0,0.04,0.058,0.076
(C1)

between αcorr = 0 (no ad hoc correction, which means pI=pI(2)), and αcorr = 0.076 (corresponding to a linear least-square fit to the weights in Fig. 10). The effect of the choice of correction factor is displayed in Fig. 14 where we show the differences between the intermediate levels and the FG estimate for the two quantities of interest (see Fig. 3 for their FG averages). The ad-hoc correction has a noticeable influence on the quantities of interest, especially for the pair correlation function g(r) when the two particles are almost touching (r ≈ 10). This is in the relevant region for the coordination number Nc, which was measured as part of the performance test in Sec. V B.

FIG. 14.

The difference between the estimates at the intermediate and final level for (a) the pair correlation function g(r) and (b) the distribution of the number of big particles for different values of the ad-hoc correction factor αcorr.

FIG. 14.

The difference between the estimates at the intermediate and final level for (a) the pair correlation function g(r) and (b) the distribution of the number of big particles for different values of the ad-hoc correction factor αcorr.

Close modal

To determine how much the three-level method depends on αcorr, we repeated this performance test for the different values in (C1). The measured sample variances for Nruns = 60 independent realizations of the estimators are shown in Fig. 15. As before, the results for different αcorr are highly correlated because they share the same configurations. The (bootstrap) standard errors of the values in Fig. 15 are comparable to the ones shown in Fig. 12; the same caveats apply and we have omitted them here for clarity of presentation.

FIG. 15.

The sample variance of 60 independent estimates of the coordination number Nc for different values of the ad hoc correction factor αcorr (see also Fig. 12).

FIG. 15.

The sample variance of 60 independent estimates of the coordination number Nc for different values of the ad hoc correction factor αcorr (see also Fig. 12).

Close modal

The main takeaway from Fig. 15 is that for our example, the three-level estimators without tapering outperform the corresponding two-level estimators, independent of the choice of αcorr. That is the method appears to be robust with respect to modifications of the intermediate level, even if the mean quantity of interest differs significantly between intermediate distributions.

Given the low number of samples, the exact variance figures should not be over-interpreted. Nevertheless, the trends in Fig. 15 illustrate two aspects of the SMC methodology that are relevant to applications. First, the details of the intermediate level become more important when we increase the tapering rate, as one can, for example, see by comparing the four different values for the final-level estimator ÂF3L without tapering and with 7:3 tapering. Second, this effect is dampened for the difference estimators. The general robustness of the difference estimator in the example considered here supports our assertion that for appropriately defined levels, it should be the preferred estimator when applying the three-level method in practice.

1.
C. N.
Likos
, “
Effective interactions in soft condensed matter physics
,”
Phys. Rep.
348
,
267
439
(
2001
).
2.
W. G.
Noid
,
J.-W.
Chu
,
G. S.
Ayton
,
V.
Krishna
,
S.
Izvekov
,
G. A.
Voth
,
A.
Das
, and
H. C.
Andersen
, “
The multiscale coarse-graining method. I. A rigorous bridge between atomistic and coarse-grained models
,”
J. Chem. Phys.
128
,
244114
(
2008
).
3.
C.
Peter
and
K.
Kremer
, “
Multiscale simulation of soft matter systems–from the atomistic to the coarse-grained level and back
,”
Soft Matter
5
,
4357
4366
(
2009
).
4.
M.
Praprotnik
,
L.
Delle Site
, and
K.
Kremer
, “
A macromolecule in a solvent: Adaptive resolution molecular dynamics simulation
,”
J. Chem. Phys.
126
,
134902
(
2007
).
5.
C.
Pierleoni
,
B.
Capone
, and
J.-P.
Hansen
, “
A soft effective segment representation of semidilute polymer solutions
,”
J. Chem. Phys.
127
,
171102
(
2007
).
6.
T. E.
Ouldridge
,
A. A.
Louis
, and
J. P. K.
Doye
, “
Structural, mechanical, and thermodynamic properties of a coarse-grained DNA model
,”
J. Chem. Phys.
134
,
085101
(
2011
).
7.
B. M.
Mladek
,
J.
Fornleitner
,
F. J.
Martinez-Veracoechea
,
A.
Dawid
, and
D.
Frenkel
, “
Procedure to construct a multi-scale coarse-grained model of DNA-coated colloids from experimental data
,”
Soft Matter
9
,
7342
7355
(
2013
).
8.
A. J.
Pak
and
G. A.
Voth
, “
Advances in coarse-grained modeling of macromolecular complexes
,”
Curr. Opin. Struct. Biol.
52
,
119
126
(
2018
).
9.
S.
Asakura
and
F.
Oosawa
, “
On interaction between two bodies immersed in a solution of macromolecules
,”
J. Chem. Phys.
22
,
1255
1256
(
1954
).
10.
R.
Roth
,
R.
Evans
, and
S.
Dietrich
, “
Depletion potential in hard-sphere mixtures: Theory and applications
,”
Phys. Rev. E
62
,
5360
(
2000
).
11.
M.
Karplus
, “
Development of multiscale models for complex chemical systems: From H + H2 to biomolecules (Nobel lecture)
,”
Angew. Chem., Int. Ed.
53
,
9992
10005
(
2014
).
12.
A.
Warshel
, “
Multiscale modeling of biological functions: From enzymes to molecular machines (Nobel lecture)
,”
Angew. Chem., Int. Ed.
53
,
10020
10031
(
2014
).
13.
H. N. W.
Lekkerkerker
,
W. C.-K.
Poon
,
P. N.
Pusey
,
A.
Stroobants
, and
P. B.
Warren
, “
Phase behaviour of colloid + polymer mixtures
,”
Europhys. Lett.
20
,
559
564
(
1992
).
14.
W. C. K.
Poon
, “
The physics of a model colloid–polymer mixture
,”
J. Phys.: Condens. Matter
14
,
R859
(
2002
).
15.
T.
Biben
and
J.-P.
Hansen
, “
Phase separation of asymmetric binary hard-sphere fluids
,”
Phys. Rev. Lett.
66
,
2215
(
1991
).
16.
M.
Dijkstra
,
R.
van Roij
, and
R.
Evans
, “
Phase diagram of highly asymmetric binary hard-sphere mixtures
,”
Phys. Rev. E
59
,
5744
(
1999
).
17.
H.
Kobayashi
,
P. B.
Rohrbach
,
R.
Scheichl
,
N. B.
Wilding
, and
R. L.
Jack
, “
Critical point for demixing of binary hard spheres
,”
Phys. Rev. E
104
,
044603
(
2021
).
18.
H.
Kobayashi
,
P. B.
Rohrbach
,
R.
Scheichl
,
N. B.
Wilding
, and
R. L.
Jack
, “
Correction of coarse-graining errors by a two-level method: Application to the Asakura-Oosawa model
,”
J. Chem. Phys.
151
,
144108
(
2019
).
19.
R. W.
Zwanzig
, “
High-temperature equation of state by a perturbation method. I. Nonpolar gases
,”
J. Chem. Phys.
22
,
1420
1426
(
1954
).
20.
C.
Jarzynski
, “
Nonequilibrium equality for free energy differences
,”
Phys. Rev. Lett.
78
,
2690
(
1997
).
21.
G. E.
Crooks
, “
Path-ensemble averages in systems driven far from equilibrium
,”
Phys. Rev. E
61
,
2361
(
2000
).
22.
R. M.
Neal
, “
Annealed importance sampling
,”
Stat. Comput.
11
,
125
139
(
2001
).
23.
N. J.
Gordon
,
D. J.
Salmond
, and
A. F. M.
Smith
, “
Novel approach to nonlinear/non-Gaussian Bayesian state estimation
,” in
IEE Proceedings F-Radar and Signal Processing
(
IET
,
1993
), Vol. 140, pp.
107
113
.
24.
P.
Del Moral
,
Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications
(
Springer
,
2004
), Vol. 88.
25.
A.
Doucet
and
A. M.
Johansen
, “
A tutorial on particle filtering and smoothing: Fifteen years later
,” in
Handbook of Nonlinear Filtering
, edited by
D.
Crisan
and
B.
Rozovsky
(
Oxford University Press,
2011
), Vol. 12, p.
3
.
26.
Y.
Iba
, “
Population Monte Carlo algorithms
,”
Trans. Jpn. Soc. Artif. Intell.
16
,
279
286
(
2001
).
27.
P.
Grassberger
, “
Go with the winners: A general Monte Carlo strategy
,”
Comput. Phys. Commun.
147
,
64
70
(
2002
).
28.
H.-P.
Hsu
and
P.
Grassberger
, “
A review of Monte Carlo simulations of polymers with PERM
,”
J. Stat. Phys.
144
,
597
637
(
2011
).
29.
C.
Giardina
,
J.
Kurchan
,
V.
Lecomte
, and
J.
Tailleur
, “
Simulating rare events in dynamical processes
,”
J. Stat. Phys.
145
,
787
811
(
2011
).
30.
P. J.
Reynolds
,
D. M.
Ceperley
,
B. J.
Alder
, and
W. A.
Lester
, Jr
, “
Fixed-node quantum Monte Carlo for molecules
,”
J. Chem. Phys.
77
,
5593
5603
(
1982
).
31.
M. B.
Giles
, “
Multilevel Monte Carlo path simulation
,”
Oper. Res.
56
,
607
617
(
2008
).
32.
V. H.
Hoang
,
C.
Schwab
, and
A. M.
Stuart
, “
Complexity analysis of accelerated MCMC methods for Bayesian inversion
,”
Inverse Probl.
29
,
085010
(
2013
).
33.
T. J.
Dodwell
,
C.
Ketelsen
,
R.
Scheichl
, and
A. L.
Teckentrup
, “
A hierarchical multilevel Markov chain Monte Carlo algorithm with applications to uncertainty quantification in subsurface flow
,”
SIAM/ASA J. Uncertainty Quantif.
3
,
1075
1108
(
2015
).
34.
A.
Jasra
,
K.
Kamatani
,
K. J. H.
Law
, and
Y.
Zhou
, “
Multilevel particle filters
,”
SIAM J. Numer. Anal.
55
,
3068
3096
(
2017
).
35.
A.
Beskos
,
A.
Jasra
,
K.
Law
,
R.
Tempone
, and
Y.
Zhou
, “
Multilevel sequential Monte Carlo samplers
,”
Stochastic Processes Appl.
127
,
1417
1440
(
2017
).
36.
P.
Del Moral
,
A.
Jasra
, and
K. J. H.
Law
, “
Multilevel sequential Monte Carlo: Mean square error bounds under verifiable conditions
,”
Stochastic Anal. Appl.
35
,
478
498
(
2017
).
37.
A. D.
Bruce
and
N. B.
Wilding
, “
Computational strategies for mapping equilibrium phase diagrams
,”
Adv. Chem. Phys.
127
,
1
(
2003
).
38.
J.
Kolafa
,
S.
Labík
, and
A.
Malijevský
, “
Accurate equation of state of the hard sphere fluid in stable and metastable regions
,”
Phys. Chem. Chem. Phys.
6
,
2335
2340
(
2004
).
39.
D. J.
Ashton
,
N. B.
Wilding
,
R.
Roth
, and
R.
Evans
, “
Depletion potentials in highly size-asymmetric binary hard-sphere mixtures: Comparison of simulation results with theory
,”
Phys. Rev. E
84
,
061136
(
2011
).
40.
D. J.
Ashton
and
N. B.
Wilding
, “
Grand canonical simulation of phase behaviour in highly size-asymmetrical binary fluids
,”
Mol. Phys.
109
,
999
1007
(
2011
).
41.
C. P.
Robert
and
G.
Casella
,
Monte Carlo Statistical Methods
(
Springer
,
2004
), Vol. 2.
42.
R.
Douc
and
O.
Cappé
, “
Comparison of resampling schemes for particle filtering
,” in
ISPA 2005. Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis
(
IEEE
,
2005
), pp.
64
69
.
43.
O.
Cappé
,
E.
Moulines
, and
T.
Rydén
,
Inference in Hidden Markov Models
, Springer Series in Statistics (
Springer-Verlag
,
Berlin, Heidelberg
,
2005
).
44.
P.
Del Moral
,
A.
Doucet
, and
A.
Jasra
, “
Sequential Monte Carlo samplers
,”
J. R. Stat. Soc., Ser. B: Stat. Methodol.
68
,
411
436
(
2006
).
45.
E. L.
Ionides
,
C.
Bretó
, and
A. A.
King
, “
Inference for nonlinear dynamical systems
,”
Proc. Natl. Acad. Sci. U. S. A.
103
,
18438
18443
(
2006
).
46.
P.
Fearnhead
,
O.
Papaspiliopoulos
, and
G. O.
Roberts
, “
Particle filters for partially observed diffusions
,”
J. R. Stat. Soc., Ser. B: Stat. Methodol.
70
,
755
777
(
2008
).
47.
P.
Fearnhead
,
O.
Papaspiliopoulos
,
G. O.
Roberts
, and
A.
Stuart
, “
Random-weight particle filtering of continuous time processes
,”
J. R. Stat. Soc., Ser. B: Stat. Methodol.
72
,
497
512
(
2010
).
48.
C.
Naesseth
,
F.
Lindsten
, and
T.
Schon
, “
Nested sequential Monte Carlo methods
,” in
International Conference on Machine Learning
(
PMLR
,
2015
), pp.
1292
1301
.
49.
P. B.
Rohrbach
and
R. L.
Jack
, “
Convergence of random-weight sequential Monte Carlo methods
,” arXiv:2208.12108 (
2022
).
50.
H.
Rafii-Tabar
,
L.
Hua
, and
M.
Cross
, “
A multi-scale atomistic-continuum modelling of crack propagation in a two-dimensional macroscopic plate
,”
J. Phys.: Condens. Matter
10
,
2375
(
1998
).
51.
M.
Praprotnik
,
L.
Delle Site
, and
K.
Kremer
, “
Adaptive resolution molecular-dynamics simulation: Changing the degrees of freedom on the fly
,”
J. Chem. Phys.
123
,
224106
(
2005
).
52.
R.
Evans
, “
The nature of the liquid–vapour interface and other topics in the statistical mechanics of non-uniform, classical fluids
,”
Adv. Phys.
28
,
143
200
(
1979
).
53.
M. L.
de Haro
and
M.
Robles
, “
The structure factor and equation of state of hard-sphere fluids
,”
J. Phys.: Condens. Matter
16
,
S2089
(
2004
).
54.
T.
Hahn
, “
Cuba—A library for multidimensional numerical integration
,”
Comput. Phys. Commun.
168
,
78
95
(
2005
).
55.
B.
Efron
, “
Nonparametric estimates of standard error: The jackknife, the bootstrap and other methods
,”
Biometrika
68
,
589
599
(
1981
).
56.
D.
Williams
,
Probability with Martingales
(
Cambridge University Press
,
1991
).
57.
R.
Douc
and
E.
Moulines
, “
Limit theorems for weighted samples with applications to sequential Monte Carlo methods
,”
Ann. Stat.
36
,
2344
2376
(
2008
).
58.
H. P.
Chan
and
T. L.
Lai
, “
A general theory of particle filters in hidden Markov models and some applications
,”
Ann. Stat.
41
,
2877
2904
(
2013
).
59.
G.
D’Adamo
,
R.
Menichetti
,
A.
Pelissetto
, and
C.
Pierleoni
, “
Coarse-graining polymer solutions: A critical appraisal of single- and multi-site models
,”
Eur. Phys. J.: Spec. Top.
224
,
2239
2267
(
2015
).
60.
D.
Crisan
and
A.
Doucet
, “
Convergence of sequential Monte Carlo methods
,” Signal Processing Group, Department of Engineering, University of Cambridge, Technical Report CUEDIF-INFENGrrR38, No. 1,
2000
.
61.
H.
Oberhofer
and
C.
Dellago
, “
Efficient extraction of free energy profiles from nonequilibrium experiments
,”
J. Comput. Chem.
30
,
1726
1736
(
2009
).
62.
J.-P.
Hansen
and
I. R.
McDonald
,
Theory of Simple Liquids: With Applications to Soft Matter
(
Academic Press
,
2013
).