We present a multilevel Monte Carlo simulation method for analyzing multi-scale physical systems via a hierarchy of coarse-grained representations, to obtain numerically exact results, at the most detailed level. We apply the method to a mixture of size-asymmetric hard spheres, in the grand canonical ensemble. A three-level version of the method is compared with a previously studied two-level version. The extra level interpolates between the full mixture and a coarse-grained description where only the large particles are present—this is achieved by restricting the small particles to regions close to the large ones. The three-level method improves the performance of the estimator, at fixed computational cost. We analyze the asymptotic variance of the estimator and discuss the mechanisms for the improved performance.

## I. INTRODUCTION

A key challenge in the molecular simulation of soft matter is posed by the separation of length-scales between its microscopic description and the existence or emergence of mesoscopic structure. In such cases, one often relies on coarse-grained (CG) descriptions of the system that (approximately) integrate out microscopic degrees of freedom^{1–3} to yield a tractable simplified model. Examples in the soft-matter context include polymers,^{4,5} biomolecules,^{6–8} and colloidal systems.^{9,10} Such CG descriptions are essential for multi-scale modeling approaches.^{11,12} However, they are not usually exact, and the associated coarse-graining errors are often difficult to assess.

Such CG models have been studied extensively in colloidal systems with depletion interactions.^{9,13,14} The typical example is a mixture of relatively large colloidal particles with much smaller non-adsorbing polymers that generate effective attractions between the colloids. This can drive de-mixing, crystallization, or gelation, depending on the context. Model systems in this context include the Asakura–Oosawa (AO) model^{9} where the CG model can even be exact if the disparity between colloid and polymer radii is large enough. The theoretical tractability of the AO model arises from a simplified modeling assumption that polymer particles act as spheres that can interpenetrate.

Alternatively, one may consider a mixture where both the colloids and the depletant are modeled as hard spheres. From a theoretical perspective, this is an interesting model in its own right as it undergoes a fluid–fluid de-mixing phase separation for large enough size-disparities and concentrations.^{10,15–17} This happens despite the lack of attractive forces between the particles in the model and can be attributed to geometric packing effects of the big and small spheres.

Direct simulation of such mixtures is very challenging because of the large number of small particles. Accurate CG models are also available in this context,^{10} but the CG representations are not exact: their errors can be detected by accurate computer simulation of the full (FG) mixtures. Hence, such models are natural testing grounds for theories and simulation methods associated with coarse-graining.

In this context, we recently developed a method^{17,18} that links a CG description with the underlying fine-grained (FG) description. We call this the *two-level* method because the CG and FG models describe the same system with different levels of details. The method was validated by computations on the AO system,^{18} where it provided numerically exact results for the FG model, even in the regime where the CG description is not quantitatively accurate. The methodology was also applied to the hard sphere mixture,^{17} where it provided a quantitative analysis of the critical point associated with de-mixing of the large and small particles.

These previous results rely on the idea that properties of the FG model can be estimated in terms of some CG quantity, with an additive correction that accounts for the coarse-graining error. This is an importance sampling method, familiar in equilibrium statistical mechanics from free-energy perturbation theory,^{19} which involves reweighting between two thermodynamic ensembles. In the present context, the reweighting factors depend on the free energy of the small spheres, computed in a system where the large particles are held fixed. This free energy can be estimated by an annealing process based on Jarzynski’s equality^{20–22} that slowly introduces small particles to fixed CG configurations.

In this paper, we present an extension of the two-level method that incorporates additional intermediate levels to improve the overall performance. Specifically, we introduce a step in the annealing process where small particles are partially inserted in regions close to big particles. Before finishing the small-particle insertion, we then replace weighted sets of configurations with unweighted ones, duplicating configurations with large weight and deleting ones with low weight. This resampling step allows us to make optimal use of the information available at the intermediate stage, focusing our subsequent computations on configurations that matter.

This general approach fits in the framework of sequential Monte Carlo (SMC).^{23–25} Such algorithmic ideas have been successfully applied in applications across disciplines under various names, including population Monte Carlo^{26} or the go-with-the-winners strategy.^{27} Examples in computational physics include the pruned-enriched Rosenbluth method for polymers,^{28} the cloning method for rare events,^{29} and diffusion quantum Monte Carlo.^{30} We combine the SMC method with an additional variance reduction strategy. Instead of estimating the FG average directly, we combine a CG estimate with estimates of subsequent level differences using the previous levels as control variate. This is the idea behind multilevel Monte Carlo methods.^{31–33} The combination of a difference estimate with SMC has been previously investigated, for example, in Refs. 34–36. As in Ref. 17, we develop a general method alongside its application to highly size-asymmetric binary hard-sphere mixtures, which provide a challenging but well understood example to benchmark our algorithm.

This paper is organized as follows: In Sec. II, we introduce the hard-sphere mixture model. In Sec. III, we summarize the setup of the two-level method before presenting our extension to three (or more) levels. The three-level method requires an intermediate level for the hard-sphere mixture, whose details we discuss in Sec. IV. In Sec. V, we present a numerical test of the method and compare its performance against the two-level method, and in Sec. VI, we present convergence results. We conclude in Sec. VII.

## II. HARD-SPHERE MIXTURE

Throughout this work, we illustrate the multilevel method with an example system, which is a mixture of large and small hard spheres at size ratio 10:1. This system is challenging for simulation because the big particles may display interesting collective behavior (in particular, a critical point), but the dominant computational cost for simulation comes from the large number of small particles.

However, despite our focus on this single example, we emphasize that the multilevel method is presented in a general way, which should also be applicable in other systems with a separation of length scales.

### A. Hard sphere mixture

The example system is a mixture of big and small particles, whose diameters are *σ*_{B} and *σ*_{S}, respectively. We consider a periodic box [0, *L*]^{3} of linear size *L* and we work in the grand canonical ensemble. (This choice is particularly relevant for analysis of de-mixing, where the number density of large particles is a suitable order parameter.^{37})

In a given configuration, the numbers of big and small particles are *N* and *n*, respectively; the position of the *i*th big particles is **R**_{i}, while the position of the *j*th small particle is **r**_{j}. We denote the configurations of big and small particles by $C=(N;R1,\u2026,RN)$ and $F=(n;r1,\u2026,rn)$, respectively, and the full configuration is denoted $X=(C,F)$.

Since the particles are hard, the temperature plays no role in the following, and so we set the temperature as *k*_{B}*T* = 1 without any loss of generality. The equilibrium distribution of the mixture is described by a probability density

where the subscript F indicates that we refer to the FG model, *μ*_{B} and *μ*_{S} are the chemical potentials for the large and small particles, respectively, and Ξ_{F} is the grand canonical partition function. The particles are hard (non-overlapping), so the potential energy is

This *p*_{F} is normalized as $1=\u222bpF(C,F)dCdF$, the precise meaning of these integrals is given in Appendix A 1.

Within this setting, the dimensionless parameters of the model are the ratio of the particle diameters *σ*_{B}/*σ*_{S}, the system size parameter *L*/*σ*_{B}, and the two chemical potentials *μ*_{S} and *μ*_{B}. In practice, *μ*_{S} is more naturally parameterized by the associated reservoir volume fraction $\eta Sr$, which we relate to *μ*_{B} via an accurate equation of state.^{38}

Our multi-level method is designed for accurate estimates of properties of the large particles. Specifically, we consider observable quantities of interest $A=A(C)$ that only depend on the large particles. (Examples are discussed in Sec. II C: see also Fig. 3.) Our aim is to compute the equilibrium average of *A*, that is,

Since $A(C)$ does not depend on $F$, it is natural to define the marginal distribution for the big particles

so that $\u27e8A\u27e9F=\u222bA(C)pF(C)dC$. A similar situation occurs in the context of statistics, where one seeks to analyze the behavior of a few quantities of interest in a high-dimensional system: in that context, the small-particle degrees of freedom in (3) would be referred to as *nuisance parameters*. This means that their values are not required to compute the quantity of interest, but their statistical properties strongly affect the average of this quantity.

### B. Coarse-grained model

If samples for the marginal distribution $pF(C)$ could be generated by an MC method for the big particles alone, this would make the system much more tractable by simulation. This is a central idea in coarse-grained modeling.^{2} However, the complexity of packing of the small hard spheres means that $pF(C)$ is a complex distribution, and it is not possible to sample it exactly. A great deal of effort has gone into developing CG models that approximate this distribution with high accuracy.^{10,16,39}

A suitable CG model is an equilibrium distribution with probability density

where Ξ_{C} is the partition function, and the CG (effective) interaction energy is

where *V*_{2} is a pairwise interaction potential. Averages with respect to the CG model are denoted as

For a suitably chosen *V*_{2}, the coarse distribution $pC(C)$ can be an accurate approximation to $pF(C)$. For the CG model in this work, we take the accurate potential *V*_{2} = *V*_{RED}, developed by Roth, Evans, and Dietrich.^{10} Following Ref. 17, we choose Δ*μ* such that the distributions of *N* coincide for FG and CG models.

### C. Benchmark system: Parameters and observables

Throughout the paper, we benchmark our numerical methods by considering the hard-sphere mixture with fixed parameters, as follows: We take the ratio of particle sizes (*σ*_{B}/*σ*_{S}) = 10, the linear size of the periodic system is *L* = 31*σ*_{S}, and the small-particle (reservoir) volume fraction is $\eta Sr=0.2$. This volume fraction is large enough to generate a significant depletion attraction between the large particles, but not strong enough to cause de-mixing of the large and small particles.^{17}

Aspects of the CG and FG models are illustrated in Figs. 1(a)–1(c) for these parameters. In particular, we show representative configurations of the CG and FG models, as well as a plot of the RED potential. While direct grand canonical Monte Carlo (GCMC) sampling of the full mixture is possible, in principle, it should be apparent from Fig. 1(c) that this would be intractable because insertion of large particles in such a fluid is hardly ever possible. Advanced MC methods^{39,40} might be applicable, but these tend to struggle when the volume fraction gets large. This motivates the development of two-level and multi-level methods.

Figure 2 highlights properties of the distribution of the number of big particles for the CG model when varying the effective large-particle chemical potential $\mu Beff=\mu B\u2212\Delta \mu $. In particular, Fig. 2(b) shows that increasing $\mu Beff$ in the CG model leads to a non-monotonic behavior in the variance of the particle number *N* (analogous to the compressibility of the model). This maximum indicates that the system has a tendency for de-mixing at larger $\eta Sr$ (one expects a divergent compressibility at the critical point, if one exists). In the following, we fix *μ*_{B} at the value corresponding to this maximum—the relatively large fluctuations at this point are challenging for the multi-level model because the distributions $pC(C)$ and $pF(C)$ are broader, requiring good sampling. The corresponding CG system has an average of *N* ≈ 11.6 big particles, occupying around 20% of the available volume.

For the specific quantities that we will compute for this mixture, Fig. 3 shows the expectations of the big-particle pair correlation function *g*(*r*) and the distribution of the number of big particles *P*(*N*). Results are shown for both CG and FG models (in the FG case, the results are computed using the two-level method). For both quantities of interest, the CG model provides an accurate but not exact description of the model. In particular, the CG model underestimates the pair correlation at the point where two big particles are in contact. The distributions of the number of big particles in Fig. 3(b) are both unimodal: both the FG and CG systems are well below the critical point of demixing.

Compared to the critical hard-sphere mixture discussed in Ref. 17, the system we consider here is smaller and has a lower volume fraction $\eta Sr$ of the small particles. This is still challenging for conventional Monte Carlo algorithms but can be simulated fast enough to evaluate the performance and compare the computational methods discussed here. Furthermore, the lower small-particle volume fraction helps with the construction of the intermediate level in Sec. IV, whose underlying approximation decays as $\eta Sr$ increases (see Appendix B).

## III. MULTILEVEL SIMULATION

### A. Overview

This section reviews the two-level method of Refs. 17 and 18 and then lays out its three-level extension. The presentation of the method is intended to be generic and applicable to a variety of systems. However, we first introduce the key ideas using the example and illustrations of Fig. 1 for the hard-sphere mixture.

The two-level method is constructed with the scale separation of the mixture in mind: it splits the simulation of the big and small spheres into two stages by first simulating a CG system of large particles alone and computing ⟨*A*⟩_{C}. Then, differences between ⟨*A*⟩_{C} and ⟨*A*⟩_{F} are computed by a reweighting (importance sampling) method. The weight factors for this computation are obtained by an annealing step, where the small particles are slowly inserted into the system, with the large particles held fixed [see Fig. 1(e)]. The advantage of this procedure is that large particle motion only happens in the CG simulation where the small particles are absent—there is no scale separation in this case so simulations are tractable. Similarly, insertion of the small particles happens in a background of fixed large particles, so these annealing simulations do not suffer long time scales associated with large-particle motion. This makes for tractable simulations in scale-separated systems, as long as the CG model is sufficiently accurate: see Refs. 17 and 18 for further discussion.

In practice, the simulation effort for two-level computations is dominated by the annealing step. The weighting factors are required to high accuracy, which means that the annealing must be done gradually. Moreover, the weights are subject to numerical uncertainties that tend to be large in systems with many small particles. This limits the method to systems of moderate size, with moderate $\eta Sr$ (see Ref. 17).

We show in this work that such problems can be reduced by breaking the annealing process into several stages—this is the idea of the three-level method [Fig. 1(e)]. Specifically, we start (as before) with a population of configurations of the CG model. We perform a first annealing step where the small particles are added in regions that are close to large ones. The information from this step is used in a resampling process, which partially corrects the coarse-graining error by discarding some of the configurations from the population and duplicating others. (This idea is similar to go-with-the-winners.^{27}) Finally, the second annealing step inserts the small particles in the remaining empty regions, arriving at configurations of the FG model. Hence, the end point is the same as the two-level method, but the annealing route is different.

In practice, the effectiveness of the three-level method relies on a clear physical understanding of the intermediate (partially inserted) system in order to decide which configurations to discard in the resampling step. For the hard-sphere case, that issue will be discussed in Sec. IV; a more general discussion is given in Sec. VII. The remainder of this section describes the two- and three-level methods in more detail.

### B. Two-level method

We review the two-level method of Refs. 17 and 18. For a general presentation, we assume that CG and FG models exist with configurations $C$ and $X=(C,F)$, respectively. In the case of hard spheres, $C$ and $F$ correspond to configurations of the large and small spheres, respectively.

The two-level method is an importance sampling^{41} (or reweighting) computation, closely related to the free-energy perturbation method of Zwanzig.^{19} We use the grand canonical Monte Carlo (GCMC) method to sample *M*_{C} configurations from *p*_{C}, and these are denoted by $C1,C2,\u2026,CMC$. Then, the CG average can be estimated as

As the sampling is increased (*M*_{C} → *∞*), we have $A\u0302C\u2192\u27e8A\u27e9C$. However, if the coarse-graining error

is significant, $A\u0302C$ does not provide an accurate estimate of ⟨*A*⟩_{F}.

To address this problem, we use an annealing procedure based on Jarzynski’s equality^{20} that starts from a coarse configuration $C$ and populates the fine degrees of freedom $F\u0302$; at the same time, it generates a random weight $W\u0302(C)$ with the property that

where the angle brackets with subscript J indicate an averaging over the annealing process (analogous to Jarzynski’s equality^{20}), and *ξ* is a constant (independent of $C$). The details of the annealing process are given in Appendix A. It is applied to a set of *M*_{F} coarse configurations, again denoted by $C1,C2,\u2026,CMF$, which are typically a subset of the *M*_{C} CG configurations above.

For later convenience, we define

In practical applications, the constant *ξ* is not known, but its effect can be controlled by defining the self-normalized weight

Since the $Cj$ are representative of *p*_{C}, the denominator in $w\u0302$ converges to *ξ* as *M*_{F} → *∞* and so $w\u0302(Cj)\u2192W\u0302n(Cj)$. Then, the estimator

converges to ⟨*A*⟩_{F} as *M*_{F} → *∞*. (In the case that $W\u0302$ is not random then this procedure recovers the free energy perturbation theory of Zwanzig.^{19})

The annealing process has one useful additional property: Let the joint probability density for the weight and the fine degrees of freedom be $\kappa (W\u0302,F\u0302|C)$, which is normalized as $\u222b\kappa (W\u0302,F\u0302|C)dF\u0302dW\u0302=1$. We show in Appendix A that

This formula is the essential property of the annealing procedure, which is required for the operation of the method. Additionally integrating over $F\u0302$ shows that (14) ensures that (10) also holds. This means in turn that if $B=B(C,F)$ is an observable quantity that depends on both coarse and fine degrees of freedom,

converges to ⟨*B*⟩_{F} as *M*_{F} → *∞*.

This method can be easily improved without extra computational effort. The key idea^{31–33} is to estimate the FG average as the sum of the CG average and the coarse-graining error (9),

Then, use importance sampling to estimate Δ, as

Finally, a suitable estimator for the FG average is obtained by combining the estimate of the coarse-graining error with the corresponding CG quantity,

This estimator converges to ⟨*A*⟩_{F} in the limit where *M*_{C}, *M*_{F} → *∞*. As discussed in Ref. 18, the variance of the estimate $\Delta \u0302$ is typically smaller than that of $A\u0302F$, and the CG estimate $A\u0302C$ is cheap to compute accurately. Thus, the combined difference estimator $A\u0302F,\Delta $ is typically more accurate at fixed computational cost.

The importance sampling methodology has a useful physical interpretation, which we explain for the example of the hard-sphere mixture. If we consider a fixed configuration of the large particles, the grand canonical partition function for the small particles is

As the system is annealed (the small particles are inserted), we estimate (19) by a free-energy method based on Jarzynski’s equality^{20} (see Appendix A for details). Since the annealing is stochastic, this yields an estimate of the partition function, which we denote by $\Xi \u0302[C,\mu S]$. Moreover, this estimate is unbiased $\u27e8\Xi \u0302[C,\mu S]\u27e9J=\Xi [C,\mu S]$. Hence, we can take

Physically, the CG model is constructed so that the Boltzmann factor $e\u2212UC(C)$ is a good estimate of the small-particle partition function $\Xi [C,\mu S]$. If this is the case, the model is accurate. The two-level methodology uses estimates of the small-particle partition function (or, equivalently, their free energy) and compares it with the assumptions that were made about this quantity in the CG model. By analyzing the differences between these quantities, the differences between CG and FG models can be quantified. The effectiveness of this method for numerical simulation of the mixtures of large and small particles was discussed in Refs. 17 and 18.

The distribution of the importance weights $w\u0302(C)$ impacts the accuracy of the resulting FG estimate $A\u0302F$. Additionally, it serves as a useful indicator of the accuracy of the CG model and the variance of the free energy computation. To give an example, we apply the two-level method to the example problem from Sec. II C. In Fig. 4, we show the empirical distribution of 18 000 weights of the example system that are computed using an accurate annealing process; we use these computations as the reference solution in Sec. V. This illustrates a situation where the two-level method is applicable, where no single sample dominates and only very few samples have a weight larger than 10.

If one considers less accurate CG models, the variance of the weights increases, and the tail of their distribution gets heavier. Eventually, one would reach a situation where a few samples dominate the weighted sum (13). For accurately computed weights $w\u0302(C)$, such a breakdown of the two-level method indicates that the CG model is not sufficiently accurate. This behavior provides a useful feedback loop that can be used to iterate on the CG model itself.^{18}

### C. Three-level method

We now present the three-level method for estimation of $\u27e8A(C)\u27e9F$.

#### 1. Coarse level

We start by generating *M*_{0} samples of the CG model, denoted by $C01,\u2026,C0M0$. The subscript indicates the step within the algorithm (which is 0 for the initial sampling of coarse configurations). The CG average of *A* can be estimated similarly to (8)

#### 2. Intermediate level

In addition to the CG and FG models, the three-level method also relies on an intermediate set of configurations, which correspond in the hard-sphere mixture to the system where the small particles have been inserted in regions close to the large ones (see Fig. 5). This state is described by an equilibrium probability distribution

where $UI(C,F)$ is an interaction energy. Its construction for the hard-sphere mixture will be discussed in Sec. IV.

The first annealing step of the three-level algorithm applies the two-level method, with the FG distribution *p*_{F} replaced by *p*_{I}. This part of the algorithm closely follows Sec. III B, and we give a brief discussion that mostly serves to fix notation. We start with a set of *M*_{1} coarse configurations that are samples of *p*_{C}; they are denoted by $C11,C12,\u2026,C1M1$, where the subscript 1 indicates the intermediate stage of the three-level method. (These will typically be a subset of the configurations that were generated on the coarse level.)

For each coarse configuration $C1$, we anneal the fine degrees of freedom of the system $F\u03021$ to arrive at the intermediate level and generate a random weight $W\u03021(C1)$ with the property

with a constant *ξ*_{1} independent of $C1$. (For the hard spheres, we recall that particles are inserted preferentially in regions close to large ones, and this is illustrated in the top row of Fig. 5.)

As before, we define $W\u03021n(C1)=W\u03021(C1)/\xi 1.$ Again the constant *ξ*_{1} is generally not known, so we define the self-normalized weight

which converges to $W\u03021n(C1j)$ as *M*_{1} → *∞*. Then, the estimator

converges to ⟨*A*⟩_{I} as *M*_{1} → *∞*. Similar to (14), the joint probability density $\kappa 1(W\u03021,F\u03021\u2223C1)$ of the weight and fine degrees of freedom at the intermediate level, defined by the annealing process, fulfills

Hence, similar to (15), we also obtain

which converges to ⟨*B*⟩_{I} as *M*_{1} → *∞*.

#### 3. Fine level

At the end of the intermediate level, we have *M*_{1} large-particle configurations. For each configuration $C1j$, the process of annealing to the intermediate level also provided the weight $w\u03021(C1j)$ and the small-particle configuration $F\u03021j$. This information can be used to build a set of configurations that are representative of *p*_{I}. This procedure is called *resampling*, its validity in this example relies on the property (26) of the annealing procedure. This is the part of the method that is similar to population-based sampling approaches, such as SMC^{28} or go-with-the-winners.^{27} The idea is that one should focus the effort of the annealing process onto coarse configurations that are typical of the full system, and to discard those which are atypical, see Fig. 6 for a visualization of this step.

We write $X\u03021j=(C1j,F\u03021j)$ for the full configuration that is obtained by the annealing procedure at the intermediate level. The resampled configurations will be denoted by $X21,X22,\u2026,X2M2$; they are representative of the intermediate level *p*_{I}. There are *M*_{2} of them, and the subscript 2 indicates the final stage of the three-level method. The simplest resampling method (*multinomial resampling*) is that each $X2i$ is obtained by copying one of the $X\u03021j$, chosen at random with probability $w\u03021(C1j)$. In applications, one typically replaces this by a lower variance resampling scheme like residual resampling; see Ref. 42 for a comparison of commonly used variants.

We then perform the second annealing step that starts from an intermediate level configuration $X2=(C2,F2)$ and anneals the fine degrees of freedom from the intermediate to the fine level, yielding $F\u03022$ and a weight $W\u03022(X2)$; details are given in Appendix A. For the hard sphere system, this involves further insertion of small particles to fill the system and generate realistic configurations of the full mixture. This procedure is shown in the bottom row of Fig. 5.

Since the starting point of the annealing procedure is $X2$, the joint probability density of the annealing process $\kappa 2(W\u03022,F\u03022\u2223X2)$ depends on both large and small particles. Therefore, the analog of (26) requires an additional average over the small particles of the starting configuration

for some constant *ξ*_{2}. Note that $pI(F|C)=pI(C,F)/pI(C)$. Similar to (23), the weights $W\u03022(X2)$ have the property

From here, we proceed as before. We define the normalized weight $W\u03022n(X2)=W\u03022(X2)/\xi 2$ and its self-normalized estimate

Since the $X2j$ are representative of *p*_{I}, it follows from (28) that observables of the coarse system *A* can be estimated as

which converges to ⟨*A*⟩_{F} as *M*_{2} → *∞*. Similar to (15), we can also obtain a consistent FG estimates of observable quantities *B* that depend both on coarse and fine degrees of freedom by

Following the same variance reduction strategy as in Sec. III B, we can define a difference estimator of the FG average, which is expected to have lower statistical uncertainty: let

Then,

is a consistent estimator of ⟨*A*⟩_{F}, analogous to (18).

#### 4. General features of the three-level method

A few comments on the three-level method are in order. First, there is a simple generalization to four or more levels by splitting the annealing procedure into more than two stages. As such, the method is an example of a sequential Monte Carlo (SMC) algorithm (which is sometimes more descriptively referred to as sequential importance sampling and resampling^{28,43–45}). We note from (10) that the weights obtained from the annealing step are random, and this is not the standard situation in SMC, but similar ideas have been previously studied in Refs. 46–49. Combining an SMC algorithm with a difference estimate as in (35) has been investigated in Refs. 34–36.

Second, we observe that the key distinction between the two- and three-level algorithms is the resampling step at the intermediate level. Without this, the three-level method reduces to a simple two-level method with an arbitrary stop in the middle of the annealing process. As noted above, the resampling process is designed to partially correct differences between the CG and FG models. This relies on a good accuracy of the intermediate level (otherwise the wrong configurations might be discarded, which hinders numerical accuracy). On the other hand, we note that for sufficiently large numbers of samples *M*_{0}, *M*_{1}, *M*_{2}, the method does provide accurate FG estimates, even if the CG and intermediate level models are not extremely accurate. The distinction between the different methods comes through the number of samples that are required to obtain accurate FG results.

Third, note that the ideal situation for difference estimation is that the three terms in (35) get successively smaller. That is, the coarse estimate is already close to ⟨*A*⟩_{F}, the intermediate-level estimate provides a large part of the correction, and the fine-level correction is small. In this case, it is natural to use a tapering strategy where the number of samples used at each level decreases,

This allows a fixed computational budget to be distributed evenly between the various levels to minimize the total error.

## IV. CONSTRUCTION OF THE INTERMEDIATE LEVEL

As noted above, the intermediate probability distribution *p*_{I} must be designed carefully in order for the resampling part of the three-level method to be effective. We now describe how this is achieved for the hard sphere mixture.

To motivate the intermediate level, recall Fig. 5, and note that defining a suitable CG model is equivalent to an estimate of the small-particle free energy in the final (fully inserted) state. The physical idea of the intermediate level is that the free energy associated with the first stage of insertion may be hard to estimate (because of the complicated packing of the small particles around the large ones), but the free energy difference associated with the second stage should be easier (because it corresponds to insertion into large empty regions where the packing of the small particles is similar to that of a homogeneous fluid, whose free energy can be estimated based on analytic approximations). A combination of these ideas yields an intermediate level that represents the big particle statistics more accurately than the CG model. Similar ideas have been considered before in multi-scale simulation,^{4,50,51} in particular the problem of estimating the small-particle free energy has some similarities to estimation of solvation free energies (where the depletant here plays the role of a solvent).

We start by analyzing the small particles, so we fix the large particles in some configuration $C$. The idea of the intermediate level is to first insert small particles only in a region close to the large particles $C$, and then use this information to make the intermediate marginal distribution $pI(C)$ match the FG marginal $pF(C)$ as closely as possible. The structure of the intermediate level is depicted in the bottom row of Fig. 1(e), and an example configuration is shown in Fig. 1(d). We implement this idea by introducing an effective (one-body) potential that acts on the small particles. We first define

to be the distance from the point **r** to the nearest large particle. Small-particle insertion is suppressed in regions far from large particles by a potential energy term

where

and the function

interpolates from zero (for small distances *r*) to the value s at large *r*. This function acts as a smoothed out step function, where *δ*_{free} is the position of the step and *l* its width. In Figs. 1(e) and 5, areas where $EC(r)>0$ are indicated by blue shaded regions, in which the insertion of small particles is suppressed.

Then, define a grand-canonical probability distribution for the small particles in the partially inserted (intermediate) system as

This distribution is normalized as $\u222bp\u0303I(F\u2223C)dF=1$. It depends on the three parameters *s*, *δ*_{free}, *l*, as well as the underlying parameters of the hard sphere mixture model.

The next step is to construct the weights $W\u03021(C)$. For consistency with (22), we write the intermediate-level distribution in the form

As discussed above, the term Φ^{corr} should be designed so that the respective coarse-particle marginals $pI(C)$ and $pF(C)$ match as closely as possible. Using (4), (19), and (41), we can show that a perfect match requires $\Phi corr(C)=\Phi ex(C)$ with

where *ϕ*_{0} is an irrelevant constant. Since the Ξs in (43) are partition functions, determination of Φ^{ex} reduces to computation of the free energy difference between the non-homogeneous small particle distributions of the partially and fully inserted system. We now explain how Φ^{corr} is defined, as an approximation to Φ^{ex}.

### A. Square-gradient approximation of a non-homogeneous hard sphere fluid

As a preliminary step for estimating Φ^{ex}, we first consider the grand potential Φ for the small particles, in a system with no large particles, where the small particles feel an (arbitrary) smooth potential $E=E(r)$. The grand potential of this system is

where **0** indicates the large-particle configuration with no particles at all (*N* = 0).

If $E$ varies slowly in space, a simple approach to this integral is to assume that the system is locally the same as a homogeneous system in equilibrium—similar to the local density approximation.^{52} In this case,

where $p$ is the pressure, expressed as a function of the chemical potential.

However, this approximation is not sufficiently accurate for the current application. To this end, we include a correction to account for inhomogeneities, as a squared gradient term $\Phi [E;\mu S]\u2248\Phi sq[E;\mu S]$ with

(Within a gradient expansion, this is the first correction that is consistent with rotational and inversion symmetry.)

We show in Appendix B 1 that $g$ can be estimated as

where *S*(*μ*; *q*) is the structure factor of the small hard-sphere system. For a numerical estimate of this Φ^{sq}, we estimate the pressure $p$ by the accurate equation of state from Ref. 38, and $g$ is estimated from (47) using the structure factor from Ref. 53. A numerical example demonstrating the accuracy of this second order approximation for a non-homogeneous hard-sphere fluid can be found in Appendix B 2.

### B. Definition of Φ^{corr}

We are now in a position to approximate Φ^{ex} in terms of Φ^{sq}. This (analytical) calculation is illustrated in Fig. 7. We require an estimate of Φ^{ex}, which is the free-energy difference between the partially inserted and fully inserted systems in panels (a) and (b). This is achieved as a sum of three free-energy differences. In the first step, the large particles are removed and the small-particle fluid is re-equilibrated, to fill up the remaining space, leading to panel (c). Then, the confining potential $U\u0303$ is removed and the small particles fully inserted, leading to (d). Finally, the large particles are re-inserted and the small particles re-equilibrated again, leading to (b).

To make this precise, define $\Phi C[E;\mu S]$ as the grand potential of the small particles in the potential $E$, where the large particles are also included, with configuration $C$. Then, the desired free energy difference between panels (a) and (b) is

where we took *ϕ*_{0} = 0.

From the definitions in Sec. IV A, the free energy difference between panels (c) and (d) is $\Phi [0;\mu S]\u2212\Phi [EC;\mu S]$, from (38) and (44). Our central approximation is that the free energy difference between panels (a) and (c) is (approximately) equal and opposite of the difference between (d) and (b) because the local environment of the large particles is the same in both cases. (The only differences are in regions far from any large particles.) At this level of approximation, the free energy differences between [(a),(b)] and [(c),(d)] are equal,

Finally, the right-hand side can be estimated by the square gradient approximation (46), yielding $\Phi ex(C)\u2248\Phi corr(C)$ with

Operation of the three-level method requires numerical estimates of this Φ^{corr}, which includes the integral in (46). Moreover, its value is exponentiated when computing weight factors $W\u0302$, so these numerical estimates are required to high accuracy. This is a non-trivial requirement because the integrand is constant on regions far from the big particles, but it varies much more rapidly when these particles are approached. In such situations, adaptive quadrature schemes are appropriate: we use the cuhre algorithm of the cuba library^{54} that uses globally adaptive subdivision to refine its approximations in the relevant regions of space. Note, however, that while the choice of the numerical integrator influences the intermediate level, small errors in estimation of this integral will be corrected by the second annealing step, so such errors do not affect the consistency of our numerical estimators.

Given this choice of $\Phi corr(C)$, the intermediate level distribution *p*_{I} of (42) has been completely defined, although it still depends on the three parameters *δ*_{free}, *s*, *l* that appear in the function *ɛ*(*r*). We also note that given the approximations made, it is not expected that this *p*_{I} is optimal (its marginal $pI(C)$ does not match $pF(C)$ perfectly). Subsection IV C discusses the parameter choices, and some possibilities for correction factors that can be added to Φ^{corr} in order to address specific sources of error.

### C. Variants of the intermediate-level distribution

In fixing the parameters *δ*_{free}, *s*, *l*, several considerations are relevant. First, if s is too small or *δ*_{free} is too large, the potential $EC$ has little effect on the system and the small particles are not restricted to be close to the large ones. In this case, *p*_{I} ends up close to *p*_{F}, and there is little benefit from the intermediate level. On the other hand, the accuracy of Φ^{sq} is greatest when the gradient of the potential $EC$ is small, and this favors small s and large *l*, *δ*_{free}. In practice, it is also convenient if the two annealing stages insert similar numbers of particles, so that their computational costs are similar. For the example system in Sec. II C, we will present results for a suitable parameter set

We have also tested other values, a few comments are given below.

We will consider several variants of the intermediate level. We denote by $pI(1)$ the distribution defined by (42) and (50), with parameters (51). Figure 8 shows how the quantities of interest differ between the CG and FG models, and the corresponding differences between the intermediate level and the FG model. Here, Δ*g*(*r*) is the difference between *g*(*r*) for the FG model and the distribution of interest (which is either the CG distribution *p*_{C} or one of the variants of the intermediate distribution). Δ*P*(*N*) is the corresponding difference in the probability that the system has *N* large particles.

For the value of *g*(*r*) at contact, we see that the intermediate level $pI(1)$ corrects around half of the deviation between CG and FG models. However, the probability distribution of the *N* has the opposite situation that the intermediate level is *less* accurate than the CG model. [This is partly attributable to the fact that Δ*μ* in Eq. (6) has been chosen to make the CG model accurate.]

To explore the behavior of the intermediate level, we constructed two variants of *p*_{I}. The aim is to understand why $pI(1)$ has inaccuracies and to (partially) correct for them. There are two main approximations in the intermediate level $pI(1)$: the first is (49) and the second is that $\Phi [EC,\mu S]$ can be approximated by the square-gradient approximation (46). The first approximation neglects a significant physical phenomenon in these systems, which is a layering effect of the small particles around the large ones. This is illustrated in Fig. 9 by the radial distribution function $gBS0$ between large and small particles (measured in a system with a single large particle). One sees that there is typically an excess of small particles close to the large ones, followed by a deficit $[gBS0(r)<1]$, and a (weak) second layer.

For (49) to be accurate, the intermediate level should have enough small particles to capture this layering so that the particles being inserted in the second annealing stage are not strongly affected by the presence of the large particles. However, computational efficiency requires that *δ*_{free} is not too large, so these layers are not fully resolved at the intermediate level. To partially account for this effect, we make an *ad hoc* replacement of *μ*_{S} in (46) by an effective chemical potential *μ*_{lay}(**r**), which is chosen such that the corresponding reservoir volume fraction $\eta Slay(r)$ satisfies

In estimating the free energy of the small particles that are inserted in the second level of annealing, this adjustment to Φ^{sq} helps to counteract the error made in (49), leading to an updated potential Φ^{corr,2}. The intermediate level constructed in this way is denoted by $pI(2)$. The results of Fig. 8 show that this variant is (somewhat) more accurate than $pI(1)$. However, the intermediate level still tends to have a smaller number of large particles than the full (FG) mixture.

To investigate this further, we took 800 representative CG configurations. For each one, we estimate the error associated with the approximation (50)

Results are shown in Fig. 10. One sees that the errors are of order unity (note that Φ^{ex} itself is of order 10^{4} so this is a small relative error; see below); there is a systematic trend that Φ^{corr,2} underestimates Φ^{ex} when *N* is large. To correct this error, we introduce an additional correction term to Φ^{corr,2},

and denote the intermediate level constructed in this way by $pI(3)$.

A least squares fit to Fig. 10 suggests to take *α*_{corr} = 0.076; in practice, this tends to over-correct the error in $\Phi corr,2(C)$, and we find better performance with a smaller value

However, the performance of the method depends only weakly on the specific choice of *α*_{corr}, which is discussed in Appendix C. For all following results, we define the intermediate level $pI=pI(3)$ to use the potential Φ^{corr,3}.

### D. Discussion of intermediate level

An important aspect of the three-level method is the self-consistency of the general approach. The intermediate level variants $pI(1)$ and $pI(2)$ were constructed on a purely theoretical basis. The corresponding results in Fig. 8 indicated good performance, but that the distribution of *N* had a systematic error. This error was quantified precisely in Fig. 10, which enabled an improvement to the intermediate level. In principle, this procedure could be repeated to develop increasingly accurate variants of *p*_{I}. That approach would be useful if (for example) one wanted to consider increasingly large systems, where the requirements for the accuracy of *p*_{I} become increasingly demanding.

One way to see the effect of system size is to note that Fig. 10 required the estimation of $\Phi C[EC,\mu S]$ and $\Phi C[0,\mu S]$, whose values are of order 1 × 10^{4}. Since the free energies are exponentiated in the weights for resampling, an absolute error of ±1 is required on these free energies, while their absolute values are extensive in the system size. Hence, one sees that accurate estimates of the free energy are required: their relative error is required to be of the order of the inverse volume of the system.

## V. NUMERICAL TESTS

In this section, we apply the three-level method to the example from Sec. II C using the intermediate level from Sec. IV, with the parameters defined in (51) and (55). The parameters and the annealing schedules are chosen such that, on average, the first and second steps have the same computational effort; see Appendix A for details.

It can be proven^{49} that the three-level method provides accurate results, in the limit where the population sizes *M*_{0}, *M*_{1}, *M*_{2} are all large. In particular, we expect the estimators $A\u0302F3L,A\u0302F,\Delta 3L$ to all obey central limit theorems (CLTs), and the two-level estimators $A\u0302F,A\u0302F,\Delta $ behave similarly. Detailed results are given in Sec. VI. The important fact is that, for large populations, the variances of the estimators behave as

where M is the relevant population size and Σ is called the asymptotic variance (it depends on the observable *A* and on which specific estimator is used). In general, the estimators may have a bias, which is also of order 1/*M*. This means that the uncertainty in our numerical computations is dominated by the random error, whose typical size is $\Sigma /M$, and the mean squared error is given by the variance $MSE(A\u0302)=Var(A\u0302)$, to leading order.

This gives us an easy way to measure and compare the performance of the different estimators. Suppose that we require an estimate of *A* with a prescribed mean squared error. The associated computational cost can be identified with the population size *M* and is given by (56) as $M\u2248\Sigma /MSE(A\u0302)$. Clearly, estimators with small Σ should be preferred. In practice, we do not compare computational costs at fixed error, instead we compare variances $Var(A\u0302)$ at fixed *M*. For any two algorithms (and assuming that *M* is large), the ratio of these variances approximates the ratio of the Σ’s, which can then be interpreted as a ratio of computational costs (at fixed MSE). Numerical results are presented in Sec. V B.

We note that the theoretical results for convergence do not require that the coarse or intermediate levels are accurate. However, one easily sees^{18} that serious inaccuracies in these levels lead to very large Σ. In such cases, one may require prohibitively large populations to obtain accurate results.

In this section, we demonstrate (for the example in Sec. II C) that we do not require very large populations for the three-level method and that the numerical results are consistent with (56). After that, we estimate the asymptotic variances for the two-level and three-level methods. We will find that introducing the third level improves the numerical performance, corresponding to a reduction in Σ.

To this end, we investigate the pair correlation *g*(*r*) of the big particles. As seen in Fig. 8(a), the coarse approximation of *g*(*r*) has a substantial error, especially when two big particles are in contact. To quantify this specific effect, we define the coordination number *N*_{c}, which is the number of large particles within a distance *r*_{1} of a given large particle. (For a given configuration, this quantity is estimated as an average over the large particles. We take *r*_{1} ≈ 10.73*σ*_{S} to be the first minimum of *g*(*r*) of the CG model.) For our example, the coordination number for the FG and CG systems are given by

### A. Accuracy of method

To illustrate the reliable performance of the method, we take a simple example with *M*_{0} = 4 × 10^{5} and *M*_{1} = *M*_{2} (no tapering) and we focus on the difference estimator $A\u0302F,\Delta 3L$, which we expect to be the most accurate. The corresponding numerical estimate of *g*(*r*) is denoted by $g\u0302(r)$, binned using 40 equidistant bins at positions *r*_{j} between *r* = 10 and *r* = 12. Figure 11(a) shows estimates of the difference between $g\u0302(r)$ and its true value, as the population size increases. (The FG result was estimated independently by the two-level method using a large value of *M*_{F} = 18 000.) A population *M*_{2} of several thousand is sufficient for an accuracy better than 0.5 in each bin of *g*(*r*).

For smaller *M*_{2}, fluctuations in the measured $g\u0302(r)$ are apparent in Fig. 11(a). To estimate their size, we define the error for a single run of the three-level method by summing over the bins,

Hence, one expects from (56) that this error decays with increasing population, proportional to $MF\u22121/2$. Figure 11(b) shows an estimate of (58), which is consistent with this expected scaling.

### B. Measurements of variances Σ

We now investigate whether the three-level method does indeed improve on the performance of the (simpler) two-level method of Refs. 17 and 18. The key question is whether the resampling step is effective in focusing the computational effort on the most important configurations of the big particles.

We recall from above that removing the resampling step from the three-level method leads to a two-level method, where the annealing process is paused at the intermediate level and then restarted again. In order to test the effect of resampling, we compare these two schemes, keeping the other properties of the algorithm constant, including the annealing schedule. (To test the overall performance, one might also optimize separately the annealing schedules for the two-level and three-level algorithms and compare the total computational time for the two methods to obtain a result of fixed accuracy. However, such an optimization would be very challenging, so instead we focus on the role of resampling.)

As a very simple quantity of interest, we take the co-ordination number *N*_{c}. We run the whole algorithm *N*_{runs} independent times, and we estimate *N*_{c} for each run. This can be done using several different estimates of *N*_{c}. These are (i) the two-level estimates $A\u0302F$ and $A\u0302F,\Delta $ from (13) and (18); (ii) the corresponding three-level estimates $A\u0302F3L$ and $A\u0302F,\Delta 3L$ of (31) and (35), in which we also vary the ratio *M*_{1}: *M*_{2}, to see the effects of tapering.

All comparisons are done with a fixed total computational budget. We have chosen parameters such that the first and second annealing stages have the same (average) computational cost. This means we need to hold *M*_{T} = (*M*_{1} + *M*_{2})/2 constant during tapering. The two-level method takes *M*_{F} = *M*_{T} (because the single step of annealing in the two-level method has the same cost as the two annealing steps of the three-level method). For the coarse level estimates $A\u0302C$ and $A\u0302C3L$ (which are used in computation of $A\u0302F,\Delta $ and $A\u0302F,\Delta 3L$), the CG computations are cheap so we take *M*_{0} = *M*_{C} = 6 × 10^{6}. This is large enough that the numerical errors on these coarse estimates are negligible in comparison to the errors from higher levels.

For each version of the algorithm, we measure the sample variance of the *N*_{runs} estimates. Results are shown in Fig. 12 for *N*_{runs} = 60 and *M*_{T} = 500. The error bars are computed by the bootstrap method.^{55} It is useful that the variance of all these estimators is expected to be proportional 1/*M*_{T}: this means that reducing the variance by a factor of *α* requires that the computational effort is increased by the same factor. Hence, the ratio of variances of two estimators is a suitable estimate for the ratio of their computational costs.

When carrying out these runs, each estimator was computed by performing annealing on the same set of coarse configurations, to ensure direct comparability. (More precisely, we take a set of 700 representative configurations that are used for the method with *M*_{1}:*M*_{2} = 7:3, and other versions of the method used a subset of these 700.) In addition, it is possible to share some of the annealing runs when computing the different estimators (while always keeping the 60 different runs completely independent). This freedom was exploited as far as possible, which reduces the total computational effort. However, it does mean that the calculations of the different estimators are not at all independent of each other.

### C. Performance: Discussion

All three-level estimators have a reduced standard deviation compared to their two-level equivalents, demonstrating the usefulness of the intermediate resampling step. In all cases, the difference estimate outperforms its equivalent final-level estimate; this effect is stronger for the three-level estimate, providing evidence that the intermediate stop additionally improves the quality of the control variate in the difference estimates.

The effect of introducing tapering from *M*_{I} = 600 to *M*_{F} = 400 is difficult to assess, given the statistical uncertainties in this example. The variance of the tapered final-level estimator is very close to the non-tapered one, despite averaging over fewer configurations. This is possible since we start with more samples in the CG model that improves the sampling at the intermediate step, where we then resample to keep relevant configurations. As the results for the 700–300 tapering show, the tapering rate needs to be chosen carefully as a too aggressive rate can decrease the performance quickly.

Overall, the numerical tests in this section provide strong evidence of the benefit of the intermediate resampling. For our example, switching from a two-level to a three-level difference estimator substantially reduces the variance, from around 0.0029 for the two-level method to 0.0016 at a fixed computational budget. As discussed just below (56), the ratio of these numbers can be interpreted as the ratio of costs for the two- and three-level method: the conclusion for this case is that including the intermediate level reduces the cost by ∼45%. This demonstrates a significant speedup in this specific case, which provides a proof-of-principle of the approach.

## VI. CONVERGENCE OF THE MULTILEVEL METHOD

In Sec. V, we have seen that the three-level method outperforms the two-level method in numerical tests, both for the final-level as well as the difference version of the estimator. In this section, we provide convergence results for both algorithms and compare their asymptotic performance as the number of configurations goes to infinity.

The proof is general, but it does require some assumptions on the models of interest. First, for every allowed CG configuration [that is, configurations $C$ with $pC(C)>0$], we assume that the quantity of interest *A* is bounded. In addition, the probability density $pC(C)$ must be non-zero whenever $pI(C)$ is non-zero, and similarly $pI(C,F)$ must be non-zero whenever $pF(C,F)$ is non-zero.

### A. Two-level method

The two-level method has been previously analyzed in Ref. 18. We summarize its key properties. It was noted in Sec. III B that $A\u0302F\u2192\u27e8A\u27e9F$ as *M*_{F} → *∞* (specifically, this is convergence in probability^{56}). We also expect a CLT for this quantity: as in Eq. (56), the distribution of the error $(A\u0302F\u2212\u27e8A\u27e9F)$ converges to a Gaussian with mean zero and variance Σ_{F}/*M*_{F}. We will derive a formula for this variance, which will be compared later with the corresponding quantity for the three-level model.

For compact notation, it is convenient to define the re-centered quantity of interest

A significant contribution to Σ_{F} comes from the randomness of the annealing procedure, and this can be quantified as

where the variance is again with respect to the annealing procedure (from coarse to fine). Then, following Ref. 18, it can be shown that

where $w(C)=\u27e8W\u0302n(C)\u27e9J=pF(C)/pC(C)$, so one identifies $w(C)2+v(C)$ as the mean square weight obtained from the annealing procedure. Similarly, the estimator $\Delta \u0302$ that appears in the difference estimate $A\u0302F,\Delta $ also obeys a CLT, with variance $Var(\Delta \u0302)\u2248\Sigma F,\Delta /MF$, where

As discussed in Ref. 18, if the computational cost of the coarse model is low, *M*_{C} can be taken large enough that the variance of the coarse estimator $A\u0302C$ is negligible, in which case (18) implies $Var(A\u0302F,\Delta )\u2248Var(\Delta \u0302)$, and hence

Comparing (61) and (62)—which give the variances of $A\u0302F$ and $A\u0302F,\Delta $, respectively—the term $v(C)$ in (61) is replaced by $v(C)\u22121$ in (62), which reduces the variance of the estimator. We expect, in general, that Var_{C}(*A*) and Var_{F}(*A*) should be similar in magnitude, in which case these terms in (62) should have little effect. Hence, one expects that the estimator $A\u0302F,\Delta $ has lower variance than $A\u0302F$. This is consistent with the results of Fig. 12.

### B. Three-level method

The results (61) and (63) are based on the property that each estimator is a sum of (nearly) independent random variables, which means that we can immediately apply standard Monte Carlo convergence results.^{41} This is not possible for the three-level method, since the resampling step correlates the configurations. This makes the analysis of SMC-type algorithms challenging, but widely applicable results are available.^{43,57,58} The three-level method in Sec. III C is an implementation of a random-weight SMC method that has been analyzed in Ref. 49.

To analyze the variance of the three-level method, we require results analogous to (61), which depend on the mean square weights associated with the annealing procedure. To this end, define the average of the final level weight

The averages in these equations are with respect to the second annealing step (from intermediate to fine level), starting at configuration $X2$ (see Sec. III C 3).

For the contribution to the asymptotic variance of the first annealing step, it is important to consider a product of weight factors: $W\u03021n(C1)w2(X\u03021)$. The first factor in this product is the random weight $W\u03021n$ that is obtained by annealing from the coarse to the intermediate level, leading to the intermediate configuration is $X\u03021=(C1,F\u03021)$. The second factor is the averaged weight $w2(X\u03021)$ from (64) associated with the second (subsequent) annealing step. Combining (26) and (29), the average of the product is

and the corresponding variance is

Hence, $w(C1)2+v1(C1)$ is the mean square value of $W\u03021n(C1)w2(X\u03021)$ with respect to the annealing process: this turns out to be a relevant quantity for the asymptotic variance.

The number of configurations *M*_{1}, *M*_{2} can be varied between steps of the three-level method. We formulate the asymptotic variance in the average number of configurations

If the two annealing steps have comparable cost, we can then directly compare the variances for different tapering rates at fixed *M*_{T}. Define also

Then, a direct application of Theorem 2.1 of Ref. 49 gives a CLT for $A\u0302F3L$: for large *M*_{T}, we have

with asymptotic variance

with

The physical interpretation of these formulas will be discussed in Subsection VI C.

Computing the asymptotic variance of the three-level difference estimator $A\u0302F,\Delta 3L$ is more difficult, since it involves difference of non-trivially correlated samples. For some examples of multilevel difference estimators, upper bounds on the asymptotic variance have been developed in Refs. 35 and 36. A detailed analysis of these bounds in the context of our algorithm is beyond the scope of this paper.

### C. Discussion of CLTs

To understand the differences between the two- and three-level method, we compare the asymptotic variances of their corresponding final level estimators Σ_{F} in (61) and $\Sigma F3L$ in (71). The variance of the three-level method has two contributions $\Sigma F,13L$ and $\Sigma F,23L$; they are the variances of two-level methods from the coarse to the fine model and the intermediate to the fine model, respectively. The first term $\Sigma F,13L$ is therefore directly related to Σ_{F}, where the variance of the importance weight $v(C)$ has been replaced by $v1(C)$.

In order to make quantitative comparisons, we again consider the three-level method without intermediate resampling. As discussed in Sec. V B, this is a two-level method with a specific annealing process that consists of the concatenation of the two annealing processes of the three-level method. For the concatenated annealing process, we have

where $X\u0302=(C,F\u0302)$ is generated by the first annealing stage. This means that

where the variance is now over the randomness of both annealing processes. Comparing (74) to (67), we see that $v1(C)$ computes the variance of the same importance weight, but after averaging over the second annealing stage in (64). We can apply Jensen’s inequality^{56} to show that

For the case without tapering *c* = 1/2, the three-level method, therefore, trades a reduction in the variance of the importance weights from coarse to fine in $\Sigma F,13L$ for the addition of a term $\Sigma F,23L$ that corresponds to the variance of a two-level method going from the intermediate to the fine level. The possibility of tapering, i.e., *c* ≠ 1/2, further allows us to optimize the distribution of computation effort between the two stages, which is particularly useful if $\Sigma F,23L\u226a\Sigma F,13L$.

For our application to hard sphere mixture example in Sec. II C, the annealing process is computationally expensive and the resulting weights are noisy. We are therefore in the situation where the variance $v(C)$ contributes substantially to the overall variance, and where we have constructed an intermediate in Sec. IV that improves on the CG model. Following the discussion above, this is the setting where we expect the three-level method to improve upon a two-level method, which is confirmed by the numerical results in Sec. V. Further discussion of the effect of resampling on random-weight SMC methods can be found in Ref. 49.

## VII. CONCLUSIONS

We have introduced a three- and multilevel extension of the two-level simulation method first discussed in Ref. 18. We have applied this method to a highly size-asymmetric binary hard-sphere system. As shown in the numerical test in Sec. V and theoretical results in Sec. VI, the introduction of intermediate resampling that distinguishes the two-level from the three-level method can lead to substantial improvements in performance by reducing the variance in importance weights and by allowing efficient allocation of resources between levels via tapering.

### A. Hard sphere model

In the application to binary hard-sphere systems, the introduction of an intermediate level required us to construct a semi-analytic estimate of the free energy of a system with partially inserted small particles. For this, we have combined a highly accurate square-gradient theory with pre-computed *ad hoc* corrections, yielding an intermediate level that substantially improves the accuracy of the investigated quantities of interest compared to the initial coarse level. Furthermore as we show in Appendix C, the three-level method appears robust with respect to slight deviations of the intermediate level.

Compared to our numerical example, Ref. 17 applied the two-level method to larger and more dense systems than considered here, to investigate the critical point of demixing. This was achieved by replacing the two-body CG model with RED potential used in this publication by a highly accurate two- and three-body potential. The computation of accurate effective potentials entails a substantial upfront computational cost (compared to our construction of the intermediate level), but for the hard sphere mixtures, this results in a CG level that is more accurate than our intermediate level. Despite the challenges of keeping the variance of the importance weights under control for large systems, this turned out to be more efficient overall.

### B. Design principles for other potential applications

We have emphasized throughout that our three-level modeling approach is generally applicable whenever a suitable intermediate level can be constructed. We can identify two main scenarios where this might be attempted. The first scenario is illustrated by the binary hard sphere mixture, which is a two-scale system by construction (there are two species). In this situation, there is no obvious intermediate level, and a careful construction is required, to design one. Our results show that this strategy is possible—it is worthwhile in this example because the system is very challenging to characterize by other methods, so the effort of constructing the intermediate level is worthwhile.

The second scenario—where we may expect a multilevel method to be particularly useful—is that a multi-scale system admits a true hierarchy of coarse-graining, such as a system of long-chain polymers. We can coarse-grain a polymer chain by representing groups of monomers by their center of masses with suitable effective interactions.^{5,59} By varying the number of monomers per group, we get a hierarchy of CG models that could be targeted by a multilevel method. For such methods to be efficient in such a scenario, we require high accuracy of the CG models, and an efficient annealing process to introduce the finer degrees of freedom analogous to the introduction of the small spheres in the hard sphere mixture. Fulfilling these requirements is still challenging and requires considerable physical insight about the specific polymer system of interest, but the hierarchical structure of the system hints that a suitable method might be fruitfully extended to more than three levels, with commensurately increased performance gains.

In both scenarios, careful thought is required to apply the three-level (or multi-level) methods: our approach is far from being a black-box method. Still, the results presented here show that it can be applied in a practical (challenging) computational problem.

A separate limitation of multilevel methods is that the population of unique coarse configurations is fixed from the start and reduces with each subsequent resampling step. This is closely related to the sample depletion effect commonly observed effect in particle filtering, and SMC methods in general.^{25,60} For the multilevel method, we can address this by following each resampling step with a number of Markov Chain Monte Carlo (MCMC) steps to decorrelate duplicated configurations and further explore the system at the current level of coarse-graining.^{60} While such an approach is not feasible for the hard-sphere system where intermediate MCMC is limited by the cost of computing the required approximations, we expect this to be beneficial, for example, whenever intermediate physical systems are described in terms of effective, few-body interactions.

We end with a comment on the implementation of these methods. The introduction of intermediate levels increases the complexity of the code required to simulate the systems. It requires adding an intermediate stage to the annealing process and computing the required integrals (see Sec. IV). Additionally, when implementing the algorithm for the use on compute clusters, the resampling step requires the communication between all nodes. However, we emphasize that while these extra steps require some extra programming, none of the additional steps of the three-level method have added significant computational cost in our example.

To conclude, our results show that the multilevel method can effectively make use of intermediate levels when available, leading to improvements in performance at fixed computational cost. We look forward to further applications of multilevel methods in physical simulations.

## ACKNOWLEDGMENTS

This work was supported by the Leverhulme Trust through Grant No. RPG–2017–203.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

**Paul B. Rohrbach**: Investigation (lead); Methodology (lead); Validation (equal); Writing – original draft (lead). **Hideki Kobayashi**: Investigation (supporting); Methodology (supporting). **Robert Scheichl**: Conceptualization (equal); Funding acquisition (equal); Writing – review & editing (equal). **Nigel B. Wilding**: Conceptualization (equal); Funding acquisition (equal); Writing – review & editing (equal). **Robert L. Jack**: Conceptualization (equal); Funding acquisition (equal); Supervision (equal); Writing – original draft (supporting).

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.

### APPENDIX A: ENSEMBLE DEFINITIONS, AND ESTIMATION OF PARTITION FUNCTIONS

#### 1. Grand canonical ensemble

We define the grand canonical ensemble of the hard sphere mixture discussed in Sec. II. Recall that *k*_{B}*T* = 1. For the system of interest, the equilibrium average of a quantity of interest $A(C)$ in (3) is defined as

where each particle position is integrated over the periodic domain [0,*L*]^{3}. For ease of notation, we introduce the integration measures $dC,dF$ that include the prefactors accounting for the indistinguishability of particles that appear in (A1), which then becomes

consistent with (1) and (3). By definition, we require that *p*_{F} is normalized as $\u222bpF(C,F)dCdF=1$, so we have

The relevant quantities of the CG model ⟨*A*⟩_{C} are defined analogously.

#### 2. Estimation of the partition function

The implementation of the two- and three-level methods requires the computation of the small-particle partition function that appears in the importance weights. We use a method based on Jarzynski’s equality^{20,21} that yields an unbiased estimator; see also Ref. 18. In the statistics literature, this is also known as annealed importance sampling.^{22} We first give a short summary of the method in Appendix A 2 a and then discuss how the annealing processes are implemented for the two- and three-level method in Appendix A 2 b. The parameters used for the numerical tests are given in Appendix A 2 c.

##### a. Theoretical details

We derive an annealing process that inserts the small particles $F$ for a fixed big particle configuration $C$. This process produces weighted configurations that correctly characterize the FG distribution. We closely follow the results from Appendix A of Ref. 18; see also Refs. 20–22 and 61.

Let $ps(C,F)$ and $pe(C,F)$ be two probability distributions for the FG model of the form

The corresponding marginal distributions are $p\alpha (C)=\u222bp\alpha (C,F)dF$. The distributions *p*_{s}, *p*_{e} are the start and end point of an annealing process, with a sequence of intermediate distributions

where *p*_{0} = *p*_{s} and *p*_{K} = *p*_{e}.

Let $C$ be a sample from $ps(C)$: this configuration remains fixed during the annealing process. We anneal the small particles, as follows: first sample an initial small particle configuration $F$ from $ps(F\u2223C)$, the conditional distribution of *p*_{s}. This distribution is *p*_{0} so write $F0=F$ and set *k* = 1: then apply a sequence of MC steps with transition kernel $qC,k(Fk\u22121\u2192Fk)$ that is in detailed balance with the small particle distribution $pk(F\u2223C)$. Iterate this process for *k* = 1, …, *K* − 1: this yields a sequence of small-particle configurations $(F0,F1,\u2026,FK\u22121)$. The big-particle configuration $C$ stays fixed throughout this process.

The relevant results of this procedure are the final small-particle configuration $F\u0302=FK\u22121$ and an annealing weight

Given the initial coarse configuration $(C,F)$, the MC steps define a probability distribution over the weight $W\u0302A$ and the final small particle configuration $F\u0302$, which we denote by

Given the initial configuration $(C,F)$, averages with respect to the annealing process are denoted by ⟨·⟩_{J}.

We now show that this annealing process produces weighted samples of *p*_{e}, up to a constant. More specifically,

This implies that averaging over the start distribution *p*_{s} and the annealing process yields

for any function $B=B(C,F)$, which may depend on both big and small particles.

To show (A8), we compute the average over the annealing process explicitly

By rearranging the factors in the exponential, the right-hand side of (A10) becomes

Detailed balance of the Markov kernels $qC,k$ implies

and by definition

##### b. Application to the two- and three-level method

This section describes the details of the annealing processes used in the two- and three-level method. We first discuss its implementation for the two-level method before showing how to split this process into two stages for the three-level method.

The two-level method starts with samples $C$ of the CG model *p*_{C}. We describe the annealing process,^{18} which produces a weight $W\u0302(C)$ and small particle configuration $F\u0302$ that fulfills (14). Since we have no initial small particle distribution, we cannot directly apply the results of Appendix A 2 a and need to proceed in two steps. Let

be the distribution of small particles around a fixed big-particle configuration $C$, where we now explicitly note the dependence on the small particle chemical potential *μ*_{S}. Computing the unnormalized importance weight $W\u0302(C)$ from (20) requires an estimate of the partition function of the small particles $\Xi [C,\mu S]$. For a system with a small value of the chemical potential *μ*_{0} ≪ *μ*_{S}, we can directly estimate this quantity

as it is the reciprocal probability of having zero small particles in a system with fixed $C$. For small enough *μ*_{0}, this value is close to 1 and can be estimated quickly by a GCMC simulation that decorrelates quickly due to the low density of small particles. Since we can compute this value to a very low variance at negligible cost, we consider our estimate of $\Xi [C,\mu 0]$ it to be exact and we neglect the influence of its fluctuations on the overall variance of the method. Furthermore, we assume that we can generate samples from the low chemical potential distribution of small particles $p\mu 0(F\u2223C)$.

Starting with a sample $F$ of the initial small particle distribution $p\mu 0(F\u2223C)$, we can now apply steps of the annealing process defined in Appendix A 2 a. We define the steps of the annealing process by slowly increasing the chemical potential of the small particles *μ*_{k} in *K* + 1 steps, from *μ*_{0} to *μ*_{K} = *μ*_{S} while keeping the CG distribution fixed. More specifically, we simulate an annealing process for the sequence of probability distributions

yielding an annealing weight $W\u0302A$ and a fine-particle configuration $F\u0302$ as described previously.

Averaging over the initial distribution of small particles and the annealing process and using (A8), (A15), and (A17) yields

Combining this with (19), we have

Thus, we scale the weight that is produced by the annealing process

For this weight, the annealing process fulfills (14) when we include the sampling from the distribution of the initial small particles $F\u223cp\mu 0(F\u2223C)$ as part of the annealing process.

For the three-level method, we split the annealing process discussed above into two consecutive steps. The first part follows exactly the same steps as above, where the annealing process increases the chemical potential *μ*_{k} from a small value *μ*_{0} to *μ*_{S}. The only difference is that we include the potential $U\u0303$ of the intermediate distribution: in place of (A15), we have

so that small particle insertion is suppressed in regions far from large particles. As before, the annealing process results in a (scaled) weight $W\u03021(C)$ and small particle configuration $F\u03021$ that now fulfills (26).

For the second step of the three-level method, we need to define an annealing process that fulfills (28). We start with a sample $X2=(C2,F2)$ from the intermediate level *p*_{I}. Since this configuration already contains small particles, we can directly apply the results of Appendix A 2 a to anneal from $pI(C,F)$ to $pF(C,F)$. This is achieved by a sequence of intermediate annealing distributions that increase the parameter *δ* of the potential (40) so that the volume available to the small particles is slowly increased. This is done in *K* + 1 steps from the parameter *δ*_{0} = *δ*_{free} (the intermediate level) to a final value

at which point the suppression potential does not affect any point in the domain. Then, the intermediate level with *δ*_{K} corresponds to the fine-level distribution, up to the correction factor $e\Phi corr(C)$ in (42) that only depends on the big particles. Following from (A8), this annealing process with scaled weight $W\u03022(C)=W\u0302Ae\u2212\Phi corr(C)$ and final small particle configuration $F\u03022$ fulfills the property (28).

##### c. Annealing schedules for simulations

The importance weights produced by the annealing process are unbiased, and this feature is independent of the details of the annealing schedules. In this sense, the algorithm is valid for any schedule. However, the variance of the computed importance weights depends strongly on the choice of schedule.

The initial chemical potential *μ*_{0} is chosen such that, in a system with no big particles, and there would be an average of $n\u03040=0.01$ small particles present, that is, the initial reservoir volume fraction is $\eta S,0r=0.01/L3$.

For the first stage annealing process, we increase the chemical potential in steps Δ*μ*_{k} such that the average change in the number of small particles would be $\delta n\u0304k=0.2$, in a system where no big particles were present. For the second stage, we increase *δ*_{k} in fixed steps Δ*δ*_{k} = *σ*_{S}/20 000. In both cases, we run one GCMC sweep between each step.

To compute accurate FG reference results used, for example, in Figs. 4 and 11, we apply the two-level method using the same annealing strategy as for the first stage of the three-level method but with $\delta n\u0304k=0.05$ for increased accuracy. Note that, for the numerical tests in Figs. 12 and 15 that directly compare the performance of the two- and three-level method, the two-level method uses the same annealing schedule as the three-level method outlined earlier. The only difference is the lack of resampling.

### APPENDIX B: DETAILS OF THE INTERMEDIATE LEVEL

#### 1. Perturbative approximation of non-homogeneous hard-sphere fluid

This section derives (47) of the main text. To this end, consider a homogeneous hard sphere fluid at chemical potential *μ* and add a perturbing potential

in (44). In a finite periodic system, ** q** should be a reciprocal lattice vector. We aim to estimate the free energy difference between the perturbed and homogeneous system. For this, we follow the steps of the local density approximation discussed in Ref. 52; see also Chap. 6 of Ref. 62. We approximate this difference as

To compute $g$, we need to investigate both sides of Eq. (B2). Starting with the right-hand side, we assume that $g$ is smooth; therefore, we can approximate it for small *a* by a constant $g(\mu \u2212E)\u2261g(\mu )$. To compute the integral of the pressure difference in (B2), we expand around *μ*,

and integrate the gradient correction term

Overall, we obtain

To investigate the left-hand side of (B2), we look at perturbations of the free energy in the limit of small *a*. We can express the derivatives of the free energy in *a* as equilibrium averages of the perturbed system. For *a* = 0,

with $\rho q=\u2211j=1n\u2061sin(q\u22c5rj)$. The first average is zero by translational invariance. For the second average, use that $\u27e8|\rho q|2\u27e9\mu =\u27e8n\u27e9\mu S(\mu ;q)/2$, where *S*(*μ*; *q*) is the structure factor of the fluid^{62} at chemical potential *μ*. So,

Now differentiate (B2) twice with respect to *a*, yielding

where the left-hand side used (B8) and the right-hand side was approximated with (B5) before differentiation.

Finally, differentiate with respect to *q* and send |*q*| → 0: we can identify the second order term of the square-gradient approximation as

which is (47).

#### 2. Accuracy of the square-gradient approximation

The intermediate level that is constructed in Sec. IV relies on the approximation (46) for the free energy of a non-homogeneous hard-sphere fluid. This section discusses the accuracy of this approximation for a system that only contains small particles in an external potential.

We consider a grand-canonical ensemble of small hard-spheres (*σ*_{S} = 1) in a periodic box *V* = [0,*L*]^{3} of length *L* = 10*σ*_{S} without any big particles. We perturb this system by a one-dimensional cosine potential with m periods

We apply this potential to the first component *r*_{1} of the small particle positions **r** = (*r*_{1}, *r*_{2}, *r*_{3}). The potential $Em,cos$ has a fixed maximal strength of 4. We vary the steepness of the potential by varying the number of periods of the cosine *m* = 1, 2, 3; as *m* increases, the potential changes more rapidly. We expect this to make our approximations increasingly inaccurate, as it is constructed under the assumption that the derivatives of the external potential are small.

We have computed the predicted free energy (46) in this system, as well as the cruder approximation (45), which lacks the square gradient approximation. We compare these values with the true free energy, computed via thermodynamic integration, and investigate the dependence of the accuracy of the approximation on the small particle volume fraction $\eta Sr$ and the steepness of the external potential. The results of this computation are shown in Fig. 13.

Figure 13(a) shows that the absolute value of the free energy depends weakly on the number of periods m in the external potential. As expected, the error of the approximation methods increases substantially in Figs. 13(b)–13(d), as the number of periods m increases, as does the steepness of the cosine potential $Em,cos$. The absolute prediction error also increases in the volume fraction $\eta Sr$. For all *m* considered here, the square-gradient method (46) substantially outperforms the prediction (45). For *m* = 1, where the external potential varies the slowest, the square-gradient approximation is nearly exact, which confirms the accuracy of the square-gradient factor $g=g(\mu )$ in Appendix B 1. In addition to the increasing error on increasing *m*, the relative improvement of the square-gradient method over the simple approximation decreases. This indicates that higher order terms in the derivative of the potential start to become more important.

The choice of the 1*d* cosine potential $Em,cos$ here is motivated by the use of a half-period of the cosine to introduce the suppression potential in the construction of the intermediate level in Sec. IV. For the numerical examples in Sec. V, the suppression potential has a (half-period) length of *l* = 3.5 and a maximal strength of *s* = 4.4. In terms of the maximal squared gradient that appears in this potential, it lies between the cases *m* = 1 and *m* = 2. This indicates that our square-gradient approximation is appropriate for the use of predicting the free energy of the partially inserted system in the binary hard-sphere example.

### APPENDIX C: INFLUENCE OF THE *AD HOC* CORRECTION FACTOR

The construction of the intermediate level in Sec. IV included an *ad hoc* correction term that was identified using preliminary computations. In this section, we take another look at this parameter and discuss its importance for the performance of the three-level method. For this, we consider four values for the correction factor

between *α*_{corr} = 0 (no *ad hoc* correction, which means $pI=pI(2)$), and *α*_{corr} = 0.076 (corresponding to a linear least-square fit to the weights in Fig. 10). The effect of the choice of correction factor is displayed in Fig. 14 where we show the differences between the intermediate levels and the FG estimate for the two quantities of interest (see Fig. 3 for their FG averages). The *ad-hoc* correction has a noticeable influence on the quantities of interest, especially for the pair correlation function *g*(*r*) when the two particles are almost touching (*r* ≈ 10). This is in the relevant region for the coordination number *N*_{c}, which was measured as part of the performance test in Sec. V B.

To determine how much the three-level method depends on *α*_{corr}, we repeated this performance test for the different values in (C1). The measured sample variances for *N*_{runs} = 60 independent realizations of the estimators are shown in Fig. 15. As before, the results for different *α*_{corr} are highly correlated because they share the same configurations. The (bootstrap) standard errors of the values in Fig. 15 are comparable to the ones shown in Fig. 12; the same caveats apply and we have omitted them here for clarity of presentation.

The main takeaway from Fig. 15 is that for our example, the three-level estimators without tapering outperform the corresponding two-level estimators, independent of the choice of *α*_{corr}. That is the method appears to be robust with respect to modifications of the intermediate level, even if the mean quantity of interest differs significantly between intermediate distributions.

Given the low number of samples, the exact variance figures should not be over-interpreted. Nevertheless, the trends in Fig. 15 illustrate two aspects of the SMC methodology that are relevant to applications. First, the details of the intermediate level become more important when we increase the tapering rate, as one can, for example, see by comparing the four different values for the final-level estimator $A\u0302F3L$ without tapering and with 7:3 tapering. Second, this effect is dampened for the difference estimators. The general robustness of the difference estimator in the example considered here supports our assertion that for appropriately defined levels, it should be the preferred estimator when applying the three-level method in practice.

## REFERENCES

_{2}to biomolecules (Nobel lecture)