In the Monte Carlo many-body perturbation (MC-MP) method, the conventional correlation-correction formula, which is a long sum of products of low-dimensional integrals, is first recast into a short sum of high-dimensional integrals over electron-pair and imaginary-time coordinates. These high-dimensional integrals are then evaluated by the Monte Carlo method with random coordinates generated by the Metropolis–Hasting algorithm according to a suitable distribution. The latter algorithm, while advantageous in its ability to sample nearly any distribution, introduces autocorrelation in sampled coordinates, which, in turn, increases the statistical uncertainty of the integrals and thus the computational cost. It also involves wasteful rejected moves and an initial “burn-in” step as well as displays hysteresis. Here, an algorithm is proposed that directly produces a random sequence of electron-pair coordinates for the same distribution used in the MC-MP method, which is free from autocorrelation, rejected moves, a burn-in step, or hysteresis. This direct-sampling algorithm is shown to accelerate second- and third-order Monte Carlo many-body perturbation calculations by up to 222% and 38%, respectively.

## I. INTRODUCTION

We have developed a family of Monte Carlo (MC) renditions of *ab initio* many-body methods^{1–10} (see also Refs. 11–26 for other methods that wed quantum Monte Carlo^{27–34} with *ab initio* many-body methods). The Monte Carlo many-body perturbation (MC-MP) methods, which compute the perturbation corrections to energies in a scalable manner, are available through third order.^{1,4} The perturbation corrections to electron binding energies can be computed by the Monte Carlo many-body Green’s function (MC-GF) methods, also developed through third order.^{2,10} The basis-set-incompleteness corrections are evaluable stochastically by combining the explicitly correlated (F12) technique^{35–40} with MC-MP2, producing MC-MP2-F12,^{8} or with MC-GF2, generating MC-GF2-F12.^{9} These methods are applicable to calculations on molecules using atom-centered Gaussian-type orbitals widely used in quantum chemistry. The MC-MP2 and MC-GF2 methods have been extended to one-dimensional periodic solids.^{5}

The basic idea underlying the MC-MP/GF methods for molecules is simple. Taking MC-MP2 as an example, we can write the energy correction as

where the precise definition of *f*^{(2)} can be found in the supplementary material, $ri[n]$ is the coordinates of the *i*th electron in the *n*th MC step, and *τ*^{[n]} is the imaginary-time coordinate in the *n*th MC step. The imaginary-time coordinates are distributed randomly but according to the weight function *ω*_{τ}(*τ*), which is chosen to be an exponential function.^{10} Similarly, the coordinates of the electron pair (**r**_{1}, **r**_{2}) [the other pair (**r**_{3}, **r**_{4}) similarly] are distributed according to the weight function

where *r*_{12} is the distance between **r**_{1} and **r**_{2}. Here, a form of *g*(**r**) suitable for calculations on molecules is obtained by expansion in auxiliary basis functions as

where *φ*_{i}(**r**) is an atom-centered *s*-type Gaussian-type orbital (GTO) of the form

where *c*_{i} is its height, *α*_{i} is its width, and **R**_{i} is its center. The normalization constant, *N*_{ω}, is defined as

which is evaluated analytically by a two-electron repulsion integral program.^{41}

In each MC step of an MC-MP2 calculation, at least two electron pairs, (**r**_{1}, **r**_{2}) and (**r**_{3}, **r**_{4}), are propagated by the Metropolis–Hastings algorithm^{42,43} (hereafter simply the Metropolis algorithm) so that they are eventually distributed according to the normalized weight functions *ω*(**r**_{1}, **r**_{2}) and *ω*(**r**_{3}, **r**_{4}). In contrast, the imaginary time *τ* is obtained by inverse transform sampling according to the exponential weight function.^{10} The integrand *f*^{(2)} is evaluated, and *I*^{[N]} as an MC estimate of *E*^{(2)} is accumulated along with its variance,

which is related to the statistical uncertainty *s* by *s* = *σ*/*N*^{1/2}, although the blocking algorithm^{44} is used in practice (see below).

The MC-MP/GF methods possess several distinct advantages over their conventional deterministic counterparts: (1) Since each processor can perform its own nearly independent MC integration, these methods achieve a high parallel efficiency. For example, an MC-GF3 code executes in parallel at an efficiency of 98.8% on as many as 4096 central processing unit (CPU) cores.^{10} (2) They are readily deployable to graphical processing unit (GPU) accelerators,^{7,10} with MC-MP2 running 2700 times faster on 256 GPUs than on 16 CPU cores.^{7} (3) The operation and memory costs *per MC step* are extremely low, growing no higher than quadratically (often linearly) with system size. This is because no computation, transformation, or storage of two-electron repulsion integrals in any basis is needed in the algorithm. (4) Most importantly, the operation cost *to solutions* has a more favorable size dependence (scaling) than the deterministic counterpart. For example, the observed scaling of the cost of MC-MP2 to reach a given statistical uncertainty is *O*(*n*^{3}),^{7} where *n* is the system size, in contrast to the *O*(*n*^{5}) scaling^{45} of the deterministic MP2. Other deterministic low-scaling methods, such as tensor-hypercontraction (THC) MP2,^{46–48} tend to achieve speedup for a certain class of molecules (such as spatially extended systems with no itinerant electrons) at the expense of incurring small, systematic errors (“biases”) in their results. In contrast, the MC-MP/GF methods have lower cost scaling for general molecules at the price of introducing statistical errors (but no biases). Other stochastic low-scaling approaches^{13,17,18,49} involve the computation and/or transformation of two-electron repulsion integrals, the nominal *O*(*n*^{4}) and *O*(*n*^{5}) processes, which are completely avoided by the MC-MP/GF methods.

Of these advantages, the first two render the MC-MP/GF methods readily deployable on nearly any computer architectures, while the last two imply that they will eventually outperform the deterministic counterparts in an application to a large enough system.

However, these advantages are not without penalty. One is that every result produced has an associated statistical uncertainty, a trait common to every MC method but alien to deterministic methods. This uncertainty can, of course, be reduced by increasing the number of MC samples. Another related disadvantage is that the cost functions have large prefactors, making the crossover point where the MC-MP/GF methods become actually faster than the best deterministic implementations far off along the system-size axis.^{40} Therefore, the reduction of this prefactor, which is equivalent to compressing the statistical uncertainty for molecules of any sizes, is of vital importance for the MC-MP/GF methods.

An effective method of reducing the statistical uncertainty in an MC integration is to enhance its sampling. In the MC-MP/GF methods, random coordinates of electron pairs in a given distribution have been produced by the Metropolis algorithm.^{42,43} This algorithm is a powerful tool that may be used to sample nearly any arbitrary distribution. However, it has several weaknesses, the severest of which may be that as a Markov sampling technique,^{50–52} it generates an autocorrelated random sequence, slowing the convergence of an MC integration.

A related weakness has to do with the fact that the distribution generated by the Metropolis algorithm is guaranteed to converge at the desired distribution only in the limit of an infinite number of samples.^{50–52} A likely scenario where this poses a problem is when the MC-MP/GF methods are applied to a system with multiple well-separated molecules; the trajectories of electron pairs may well be trapped in one molecule and unable to cross the barrier to reach another molecule—a severe case of hysteresis caused by autocorrelation. The redundant-walker algorithm^{3} may partially mitigate this difficulty by generating an initial distribution spreading across the system. However, this is *ad hoc*, and a more general, robust remedy is desired.

Less significant weaknesses of the Metropolis algorithm are the need for a “burn-in” period in the simulation and the existence of wasteful rejected moves, which can add up to 50% of the total moves. See Sec. II A for the details on the burn-in period. It should be acknowledged that in some cases, the waste caused by rejection of moves can be nearly entirely offset by the reuse of values obtained during previous MC steps. However, the MC-MP algorithm is not one such case as the imaginary-time coordinate is updated at every MC step. This necessitates that the majority of the MC-MP algorithm must be performed regardless of rejection.

In this article, we show that a random sequence of electron-pair coordinates forming the distribution of Eq. (4) can be generated *directly*, i.e., without a Markov sampling technique such as the Metropolis algorithm. This *direct-sampling* algorithm introduced here for the MC-MP/GF methods, therefore, has no autocorrelation, hysteresis, burn-in step, or rejected moves. This is achieved by applying several carefully chosen transformations to the distribution function of Eq. (4) so that it is decoupled into a product of one- and two-dimensional distributions. If a low-dimensional distribution is one whose cumulative distribution function (CDF) is analytically invertible, a random sequence is generated by inverse transform sampling.^{53,54} If not, a root-finding algorithm is used to invert the CDF numerically. In either case, the resulting random sequence of electron-pair coordinates consists of independent random variables and is thus completely uncorrelated. In addition, any segment of these sequences immediately forms the distribution, showing no hysteresis. No burn-in step or rejected moves occur in the algorithm, either. We will demonstrate the algorithm’s utility and performance for MC-MP2 and MC-MP3 although it should serve to improve the entire family of methods.

## II. SAMPLING TECHNIQUES

Before describing the direct-sampling algorithm in Sec. III, a brief discussion on sampling techniques is presented. The Metropolis algorithm^{42,43} and the inverse transform sampling method (also known as the quantile method or inverse-CDF)^{53,54} are the two most relevant algorithms discussed below. Specialized methods to sample standard distributions also exist, such as the Box–Muller transformation^{55} or the Irwin–Hall distribution^{56,57} for the normal distribution. They are also used in an efficient implementation of the proposed direct-sampling algorithm.

### A. The metropolis algorithm

The Metropolis algorithm can sample any valid probability density function (PDF). It performs a distorted random walk through coordinate space, which is a two-step process: In the first step, a trial position **x**^{[T]} is generated by a random “move” from the previous position, **x**^{[n−1]}, i.e.,

where Δ**x** is a random offset. The second step decides if this move is accepted or rejected. It is accepted at the probability,

with w(**x**) being the PDF, whereupon **x**^{[n]} = **x**^{[T]}; otherwise, the move is rejected and **x**^{[n]} = **x**^{[n−1]}. In the limit *n* → *∞*, the sequence {**x**^{[i]}|1 ≤ *i* ≤ *n*} is guaranteed^{42,43} to form the distribution w(**x**).

In this algorithm, an initial burn-in step is necessary, during which all generated samples are discarded. It chiefly accomplishes two tasks: First, it randomizes the initial values of **x** so that they already somewhat resemble the PDF. Second, it tunes the average size of the random offset |Δ**x**| so that the accept–reject ratio reaches an appropriate value of approximately 50%.^{50} Choosing the right value for this ratio is a nuanced topic.^{58–62} Since no MC integration is carried out during this burn-in step, the efficiency of the Metropolis algorithm is slightly lessened, though it is usually not a significant issue.

The samples generated by the Metropolis algorithm are not independent since the value of **x**^{[n]} clearly depends on the value of **x**^{[n−1]}; they are correlated. Because of this, the standard formula [Eq. (8)] underestimates the variance, and we use the blocking algorithm of Flyvbjerg and Petersen^{44} in the MC-MP/GF methods to properly account for the autocorrelation. In this algorithm, the variance, {*σ*(*b*)}^{2}, and statistical uncertainty, *s*(*b*), for MC-MP2 are evaluated as

where *b* is the number of blocking transformations, **x**^{[n]} stands for $(r1[n],r2[n],r3[n],r4[n],\tau [n])$ collectively, and *ω*(**x**^{[n]}) represents the denominator of Eq. (8). A more accurate estimate of the variance occurs at the value of *b* where {*σ*(*b*)}^{2} plateaus, and it is always greater than *σ*^{2} of Eq. (8), quantifying the autocorrelation of samples from the Metropolis algorithm.

### B. Inverse transform sampling

Inverse transform sampling^{53,54} is one of the most important algorithms that generates a random sequence of samples *directly* (i.e., without some Markov sampling technique) according to any one-dimensional PDF. For a one-dimensional PDF, w(*x*), its associated cumulative density function (CDF), W(*x*), is defined by

where *x*_{0} is the lower bound of the domain of *x*. A point *x* is generated at the probability,

which may be rewritten as

where W^{−1}(*p*) is called the quantile function of w(*x*). The above equation may be viewed as a mapping of the uniform distribution of *p* on the internal [0, 1) onto the distribution of *x* according to w(*x*). If W(*x*) is invertible analytically [i.e., the quantile function, W^{−1}(*p*), is readily available], then *x* is immediately obtained by evaluating the right-hand side of Eq. (15). If not, Eq. (14) is solved numerically for *x* through a root-finding algorithm.

Sampling from a higher-dimensional PDF may be achieved by first factoring the PDF into as many low-dimensional distributions as possible and then applying inverse transform sampling to these factors separately. An inverse transform sampling procedure for a PDF whose dimensionality is higher than one is possible by employing marginal and conditional distributions (see below), though it rapidly becomes intractable with increasing dimensionality.

## III. THE DIRECT-SAMPLING ALGORITHM

The direct sampling of the distribution of an electron pair according to the six-dimensional PDF of Eq. (4) is achieved by first factoring it into one- and two-dimensional PDFs by a series of coordinate transformations and then applying the inverse transform sampling method to each of these factors. The details are now given below.

We first expand Eq. (4) as

where we introduced the composite index *n* = (*i* − 1) *n*_{g} + *j* and $pn=Nwi,j/N\omega $. It may be reminded that *φ*_{i}(**r**) is a single or “primitive” *s*-type GTO (a radial Gaussian function multiplied by the spherical harmonic *Y*_{00}), and w_{n}(**r**_{1}, **r**_{2}) is an importance primitive,

which is normalized by the analytically evaluable integral^{41}

The distribution for an electron pair given by Eq. (18) takes the form of a mixture distribution,^{52} whose underlying distributions are the importance primitives. This is sampled by first choosing one of its importance primitives, w_{n}(**r**_{1}, **r**_{2}), according to its probability of selection, *p*_{n}, and then sampling the selected distribution. The remainder of this section is, therefore, devoted to the procedure used to sample an importance primitive.

Attempting to sample an importance primitive of Eq. (19) is a rather hopeless endeavor as it is a coupled six-dimensional function in Cartesian coordinates. Hence, a series of coordinate transformations is applied so as to decouple the importance primitive into a product of lower-dimensional distributions, which may lend themselves to an inverse transform sampling.

The first coordinate transformation moves the two GTOs in the importance primitive such that their centers are symmetric about the origin and are aligned along the *z*-axis. Let the *i*th GTO be on the positive *z*-axis. Then, the centers of the GTOs in the transformed coordinates are (0, 0, ±*ζ*_{n}) with

where **R**_{i} is the center of the *i*th GTO in the original Cartesian coordinates. The Cartesian coordinates $r\u0303$ in the new coordinate system are related to the ones **r** in the original system by

with *θ*_{1} = −arccos(Δ*Z*_{n}/2*ζ*_{n}), *θ*_{2} = −arctan(Δ*Y*_{n}/Δ*X*_{n}), Δ*X*_{n} = *X*_{i} − *X*_{j}, and **R**_{n} = (**R**_{i} + **R**_{j})/2, where $R^x(\theta )$ is a spatial rotation by *θ* about the *x* axis and $T^(\u2212Rn)$ is a spatial translation by −**R**_{n}.

Next, the six-dimensional Cartesian coordinates of the electron pair, $(r\u03031,r\u03032)=(x\u03031,y\u03031,z\u03031,x\u03032,y\u03032,z\u03032)$, are transformed to the center-of-mass-like $(x\u0303,y\u0303,z\u0303)$ and relative spherical coordinates (*r*, *θ*, *ϕ*), where *r* is the inter-electronic distance, *θ* is the polar angle, and *ϕ* is azimuthal angle. They are related to one another by

with

and

With this transformation, the importance primitive becomes

where **J** is the Jacobian matrix associated with the coordinate transformation, whose determinant is simply

Here, $Nx;\mu ,\sigma $ stands for the normal distribution with center *μ* and width *σ*, i.e.,

whereas $U\varphi ;a,b$ is the uniform distribution on the interval [*a*, *b*), namely,

The only two-dimensional function here, $wr\theta r,\theta ;\zeta ,\gamma $, is the joint distribution of the inter-electronic distance *r* and the polar angle *θ*, which reads

Therefore, the second transformation factors the importance primitive into three one-dimensional normal distribution functions, one one-dimensional uniform distribution function, and one two-dimensional function of Eq. (37). The coordinates ($x\u0303,y\u0303,z\u0303$) that are normally distributed can be sampled by any one of the well-known algorithms,^{55–57} and the uniformly distributed coordinate (*ϕ*) is sampled trivially.

Two separate approaches, both based on the inverse transform sampling method, are necessary for sampling the two-dimensional distribution, $wr\theta r,\theta ;\zeta ,\gamma $. The approach depends on the value of *ζ* as Eq. (37) becomes indeterminate at *ζ* = 0. The latter corresponds to the case where the two GTOs share the same center. The more general approach for *ζ* ≠ 0 is explained first, followed by the other approach for the special case of *ζ* = 0.

For *ζ* ≠ 0, a two-step approach is used to generate a random sequence of (*r*, *θ*) according to $wr\theta r,\theta ;\zeta ,\gamma $: First, a random value of *r* is generated from its marginal distribution. Second, *θ* is produced from $wr\theta r,\theta ;\zeta ,\gamma $, given the value of *r* determined in the first step.

The marginal distribution of *r*, $wrr;\zeta ,\gamma $, is obtained by integrating *θ* out of Eq. (37), namely,

whose CDF [cf. Eq. (13)] is given by

Since this CDF cannot be easily invertible, the inverse transform sampling of $wrr;\zeta ,\gamma $ must proceed numerically.

Here, we use Halley’s method (also known as second-order Householder’s method).^{63} It is a cubically convergent iterative root-finding algorithm for the equation, *f*(*x*) = 0, in which a systematically more accurate root is obtained by the recursion,

starting with some appropriate initial guess, *x*_{0}. Rewriting Eq. (14) as W_{r}(*r*; *ζ*, *γ*) − *p* = 0, we set *f*(*r*) = W_{r}(*r*; *ζ*, *γ*) − *p*, which implies *f*′(*r*) = W_{r}′(*r*; *ζ*, *γ*) = w_{r}(*r*; *ζ*, *γ*) and $f\u2032\u2032(r)=Wr\u2032\u2032(r;\zeta ,\gamma )$. The above recursion then becomes

with

For a random number *p* from the uniform distribution on the domain of [0, 1) and an initial guess *r*_{0}, a random number *r* from the distribution of w_{r}(*r*; *ζ*, *γ*) is obtained by iteratively substituting *r*_{n} in Eq. (42) up to a sufficiently large *n*. As the initial guess, we used the expectation value of the distribution w_{r}(*r*; *ζ*, *γ*) [Eq. (39)], i.e.,

For a value of *r* thus determined, the polar angle (*θ*) is distributed according to

where the factor of w_{r}(*r*; *ζ*, *γ*) arises as a normalizing constant. The CDF of w_{θ}(*θ*; *r*, *ζ*, *γ*) is

which can be analytically inverted to give the quantile function,

Hence, the inverse transform sampling method works analytically, converting a uniform random distribution of *p* into a random distribution of *θ* by evaluating the right-hand side of the above equation.

The second approach for *ζ* = 0, where the joint distribution of (*r*, *θ*) becomes indeterminate, is now discussed. Taking the *ζ* → 0^{+} limit of Eq. (37), we find

where R(*r*, *σ*) is a Rayleigh distribution,

and

Both of these distributions are readily sampled by inverse transform sampling using their quantile functions.

Once all of the coordinates $(x\u0303,y\u0303,z\u0303,r,\theta ,\varphi )$ have been generated, they are back-transformed to $(x\u03031,y\u03031,z\u03031,x\u03032,y\u03032,z\u03032)$ using Eq. (23) through (28). They are then converted to the electron-pair coordinates in the original Cartesian coordinates (*x*_{1}, *y*_{1}, *z*_{1}, *x*_{2}, *y*_{2}, *z*_{2}) by applying the inverse of Eq. (22).

## IV. RESULTS

### A. Computational details

The MC-MP2 and MC-MP3 calculations were performed on four molecules: methane (*r*_{CH} = 1.0848 Å), ammonia (*r*_{NH} = 1.0123 Å, *r*_{HH} = 1.6011 Å), water (*r*_{OH} = 0.9573 Å, *r*_{HH} = 1.5140 Å), and d-leucine (see the supplementary material for the geometry). All calculations were accelerated by the redundant-walker algorithm^{3} using 128 electron pairs.

The cc-pVDZ atomic-orbital basis set^{64} was used for all molecules. The frozen core approximation was invoked for all calculations. The auxiliary basis sets, which define the weight functions according to Eq. (4) through (6) that are sampled directly here, are specified in Table I. The rationales for their parameter choices are as follows: Typically, two *s*-type auxiliary GTOs are placed on each atom. If the atomic-orbital (AO) basis set contains diffuse functions, the use of more than two *s*-type auxiliary GTOs per atom can reduce the variance.^{9} In the case of two auxiliary GTOs per atom, the height of the first, *c*_{1}, is set to be the number of valence electrons in that atom. Its exponent, *α*_{1}, is chosen to be approximately equal to the exponent of the most diffuse primitive GTO in the most highly contracted *s*-type contracted GTO in the AO basis set. The height of the second auxiliary GTO, *c*_{2}, is set to be one-tenth of *c*_{1}. Its exponent, *α*_{2}, is approximately equal to the exponent of the most diffuse GTO in the least highly contracted *s*-type contracted GTO.

Atom . | i
. | c_{i}
. | α_{i}
. |
---|---|---|---|

H | 1 | 1.0 | 0.40 |

2 | 0.1 | 0.10 | |

C | 1 | 4.0 | 0.50 |

2 | 0.4 | 0.14 | |

N | 1 | 5.0 | 0.80 |

2 | 0.5 | 0.18 | |

O | 1 | 6.0 | 1.00 |

2 | 0.6 | 0.20 |

Atom . | i
. | c_{i}
. | α_{i}
. |
---|---|---|---|

H | 1 | 1.0 | 0.40 |

2 | 0.1 | 0.10 | |

C | 1 | 4.0 | 0.50 |

2 | 0.4 | 0.14 | |

N | 1 | 5.0 | 0.80 |

2 | 0.5 | 0.18 | |

O | 1 | 6.0 | 1.00 |

2 | 0.6 | 0.20 |

The MC performance improvement, which measures the speedup of an alternate (“alt”) MC integration scheme to a reference (“ref”) scheme, is defined as^{50–52}

where *T* is the time per MC step and *σ*^{2} is the variance of the integrand. In this study, “alt” and “ref” designate the MC-MP2/MP3 methods with and without the direct-sampling algorithm, respectively.

The factors in the performance improvement (*η*) were measured separately. The times per MC step (*T*) were determined through relatively short simulations lasting 16 384 steps per CPU core, while the variances (*σ*^{2}) were obtained from more lengthy joint MC-MP2/MP3 simulations taking 155 355 136 MC steps for water, methane, and ammonia and 416 874 496 MC steps for d-leucine. A burn-in period of 100 000 MC steps needed to be performed with the Metropolis algorithm only. The cost of this additional computation is noted separately. It is not included in the performance measurement of *T*_{ref} since its overall impact varies as a function of the total simulation length. All calculations employed the same executable capable of sampling with either the Metropolis algorithm or the proposed direct-sampling algorithm to ensure that neither algorithm is unfairly disadvantaged.

Statistical uncertainties with up to ⌊ log _{2}*N*⌋ blocking transformations were accumulated concurrently in both MC-MP2 and MC-MP3 simulations with *N* MC steps. The correct estimate of the statistical uncertainty *s*(*b*) corresponds to the value of *b* where {*σ*(*b*)}^{2} plateaus [see Eqs. (11) and (12)], which occurs at *b* ≈ 10 and 8 for the MC-MP2 and MC-MP3 methods, respectively (see below).

All calculations were performed using Cray XE nodes on the Blue Waters supercomputer at the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign. An XE node has two eight-core AMD 6276 CPUs and 64 GB of RAM with a peak performance of 313 GFLOPS. The timing measurements were made on a single XE node using all of its processors, whereas the variance determination used a varied number of XE nodes to reach sufficiently converged estimates.

### B. Performance analysis

Table II presents the average times required to perform a single MC step utilizing either the direct-sampling algorithm or the Metropolis algorithm. It is observed that the difference (*T*_{alt} − *T*_{ref}) in the average time is nearly constant irrespective of the method or molecular size. The direct-sampling algorithm needs to perform the transformations described in Sec. III in addition to the evaluation of the weight function [Eq. (4)], which is the most time-consuming step of both sampling algorithms, explaining the difference.

Method . | Molecule . | T_{alt}
. | T_{ref}
. | T_{alt} − T_{ref}
. | T_{ref}/T_{alt}
. | T_{burn}
. |
---|---|---|---|---|---|---|

MC-MP2 | Methane | 1.76 | 1.59 | 0.17 | 0.90 | 12.6 |

Ammonia | 1.54 | 1.40 | 0.14 | 0.91 | 11.2 | |

Water | 1.34 | 1.21 | 0.13 | 0.90 | 9.4 | |

d-Leucine | 10.79 | 10.55 | 0.24 | 0.98 | 37.8 | |

MC-MP3 | Methane | 21.49 | 21.36 | 0.13 | 0.99 | 12.8 |

Ammonia | 20.79 | 20.57 | 0.22 | 0.98 | 11.3 | |

Water | 22.26 | 22.14 | 0.12 | 0.99 | 9.3 | |

d-Leucine | 32.21 | 31.79 | 0.42 | 0.99 | 37.7 |

Method . | Molecule . | T_{alt}
. | T_{ref}
. | T_{alt} − T_{ref}
. | T_{ref}/T_{alt}
. | T_{burn}
. |
---|---|---|---|---|---|---|

MC-MP2 | Methane | 1.76 | 1.59 | 0.17 | 0.90 | 12.6 |

Ammonia | 1.54 | 1.40 | 0.14 | 0.91 | 11.2 | |

Water | 1.34 | 1.21 | 0.13 | 0.90 | 9.4 | |

d-Leucine | 10.79 | 10.55 | 0.24 | 0.98 | 37.8 | |

MC-MP3 | Methane | 21.49 | 21.36 | 0.13 | 0.99 | 12.8 |

Ammonia | 20.79 | 20.57 | 0.22 | 0.98 | 11.3 | |

Water | 22.26 | 22.14 | 0.12 | 0.99 | 9.3 | |

d-Leucine | 32.21 | 31.79 | 0.42 | 0.99 | 37.7 |

The ratio of the average times (*T*_{ref}/*T*_{alt}) approaches unity with increasing molecule size and/or perturbation order. This is because these increases cause the runtime of an MC step to become longer regardless of the sampling method, obscuring the small constant difference in the average time per step. Including the burn-in time into *T*_{ref} would push the ratio of the average times closer to unity. For shorter simulations taking, on average, less than 80 000 MC step per CPU core, the direct-sampling algorithm would be faster than the metropolis algorithm. Therefore, the cost increase incurred by the direct-sampling algorithm is either negligible or non-existent in any calculation of practical interest.

The variances of the MC-MP2/MP3 integrands from the Metropolis algorithm were estimated through the blocking algorithm of Flyvbjerg and Petersen^{44} [Eq. (11)]. They are plotted as a function of the number of blocking transformations *b* in Figs. 1 and 2 for MC-MP2 and MC-MP3, respectively. An accurate estimate of the variance occurs in the smooth plateau region that begins after the initial rise and ends when the variance becomes unstable. For MC-MP2, this region begins after as few as six blocking transformations (*b* = 6) and ends after approximately fifteen (*b* = 15). Similarly, *b* spans the range of five to ten for MC-MP3. We, therefore, choose to set *b* = 10 to estimate the variances for MC-MP2 and *b* = 8 for MC-MP3 in the remaining analysis.

The measured variances from the direct-sampling and Metropolis algorithms are given in Table III. For MC-MP2, the variances are reduced by a factor ranging from 2.09 to 3.57 by switching from the Metropolis to direct-sampling algorithm. For MC-MP3, the variances are compressed by a smaller factor ranging from 1.24 to 1.40. In both cases, the direct-sampling algorithm is superior to the Metropolis algorithm, consistent with the lack of autocorrelation in the former.

Method . | Molecule . | $\sigma alt2$ . | $\sigma ref2$ . | $\sigma ref2/\sigma alt2$ . |
---|---|---|---|---|

MC-MP2 | Methane | 0.191 | 0.535 | 2.80 |

Ammonia | 0.169 | 0.479 | 2.84 | |

Water | 0.130 | 0.466 | 3.57 | |

d-Leucine | 5.29 · 10^{2} | 1.10 · 10^{3} | 2.09 | |

MC-MP3 | Methane | 0.648 | 0.883 | 1.36 |

Ammonia | 0.427 | 0.546 | 1.28 | |

Water | 0.169 | 0.236 | 1.40 | |

d-Leucine | 2.96 · 10^{4} | 3.66 · 10^{4} | 1.24 |

Method . | Molecule . | $\sigma alt2$ . | $\sigma ref2$ . | $\sigma ref2/\sigma alt2$ . |
---|---|---|---|---|

MC-MP2 | Methane | 0.191 | 0.535 | 2.80 |

Ammonia | 0.169 | 0.479 | 2.84 | |

Water | 0.130 | 0.466 | 3.57 | |

d-Leucine | 5.29 · 10^{2} | 1.10 · 10^{3} | 2.09 | |

MC-MP3 | Methane | 0.648 | 0.883 | 1.36 |

Ammonia | 0.427 | 0.546 | 1.28 | |

Water | 0.169 | 0.236 | 1.40 | |

d-Leucine | 2.96 · 10^{4} | 3.66 · 10^{4} | 1.24 |

The smaller factor of compression in MC-MP3 means that the Metropolis samples have a lesser degree of autocorrelation in MC-MP3 than in MC-MP2. This result parallels the foregoing observation that fewer blocking transformations are required in MC-MP3 (cf. Figures 1 and 2). This is likely because there is lesser a chance that all electron pairs’ moves are rejected in MC-MP3 than in MC-MP2, given a greater number of electron pairs in the former.

Table IV presents the MC efficiency increase brought to by the direct-sampling algorithm. These values are simply the products of the ratios given in Tables II and III as per Eq. (52). The MC-MP2 calculations are sped up by a factor ranging from 2.04 to 3.22, while the MC-MP3 simulations execute faster by a factor ranging from 1.22 to 1.38. Therefore, we have shown that the weight functions of the MC-MP2 and MC-MP3 methods can be directly sampled at an efficiency greater than that of the Metropolis algorithm.

## V. CONCLUSIONS

Here, we have proposed, implemented, and tested an algorithm to directly sample the six-dimensional distribution for an electron pair appearing in the MC-MP2/MP3 calculations. At the heart of the proposed algorithm are the several carefully chosen transformations that decouple the distribution into a product of one- and two-dimensional distributions. The one-dimensional distributions turn out to be the normal and uniform distributions, which can thus be sampled directly by well-known procedures. The two-dimensional distribution is tackled by a combination of analytical and numerical inverse transform sampling.

Unlike the Metropolis algorithm used in previous implementations of the MC-MP/GF methods, the proposed direct-sampling algorithm has no autocorrelation of samples, burn-in period, or rejected moves. It generates a random sequence that samples uniformly from a given distribution from the outset and is free from hysteresis or initial-value dependence. Our numerical tests have shown that the direct-sampling algorithm adds negligible cost, while eliminating the autocorrelation and thus reducing the variances by 19%–72%. Overall, it accelerates the MC-MP2 and MC-MP3 calculations by a factor of as much as 3.22 and 1.38, respectively, relative to the Metropolis algorithm.

While these accelerations may appear rather modest, given that they universally reduce the large prefactors on the cost functions of MC-MP methods irrespective of the molecular size, apply generally to higher-order MC-MP and MC-GF methods alike, and have no recognizable drawbacks, the direct-sampling algorithm should replace the Metropolis algorithm in all MC-MP/GF implementations. The Metropolis algorithm may continue to play a role in the development of MC-MP/MC-GF methods if the use of alternate electron-pair distributions is proposed, as would be the case if the use of other basis types, such as plane waves or Slater-type orbitals, is to be explored.

## SUPPLEMENTARY MATERIAL

See the supplementary material for the definition of *f*^{(2)} from Eq. (1) and the geometry of d-leucine.

## DATA AVAILABILITY

The data that support the findings of this study are available within the article and its supplementary material.

## ACKNOWLEDGMENTS

This work was supported by the Center for Scalable, Predictive methods for Excitation and Correlated phenomena (SPEC), which is funded by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences, Chemical Sciences, Geosciences, and Biosciences Division, as a part of the Computational Chemical Sciences Program and also by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences under Grant No. DE-SC0006028. It is also part of the Blue Waters sustained-petascale computing project, which is supported by the National Science Foundation (Award Nos. OCI-0725070 and ACI-1238993) and the state of Illinois. Blue Waters is a joint effort of the University of Illinois at Urbana-Champaign and its National Center for Supercomputing Applications.