A stochastic algorithm is proposed and implemented that computes a basis-set-incompleteness (F12) correction to an *ab initio* second-order many-body perturbation energy as a short sum of 6- to 15-dimensional integrals of Gaussian-type orbitals, an explicit function of the electron-electron distance (geminal), and its associated excitation amplitudes held fixed at the values suggested by Ten-no. The integrals are directly evaluated (without a resolution-of-the-identity approximation or an auxiliary basis set) by the Metropolis Monte Carlo method. Applications of this method to 17 molecular correlation energies and 12 gas-phase reaction energies reveal that both the nonvariational and variational formulas for the correction give reliable correlation energies (98% or higher) and reaction energies (within 2 kJ mol^{−1} with a smaller statistical uncertainty) near the complete-basis-set limits by using just the aug-cc-pVDZ basis set. The nonvariational formula is found to be 2–10 times less expensive to evaluate than the variational one, though the latter yields energies that are bounded from below and is, therefore, slightly but systematically more accurate for energy differences. Being capable of using virtually any geminal form, the method confirms the best overall performance of the Slater-type geminal among 6 forms satisfying the same cusp conditions. Not having to precompute lower-dimensional integrals analytically, to store them on disk, or to transform them in a nonscalable dense-matrix-multiplication algorithm, the method scales favorably with both system size and computer size; the cost increases only as *O*(*n*^{4}) with the number of orbitals (*n*), and its parallel efficiency reaches 99.9% of the ideal case on going from 16 to 4096 computer processors.

## I. INTRODUCTION

Slow convergence^{1} of electron-correlation energies with respect to the size of atomic-orbital (AO) basis set is one of the most serious weaknesses of *ab initio* molecular orbital (MO) theory.^{2–5} Its cause is well recognized: the ability of the products of the AO basis functions to describe the exact wave functions is rather poor at short inter-electronic distances, or even completely lacking in cases where they are cusped at electron-electron coalescence.^{1,6,7} The most direct and effective remedy^{2,3,5,8} is to use a two-electron basis function that depends explicitly on the inter-electronic distance, *r*_{12}, which is called a correlation factor or geminal.

Slater^{9,10} and Hylleraas^{11,12} were the first to report the results of such calculations for the helium atom, which were much closer to the exactness than those without an explicit function of *r*_{12}. Kutzelnigg^{8} developed a general framework in which to include the simplest geminal—a linear function of *r*_{12}—in Gaussian-basis-set second-order many-body perturbation (MP2) theory,^{13,14} pioneering the explicitly correlated MP2 (MP2-R12) method. In this method, a basis-set-incompleteness (R12) correction to reach the complete-basis-set (CBS) limit is written as a short sum of two-electron integrals over a geminal, an orthogonality projector, and occupied orbitals only. Here, the orthogonality projector is there to prevent a double counting of the same correlation-energy contribution between MP2 and R12.

One may foresee at least three technical challenges in implementing MP2-R12: (1) Integrals of a geminal multiplying Gaussian-type orbitals (GTOs) need to be evaluated preferably analytically, limiting the forms of such factors to a handful of the simplest ones.^{15–21} (2) The excitation amplitudes multiplying these integrals have to be determined through an *O*(*n*^{6}) process (where *n* is the number of orbitals),^{22} making MP2-R12 less scalable with system size than *O*(*n*^{5}) of its parent MP2. (3) When the orthogonal projector is expanded, numerous three-electron (9-dimensional) and four-electron (12-dimensional) integrals involving both occupied and virtual orbitals emerge. Computer and human costs of their evaluation would be prohibitive.

Combining with their Gaussian-type geminal method, Persson and Taylor^{23} alleviated problems (1) and (3). In their approach, a geminal of any form can be expanded by a small number of Gaussian-type geminals (GTGs), and the resulting many-electron integrals over GTOs and GTGs are directly and analytically evaluated by the McMurchie–Davidson algorithm^{24–26} or by the Rys quadrature.^{27–29} Although a GTG does not satisfy the cusp conditions,^{1,6,7} a geminal expanded by GTGs is still found to capture a majority of the R12 correction, suggesting^{30} that an accurate description of a correlation hole at intermediate *r*_{12} is more important than that at *r*_{12} = 0. Furthermore, Szalewicz *et al.*^{31,32} showed that approximating the orthogonal projector by the so-called weak-orthogonality projector makes higher-than-three-electron integrals disappear in the formalism. However, such simplification comes at a price of somewhat poorer performance.^{4,33}

Klopper and Kutzelnigg^{34,35} proposed a practical and now-widely used method that resolved problem (3), in which all three- and four-electron integrals are approximated as linear combinations of two-electron integrals by the resolution-of-the-identity (RI) insertion with an auxiliary basis set (ABS).^{34–37} Also, problem (1) was partially addressed by Ten-no, who introduced^{16,38} a nonlinear function of *r*_{12} known as the Slater-type geminal, which was found to accelerate the basis-set convergence more than linear *r*_{12} or a GTG. Ten-no thus established the MP2-F12 method. However, the true merit of the Slater-type geminal may be that its functional form resembles the correlation hole in a wide range of *r*_{12} so closely that the excitation amplitudes multiplying its geminal integrals can be held fixed at values dictated by the cusp conditions.^{1,6,7} Hence, Ten-no’s fixed-amplitude method,^{39} combined with his Slater-type geminal,^{16,38} brings the cost scaling of MP2-F12 down to the same *O*(*n*^{5}) of MP2, resolving problem (2), because the excitation amplitudes no longer have to be determined through the *O*(*n*^{6}) process.

In this work, we explore a rather different approach—the use of a stochastic algorithm—to resolving all three problems mentioned above as well as the fourth problem: (4) In deterministic implementations, a large number of two-electron integrals are generated from far fewer, higher-dimensional integrals by the RI insertions, which, together with the ordinary two-electron integrals for MP2, need to be pre-calculated, stored (on disks), and used in a series of dense matrix multiplications; such computational steps are nonscalable with system size (in the sense that the cost increases as the fifth power of the number of orbitals) or computer size (because dense matrix multiplications^{40} tend to involve large and frequent inter-processor communications). Here, we present a novel, stochastic MP2-F12 method, which is based on Ten-no’s fixed-amplitude method, but otherwise differs from the existing MP2-F12 methods in that it can use virtually any geminal, does not require the RI approximation or an ABS, is not predicated upon numerous molecular integrals precomputed or stored on disks, and thus scales favorably with both system and computer sizes.

It is an extension of the Monte Carlo MP2 (MC-MP2) method,^{41} which is a member of the Brueckner–Goldstone quantum Monte Carlo (BGQMC) family^{14} of methods for both electrons^{41–46} and vibrations.^{47–50} It weds diagrammatic many-body perturbation and Green’s function theories with the Metropolis Monte Carlo (MC) algorithm,^{51} in the same spirit as other recent studies combining *ab initio* MO theory with quantum Monte Carlo (QMC).^{52–62} Unlike more conventional QMC methods,^{63–67} MC-MP2 can compute energy differences (such as correlation energies, correlated ionization potentials and electron affinities, and quasiparticle energy bands) directly and not as small differences of noisy total energies, does not suffer from any sign problem or fixed-node error, and is systematically convergent at the exact solutions of the Schrödinger equations as the perturbation rank and basis-set size are increased. It is also rigorously (diagrammatically) size-consistent and thus free from a finite-size error. Unlike deterministic MP2, MC-MP2 does not need two-electron integrals precomputed or stored either in the AO or MO basis and is, therefore, more scalable. The operation cost per MC step of MC-MP2 is shown to be linear^{43} with system size and the cost to achieve a given relative statistical uncertainty is found to be cubic.^{68} It can be easily and efficiently parallelized on many central processing units (CPUs)^{44} or on many graphical processing units (GPUs),^{68} sometimes achieving scalability^{68} unprecedented for *ab initio* electron-correlation theories.

In the explicitly correlated extension called MC-MP2-F12 presented here, we exploit another important advantage of MC-MP2, which is its flexibility with various mathematical forms of basis functions. MC-MP2-F12 can use virtually any geminal forms as it evaluates necessary integrals numerically by the Metropolis MC method. Furthermore, being a sparse integration method, the MC method’s relative superiority over quadrature grows with dimension; it was argued that the former is more efficient than the latter when the dimension exceeds eight.^{69} In this sense, three-electron (9-dimensional) and higher-dimensional integrals or even two-electron (6-dimensional) integrals may be more suitably handled by MC integrations than by analytical integrations.^{25,26,28,29} The latter are also more expensive to develop and may have a hard ceiling of applicability when the problem size is too large. The former, in contrast, would execute for a far larger problem, only taking longer to converge and give meaningful results.

Also, the MC method makes it unnecessary to factor high-dimensional integrals into lower-dimensional ones in MC-MP2-F12; it can directly evaluate a short sum of high-dimensional integrals, neither requiring the RI approximation and a large ABS nor involving many dense matrix multiplications, which tend not to be scalable. Consequently, the cost of MC-MP2-F12 to achieve a given relative statistical error is found to scale more favorably [*O*(*n*^{4})] than MP2 or MP2-F12 with Ten-no’s fixed amplitudes. Its parallel algorithm of MC-MP2-F12 is also found to exhibit near-perfect scalability up to thousands of processing cores and is essentially free of disk I/O.

In a previous Communication in this journal,^{41} Willow and three of the present authors reported a pilot implementation of the MC-MP2-F12 method using the nonvariational *V* formula (see below) and obtained *total* correlation energies near their CBS limits using Ten-no’s fixed-amplitude ansatz^{39} with the Slater-type geminal.^{16} Since this (*V*) formula is not variational with respect to the form of the geminal or the excitation amplitude multiplying its integrals (which are held fixed^{39}), whether it gives reliable *relative* energies near the CBS limits was not clear. While the cost per MC step for MC-MP2-F12 with the nonvariational (*V*) formula was shown^{41} to be quadratic with the number of orbitals, the more meaningful cost, i.e., that required to reach a given relative statistical uncertainty, was unknown. Parallel scalability was strongly inferred but never demonstrated, either. Furthermore, the ease of use of any geminal was touted but never exploited. In this paper, we fully develop MC-MP2-F12 using both of the nonvariational (*V*) and variational (*VBX*) formulas and answer all of these unanswered questions.

## II. THEORY

### A. *E*_{F12}

The correlation energy of MP2-F12 theory^{8,16,22,39,41} is written as

where *E*_{MP2} is the MP2 correlation energy in a finite (AO) basis set^{14} and *E*_{F12} is the correction for the basis-set-incompleteness error. The latter (the R12 or F12 correction) is derived from the Hylleraas functional^{3,5} and consists of three terms,

with

where *ϵ _{p}* is the

*p*th Hartree–Fock (HF) orbital energy, $tijmn$ is the so-called geminal amplitude, and all summations run over occupied orbitals spanned by the basis set. The other factors are molecular integrals of two electrons, which are written in the standard physicists’ notation as

where *f*_{12} is the geminal (an explicit function of *r*_{12}) and $F\u02c6n$ is the Fock operator of electron *n*, i.e.,

The right-hand side of this equation is the sum of the kinetic-energy operator, nuclear-attraction operator, Coulomb operator, and exchange operator, in this order. Operator $Q\u02c612$ is the strong-orthogonality projector^{70–73} defined as

with

where $P\u02c631$ replaces the coordinates of electron 1 by 3 in what follows. The summation in Eq. (12) runs over all virtual orbitals spanned by the basis set.

The F12 correction (*E*_{F12}) is variational with the geminal amplitudes, whose values are, therefore, to be determined by minimizing *E*_{F12}, through an *O*(*n*^{6}) process,^{22} where *n* is the number of orbitals. It can also be shown that at the minimum, $EF12B+EF12X=\u2212EF12V$ and hence,

Ten-no showed^{39} that an accurate F12 correction can be obtained by holding the geminal amplitudes fixed at the values dictated by the cusp conditions,^{1,6,7} i.e.,

insofar as an appropriate form of the geminal is used, such as Ten-no’s Slater-type geminal,

where *γ* is an adjustable parameter. With Ten-no’s fixed amplitudes, the F12 correction as defined by Eq. (2) (the variational *VBX* formula) is no longer a variational minimum with respect to the “size” ($tijmn$) of the geminal, but is still variationally bound from below with respect to its “shape” (*γ*). The nonvariational (*V*) formula^{41} [Eq. (13)] is not variational with either. Since the errors in a variational energy have the same sign and tend to also have similar magnitudes, they may cancel with each other to yield accurate energy differences. Below, we quantify the performance of the nonvariational (*V*) and variational (*VBX*) formulas for energy differences. We will also address another important question: What is special about the Slater-type geminal of Eq. (15) that makes the fixed-amplitude method work? Below, we explore several geminals that have the right asymptote,

Before converting them into forms suitable for MC integration, we make two additional simplifying approximations to the formalism.

First, we assume the so-called generalized Brillouin condition (GBC) and extended Brillouin condition (EBC),^{17,22} i.e.,

where *i* and *a* label an occupied and virtual orbital, respectively. This assumption leads to

which, in turn, simplifies the *B* integrals [Eq. (7)] into

and

where the idempotency of $Q\u02c612$ is used, i.e., $Q\u02c6122=Q\u02c612$. As will be discussed below, when expanded, each $Q\u02c612$ introduces a new electron and increases the dimension of an integral by three [see Eqs. (11) and (12)]. Therefore, Eq. (25) is far more computationally tractable than Eq. (23).

Second, using the commutator $[F\u02c61+F\u02c62,f12]$, we rewrite Eq. (25) as

and

Note that the GBC is invoked in Eq. (29).

These two approximations reduce *E*_{F12} to

with

and

The GBC is known to be an accurate approximation, causing errors that are no greater than a milli-Hartree in molecules ranging from H_{2} to O_{3}.^{17} The EBC is less accurate, but still tolerable.^{17,74} For a smaller basis set, in fact, it is recommended^{17,74} that the EBC be invoked when the GBC is used.

### B. $EF12V$

with

where the subscripts on the left-hand sides indicate the number of electrons involved and thus the dimension of integrals. They are rewritten in the MC-integrable forms as

with

where

The formalism of this term is unchanged from Ref. 41.

### C. $EF12BX$

The sum of the *B* and *X* terms, Eq. (33), is newly considered in this work. Expanding $Q\u02c612$ in this equation, we find

with

and

where we use the fact that only the kinetic-energy ($T\u02c6n$) and exchange ($K\u02c6n$) operators in the Fock operator [Eq. (9)] do not commute with *f*_{12} and, therefore,

The subscripts “2e,” “3e,” etc., again refer to the number of electrons that appear in the integrals. It is incremented in the exchange terms [ Eqs. (51)–(53)] because the exchange operator introduces one more electron, just like $Q\u02c612$ (see Appendix A).

After straightforward, but rather tedious algebra, which is partially computerized in this work, we arrive at the following MC-integrable expressions:

The integrals in the two-electron kinetic-energy contribution [Eq. (55)] are given by

with

See Appendix A for the derivation of Eqs. (61) and (62). The integrand of Eq. (55) is divided into two terms, $F2eT1$ and $F2eT2$, because they have rather different behavior as *r*_{12} → 0, thus requiring different weight functions in MC integrations (see below). The subscripts of *r*_{12} in Eq. (63) are always “12” and independent of *p* or *q*. This is related to the fact that, in Eqs. (61) and (62) and throughout our formalism, the singular operator is chosen (by coordinate interchanges) to be always of the form $r12\u22121$, so that the same weight function containing $r12\u22121$ can be consistently applied to these dimensions.

The factors of the geminal, *f*^{(a)}, *f*^{(b)}, *f*^{(c)}, and *f*^{(d)}, are defined as

whose actual forms are compiled in Table I for the 6 geminals^{75} considered in this work. This form is related to the so-called Kutzelnigg’s regularization operator.^{8} The cusp conditions^{6,7} reduce to

which is satisfied by all the geminals in Table I.

Geminal . | Name . | $f12(a)$ . | $f12(b)$ . | $f12(c)$ . | $f12(d)$ . |
---|---|---|---|---|---|

$f12(1)=(1\u2212e\u2212\gamma r12)/\gamma $ | Slater | −2e^{−γr12} | γe^{−γr12} | e^{−γr12} | 0 |

$f12(2)=(1\u2212e\u2212\gamma r122)/(\gamma r12)$ | Gauss | $\u22122e\u2212\gamma r122$ | $4\gamma r12e\u2212\gamma r122$ | $(e\u2212\gamma r122\u22121)/(\gamma r122)+2e\u2212\gamma r122$ | 0 |

$f12(3)=\gamma r12/(\gamma +r12)$ | Rational | −2γ^{3}/(r_{12} + γ)^{3} | 0 | γ/(γ + r_{12}) | −γ/(γ + r_{12})^{2} |

$f12(4)=ln(1+\gamma r12)/\gamma $ | Logarithm | −2/(1 + γr_{12})^{2} | −γ/(1 + γr_{12})^{2} | 1/(1 + γr_{12}) | 0 |

$f12(5)=arctan(\gamma r12)/\gamma $ | Arctangent | $\u22122/(1+\gamma 2r122)2$ | 0 | $1/(1+\gamma 2r122)$ | 0 |

$f12(6)=f12(1)/2+f12(3)/2$ | Hybrid | −e^{−γr12} − γ^{3}/(r_{12} + γ)^{3} | γe^{−γr12}/2 | e^{−γr12}/2 + γ/(2γ + 2r_{12}) | −γ/(γ + r_{12})^{2}/2 |

Geminal . | Name . | $f12(a)$ . | $f12(b)$ . | $f12(c)$ . | $f12(d)$ . |
---|---|---|---|---|---|

$f12(1)=(1\u2212e\u2212\gamma r12)/\gamma $ | Slater | −2e^{−γr12} | γe^{−γr12} | e^{−γr12} | 0 |

$f12(2)=(1\u2212e\u2212\gamma r122)/(\gamma r12)$ | Gauss | $\u22122e\u2212\gamma r122$ | $4\gamma r12e\u2212\gamma r122$ | $(e\u2212\gamma r122\u22121)/(\gamma r122)+2e\u2212\gamma r122$ | 0 |

$f12(3)=\gamma r12/(\gamma +r12)$ | Rational | −2γ^{3}/(r_{12} + γ)^{3} | 0 | γ/(γ + r_{12}) | −γ/(γ + r_{12})^{2} |

$f12(4)=ln(1+\gamma r12)/\gamma $ | Logarithm | −2/(1 + γr_{12})^{2} | −γ/(1 + γr_{12})^{2} | 1/(1 + γr_{12}) | 0 |

$f12(5)=arctan(\gamma r12)/\gamma $ | Arctangent | $\u22122/(1+\gamma 2r122)2$ | 0 | $1/(1+\gamma 2r122)$ | 0 |

$f12(6)=f12(1)/2+f12(3)/2$ | Hybrid | −e^{−γr12} − γ^{3}/(r_{12} + γ)^{3} | γe^{−γr12}/2 | e^{−γr12}/2 + γ/(2γ + 2r_{12}) | −γ/(γ + r_{12})^{2}/2 |

The integrands in the three-electron kinetic contribution [Eq. (56)] are given by

whereas those of the four-electron kinetic contribution [Eq. (57)] read

and

The integrands in the exchange contributions [Eqs. (58)–(60)] are found to be

and

See Appendix A for the derivation of Eq. (70) as an example.

## III. MONTE CARLO ALGORITHM

Each of the contributions to the F12 correction is evaluated by the Metropolis MC method with the redundant-walker convergence-acceleration technique.^{44} Identifying and using an appropriate weight function are essential for the viability (let alone efficiency) of any MC integration. A weight function should be positive everywhere, be analytically integrable, have the same singularity as the integrand, and generally behave like the integrand.

### A. $EF12V$

and

where *N* is the total number of MC steps and *m* is the number of “redundant walkers” (see below for the definition).

Electron pairs with coordinates ${r1[n],r2[n]|1\u2264n\u2264N}$ are distributed randomly but according to the normalized two-electron weight function of the form,

where *g*(** r**), in our implementation, is chosen to be a sum of two atom-centered

*s*-type GTOs per atom,

Here, *n*_{atom} is the number of atoms and *r*_{A} is the position of the *A*th atom. The normalization coefficient, *N*_{2e}, can be evaluated analytically.^{76} Although the $r12\u22121$ singularity in $F2eV$ [Eq. (42)] is analytically removed by *f*_{12} in its numerator, we elect to use the above weight function that makes the MC algorithm sample more heavily at short *r*_{12} distances. This is appropriate or even necessary because $F2eV$ is expected to vary more rapidly at short *r*_{12} distances, even after the singularity is removed.^{77–79} In $F3eV$ [Eq. (43)] and $F4eV$ [Eq. (44)], the $r12\u22121$ singularity remains and the use of the weight function that cancels it is essential. A random distribution according to *w*_{2e} is achieved by the Metropolis algorithm.^{44,80}

A distribution of one-electron coordinates ${r3k[n]|1\u2264n\u2264N}$ is generated randomly but according to the normalized one-electron weight function,

where *N*_{1e} is again analytically determined.^{76} Electron 3 or 4 is not strongly coupled with the others in integrand $F3eV$ or $F4eV$, and, therefore, one-electron “walkers” whose distributions resemble the molecule’s electron density are appropriate in these dimensions. In the redundant-walker algorithm,^{41,44}*m* such independent distributions (redundant walkers) are generated (1 ≤ *k* ≤ *m*), so that each of these *m* one-electron walkers can be used in Eq. (74) and *m*(*m* − 1)/2 distinct pairs of one-electron walkers in Eq. (75). Since generating *m* distributions increases the cost by no more than a factor of *m*, but it increases the number of distinct summands (samples) by *O*(*m*^{2}), this algorithm increases the sampling efficiency of Eq. (75) by *O*(*m*). However, Eq. (73) is unaffected by the algorithm. Therefore, the overall performance boost by the redundant-walker algorithm in MC-MP2-F12 is hard to predict. The *V* contribution implemented in this manner was reported in Ref. 41.

### B. $EF12BX$

The sum of the *B* and *X* terms [Eq. (47)] consists of two-, three-, and four-electron kinetic-energy contributions as well as three-, four-, and five-electron exchange contributions, which are as high as 15-dimensional. They are evaluated as

and

where *w*_{2e} is the weight function for an electron-pair walker for strongly coupled integration variables (*r*_{1} and *r*_{2}), while *w*_{1e} is the weight function for *m* independent one-electron walkers for relatively uncoupled variables. These weight functions are identical to Eqs. (76) and (78), respectively, and hence the same electron-pair and one-electron walkers for the *V* term can be reused for the *BX* term (as well as for MC-MP2). The sampling efficiency of the first term of Eq. (79) is unchanged by the redundant-walker algorithm and that of the second term is increased by a factor of *O*(*m*). The *m*-dependence of the number of samplings in individual terms in Eqs. (80) and (81) can be inferred similarly, but that of the overall performance is again hard to predict, but is expected to be small.

The exchange contributions are evaluated as

and

again reusing the same electron-pair walkers and *m* independent one-electron walkers. The use of *w*_{2e} for variables *r*_{1} and *r*_{2} and of *w*_{1e} for the other variables can be rationalized by the structure (occurrence of singularity) in the integrands [ Eqs. (70)–(72)].

### C. Statistical uncertainty

The MC integrals for both the nonvariational (*V*) [Eq. (13)] and variational (*VBX*) [Eq. (32)] formulas can be written in a unified form as

where *I*^{[n]} collects all summands of the outermost summations in Eqs. (73)–(75), (79)–(81), and/or (82)–(84). The statistical uncertainty *σ _{N}* in

*I*at the

_{N}*N*th MC step can be estimated as

However, this estimate is well known^{81} to be an underestimation of the true statistical uncertainty because the coordinates of a walker are correlated across several MC steps. A more accurate estimate is obtained by the blocking algorithm of Flyvbjerg and Petersen^{81} as

where *N*_{b} is the block size, which is to be gradually enlarged until *σ _{N}* plateaus. The relative error at the

*N*th MC step is, therefore,

## IV. RESULTS AND DISCUSSION

A massively parallel MC-MP2-F12 program was implemented using the redundant-walker algorithm. Both nonvariational *V* formula^{41} [Eq. (13)] and variational *VBX* formula [Eq. (32)] were used, and the notational distinction is made in this article in the parentheses following the method label as in MC-MP2-F12(*V*) and MC-MP2-F12(*VBX*).

The number of redundant walkers (*m* in various equations in Sec. III) was 40 and the block size [*N*_{b} in Eq. (88)] was 6. The frozen-core approximation was used in all cases. A F12 correction thus obtained was added to the corresponding MP2 energy obtained by the conventional deterministic algorithm in nwchem.^{83} This is merely to isolate the performance characteristics of MC-MP2-F12 from those of MC-MP2 because the result of the latter becomes immediately available from the former with no extra cost.

The CBS limits of the MP2 energy were extrapolated applying the *X*^{−3} formula^{84} to the deterministic values obtained with the aug-cc-pVQZ and aug-cc-pV5Z basis sets. Hereafter, we abbreviate the cc-pVXZ basis set as “XZ” and the aug-cc-pVXZ basis set as “AXZ,” where X = D, T, Q, or 5.

### A. Geminals and *V* versus *VBX* formulas

The MP2 correlation energies for H_{2}O and CH_{4} at the geometries of Bak *et al.*^{85} were computed with the MC-MP2-F12 method with the DZ and TZ basis sets. Six geminals^{75} listed in Table I were used (whose forms near *r*_{12} = 0 are visualized in Fig. 1). Recall that any differentiable function can be employed as a geminal in MC-MP2-F12(*VBX*) or any function (even a numerically defined one) in MC-MP2-F12(*V*). This is one of the unique advantages of MC-MP2-F12, shared by QMC, the latter using the Jastrow factor routinely,^{86–89} some even correlating two electrons and a nucleus.^{90} See also the work of Nooijen and Bartlett,^{75} who proposed many geminals including those used here, Monkhorst^{91} for a rational geminal whose form is dictated by its equation of motion in the first order and asymptotic regions, Grüneis *et al.*,^{92} who used a “Yukawa–Coulomb” geminal with their planewave MP2-F12, and Silkowski *et al.*^{93} for a range-separated geminal.

Figure 2 shows the *γ*-dependence of the MC-MP2-F12 energy of H_{2}O. The nonvariational *V* data are plotted as green curves, while the variational *VBX* ones as purple curves. The uncorrected MP2 energies with the DZ and TZ basis sets as well as the CBS limit are indicated as red lines. Figure 3 plots the same for CH_{4}.

First, focusing on the bounded *VBX* curves (purple), we find that all geminals work reasonably well with the exception of geminal 2; the curves from all geminals except 2 are not strongly dependent on *γ* and close to the CBS limit in a wide range of *γ*. Geminal 1 is the Slater-type geminal introduced by Ten-no^{16,38} and is widely regarded as one of the best-performing geminals. Geminal 2 is the Gauss-type geminal, which should be distinguished from the Gaussian geminal that does not satisfy the cusp conditions.^{16} Its performance is rather poor as its MC-MP2-F12(*VBX*)/DZ energy falls short of the MP2/TZ energy at any value of *γ*.

Comparing these performance data with Fig. 1, we notice that only geminal 2 has a rather different long-range behavior than the rest. Unlike the other geminals, which rise monotonically, geminal 2 increases and then turns to decrease with *r*_{12}, meaning that its correlation hole may be 2*s*-orbital-like, when the true correlation hole is 1*s*-orbital-like.^{79} It has been shown^{2,18} that the $r122$ component in the Taylor-series expansion of a geminal plays an important role in describing the correlation hole correctly, but among the 6 geminals studied here, only geminals 2 and 5 lack this component (the *VBX* curves of geminal 5 are equally unstable with *γ* as geminal 2, but as will be shown below the relative energies from geminal 5 seem reasonable). These may explain the poor performance of geminal 2.

Next, turning our attention to the nonvariational *V* curves (green), we observe that they strongly depend on *γ* and are usually unbounded. For instance, using geminal 5 with *γ* < 1.2 a.u., the F12 corrections overshoot the correct values in both H_{2}O and CH_{4}, rendering the MC-MP2-F12(*V*)/DZ energies more negative than the MP2/CBS values, a clear violation of Hylleraas’ variational principle. This may or may not become troublesome in chemical applications of MP2-F12, wherein relative energies are often sought; if the errors from the CBS limits have opposite signs let alone greatly different magnitudes across molecules, the errors in relative energies can be amplified, possibly undoing improvements made by F12 in the total correlation energies. In Sec. IV C, we will examine this numerically.

As will be shown in Sec. IV D, the nonvariational *V* formula has the advantage over the variational *VBX* formula of being 2–10 times faster. Can one then use the *V* formula and reliably determine the CBS limits? We have conceived of two ways to do this. The first is to combine two geminals with opposite *γ*-dependences, such that the resulting geminal is nearly independent of *γ*. Geminal 6 is indeed constructed in this way as an average of geminals 1 and 3, whose *V* curves have opposite *γ*-dependences. Consequently, the MC-MP2-F12(*V*) energies of geminal 6 are stable with variation in *γ* and also much closer to the CBS limits than those from geminal 1 or 3 alone at any *γ*. We will examine if this construction improves total correlation energies for a wider array of molecules in Sec. IV B and relative energies in Sec. IV C.

The second method attempts to locate the value of *γ* at which the nonvariational *V* formula yields the CBS limit. It does so by combining the results of two basis sets. At a given basis set, the *VBX* energy is closest to the CBS limit at *γ*_{min} that minimizes it,

Note the approximate equality in the above expression, which is because the left-hand side is a lower bound of the right-hand side. At the minimum, however, the right-hand side should be nearly equal to the *V* energy as per Eq. (13) insofar as Ten-no’s fixed amplitudes^{39} approximate well the true variationally optimized geminal amplitudes,

This seems, in fact, borne out in most of the plots of Figs. 2 and 3; the intersection of the *VBX* and *V* curves of the same basis set occurs at the minimum of the former (except for the curves of geminal 2). This once again attests to the excellent transferability of Ten-no’s fixed amplitudes across all geminals that have a physically reasonable functional form.

If we now assume the equality in Eq. (90) (instead of the approximate equality), we have

suggesting that *γ*_{min} is at the intersection of the *V* curves of two different basis sets, at which the energy is the CBS limit. Applying this method to the linearly interpolated *V* curves of geminal 1, 3, and 4, we recover, respectively, 100.2%, 99.7%, and 99.2% of the CBS limit of H_{2}O. They are closer to 100% than 94.8% or 98.0% recuperated, respectively, by the *VBX* method with the DZ and TZ basis sets. Likewise, this intersection method recovers 100.4%, 100.1%, and 100.1% of the CBS limit for CH_{4} using geminals 1, 3, and 4, respectively. Notice how closely the two green curves (of geminals 1, 3, and 4) and the MP2/CBS line (red) meet at one and the same point for both H_{2}O and CH_{4}. Although it is at present unclear whether the efficacy of the intersection method occurs universally across many molecules, we have a potentially practical way of capturing nearly 100% of the CBS limits inexpensively using only the nonvariational *V* formula.

### B. Correlation energies

The F12 corrections and statistical uncertainties were calculated for 17 molecules at the geometries of Bak *et al.*^{85} by MC-MP2-F12 with the ADZ and ATZ basis sets using the *V* or *VBX* formula. The MP2 energies were also computed for the same molecules with the AXZ basis set (X = D, T, Q, and 5) using nwchem.^{83}

Table II compiles the MP2 correlation energies in the increasing order of their magnitude. The MC-MP2-F12 calculations used the Slater-type geminal with *γ* = 1.1 a.u. With the ADZ basis set, the F12 correction brings an average error from the CBS limit from 107.1 m*E*_{h} to 6.6 m*E*_{h} (with the *V* formula) or 6.9 m*E*_{h} (*VBX*). The latter two errors are smaller than the average error of 9.9 m*E*_{h} in MP2/aug-cc-pV5Z. With the ATZ basis set, MP2 has an average error of 42.5 m*E*_{h}, whereas MC-MP2-F12(*V*) has only 3.0 m*E*_{h} and MC-MP2-F12(*VBX*) 3.5 m*E*_{h}. The statistical uncertainties are on the order of a few tenths of 1 m*E*_{h} with the largest being 1.2 m*E*_{h} after 1.44 × 10^{8} MC steps using 40 redundant walkers. They can furthermore be arbitrarily (albeit slowly) compressed by running a longer MC run.

. | MP2 . | MC-MP2-F12(V)^{b}
. | MC-MP2-F12(VBX)^{b}
. | ||||||
---|---|---|---|---|---|---|---|---|---|

Molecule^{a}
. | ADZ . | ATZ . | AQZ . | A5Z . | CBS^{c}
. | ADZ . | ATZ . | ADZ . | ATZ . |

H_{2} | −0.0273 | −0.0320 | −0.0333 | −0.0337 | −0.0343 | −0.0336(0) | −0.0341(0) | −0.0342(0) | −0.0342(0) |

CH_{2} | −0.1154 | −0.1410 | −0.1493 | −0.1524 | −0.1557 | −0.1508(1) | −0.1554(1) | −0.1535(1) | −0.1547(2) |

CH_{4} | −0.1677 | −0.2008 | −0.2111 | −0.2149 | −0.2188 | −0.2134(1) | −0.2189(2) | −0.2165(1) | −0.2186(4) |

NH_{3} | −0.1992 | −0.2401 | −0.2537 | −0.2588 | −0.2643 | −0.2601(1) | −0.2650(2) | −0.2603(2) | −0.2632(3) |

H_{2}O | −0.2193 | −0.2683 | −0.2859 | −0.2929 | −0.3002 | −0.3004(1) | −0.3023(1) | −0.2954(2) | −0.2979(3) |

C_{2}H_{2} | −0.2223 | −0.2798 | −0.3012 | −0.3100 | −0.3192 | −0.3263(1) | −0.3231(1) | −0.3129(2) | −0.3148(2) |

C_{2}H_{4} | −0.2628 | −0.3137 | −0.3311 | −0.3376 | −0.3445 | −0.3337(1) | −0.3444(2) | −0.3389(3) | −0.3438(5) |

HF | −0.2847 | −0.3399 | −0.3583 | −0.3651 | −0.3722 | −0.3622(2) | −0.3714(5) | −0.3673(4) | −0.3700(9) |

HNC | −0.2824 | −0.3377 | −0.3573 | −0.3650 | −0.3731 | −0.3660(3) | −0.3739(3) | −0.3677(7) | −0.3714(5) |

HCN | −0.2943 | −0.3502 | −0.3699 | −0.3777 | −0.3859 | −0.3774(2) | −0.3866(2) | −0.3789(4) | −0.3837(4) |

CO | −0.2992 | −0.3607 | −0.3837 | −0.3932 | −0.4031 | −0.4012(3) | −0.4056(2) | −0.3970(5) | −0.4001(5) |

N_{2} | −0.3173 | −0.3796 | −0.4018 | −0.4109 | −0.4203 | −0.4147(2) | −0.4218(2) | −0.4127(5) | −0.4170(4) |

CH_{2}O | −0.3327 | −0.4021 | −0.4271 | −0.4370 | −0.4474 | −0.4439(3) | −0.4492(4) | −0.4405(5) | −0.4436(7) |

HNO | −0.3686 | −0.4455 | −0.4729 | −0.4841 | −0.4957 | −0.4934(3) | −0.4987(3) | −0.4873(5) | −0.4916(7) |

H_{2}O_{2} | −0.4154 | −0.5079 | −0.5411 | −0.5545 | −0.5685 | −0.5700(4) | −0.5727(5) | −0.5601(7) | −0.5637(10) |

F_{2} | −0.4280 | −0.5361 | −0.5758 | −0.5927 | −0.6105 | −0.6253(3) | −0.6193(5) | −0.5978(7) | −0.6023(9) |

CO_{2} | −0.5053 | −0.6117 | −0.6514 | −0.6676 | −0.6846 | −0.6828(6) | −0.6890(6) | −0.6730(12) | −0.6791(12) |

Error^{d} | 0.1071 | 0.0425 | 0.0193 | 0.0099 | 0.0000 | 0.0066(3) | 0.0030(3) | 0.0069(5) | 0.0035(6) |

. | MP2 . | MC-MP2-F12(V)^{b}
. | MC-MP2-F12(VBX)^{b}
. | ||||||
---|---|---|---|---|---|---|---|---|---|

Molecule^{a}
. | ADZ . | ATZ . | AQZ . | A5Z . | CBS^{c}
. | ADZ . | ATZ . | ADZ . | ATZ . |

H_{2} | −0.0273 | −0.0320 | −0.0333 | −0.0337 | −0.0343 | −0.0336(0) | −0.0341(0) | −0.0342(0) | −0.0342(0) |

CH_{2} | −0.1154 | −0.1410 | −0.1493 | −0.1524 | −0.1557 | −0.1508(1) | −0.1554(1) | −0.1535(1) | −0.1547(2) |

CH_{4} | −0.1677 | −0.2008 | −0.2111 | −0.2149 | −0.2188 | −0.2134(1) | −0.2189(2) | −0.2165(1) | −0.2186(4) |

NH_{3} | −0.1992 | −0.2401 | −0.2537 | −0.2588 | −0.2643 | −0.2601(1) | −0.2650(2) | −0.2603(2) | −0.2632(3) |

H_{2}O | −0.2193 | −0.2683 | −0.2859 | −0.2929 | −0.3002 | −0.3004(1) | −0.3023(1) | −0.2954(2) | −0.2979(3) |

C_{2}H_{2} | −0.2223 | −0.2798 | −0.3012 | −0.3100 | −0.3192 | −0.3263(1) | −0.3231(1) | −0.3129(2) | −0.3148(2) |

C_{2}H_{4} | −0.2628 | −0.3137 | −0.3311 | −0.3376 | −0.3445 | −0.3337(1) | −0.3444(2) | −0.3389(3) | −0.3438(5) |

HF | −0.2847 | −0.3399 | −0.3583 | −0.3651 | −0.3722 | −0.3622(2) | −0.3714(5) | −0.3673(4) | −0.3700(9) |

HNC | −0.2824 | −0.3377 | −0.3573 | −0.3650 | −0.3731 | −0.3660(3) | −0.3739(3) | −0.3677(7) | −0.3714(5) |

HCN | −0.2943 | −0.3502 | −0.3699 | −0.3777 | −0.3859 | −0.3774(2) | −0.3866(2) | −0.3789(4) | −0.3837(4) |

CO | −0.2992 | −0.3607 | −0.3837 | −0.3932 | −0.4031 | −0.4012(3) | −0.4056(2) | −0.3970(5) | −0.4001(5) |

N_{2} | −0.3173 | −0.3796 | −0.4018 | −0.4109 | −0.4203 | −0.4147(2) | −0.4218(2) | −0.4127(5) | −0.4170(4) |

CH_{2}O | −0.3327 | −0.4021 | −0.4271 | −0.4370 | −0.4474 | −0.4439(3) | −0.4492(4) | −0.4405(5) | −0.4436(7) |

HNO | −0.3686 | −0.4455 | −0.4729 | −0.4841 | −0.4957 | −0.4934(3) | −0.4987(3) | −0.4873(5) | −0.4916(7) |

H_{2}O_{2} | −0.4154 | −0.5079 | −0.5411 | −0.5545 | −0.5685 | −0.5700(4) | −0.5727(5) | −0.5601(7) | −0.5637(10) |

F_{2} | −0.4280 | −0.5361 | −0.5758 | −0.5927 | −0.6105 | −0.6253(3) | −0.6193(5) | −0.5978(7) | −0.6023(9) |

CO_{2} | −0.5053 | −0.6117 | −0.6514 | −0.6676 | −0.6846 | −0.6828(6) | −0.6890(6) | −0.6730(12) | −0.6791(12) |

Error^{d} | 0.1071 | 0.0425 | 0.0193 | 0.0099 | 0.0000 | 0.0066(3) | 0.0030(3) | 0.0069(5) | 0.0035(6) |

^{a}

Geometries were taken from Ref. 85.

^{b}

Only the F12 corrections were calculated by the MC-MP2-F12 method using either the nonvariational *V* formula or the variational *VBX* formula and the Slater-type geminal (geminal 1) with *γ* = 1.1 a.u. The number of MC steps was 1.44 × 10^{8} and the number of the redundant walkers was 40.

^{c}

The *X*^{−3} extrapolation using the MP2/AQZ and MP2/A5Z correlation energies.

^{d}

The standard deviation from the MP2/CBS values (the standard deviation of the statistical uncertainties in parentheses).

Figure 4 plots the proportion of the CBS limits recovered by MP2 and MC-MP2-F12(*VBX*). MP2 recovers on average 74.6%, 90.2%, 95.6%, and 97.7% of the CBS limits with the ADZ, ATZ, AQZ, and A5Z basis sets, respectively. MC-MP2-F12(*VBX*)/ADZ captures 98.5% of the CBS limits, which is already greater than that of MP2/A5Z. The ratio goes up further to 99.3% with MC-MP2-F12(*VBX*)/ATZ.

Therefore, MC-MP2-F12 works exceedingly well for the total correlation energies, with either the *V* or *VBX* formula. Comparing *V* and *VBX*, the former performs slightly better than the latter. This is not surprising because errors from a variational formula are always positive and cannot be accidentally small, while the results from a nonvariational formula can scatter in both higher or lower sides of the correct values and can accidentally agree more accurately with the latter. As shown in Sec. IV C, *VBX* becomes more accurate than *V* for the relative energies because systematic cancellation of positive errors occurs in the former (but not in the latter) when energy differences are taken.

In Table III, we revisit the question of the relative performance of geminals for total correlation energies. With the ADZ basis set, Ten-no’s Slater-type geminal (geminal 1) performs the best by a large margin especially in the *V* formula with an average error of 6.6 m*E*_{h}. The hybrid geminal (geminal 6), designed to be insensitive to *γ* in the *V* formula, comes next in performance with an average error of 12.8 m*E*_{h}. Geminal 2 is particularly poorly performing. Switching from the nonvariational (*V*) to variational (*VBX*) formula, the average errors of some geminals decrease, while those of the others increase, reflecting the fact that the values of *γ* are not optimal for either formula. With the ATZ basis set, the hybrid geminal (geminal 6) becomes the most accurate in both the *V* and *VBX* formulas, perhaps supporting its design.

. | $f12(1)(\gamma =1.1a.u.)$^{a}
. | $f12(2)(\gamma =1.2a.u.)$^{a}
. | $f12(3)(\gamma =1.2a.u.)$^{a}
. | $f12(4)(\gamma =2.0a.u.)$^{a}
. | $f12(5)(\gamma =2.0a.u.)$^{a}
. | $f12(6)(\gamma =1.1a.u.)$^{a}
. | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

Basis set . | V
. | VBX
. | V
. | VBX
. | V
. | VBX
. | V
. | VBX
. | V
. | VBX
. | V
. | VBX
. |

ADZ^{b} | 6.6 (0.3) | 6.9 (0.5) | 20.2 (0.1) | 63.4 (0.2) | 17.6 (0.3) | 8.7 (0.7) | 17.6 (0.3) | 12.3 (0.7) | 21.8 (0.2) | 12.1 (0.5) | 12.8 (0.3) | 7.2 (0.6) |

ATZ^{b} | 3.0 (0.3) | 3.5 (0.6) | 17.0 (0.2) | 21.9 (0.3) | 4.2 (0.3) | 4.0 (0.7) | 5.2 (0.4) | 4.6 (0.7) | 3.0 (0.3) | 3.2 (0.5) | 2.1 (0.3) | 3.2 (0.6) |

. | $f12(1)(\gamma =1.1a.u.)$^{a}
. | $f12(2)(\gamma =1.2a.u.)$^{a}
. | $f12(3)(\gamma =1.2a.u.)$^{a}
. | $f12(4)(\gamma =2.0a.u.)$^{a}
. | $f12(5)(\gamma =2.0a.u.)$^{a}
. | $f12(6)(\gamma =1.1a.u.)$^{a}
. | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

Basis set . | V
. | VBX
. | V
. | VBX
. | V
. | VBX
. | V
. | VBX
. | V
. | VBX
. | V
. | VBX
. |

ADZ^{b} | 6.6 (0.3) | 6.9 (0.5) | 20.2 (0.1) | 63.4 (0.2) | 17.6 (0.3) | 8.7 (0.7) | 17.6 (0.3) | 12.3 (0.7) | 21.8 (0.2) | 12.1 (0.5) | 12.8 (0.3) | 7.2 (0.6) |

ATZ^{b} | 3.0 (0.3) | 3.5 (0.6) | 17.0 (0.2) | 21.9 (0.3) | 4.2 (0.3) | 4.0 (0.7) | 5.2 (0.4) | 4.6 (0.7) | 3.0 (0.3) | 3.2 (0.5) | 2.1 (0.3) | 3.2 (0.6) |

^{a}

Only the F12 corrections were calculated by the MC-MP2-F12 method using either the nonvariational *V* formula or the variational *VBX* formula. The number of MC steps was 1.44 × 10^{8} and the number of the redundant walkers was 40.

^{b}

The standard deviation from the MP2/CBS values (the standard deviation of the statistical uncertainties in parentheses).

### C. Reaction energies

Energies of 12 gas-phase reactions listed in Table IV were calculated by MP2, MP2-F12, and MC-MP2-F12 using geminal 1 (the Slater-type geminal) with *γ* = 1.1 a.u. The results are compiled in Table V. They were computed as the difference in the MP2 or MP2-F12 correlation energy between reactants and products, whose geometries were taken from the work of Bak *et al.*^{85} The MP2 part of the energy was obtained with the conventional deterministic algorithms in nwchem,^{83} whereas the F12 part was computed either by MC-MP2-F12(*V*), MC-MP2-F12(*VBX*), or deterministic MP2-F12(*VBX*), the latter implemented in mpqc,^{17,74,94,95} which uses neither the GBC nor EBC.

Reaction . | Formula^{a}
. |
---|---|

1 | CO + H_{2} → CH_{2}O |

2 | N_{2} + 3H_{2} → 2NH_{3} |

3 | C_{2}H_{2} + H_{2} → C_{2}H_{4} |

4 | CO_{2} + 4H_{2} → CH_{4} + 2H_{2}O |

5 | CH_{2}O + 2H_{2} → CH_{4} + H_{2}O |

6 | CO + 3H_{2} → CH_{4} + H_{2}O |

7 | HCN + 3H_{2} → CH_{4} + NH_{3} |

8 | HNO + 2H_{2} → NH_{3} + H_{2}O |

9 | C_{2}H_{2} + 3H_{2} → 2CH_{4} |

10 | CH_{2} + H_{2} → CH_{4} |

11 | F_{2} + H_{2} → 2HF |

12 | 2CH_{2} → C_{2}H_{4} |

Reaction . | Formula^{a}
. |
---|---|

1 | CO + H_{2} → CH_{2}O |

2 | N_{2} + 3H_{2} → 2NH_{3} |

3 | C_{2}H_{2} + H_{2} → C_{2}H_{4} |

4 | CO_{2} + 4H_{2} → CH_{4} + 2H_{2}O |

5 | CH_{2}O + 2H_{2} → CH_{4} + H_{2}O |

6 | CO + 3H_{2} → CH_{4} + H_{2}O |

7 | HCN + 3H_{2} → CH_{4} + NH_{3} |

8 | HNO + 2H_{2} → NH_{3} + H_{2}O |

9 | C_{2}H_{2} + 3H_{2} → 2CH_{4} |

10 | CH_{2} + H_{2} → CH_{4} |

11 | F_{2} + H_{2} → 2HF |

12 | 2CH_{2} → C_{2}H_{4} |

^{a}

Geometries were taken from Ref. 85.

. | ADZ . | ATZ . | CBS^{b}
. | |||||
---|---|---|---|---|---|---|---|---|

Reaction^{a}
. | MP2 . | F12(VBX)^{c}
. | MC-F12(V)^{d}
. | MC-F12(VBX)^{d}
. | MP2 . | MC-F12(V)^{d}
. | MC-F12(VBX)^{d}
. | MP2 . |

1 | −16.1 | −24.6 | −23.8 (1.0) | −24.2 (1.9) | −24.9 | −24.9 (1.2) | −24.5 (2.3) | −26.4 |

2 | 1.8 | −13.3 | −12.1 (0.8) | −13.5 (1.5) | −12.2 | −15.3 (1.0) | −18.2 (1.9) | −14.2 |

3 | 14.2 | 16.2 | 13.6 (0.7) | 15.4 (1.3) | 15.4 | 18.7 (1.4) | 20.8 (2.6) | 17.2 |

4 | 21.2 | 5.2 | 8.4 (1.7) | 7.3 (3.3) | 5.8 | 5.3 (1.9) | 3.5 (3.6) | 6.1 |

5 | 0.5 | −8.4 | −6.8 (0.8) | −7.5 (1.6) | −7.9 | −9.8 (1.2) | −11.9 (2.3) | −8.4 |

6 | −15.7 | −33.1 | −30.6 (0.7) | −31.7 (1.4) | −32.8 | −34.7 (0.9) | −36.5 (1.7) | −34.8 |

7 | 24.1 | 13.7 | 12.6 (0.6) | 12.7 (1.1) | 13.9 | 13.1 (0.9) | 11.4 (1.7) | 14.5 |

8 | 12.1 | −0.1 | 0.5 (0.8) | 0.3 (1.5) | 2.9 | −0.9 (1.1) | −3.0 (2.0) | −0.7 |

9 | 24.1 | 23.9 | 20.4 (0.6) | 22.8 (1.1) | 21.3 | 23.3 (1.2) | 23.5 (2.3) | 25.2 |

10 | −65.8 | −75.4 | −76.2 (0.3) | −75.4 (0.5) | −73.0 | −77.2 (0.6) | −78.2 (1.1) | −76.0 |

11 | 28.3 | 17.4 | 16.4 (1.0) | 16.1 (2.0) | 22.2 | 18.9 (1.4) | 18.3 (2.7) | 16.7 |

12 | −141.6 | −158.4 | −159.2 (0.6) | −158.3 (1.3) | −151.9 | −159.1 (1.3) | −159.1 (2.5) | −160.0 |

Error^{e} | 11.9 | 0.0 | 1.9 (0.9) | 1.0 (1.7) | 2.7 | 1.4 (1.2) | 2.8 (2.3) | 1.1 |

Error^{f} | 12.5 | 1.1 | 2.6 (0.9) | 1.7 (1.7) | 3.5 | 1.3 (1.2) | 2.6 (2.3) | 0.0 |

. | ADZ . | ATZ . | CBS^{b}
. | |||||
---|---|---|---|---|---|---|---|---|

Reaction^{a}
. | MP2 . | F12(VBX)^{c}
. | MC-F12(V)^{d}
. | MC-F12(VBX)^{d}
. | MP2 . | MC-F12(V)^{d}
. | MC-F12(VBX)^{d}
. | MP2 . |

1 | −16.1 | −24.6 | −23.8 (1.0) | −24.2 (1.9) | −24.9 | −24.9 (1.2) | −24.5 (2.3) | −26.4 |

2 | 1.8 | −13.3 | −12.1 (0.8) | −13.5 (1.5) | −12.2 | −15.3 (1.0) | −18.2 (1.9) | −14.2 |

3 | 14.2 | 16.2 | 13.6 (0.7) | 15.4 (1.3) | 15.4 | 18.7 (1.4) | 20.8 (2.6) | 17.2 |

4 | 21.2 | 5.2 | 8.4 (1.7) | 7.3 (3.3) | 5.8 | 5.3 (1.9) | 3.5 (3.6) | 6.1 |

5 | 0.5 | −8.4 | −6.8 (0.8) | −7.5 (1.6) | −7.9 | −9.8 (1.2) | −11.9 (2.3) | −8.4 |

6 | −15.7 | −33.1 | −30.6 (0.7) | −31.7 (1.4) | −32.8 | −34.7 (0.9) | −36.5 (1.7) | −34.8 |

7 | 24.1 | 13.7 | 12.6 (0.6) | 12.7 (1.1) | 13.9 | 13.1 (0.9) | 11.4 (1.7) | 14.5 |

8 | 12.1 | −0.1 | 0.5 (0.8) | 0.3 (1.5) | 2.9 | −0.9 (1.1) | −3.0 (2.0) | −0.7 |

9 | 24.1 | 23.9 | 20.4 (0.6) | 22.8 (1.1) | 21.3 | 23.3 (1.2) | 23.5 (2.3) | 25.2 |

10 | −65.8 | −75.4 | −76.2 (0.3) | −75.4 (0.5) | −73.0 | −77.2 (0.6) | −78.2 (1.1) | −76.0 |

11 | 28.3 | 17.4 | 16.4 (1.0) | 16.1 (2.0) | 22.2 | 18.9 (1.4) | 18.3 (2.7) | 16.7 |

12 | −141.6 | −158.4 | −159.2 (0.6) | −158.3 (1.3) | −151.9 | −159.1 (1.3) | −159.1 (2.5) | −160.0 |

Error^{e} | 11.9 | 0.0 | 1.9 (0.9) | 1.0 (1.7) | 2.7 | 1.4 (1.2) | 2.8 (2.3) | 1.1 |

Error^{f} | 12.5 | 1.1 | 2.6 (0.9) | 1.7 (1.7) | 3.5 | 1.3 (1.2) | 2.6 (2.3) | 0.0 |

^{a}

See Table IV.

^{b}

See footnote c of Table II.

^{c}

The deterministic MP2-F12 calculation using the variational *VBX* formula and the Slater-type geminal (geminal 1) with *γ* = 1.1 a.u. Unlike MC-MP2-F12, neither the GBC nor EBC was assumed.

^{d}

MC-MP2-F12 calculations. See footnote b of Table II.

^{e}

The standard deviation from the deterministic MP2-F12(*VBX*)/ADZ values (the standard deviation of the statistical uncertainties in parentheses).

^{f}

The standard deviation from the MP2/CBS values (the standard deviation of the statistical uncertainties in parentheses).

First, focusing on the ADZ results, we confirm that the correlation contributions to the reaction energies of MP2 suffer from excessively large errors from the CBS limits, which are, on average, 12.5 kJ mol^{−1}. They are so large that even the sign is incorrectly predicted (as compared with the CBS limits, if not with the reaction energies) for reactions 2, 5, and 8. Once the F12 corrections from MC-MP2-F12 are added, be they based on the *V* or *VBX* formula, the correlation contributions to the reaction energies are, on average, within 2.6 kJ mol^{−1} of the CBS limits. The statistical uncertainties are no more than 1.7 kJ mol^{−1} after 1.44 × 10^{8} MC steps and are comparable to the intrinsic errors in the F12 method itself, but can be arbitrarily reduced by running longer MC integrations. Therefore, MC-MP2-F12 with either the *V* or *VBX* formula does work well in practice for relative energies, consistently achieving the chemical accuracy from the CBS limits.

We also find that MC-MP2-F12(*VBX*) gives the results that are systematically closer [than MC-MP2-F12(*V*)] to deterministic MP2-F12(*VBX*) for all reactions barring one. Therefore, we argue that the MC-MP2-F12 method does not have a bias^{82} and is convergent at the correct limit, i.e., the result of the deterministic version of the corresponding method. The statistical uncertainties are also reasonable, but somewhat overestimated as the *VBX* results are within 1*σ* (statistical uncertainty) from the deterministic values. The *V* results are often outside 3*σ* from the deterministic results, but this does not mean an underestimation of *σ*; it simply means that MC-MP2-F12(*V*) and MP2-F12(*VBX*) are two different methods with slightly different correct results. Note that deterministic MP2-F12(*VBX*) used here does not invoke either the GBC or EBC, unlike MC-MP2-F12(*VBX*), which assumes both. However, we numerically confirmed (not shown) that the differences in the F12 corrections caused by these approximations are much smaller than the typical statistical uncertainties and negligible, except when *γ* is too small (*γ* ≤ 0.4 a.u. in the case of the Slater-type geminal).

Furthermore, the errors from the CBS limits are distinctly smaller in MC-MP2-F12(*VBX*) than in MC-MP2-F12(*V*) with the ADZ basis set. This may well be because the intrinsic errors in the F12 method, which are always positive with the variational *VBX* formula, cancel between reactants and products in MC-MP2-F12(*VBX*). However, the statistical uncertainties are nearly twice as large in the *VBX* results as in the *V* results, making it difficult to draw a definitive conclusion. The greater statistical uncertainties in the results of the *VBX* formula are due to the larger number and higher dimension of its integrals and are expected. Nonetheless, the difference in the overall accuracy between *V* and *VBX* is small, and given its much smaller computational cost (see Sec. IV D) and somewhat smaller statistical uncertainty, the nonvariational *V* formula may be preferred in most applications.

Turning to the results of the ATZ basis set, we find the benefit of the F12 method to diminish rather rapidly with a basis-set extension especially in the MC-MP2-F12 implementation. The average error from the CBS limits is compressed significantly from 12.5 to 3.5 kJ mol^{−1} going from MP2/ADZ to MP2/ATZ. In the meantime, the same quantity of MC-MP2-F12(*V*) decreases from 2.6 to 1.3 kJ mol^{−1}, while the average statistical uncertainty increases from 0.9 to 1.2 kJ mol^{−1}. With MC-MP-F12(*VBX*), the average error from the CBS limits *increases* from 1.7 kJ mol^{−1} (ADZ) to 2.6 kJ mol^{−1} (ATZ), probably owing to the increased statistical uncertainty from 1.7 to 2.3 kJ mol^{−1}. However, we do not consider this to be particularly troubling because it is with a small basis set that the F12 method is most needed. That MC-MP2-F12 is relatively more effective and efficient (in the sense of giving smaller statistical uncertainties) with ADZ than with ATZ which should indeed be considered as a practical advantage.

Table VI summarizes the geminal-dependence of the reaction energies calculated by MC-MP2-F12/ADZ with the *V* or *VBX* formula. Figure 5 plots the same using the *VBX* data and the values from the deterministic counterparts. The statistical uncertainties are comparable across different geminals, but the average errors from the CBS limits vary more greatly and are a better gauge of the geminals’ performance.

. | $f12(1)\u2009(\gamma =1.1a.u.)$^{b}
. | $f12(2)\u2009(\gamma =1.2a.u.)$^{b}
. | $f12(3)\u2009(\gamma =1.2a.u.)$^{b}
. | $f12(4)\u2009(\gamma =2.0a.u.)$^{b}
. | $f12(5)\u2009(\gamma =2.0a.u.)$^{b}
. | $f12(6)\u2009(\gamma =1.1a.u.)$^{b}
. | . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Reaction^{a}
. | V
. | VBX
. | V
. | VBX
. | V
. | VBX
. | V
. | VBX
. | V
. | VBX
. | V
. | VBX
. | CBS^{c}
. |

1 | −23.8 | −24.2 | −25.0 | −22.1 | −23.6 | −26.7 | −22.8 | −26.2 | −23.3 | −25.6 | −23.7 | −25.7 | −26.4 |

2 | −12.1 | −13.5 | −23.4 | −19.8 | −9.6 | −15.1 | −6.5 | −12.7 | −11.7 | −13.6 | −9.4 | −11.7 | −14.2 |

3 | 13.6 | 15.4 | 4.9 | 6.8 | 15.0 | 16.6 | 16.5 | 18.5 | 12.0 | 13.2 | 13.5 | 14.4 | 17.2 |

4 | 8.4 | 7.3 | 2.6 | −0.1 | 13.7 | 14.5 | 17.4 | 15.3 | 9.7 | 7.4 | 11.8 | 10.7 | 6.1 |

5 | −6.8 | −7.5 | −11.1 | −10.9 | −3.5 | −3.7 | −4.9 | −9.6 | −5.2 | −5.2 | −5.4 | −6.0 | −8.4 |

6 | −30.6 | −31.7 | −36.1 | −33.0 | −27.1 | −30.3 | −27.7 | −35.8 | −28.5 | −30.8 | −29.1 | −31.7 | −34.8 |

7 | 12.6 | 12.7 | 3.0 | 8.1 | 16.8 | 16.0 | 17.8 | 14.7 | 13.8 | 13.9 | 14.3 | 13.1 | 14.5 |

8 | 0.5 | 0.3 | −13.8 | −13.2 | 4.6 | 3.7 | 5.6 | 0.9 | 1.2 | 2.1 | 1.6 | −0.2 | −0.7 |

9 | 20.4 | 22.8 | 5.4 | 7.8 | 22.4 | 23.6 | 23.1 | 20.6 | 18.6 | 20.2 | 21.6 | 23.3 | 25.2 |

10 | −76.2 | −75.4 | −81.3 | −75.6 | −74.3 | −75.7 | −73.1 | −74.9 | −75.2 | −76.1 | −75.2 | −76.0 | −76.0 |

11 | 16.4 | 16.1 | −3.8 | −1.0 | 19.9 | 17.9 | 19.8 | 12.4 | 14.1 | 15.0 | 18.1 | 17.4 | 16.7 |

12 | −159.2 | −158.3 | −163.1 | −152.1 | −156.1 | −158.4 | −152.8 | −152.0 | −156.9 | −159.3 | −158.5 | −160.9 | −160.0 |

Error^{d} | 2.6 | 1.7 | 10.9 | 9.5 | 4.5 | 3.4 | 5.7 | 4.1 | 3.8 | 2.6 | 3.4 | 2.2 | 0.0 |

. | $f12(1)\u2009(\gamma =1.1a.u.)$^{b}
. | $f12(2)\u2009(\gamma =1.2a.u.)$^{b}
. | $f12(3)\u2009(\gamma =1.2a.u.)$^{b}
. | $f12(4)\u2009(\gamma =2.0a.u.)$^{b}
. | $f12(5)\u2009(\gamma =2.0a.u.)$^{b}
. | $f12(6)\u2009(\gamma =1.1a.u.)$^{b}
. | . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Reaction^{a}
. | V
. | VBX
. | V
. | VBX
. | V
. | VBX
. | V
. | VBX
. | V
. | VBX
. | V
. | VBX
. | CBS^{c}
. |

1 | −23.8 | −24.2 | −25.0 | −22.1 | −23.6 | −26.7 | −22.8 | −26.2 | −23.3 | −25.6 | −23.7 | −25.7 | −26.4 |

2 | −12.1 | −13.5 | −23.4 | −19.8 | −9.6 | −15.1 | −6.5 | −12.7 | −11.7 | −13.6 | −9.4 | −11.7 | −14.2 |

3 | 13.6 | 15.4 | 4.9 | 6.8 | 15.0 | 16.6 | 16.5 | 18.5 | 12.0 | 13.2 | 13.5 | 14.4 | 17.2 |

4 | 8.4 | 7.3 | 2.6 | −0.1 | 13.7 | 14.5 | 17.4 | 15.3 | 9.7 | 7.4 | 11.8 | 10.7 | 6.1 |

5 | −6.8 | −7.5 | −11.1 | −10.9 | −3.5 | −3.7 | −4.9 | −9.6 | −5.2 | −5.2 | −5.4 | −6.0 | −8.4 |

6 | −30.6 | −31.7 | −36.1 | −33.0 | −27.1 | −30.3 | −27.7 | −35.8 | −28.5 | −30.8 | −29.1 | −31.7 | −34.8 |

7 | 12.6 | 12.7 | 3.0 | 8.1 | 16.8 | 16.0 | 17.8 | 14.7 | 13.8 | 13.9 | 14.3 | 13.1 | 14.5 |

8 | 0.5 | 0.3 | −13.8 | −13.2 | 4.6 | 3.7 | 5.6 | 0.9 | 1.2 | 2.1 | 1.6 | −0.2 | −0.7 |

9 | 20.4 | 22.8 | 5.4 | 7.8 | 22.4 | 23.6 | 23.1 | 20.6 | 18.6 | 20.2 | 21.6 | 23.3 | 25.2 |

10 | −76.2 | −75.4 | −81.3 | −75.6 | −74.3 | −75.7 | −73.1 | −74.9 | −75.2 | −76.1 | −75.2 | −76.0 | −76.0 |

11 | 16.4 | 16.1 | −3.8 | −1.0 | 19.9 | 17.9 | 19.8 | 12.4 | 14.1 | 15.0 | 18.1 | 17.4 | 16.7 |

12 | −159.2 | −158.3 | −163.1 | −152.1 | −156.1 | −158.4 | −152.8 | −152.0 | −156.9 | −159.3 | −158.5 | −160.9 | −160.0 |

Error^{d} | 2.6 | 1.7 | 10.9 | 9.5 | 4.5 | 3.4 | 5.7 | 4.1 | 3.8 | 2.6 | 3.4 | 2.2 | 0.0 |

The Slater-type geminal (geminal 1) is confirmed to be the one with the smallest average errors, either with the *V* or *VBX* formula, and is, therefore, judged to be the most suitable for the fixed-amplitude MP2-F12 method (at least among the six studied). This may or may not be rationalized by the limiting behavior of the solution of the two-electron equation of motion pointed out by Monkhorst.^{91} However, insofar as the short-range *r*_{12}-dependence is similar to the Slater-type geminal (see Fig. 1), other geminals (barring geminal 2) work almost as effectively. For instance, geminal 3 (the rational geminal) has the average errors that are roughly twice those of geminal 1 (the Slater-type geminal). Geminal 6 (the hybrid geminal) is an average of geminals 1 and 3 and its errors also come in between those of its parent geminals. This geminal may be said to have the advantage that its F12 corrections are nearly independent of *γ* (see Figs. 2 and 3). If an optimal value of *γ* is unknown, one may use either the *VBX* formula or geminal 6.

### D. System-size scaling

Here, we determine the asymptotic functional dependence (scaling) of the cost of MC-MP2-F12 on the number of basis functions (*n*), i.e., the system’s spatial size.

Figure 6 plots the cost scaling per MC step. To understand the observation, we briefly review the algorithm first. In each MC step, an electron-pair walker and *m* one-electron walkers need to be propagated by the Metropolis algorithm. The cost scaling of this step is *O*(*mn*) because a processor must evaluate *O*(*n*) weight functions at *O*(*m*) coordinates. Next, *n* AO amplitudes at *m* accepted walker coordinates are computed at an *O*(*mn*) cost. Then, these AO amplitudes are transformed into MO amplitudes at an *O*(*mn*^{2}) cost because at each of *O*(*m*) walker coordinates, *O*(*n*^{2}) multiplications of MO coefficients and AO amplitudes occur in this step. Thereupon, *O _{pq}* [Eq. (45)],

*V*[Eq. (46)], and $Opq\u2032$ [Eq. (63)] arrays are constructed at an

_{pq}*O*(

*m*

^{2}

*n*) cost, for each array is a sum over

*O*(

*n*) orbitals evaluated at two walker coordinates, whose number grows as

*O*(

*m*

^{2}). Once these arrays are constructed, a processor accumulates the F12 correction. The cost of this step is cheap and independent of

*n*(but is heavily dependent on

*m*).

Figure 6 shows that the wall time required for 128 MC steps of the MC-MP2-F12 calculations using the *V* formula falls accurately on the *n*^{2} line in the large *n* limit. This is because the overall cost per MC step is dominated by the AO-to-MO transformation of orbital amplitudes, whose cost is an *O*(*mn*^{2}) quantity, when the number of redundant walkers (*m*) is as small as 40 in this case. For small *n*, the cost per MC step appeared linear with *n*,^{43} as the *O*(*mn*) AO amplitude calculation step is dominant with a large prefactor on the cost function caused by its many exponential evaluations. For larger *n*, as explored in this work, the *O*(*mn*^{2}) MO amplitude calculation step supersedes the former, despite its small prefactor. When a large value of *m* is used, the cost per MC step reverts back to linear scaling with *n*,^{68} since the *O*(*m*^{2}*n*) step of constructing *O _{pq}*,

*V*, and $Opq\u2032$ becomes the hotspot. The same logic applies to the cost scaling of MC-MP2. Therefore, the cost scaling of MC-MP2-F12 varies with the balance between

_{pq}*n*and

*m*, but is observed to be

*O*(

*n*

^{2}) per step in the present implementation and for a wide range of molecular sizes.

The cost function of the *VBX* formula may not have entered an asymptotic region by *n* = 472, but its functional form is almost certain to be *O*(*n*^{2}) because a *VBX* calculation includes all steps of the *V* calculation. For a small molecule, a *VBX* calculation is nearly an order of magnitude more expensive than a *V* calculation. The gap becomes smaller and is only a factor of 2.3 for the largest molecule in the plot with *n* = 472. Hence, in large molecules, the cost advantage of the *V* formula may be considered insignificant, rendering the *VBX* formula preferred for its variational stability; recall that MC-MP2 and MC-MP2-F12 are designed for large molecules as they are not competitive with their deterministic counterparts for smaller ones.

The scaling of cost per MC step is not a useful measure of a stochastic method’s efficiency. A more useful measure is the scaling of cost to reach a given accuracy. Figure 7 testifies that the relative statistical uncertainty (*σ*_{rel}) [as defined by Eq. (89) with *I _{N}* understood to include the MP2 correlation energy] after some MC steps (say,

*N*

_{rel}) is asymptotically proportional to

*n*,

As is well known, the statistical uncertainty in an MC integral tends to fall off accurately in proportion to *N*^{−1/2}, where *N* is the number of MC steps. That this is also the case with the MC-MP2-F12 results has been confirmed with the data presented in Fig. 8 and elsewhere. Therefore, the relative statistical uncertainty (*σ*_{fin}) after *N*_{fin} MC steps is accurately predicted to be

For *σ*_{fin} to become smaller than a given size-independent tolerance, *N*_{fin} has to be an *O*(*n*^{2}) quantity. The total cost is, therefore, *N*_{fin} times the *O*(*n*^{2}) cost per MC step, which is *O*(*n*^{4}). We conclude that the MC-MP2-F12 cost to achieve a given accuracy (as measured by a relative statistical uncertainty) increases as *O*(*n*^{4}). This may be compared with the *O*(*n*^{5}) scaling of deterministic MP2 or MP2-F12 with fixed amplitudes.

### E. Computer-size scaling

Figure 9 records the parallel scalability of MC-MP2-F12 on many CPUs. The speedup was measured as the rate of compression of the wall time spent in the last 25 000 MC steps of an MC-MP2-F12(*VBX*)/DZ calculation of the water molecule. The number of redundant walkers, *m*, was 20. The unit of speed was taken as that of an execution on 16 CPU cores or one XE node of Blue Waters at National Center for Supercomputing Applications of University of Illinois.

The MC-MP2-F12 algorithm is naturally parallel by design, in which each processor is tasked with its own Metropolis propagation of walkers and accumulation of the F12 correction with no mandatory interprocessor communications. All processors occasionally (every 128 MC steps) report their snapshot values of the F12 correction and statistical uncertainty to the master process by mpi_reduce, just to periodically protect the most current calculation results. As a result, it achieves scalability of 99.9% going from 16 CPU cores to 4096 CPU cores or a 255.8-fold speedup for the tiniest problem of the water molecule in the DZ basis set, attesting to its strong scaling. For a larger calculation for tetrahydrocannabinol in the DZ basis set (*n* = 472) shown in Fig. 8, a speedup by a factor of 7.84 (98% scalability) is observed from 512 to 4096 CPU cores (using the *VBX* formula). Furthermore, these calculations are indefinitely restartable in the sense that the F12 corrections and statistical uncertainties from two separate executions on different computers can be concatenated to produce the result for a longer continuous calculation.

## V. CONCLUSIONS

Dense matrix multiplications, which have dominated electronic-structure algorithms, may be fundamentally nonscalable with both system and computer sizes. One may, therefore, need to redesign new, scalable algorithms when applications are attempted to large molecules, solids, or even liquids, running on a modern supercomputer, which nowadays has up to hundreds of thousands or millions of processors. There are at least two such fundamentally scalable algorithms: local-basis (fragment or divide-and-conquer) algorithms^{96–99} and stochastic (Monte Carlo) algorithms.^{51,64,65}

The local-basis algorithms, introduced to *ab initio* electron-correlation theories by Saebø and Pulay,^{96,97} take advantage of a rapid decay of interparticle interactions in a chemical system^{100,101} and the resulting sparsity of the associated interaction matrices in a spatially local basis set. In the stochastic algorithms, the dimensional sparsity of interaction matrices is exploited by sampling increasingly less in higher dimensions. This is achieved by the Metropolis algorithm^{80} with an appropriate weight function that describes the desired dimension-biased distribution of samples.

In this work, we have fully developed a stochastic algorithm of MP2-F12 using both the variational *VBX* formula and nonvariational *V* formula, the latter on the basis of the pilot implementation reported as Ref. 41. On the basis of numerical experimentation, we have established the *O*(*n*^{4}) scaling of the MC-MP2-F12 method with the number of orbitals *n*, which is one-rank lower than the usual *O*(*n*^{5}) scaling of MP2 or MP2-F12. The former does not need to use the RI approximation with an ABS to lower the dimensions of integrals or to evaluate the resulting long sum-of-products of lower-dimensional integrals, which is not scalable. It does not have to precompute or store any integrals and is free of any significant disk I/O.

Since all integrals (except one-electron integrals available from the HF calculations) are evaluated numerically by the MC method, MC-MP2-F12 can use nearly any mathematical form of geminal. Taking advantage of this, we have quantified the relative performance of the 6 geminals that satisfy the cusp conditions. We have confirmed the overall best performance of the Slater-type geminal. The Gauss-type geminal, which has a noticeably different long-range behavior than the rest, is distinctly poorly performing to the extent that the reaction energies are hardly improved by its F12 correction.

Contrary to our concern that the nonvariational formula may be practically useless for relative energies because its F12 corrections are dependent on the shape of a geminal (*γ*, in particular) in a manner that is hard to predict, we found it to give reliable results at a computational cost that is 2–10 times smaller than the variational formula. Nevertheless, the relative energies from the variational formula are systematically more accurate because the errors caused by the suboptimal nature of the geminal shape cancel between the two total energies. For larger molecules, the cost differential between the two formulas becomes smaller, and it may thus be recommended to use the variational formula whenever possible. The statistical uncertainties in reaction energies are less than 2 kJ mol^{−1} after 1.44 × 10^{8} MC steps with the ADZ basis set, which are roughly half of the systematic errors of the F12 method itself (from the CBS limits) with the same basis set. The former can be made arbitrarily smaller by running longer MC calculations.

The parallel scalability is excellent by virtue of the embarrassingly parallel nature, by design, of the algorithm. Each processor carries out its own MC integrations with no mandatory, frequent, or large interprocessor communications. A speedup by a factor of 255.8 has been achieved upon increasing the number of processors from 16 to 4096, which is 99.9% of the perfect parallel efficiency, which may not be surprising when this method is viewed as a QMC variant. Our algorithm’s speed is likely limited only by the hardware size rather than by the software limitations.

MC-MP2-F12 and its sister methods are *not* intended to be a replacement of its deterministic counterparts and are slower and less precise than the latter for smaller problems. Rather, these two classes of implementations have completely different sets of merits and demerits, making them complementary in their applicabilities. MC-MP2-F12 is designed for grand-challenge applications on high-end massively parallel supercomputers, for which existing deterministic implementations may not even start owing to their more rigid resource requirements, higher cost scaling, and lower parallel scaling. Next, we will examine applicability and performance of MC-MP2-F12 for such large problems.

## Acknowledgments

C.M.J. and S.H. have been supported by the U.S. Department of Energy, Office of Science, Basic Energy Sciences through Scientific Discovery through Advanced Computing (SciDAC) program under Award No. DE-FG02-12ER46875. A.E.D. and S.H. have been supported by the U.S. Department of Energy, Office of Science, Basic Energy Sciences under Award No. DE-FG02-11ER16211. S.H. has been supported by CREST, Japan Science and Technology Agency. J.Z. and E.F.V. acknowledge the support by the U.S. National Science Foundation (Award No. CHE-1362655). It is also part of the Blue Waters sustained-petascale computing project, which is supported by the U.S. National Science Foundation (Award Nos. OCI-0725070 and ACI-1238993) and the state of Illinois. Blue Waters is a joint effort of the University of Illinois at Urbana-Champaign and its National Center for Supercomputing Applications. The authors thank Dr. Soohaeng Y. Willow for providing the computer program used in Ref. 41 and Dr. Seiichiro Ten-no for an illuminating discussion on geminal forms.

### APPENDIX A: DERIVATIONS OF EQS. (61), (62), AND (70)

We show that the terms carrying the numerical factor of 7/32 in Eqs. (61) and (62) arise from the first term of Eq. (48). The other terms in Eqs. (61) and (62) as well as higher-order integrands are derived similarly.

where a change in the order of summations and integrations is made in Eq. (A2), an essential step for bringing the expression into a MC-integrable form.

Next, we show that the first term of Eq. (70) with the numerical factor of 7/16 comes from the first term of Eq. (51).

Substituting the definition of the exchange operator,

into the $K\u02c61$ contribution of the first term of Eq. (51), we obtain

in the last step of which a coordinate interchange of *r*_{2}↔*r*_{3} (additionally *r*_{1}↔*r*_{2} in the last term) is carried out to ensure that the singularity occurs only at *r*_{12} = 0, which is also essential for brining the final expression into a form most convenient for MC integration. Furthermore, using a coordinate interchange of *r*_{1}↔*r*_{2}, we can immediately show that the $K\u02c62$ contribution of the first term of Eq. (51) is

yielding

The other term of Eq. (51) and higher-rank integrands can be derived similarly.

### APPENDIX B: WEIGHT FUNCTIONS

Two types of weight function are used in MC-MP2-F12 calculations. One is the one-electron weight function, *w*_{1e}(** r**), in the form of Eq. (78), and the other is the two-electron weight function,

*w*

_{2e}(

*r*_{1},

*r*_{2}), of Eq. (76). They both are defined in terms of a linear combination,

*g*(

**), of two atom-centered**

*r**s*-type GTOs [Eq. (77)]. For each atom, the two exponents and expansion coefficients in

*g*(

**) are given in Tables VII and VIII. The only important consideration in defining**

*r**g*(

**) is that it should be more diffuse than the most diffuse of the integrands, so that the Metropolis algorithm over-samples rather than under-samples. An incidence of under-sampling can be detected as a vertical jump in the evolution of the statistical uncertainty.**

*r*Atom (A)
. | $cA(1)$ . | $\zeta A(1)$ . | $cA(2)$ . | $\zeta A(2)$ . |
---|---|---|---|---|

H | 0.5 | 0.6 | 0.05 | 0.15 |

C | 1.0 | 1.0 | 0.10 | 0.25 |

N | 2.5 | 1.4 | 0.25 | 0.30 |

O | 3.0 | 1.8 | 0.30 | 0.37 |

F | 4.5 | 1.8 | 0.45 | 0.35 |

Atom (A)
. | $cA(1)$ . | $\zeta A(1)$ . | $cA(2)$ . | $\zeta A(2)$ . |
---|---|---|---|---|

H | 0.5 | 0.6 | 0.05 | 0.15 |

C | 1.0 | 1.0 | 0.10 | 0.25 |

N | 2.5 | 1.4 | 0.25 | 0.30 |

O | 3.0 | 1.8 | 0.30 | 0.37 |

F | 4.5 | 1.8 | 0.45 | 0.35 |

Atom (A)
. | $cA(1)$ . | $\zeta A(1)$ . | $cA(2)$ . | $\zeta A(2)$ . |
---|---|---|---|---|

H | 0.5 | 0.6 | 0.05 | 0.10 |

C | 1.0 | 0.8 | 0.10 | 0.13 |

N | 2.5 | 1.0 | 0.25 | 0.19 |

O | 3.0 | 1.0 | 0.30 | 0.22 |

F | 4.5 | 1.2 | 0.45 | 0.28 |

Atom (A)
. | $cA(1)$ . | $\zeta A(1)$ . | $cA(2)$ . | $\zeta A(2)$ . |
---|---|---|---|---|

H | 0.5 | 0.6 | 0.05 | 0.10 |

C | 1.0 | 0.8 | 0.10 | 0.13 |

N | 2.5 | 1.0 | 0.25 | 0.19 |

O | 3.0 | 1.0 | 0.30 | 0.22 |

F | 4.5 | 1.2 | 0.45 | 0.28 |