A highly scalable stochastic algorithm is proposed and implemented for computing the basis-set-incompleteness correction to the diagonal, frequency-independent self-energy of the second-order many-body Green’s function (GF2) theory within the explicitly correlated (F12) formalism. The 6-, 9-, 12-, and 15-dimensional integrals comprising the F12 correction are directly evaluated by the Monte Carlo method using appropriate weight functions for importance sampling. The method is naturally and easily parallelized, involves minimal memory space and no disk I/O, and can use virtually any mathematical form of a correlation factor. Its computational cost to correct all ionization energies (IEs) is observed to increase as the fourth power of system size, as opposed to the fifth power in the case of the deterministic counterparts. The GF2 calculations and their F12 corrections for the first IEs of C_{60} and C_{70} were executed on 128 graphical processing units (GF2) and 896 central processing units (F12), respectively, to reach the results with statistical errors of 0.04 eV or less. They showed that the basis-set-incompleteness (from aug-cc-pVDZ) accounts for only 50%–60% of the deviations from experiments, suggesting the significance of higher-order perturbation corrections.

## I. INTRODUCTION

One-particle many-body Green’s function (MBGF) theory^{1–8} provides a converging series of perturbation approximations to electron-binding energies; the accurate determination of which is key to understanding many processes in chemistry, biology, and materials science. Green’s function also serves as the propagator in quantum field theory, underlying nearly all quantum many-body physics ranging from condensed matter physics to particle physics. Furthermore, it is the mathematical kernel of quantum transport, scattering, and embedding theories.

Two of the present authors with co-authors^{9} implemented general-order algorithms of the Feynman–Dyson perturbation series of MBGF and quantified its convergence rate and the accuracy of the diagonal and frequency-independent approximations to the self-energy. The study showed that the convergence is surprisingly slow, making higher-order corrections potentially much more important than in many-body perturbation theory (MBPT) for the ground-state correlation energy.

This slow perturbation-order convergence is particularly troublesome in view of the high-rank polynomial size dependence of the costs of these higher-order methods. The problem size is the product of the system’s spatial extent and basis-set size. This issue is, therefore, further compounded by the notoriously slow basis-set convergence^{10–16} of quantities obtained from any *ab initio* electron-correlation theory including MBGF.

The slow basis-set convergence, however, was tackled by one of the present authors with a co-author,^{17} who developed an explicitly correlated (F12) extension of the second-order MBGF (GF2) in the diagonal, frequency-independent approximation to the self-energy (GF2-F12). It allows the complete-basis-set (CBS) results to be obtained with a relatively small basis set on a massively parallel computer. The diagonal and frequency-independent approximations were later lifted in GF2-F12 by Pavešević *et al.*^{18}

In this article, we develop a highly scalable stochastic algorithm of the GF2-F12 method of Ohnishi and Ten-no,^{17} which we call the Monte Carlo GF2-F12 or MC-GF2-F12. It is naturally and easily parallelized and its computational cost increases less steeply with the number of basis functions. Its formalism and algorithm are a straightforward combination of the Monte Carlo second-order Green’s function (MC-GF2) method^{19} and Monte Carlo explicitly correlated second-order many-body perturbation (MC-MP2-F12) method^{20,21} reported earlier. We show that MC-GF2-F12 can compute ionization energies (IEs) near their CBS limits with a relatively small basis set for large conjugated molecules with statistical errors that are comparable or smaller than typical experimental errors. The method also allows the use of virtually any correlation factor (geminal). The details are given in the following.

## II. FORMALISMS

### A. GF2-F12

Following the derivation and notation of Ohnishi and Ten-no,^{17} we write the second-order perturbation approximation of the *x*th occupied-orbital energy, $\u03f5xGF2$, as a sum of three parts,

where $\u03f5xHF$ is the canonical Hartree–Fock (HF) orbital energy and $\Sigma xOR$ and $\Sigma xCD$ are the so-called orbital/pair-relaxation and correlation-difference contributions, respectively, to the second-order self-energy in the diagonal, frequency-independent approximation.^{9} We call this method GF2 in this article. Its contributions are given as

where *i*, *j*, *k*, *l*, *m*, *n*, and *x* denote an occupied spatial orbital in a closed-shell molecule, while *a* and *b* denote a virtual spatial orbital.

As shown numerically by Ohnishi and Ten-no,^{17} $\Sigma xOR$ converges rapidly toward the CBS limit, while the basis-set convergence of $\Sigma xCD$ is rather slow owing to its double virtual summation. Therefore, only $\Sigma xCD$ needs to be corrected for the basis-set incompleteness. Rewriting $\Sigma xCD$ as

with

we notice that second-order many-body perturbation (MP2) energy and its F12 correction,^{22–24} *E*^{MP2} and *E*^{F12}, are also written as

where *d*_{ij} is the correction to the second-order pair energy of the occupied orbital pair *ij*. Hence, the F12-corrected (GF2-F12) value of the GF2 orbital energy, $\u03f5xGF2-F12$, can simply be obtained^{17} as

where $\Sigma xF12$ is the F12 correction to $\Sigma xCD$.

The last equation suggests that, in GF2-F12, we can reuse much of the MP2-F12 formalisms and implementations. The correction, *d*_{xj}, is given by the established formalisms^{20,21,24–27} of MP2-F12 as

with

where the geminal amplitudes, $tjxmn$, are held fixed at the values dictated by the cusp conditions,^{22,28,29} as per Ten-no’s SP ansatz^{27}

Other factors are designated in the standard notations,^{21} whose definitions are written as

where *f*_{12} is the correlation factor or geminal (an explicit function of *r*_{12}), $F^n$ is the Fock operator of electron *n*, and $Q^12$ is the strong-orthogonality projector,^{30–33}

Here, *Ô*_{n} and $V^n$ are the projectors onto the occupied and virtual orbital spaces of electron *n*, respectively.

Utilizing the generalized and extended Brillouin conditions^{21,24,27,34} (which consolidate the *B* and *X* terms into the combined “*BX*” term) and explicitly symmetrizing each summation with respect to an interchange of *x* and *j* (a necessity in its Monte Carlo integration), we arrive at the working equation for the F12 correction

where

with

When the correlation factor, *f*_{12}, is optimized by minimizing this correction, the following identity holds (as in MP2-F12):^{21}

The same identity still holds approximately but accurately by merely optimizing a few parameters (*γ* in the following examples) in the correlation factor insofar as its shape can closely mimic the correlation hole.^{21,27,35} Such correlation factors include the Slater-type geminal (STG),^{26} Yukawa–Coulomb (YC) geminal,^{36,37} and Slater–Jastrow (SJ) factor^{38}

We call the GF2-F12 method based on Eq. (19) the *VBX* approximation, whereas the one using Eq. (23), the *V* approximation. The magnitude of the *VBX* correction serves as an upper bound for the true basis-set-incompleteness correction.

The foregoing derivation was taken from Ohnishi and Ten-no.^{17}

### B. MC-GF2-F12

Here, we illustrate the key derivation steps of the working equations of the Monte Carlo algorithm of GF2-F12 (MC-GF2-F12) (see Ref. 21 for the closely related MC-MP2-F12 formalism). Expanding $Q^12$, i.e., substituting Eq. (18) into Eq. (20), we have

with

where the superscript on Σ indicates the number of electrons involved in the integral (thus its dimension) and *x* denotes the occupied spatial orbital whose energy is being corrected by F12. These are then converted into an MC integrable form as

where

Here, *O*, *V*, and *X* are two-electron, six-dimensional, real-space pair functions given by

and

where *φ*_{p}(*r*_{q}) is the *p*th spatial orbital of electron *q*.

Likewise, Eq. (21) becomes a sum of six terms,

where “*T*” and “*K*” stand for the kinetic and exchange contributions, respectively, of the Fock operator in the *BX* integral, Eq. (22),

Here, $T^n$ and $K^n$ stand for the kinetic and exchange operators, respectively, for electron *n* (the other operators in $F^$ commute with *f*_{12}). Note that the dimensions of the exchange terms are one-electron higher than those of the kinetic terms. This is owing to the presence of a projector in the exchange operator, which, when expanded, increases the integral’s dimension (like $Q^12$). The MC-integrable expression then reads

The two-electron kinetic integrands are given by

the three-electron kinetic integrands by

and the four-electron kinetic integrands by

and

The primed real-space pair functions are defined by

where *r*_{12} = *r*_{1} − *r*_{2}. The kinetic-energy integrands such as of Eq. (42) are divided into two terms, $FxT1,ne$ and $FxT2,ne$ (*n* = 2, 3, 4), with the first having a singularity at *r*_{12} = |*r*_{12}| = 0 and the other not, thus requiring different weight functions in MC integrations (see below). They are defined with the quantities derivable from the correlation factor

The exchange integrands are

## III. MONTE CARLO ALGORITHM

Just like in MC-MP2-F12,^{19–21} the GF2-F12 corrections are evaluated by the Monte Carlo (MC) method^{39} utilizing the redundant-walker algorithm.^{40} In each MC step, the ratio between the integrand and a judiciously chosen weight function is calculated and summed at coordinates distributed randomly but according to the weight function. The weight function must cancel the singularities of the integrand, be always positive, be analytically integrable, and should generally behave like the integrand.^{41,42} The random distribution according to the weight function is attained by the Metropolis algorithm.^{43}

where *N* is the total number of MC steps and *m* is the number of “redundant walkers” (see below for the definition).

There are two different types of walkers used for these integrands. For strongly correlated coordinates, $(r1[n],r2[n])$, the weight function that guides their propagation is given by

with *g*(** r**) being a sum of atom-centered

*s*-type Gaussian-type orbitals,

where *n*_{atom} is the number of atoms, *r*_{A} is the position of the *A*th atom, and *n*_{λ} is the number of Gaussian functions per atom. For uncoupled coordinates, $r3k[n]$ or $r4l[n]$, the weight function is

With this choice of *g*(** r**), normalization coefficients,

*N*

_{2e}and

*N*

_{1e}, are determined analytically.

^{44}Usually two Gaussians per atom (

*n*

_{λ}= 2) suffice; one tight and one diffuse to expand the electron density. In the applications to the IEs of C

_{60}and C

_{70}(see below), a third is added to account for the large size of the salient molecular orbitals.

The reason for this separation of the walker coordinates (between *r*_{1}, *r*_{2} and *r*_{3} and *r*_{4}) can be understood by inspecting the structure of the $F3eV$ [Eq. (35)] and $F4eV$ [Eq. (36)] integrands. The coordinates of electrons 1 and 2 are strongly coupled through the singular $r12\u22121$ factor in these integrands. This factor should, therefore, be canceled by the same factor in the weight function. Electrons 3 and 4 have no such correlation and the corresponding weight function can be an independent function of each electron’s coordinate.

In the redundant-walker algorithm,^{20,40} *m* “redundant” one-electron walkers are propagated according to the weight function, $w1e$(** r**), at a cost that is

*m*times that of propagating the minimal two one-electron walkers in integrating $\Sigma xV,4e$ [Eq. (62)]. With

*m*walkers instead of two, there are

*m*(

*m*− 1)/2 distinct ways of choosing the two walker coordinates that can be substituted in the integrand of Eq. (62). This increases the sampling number by a factor of

*m*(

*m*− 1)/2 at an

*m*-fold increase in the sampling cost, a net increase in the sampling efficiency by (

*m*− 1)/2. However, Eqs. (60) and (61) are unaffected by the algorithm, making the overall efficiency boost difficult to predict.

Similarly, the *T* integrals [Eqs. (42)–(44)] are evaluated as

and

of which all factors are already defined.

The *K* integrals [Eqs. (45)–(47)] are computed as

and

## IV. RESULTS AND DISCUSSION

MC-GF2 (Refs. 19 and 45) and MC-GF2-F12 calculations were performed for the first (and occasionally the second) IEs of the molecules listed in Fig. 1 using the aug-cc-pVDZ basis set in the frozen core approximation. The structures of molecules **A**–**O** were taken from Ref. 17, whereas those of molecules **P** and **Q** came from Ref. 46. The MC-GF2 and F12 calculations were executed simultaneously; for every 8 processors, one was a graphical processing unit (GPU) running MC-GF2 and the other 7 were central processing units (CPUs) performing MC-GF2-F12. For molecules **A**–**O**, 64 GPUs (GF2) and 448 CPUs (F12) of the Blue Waters supercomputer were employed; for molecules **P** and **Q**, 128 GPUs (GF2) and 896 CPUs (F12) were used. 512 redundant walkers were propagated for MC-GF2, while 40 redundant one- and two-electron walkers were used for MC-GF2-F12. The statistical uncertainty was taken after 7 blocking transformations.^{47}

Table I compares the MC-GF2-F12 results for molecules **A**–**O** in the *V* and *VBX* approximations with the deterministic GF2-F12 results.^{17} The MC results were converged to a statistical error of 0.04 eV or less after the number of MC steps indicated in the table, which may be comparable to a typical experimental error (or “chemical accuracy” of 0.043 eV). The *VBX* data reproduce the CBS limits obtained by the deterministic GF2-F12 method^{17} even though the two methods differ slightly in detail as well as in the *γ* value and basis set used. The mean absolute deviation (MAE) of 0.018 eV between these two methods is, therefore, largely (if not entirely) due to the statistical error (0.017 eV) in the MC-GF2-F12 method. The MAE between the *V* approximation and deterministic GF2-F12 is distinctly greater (0.049 eV) and is likely a bias caused by the nonvariational nature of the F12 correction in the *V* approximation exacerbated by the small aug-cc-pVDZ basis set used [recall that the fidelity of the *V* approximation or the degree to which Eq. (23) is satisfied depends rather sensitively on the mathematical form of the correlation factor and on the value of *γ*]. It can be said that the MC algorithm works correctly with almost no bias, reproducing the deterministic counterparts within a sufficiently small statistical error, as long as the *VBX* formalism and an appropriate value of *γ* are used.

. | . | MC-GF2-F12 . | N/10^{9}
. | ||
---|---|---|---|---|---|

Molecule^{a}
. | GF2-F12^{b}
. | V
. | VBX
. | GF2 . | F12 . |

A | 9.235 | 9.174(01) | 9.217(01) | 0.51 | 0.43 |

B | 8.045 | 7.981(02) | 8.025(03) | 0.41 | 0.33 |

C | 7.289 | 7.231(04) | 7.278(05) | 0.33 | 0.22 |

D | 7.314 | 7.246(05) | 7.291(06) | 0.31 | 0.19 |

E | 7.185 | 7.131(10) | 7.177(13) | 0.21 | 0.08 |

F | 6.563 | 6.496(21) | 6.542(26) | 0.21 | 0.06 |

G (^{2}B_{3u}) | 6.718 | 6.671(19) | 6.713(22) | 0.20 | 0.07 |

G (^{2}A_{u}) | 7.095 | 7.019(09) | 7.066(11) | 0.39 | 0.13 |

H (^{2}B_{u}) | 6.155 | 6.125(27) | 6.172(38) | 1.48 | 0.23 |

H (^{2}A_{u}) | 6.754 | 6.653(21) | 6.691(30) | 1.86 | 0.29 |

I | 8.914 | 8.857(02) | 8.891(04) | 0.57 | 0.45 |

J | 7.747 | 7.720(07) | 7.748(11) | 0.42 | 0.33 |

K | 7.257 | 7.238(12) | 7.269(18) | 0.33 | 0.19 |

L | 7.026 | 7.001(13) | 7.039(21) | 0.96 | 0.44 |

M | 6.825 | 6.804(16) | 6.840(26) | 0.96 | 0.30 |

N | 6.767 | 6.732(18) | 6.775(29) | 2.00 | 0.53 |

O | 6.695 | 6.677(19) | 6.715(29) | 1.81 | 0.39 |

MAE^{c} | 0 | 0.049(12) | 0.018(17) | ⋯ | ⋯ |

. | . | MC-GF2-F12 . | N/10^{9}
. | ||
---|---|---|---|---|---|

Molecule^{a}
. | GF2-F12^{b}
. | V
. | VBX
. | GF2 . | F12 . |

A | 9.235 | 9.174(01) | 9.217(01) | 0.51 | 0.43 |

B | 8.045 | 7.981(02) | 8.025(03) | 0.41 | 0.33 |

C | 7.289 | 7.231(04) | 7.278(05) | 0.33 | 0.22 |

D | 7.314 | 7.246(05) | 7.291(06) | 0.31 | 0.19 |

E | 7.185 | 7.131(10) | 7.177(13) | 0.21 | 0.08 |

F | 6.563 | 6.496(21) | 6.542(26) | 0.21 | 0.06 |

G (^{2}B_{3u}) | 6.718 | 6.671(19) | 6.713(22) | 0.20 | 0.07 |

G (^{2}A_{u}) | 7.095 | 7.019(09) | 7.066(11) | 0.39 | 0.13 |

H (^{2}B_{u}) | 6.155 | 6.125(27) | 6.172(38) | 1.48 | 0.23 |

H (^{2}A_{u}) | 6.754 | 6.653(21) | 6.691(30) | 1.86 | 0.29 |

I | 8.914 | 8.857(02) | 8.891(04) | 0.57 | 0.45 |

J | 7.747 | 7.720(07) | 7.748(11) | 0.42 | 0.33 |

K | 7.257 | 7.238(12) | 7.269(18) | 0.33 | 0.19 |

L | 7.026 | 7.001(13) | 7.039(21) | 0.96 | 0.44 |

M | 6.825 | 6.804(16) | 6.840(26) | 0.96 | 0.30 |

N | 6.767 | 6.732(18) | 6.775(29) | 2.00 | 0.53 |

O | 6.695 | 6.677(19) | 6.715(29) | 1.81 | 0.39 |

MAE^{c} | 0 | 0.049(12) | 0.018(17) | ⋯ | ⋯ |

In our previous study,^{21} we established the *O*(*n*^{2}) scaling of the computational cost (wall time) per MC step with the number of basis functions (*n*) in an MC-MP2-F12 calculation. This combined with the observations that the relative statistical error increases as *O*(*n*) and that the same falls off as *N*^{−1/2} with the number of MC steps (*N*) led to the overall *O*(*n*^{4}) scaling of the MC-MP2-F12 method, which is one-rank lower than the *O*(*n*^{5}) scaling of deterministic MP2-F12 methods with the SP ansatz [the scaling would be even worse and *O*(*n*^{6}) for deterministic MP2-F12 methods that do not use the SP ansatz]. In this sense, the MC algorithm is guaranteed to outperform deterministic ones in the large *n* limit, though the crossover point seems rather far.^{15}

If we determine *O*(*n*) IEs simultaneously from a single GF2-F12 calculation (which is realistic for large molecules and band structures in solids in particular), the scaling of the cost per MC step in MC-GF2-F12 would be the same *O*(*n*^{2}) as in MC-MP2-F12 because of the algorithmic similarity illustrated by Eqs. (6)–(9). To know the scaling of MC-GF2-F12, we also need to know how the statistical error in an IE behaves with *n*. Unlike the total (correlation) energy computed in MC-MP2-F12, we demand that the bare statistical error (not the relative error) fall below a certain threshold (such as 0.04 eV) in the case of IEs. Figure 2 clearly shows the *O*(*n*) scaling of the bare statistical error, regardless of the details (*V* versus *VBX*) of the formalism. This and the *O*(*n*^{2}) scaling of the cost per MC step lead to the overall *O*(*n*^{4}) scaling of MC-GF2-F12 for calculating all IEs. This is one-rank lower than the *O*(*n*^{5}) scaling of deterministic GF2-F12 calculations.^{48}

A unique advantage of MC-GF2-F12 is its ability to use virtually any correlation factor.^{21,35} Table II compares the performance of three of high-performing correlation factors,^{21,35} STG, YC, and SJ, as defined by Eqs. (24)–(26), for the first IEs of the water, methane, and benzene molecules. These three correlation factors yield nearly interchangeable F12 corrections (within 0.03 eV of one another) but have vastly different statistical errors. The SJ factor, in particular, suffers from large statistical errors likely because of the greater complexity and number of derivatives appearing in the kinetic-energy operator commutator of Eq. (56). This leaves STG, which happens to be convenient for deterministic F12 algorithms,^{26} the most attractive choice for the MC algorithms also.

Correlation factor . | γ
. | Water . | Methane . | Benzene . |
---|---|---|---|---|

STG [Eq. (24)] | 1.1 | 0.578(01) | 0.331(01) | 0.402(03) |

YC [Eq. (25)] | 1.6 | 0.569(02) | 0.327(01) | 0.411(10) |

SJ [Eq. (26)] | 1.1 | 0.569(07) | 0.335(06) | 0.384(78) |

Table III lists the results of the MC-GF2 and MC-GF2-F12 calculations for the first IEs of C_{60} and C_{70} using the aug-cc-pVDZ basis set. It should be stressed that MC-GF2 and MC-GF2-F12 calculations are always executable no matter how large a system is insofar as the preceding HF calculation can be completed. This is because the MC algorithms use tiny memory footprint and no disk I/O, while being more demanding of CPU time, although the latter issue is mitigated by the high degree of parallelism and the effective use of GPUs.^{45} The MC algorithms are, therefore, an example of space-time trade-off^{50} (somewhat like Almlöf’s direct self-consistent field algorithm^{51}), which minimizes memory space at the expense of increased operation cost, thereby lifting a hardware-made limitation in applicability to large problems. Whether such calculations can be converged to within a reasonable statistical error with available resources is a different question, but the answer is affirmative for C_{60} and C_{70} using a small fraction of the Blue Waters supercomputer.

After about 4 × 10^{8} (GF2) and 6 × 10^{7} (F12) MC steps, the statistical errors in both molecules fall below 0.04 eV. The F12 part of the calculation is more rapidly convergent than the GF2 part likely because of the smallness of the integral value and the partial removal of singularities in the integrand in the former. This fact combined with the flexibility in the mathematical form of the correlation factor make it most advantageous to evaluate the F12 corrections stochastically. Figures 3 and 4 show their convergence with MC steps for C_{60} and C_{70}, respectively. These plots (displaying only small initial portions of the whole calculations) suggest that the statistical errors in MC-GF2-F12 largely originate from those in MC-GF2, which nevertheless shows stable convergence at reasonably reliable values already after 5 × 10^{6} steps.

The MC-GF2/aug-cc-pVDZ values of the IEs (6.70 eV and 6.73 eV for C_{60} and C_{70}, respectively) have substantial errors of 0.94 and 0.74 eV from the experimental values. The F12 method improves these results by reducing the errors by 0.47 eV and 0.44 eV, respectively, to 0.47 eV and 0.30 eV from the experimental values. Therefore, the basis-set incompleteness (in GF2/aug-cc-pVDZ) is responsible for 50%–60% of the errors. The remaining errors must be ascribed to the deficiency of the GF2 method itself such as the lack of higher-order perturbation corrections, the diagonal approximation, and/or the frequency-independent approximation, with the first-listed likely being the primary source.^{9} Work to extend MC-GF2 to higher perturbation orders is underway in our laboratory.

## V. CONCLUSIONS

We have implemented the MC-GF2-F12 method capable of computing correlated IEs near the CBS limit in a highly scalable manner. It is naturally and easily parallelized with the operation cost for computing all IEs scaling as *O*(*n*^{4}) with the number (*n*) of basis functions as opposed to *O*(*n*^{5}) of its deterministic counterparts. The largest calculation performed in this study (C_{70}) executed the GF2 part on 128 GPUs and the F12 part on 896 CPUs, yielding its first IE near the CBS limit with a statistical error of 0.04 eV. The applicability of these MC calculations is presently only limited by the feasibility of the preceding HF calculations. The three correlation factors that were shown to perform well in MC-MP2-F12 also work equally well in MC-GF2-F12 except that the relative simplicity of STG seems to minimize the statistical errors and is preferred over the YC or SJ factor.

## ACKNOWLEDGMENTS

This work was supported by the Center for Scalable, Predictive methods for Excitation and Correlated phenomena (SPEC), which is funded by the U.S. Department of Energy, Office of Science, Basic Energy Sciences, Chemical Sciences, Geosciences, and Biosciences Division, as a part of the Computational Chemical Sciences Program. It is also part of the Blue Waters sustained-petascale computing project, which is supported by the U.S. National Science Foundation (Award Nos. OCI-0725070 and ACI-1238993) and the state of Illinois. Blue Waters is a joint effort of the University of Illinois at Urbana-Champaign and its National Center for Supercomputing Applications. This work was also supported by the Ministry of Education, Culture, Sports, Science and Technology (MEXT) as Priority Issue on Post-K computer (Development of new fundamental technologies for high-efficiency energy creation, conversion/storage and use) and JSPS Grant-in-Aids for Scientific Research (A) (Grant No. JP18H03900).