A highly scalable stochastic algorithm is proposed and implemented for computing the basis-set-incompleteness correction to the diagonal, frequency-independent self-energy of the second-order many-body Green’s function (GF2) theory within the explicitly correlated (F12) formalism. The 6-, 9-, 12-, and 15-dimensional integrals comprising the F12 correction are directly evaluated by the Monte Carlo method using appropriate weight functions for importance sampling. The method is naturally and easily parallelized, involves minimal memory space and no disk I/O, and can use virtually any mathematical form of a correlation factor. Its computational cost to correct all ionization energies (IEs) is observed to increase as the fourth power of system size, as opposed to the fifth power in the case of the deterministic counterparts. The GF2 calculations and their F12 corrections for the first IEs of C60 and C70 were executed on 128 graphical processing units (GF2) and 896 central processing units (F12), respectively, to reach the results with statistical errors of 0.04 eV or less. They showed that the basis-set-incompleteness (from aug-cc-pVDZ) accounts for only 50%–60% of the deviations from experiments, suggesting the significance of higher-order perturbation corrections.
I. INTRODUCTION
One-particle many-body Green’s function (MBGF) theory1–8 provides a converging series of perturbation approximations to electron-binding energies; the accurate determination of which is key to understanding many processes in chemistry, biology, and materials science. Green’s function also serves as the propagator in quantum field theory, underlying nearly all quantum many-body physics ranging from condensed matter physics to particle physics. Furthermore, it is the mathematical kernel of quantum transport, scattering, and embedding theories.
Two of the present authors with co-authors9 implemented general-order algorithms of the Feynman–Dyson perturbation series of MBGF and quantified its convergence rate and the accuracy of the diagonal and frequency-independent approximations to the self-energy. The study showed that the convergence is surprisingly slow, making higher-order corrections potentially much more important than in many-body perturbation theory (MBPT) for the ground-state correlation energy.
This slow perturbation-order convergence is particularly troublesome in view of the high-rank polynomial size dependence of the costs of these higher-order methods. The problem size is the product of the system’s spatial extent and basis-set size. This issue is, therefore, further compounded by the notoriously slow basis-set convergence10–16 of quantities obtained from any ab initio electron-correlation theory including MBGF.
The slow basis-set convergence, however, was tackled by one of the present authors with a co-author,17 who developed an explicitly correlated (F12) extension of the second-order MBGF (GF2) in the diagonal, frequency-independent approximation to the self-energy (GF2-F12). It allows the complete-basis-set (CBS) results to be obtained with a relatively small basis set on a massively parallel computer. The diagonal and frequency-independent approximations were later lifted in GF2-F12 by Pavešević et al.18
In this article, we develop a highly scalable stochastic algorithm of the GF2-F12 method of Ohnishi and Ten-no,17 which we call the Monte Carlo GF2-F12 or MC-GF2-F12. It is naturally and easily parallelized and its computational cost increases less steeply with the number of basis functions. Its formalism and algorithm are a straightforward combination of the Monte Carlo second-order Green’s function (MC-GF2) method19 and Monte Carlo explicitly correlated second-order many-body perturbation (MC-MP2-F12) method20,21 reported earlier. We show that MC-GF2-F12 can compute ionization energies (IEs) near their CBS limits with a relatively small basis set for large conjugated molecules with statistical errors that are comparable or smaller than typical experimental errors. The method also allows the use of virtually any correlation factor (geminal). The details are given in the following.
II. FORMALISMS
A. GF2-F12
Following the derivation and notation of Ohnishi and Ten-no,17 we write the second-order perturbation approximation of the xth occupied-orbital energy, , as a sum of three parts,
where is the canonical Hartree–Fock (HF) orbital energy and and are the so-called orbital/pair-relaxation and correlation-difference contributions, respectively, to the second-order self-energy in the diagonal, frequency-independent approximation.9 We call this method GF2 in this article. Its contributions are given as
where i, j, k, l, m, n, and x denote an occupied spatial orbital in a closed-shell molecule, while a and b denote a virtual spatial orbital.
As shown numerically by Ohnishi and Ten-no,17 converges rapidly toward the CBS limit, while the basis-set convergence of is rather slow owing to its double virtual summation. Therefore, only needs to be corrected for the basis-set incompleteness. Rewriting as
with
we notice that second-order many-body perturbation (MP2) energy and its F12 correction,22–24 EMP2 and EF12, are also written as
where dij is the correction to the second-order pair energy of the occupied orbital pair ij. Hence, the F12-corrected (GF2-F12) value of the GF2 orbital energy, , can simply be obtained17 as
where is the F12 correction to .
The last equation suggests that, in GF2-F12, we can reuse much of the MP2-F12 formalisms and implementations. The correction, dxj, is given by the established formalisms20,21,24–27 of MP2-F12 as
with
where the geminal amplitudes, , are held fixed at the values dictated by the cusp conditions,22,28,29 as per Ten-no’s SP ansatz27
Other factors are designated in the standard notations,21 whose definitions are written as
where f12 is the correlation factor or geminal (an explicit function of r12), is the Fock operator of electron n, and is the strong-orthogonality projector,30–33
Here, Ôn and are the projectors onto the occupied and virtual orbital spaces of electron n, respectively.
Utilizing the generalized and extended Brillouin conditions21,24,27,34 (which consolidate the B and X terms into the combined “BX” term) and explicitly symmetrizing each summation with respect to an interchange of x and j (a necessity in its Monte Carlo integration), we arrive at the working equation for the F12 correction
where
with
When the correlation factor, f12, is optimized by minimizing this correction, the following identity holds (as in MP2-F12):21
The same identity still holds approximately but accurately by merely optimizing a few parameters (γ in the following examples) in the correlation factor insofar as its shape can closely mimic the correlation hole.21,27,35 Such correlation factors include the Slater-type geminal (STG),26 Yukawa–Coulomb (YC) geminal,36,37 and Slater–Jastrow (SJ) factor38
We call the GF2-F12 method based on Eq. (19) the VBX approximation, whereas the one using Eq. (23), the V approximation. The magnitude of the VBX correction serves as an upper bound for the true basis-set-incompleteness correction.
The foregoing derivation was taken from Ohnishi and Ten-no.17
B. MC-GF2-F12
Here, we illustrate the key derivation steps of the working equations of the Monte Carlo algorithm of GF2-F12 (MC-GF2-F12) (see Ref. 21 for the closely related MC-MP2-F12 formalism). Expanding , i.e., substituting Eq. (18) into Eq. (20), we have
with
where the superscript on Σ indicates the number of electrons involved in the integral (thus its dimension) and x denotes the occupied spatial orbital whose energy is being corrected by F12. These are then converted into an MC integrable form as
where
Here, O, V, and X are two-electron, six-dimensional, real-space pair functions given by
and
where φp(rq) is the pth spatial orbital of electron q.
Likewise, Eq. (21) becomes a sum of six terms,
where “T” and “K” stand for the kinetic and exchange contributions, respectively, of the Fock operator in the BX integral, Eq. (22),
Here, and stand for the kinetic and exchange operators, respectively, for electron n (the other operators in commute with f12). Note that the dimensions of the exchange terms are one-electron higher than those of the kinetic terms. This is owing to the presence of a projector in the exchange operator, which, when expanded, increases the integral’s dimension (like ). The MC-integrable expression then reads
The two-electron kinetic integrands are given by
the three-electron kinetic integrands by
and the four-electron kinetic integrands by
and
The primed real-space pair functions are defined by
where r12 = r1 − r2. The kinetic-energy integrands such as of Eq. (42) are divided into two terms, and (n = 2, 3, 4), with the first having a singularity at r12 = |r12| = 0 and the other not, thus requiring different weight functions in MC integrations (see below). They are defined with the quantities derivable from the correlation factor
The exchange integrands are
III. MONTE CARLO ALGORITHM
Just like in MC-MP2-F12,19–21 the GF2-F12 corrections are evaluated by the Monte Carlo (MC) method39 utilizing the redundant-walker algorithm.40 In each MC step, the ratio between the integrand and a judiciously chosen weight function is calculated and summed at coordinates distributed randomly but according to the weight function. The weight function must cancel the singularities of the integrand, be always positive, be analytically integrable, and should generally behave like the integrand.41,42 The random distribution according to the weight function is attained by the Metropolis algorithm.43
where N is the total number of MC steps and m is the number of “redundant walkers” (see below for the definition).
There are two different types of walkers used for these integrands. For strongly correlated coordinates, , the weight function that guides their propagation is given by
with g(r) being a sum of atom-centered s-type Gaussian-type orbitals,
where natom is the number of atoms, rA is the position of the Ath atom, and nλ is the number of Gaussian functions per atom. For uncoupled coordinates, or , the weight function is
With this choice of g(r), normalization coefficients, N2e and N1e, are determined analytically.44 Usually two Gaussians per atom (nλ = 2) suffice; one tight and one diffuse to expand the electron density. In the applications to the IEs of C60 and C70 (see below), a third is added to account for the large size of the salient molecular orbitals.
The reason for this separation of the walker coordinates (between r1, r2 and r3 and r4) can be understood by inspecting the structure of the [Eq. (35)] and [Eq. (36)] integrands. The coordinates of electrons 1 and 2 are strongly coupled through the singular factor in these integrands. This factor should, therefore, be canceled by the same factor in the weight function. Electrons 3 and 4 have no such correlation and the corresponding weight function can be an independent function of each electron’s coordinate.
In the redundant-walker algorithm,20,40 m “redundant” one-electron walkers are propagated according to the weight function, (r), at a cost that is m times that of propagating the minimal two one-electron walkers in integrating [Eq. (62)]. With m walkers instead of two, there are m(m − 1)/2 distinct ways of choosing the two walker coordinates that can be substituted in the integrand of Eq. (62). This increases the sampling number by a factor of m(m − 1)/2 at an m-fold increase in the sampling cost, a net increase in the sampling efficiency by (m − 1)/2. However, Eqs. (60) and (61) are unaffected by the algorithm, making the overall efficiency boost difficult to predict.
Similarly, the T integrals [Eqs. (42)–(44)] are evaluated as
and
of which all factors are already defined.
The K integrals [Eqs. (45)–(47)] are computed as
and
IV. RESULTS AND DISCUSSION
MC-GF2 (Refs. 19 and 45) and MC-GF2-F12 calculations were performed for the first (and occasionally the second) IEs of the molecules listed in Fig. 1 using the aug-cc-pVDZ basis set in the frozen core approximation. The structures of molecules A–O were taken from Ref. 17, whereas those of molecules P and Q came from Ref. 46. The MC-GF2 and F12 calculations were executed simultaneously; for every 8 processors, one was a graphical processing unit (GPU) running MC-GF2 and the other 7 were central processing units (CPUs) performing MC-GF2-F12. For molecules A–O, 64 GPUs (GF2) and 448 CPUs (F12) of the Blue Waters supercomputer were employed; for molecules P and Q, 128 GPUs (GF2) and 896 CPUs (F12) were used. 512 redundant walkers were propagated for MC-GF2, while 40 redundant one- and two-electron walkers were used for MC-GF2-F12. The statistical uncertainty was taken after 7 blocking transformations.47
A: benzene; B: naphthalene; C: anthracene; D: pyrene; E: coronene; F: ovalene; G: porphyrin; H: tetraphenylporphyrin; I: thiophene; J: bithiophene; K: terthiophene; L: quaterthiophene; M: quinquethiophene; N: sexithiophene; O: septithiophene; P: C60; Q: C70.
A: benzene; B: naphthalene; C: anthracene; D: pyrene; E: coronene; F: ovalene; G: porphyrin; H: tetraphenylporphyrin; I: thiophene; J: bithiophene; K: terthiophene; L: quaterthiophene; M: quinquethiophene; N: sexithiophene; O: septithiophene; P: C60; Q: C70.
Table I compares the MC-GF2-F12 results for molecules A–O in the V and VBX approximations with the deterministic GF2-F12 results.17 The MC results were converged to a statistical error of 0.04 eV or less after the number of MC steps indicated in the table, which may be comparable to a typical experimental error (or “chemical accuracy” of 0.043 eV). The VBX data reproduce the CBS limits obtained by the deterministic GF2-F12 method17 even though the two methods differ slightly in detail as well as in the γ value and basis set used. The mean absolute deviation (MAE) of 0.018 eV between these two methods is, therefore, largely (if not entirely) due to the statistical error (0.017 eV) in the MC-GF2-F12 method. The MAE between the V approximation and deterministic GF2-F12 is distinctly greater (0.049 eV) and is likely a bias caused by the nonvariational nature of the F12 correction in the V approximation exacerbated by the small aug-cc-pVDZ basis set used [recall that the fidelity of the V approximation or the degree to which Eq. (23) is satisfied depends rather sensitively on the mathematical form of the correlation factor and on the value of γ]. It can be said that the MC algorithm works correctly with almost no bias, reproducing the deterministic counterparts within a sufficiently small statistical error, as long as the VBX formalism and an appropriate value of γ are used.
The first (and second) IEs (in eV) calculated by the deterministic GF2-F12/aug-cc-pVTZ (Ref. 17) and MC-GF2-F12/aug-cc-pVDZ (in either the V or VBX approximation). The values in parentheses are the statistical errors from both MC-GF2 and MC-GF2-F12 steps. A Slater-type geminal with γ = 1.1 was used in the MC-GF2-F12 calculations. N stands for the number of MC steps in the MC-GF2 or MC-GF2-F12 calculation running on 64 GPUs or on 448 CPUs, respectively, of the Blue Waters supercomputer.
. | . | MC-GF2-F12 . | N/109 . | ||
---|---|---|---|---|---|
Moleculea . | GF2-F12b . | V . | VBX . | GF2 . | F12 . |
A | 9.235 | 9.174(01) | 9.217(01) | 0.51 | 0.43 |
B | 8.045 | 7.981(02) | 8.025(03) | 0.41 | 0.33 |
C | 7.289 | 7.231(04) | 7.278(05) | 0.33 | 0.22 |
D | 7.314 | 7.246(05) | 7.291(06) | 0.31 | 0.19 |
E | 7.185 | 7.131(10) | 7.177(13) | 0.21 | 0.08 |
F | 6.563 | 6.496(21) | 6.542(26) | 0.21 | 0.06 |
G (2B3u) | 6.718 | 6.671(19) | 6.713(22) | 0.20 | 0.07 |
G (2Au) | 7.095 | 7.019(09) | 7.066(11) | 0.39 | 0.13 |
H (2Bu) | 6.155 | 6.125(27) | 6.172(38) | 1.48 | 0.23 |
H (2Au) | 6.754 | 6.653(21) | 6.691(30) | 1.86 | 0.29 |
I | 8.914 | 8.857(02) | 8.891(04) | 0.57 | 0.45 |
J | 7.747 | 7.720(07) | 7.748(11) | 0.42 | 0.33 |
K | 7.257 | 7.238(12) | 7.269(18) | 0.33 | 0.19 |
L | 7.026 | 7.001(13) | 7.039(21) | 0.96 | 0.44 |
M | 6.825 | 6.804(16) | 6.840(26) | 0.96 | 0.30 |
N | 6.767 | 6.732(18) | 6.775(29) | 2.00 | 0.53 |
O | 6.695 | 6.677(19) | 6.715(29) | 1.81 | 0.39 |
MAEc | 0 | 0.049(12) | 0.018(17) | ⋯ | ⋯ |
. | . | MC-GF2-F12 . | N/109 . | ||
---|---|---|---|---|---|
Moleculea . | GF2-F12b . | V . | VBX . | GF2 . | F12 . |
A | 9.235 | 9.174(01) | 9.217(01) | 0.51 | 0.43 |
B | 8.045 | 7.981(02) | 8.025(03) | 0.41 | 0.33 |
C | 7.289 | 7.231(04) | 7.278(05) | 0.33 | 0.22 |
D | 7.314 | 7.246(05) | 7.291(06) | 0.31 | 0.19 |
E | 7.185 | 7.131(10) | 7.177(13) | 0.21 | 0.08 |
F | 6.563 | 6.496(21) | 6.542(26) | 0.21 | 0.06 |
G (2B3u) | 6.718 | 6.671(19) | 6.713(22) | 0.20 | 0.07 |
G (2Au) | 7.095 | 7.019(09) | 7.066(11) | 0.39 | 0.13 |
H (2Bu) | 6.155 | 6.125(27) | 6.172(38) | 1.48 | 0.23 |
H (2Au) | 6.754 | 6.653(21) | 6.691(30) | 1.86 | 0.29 |
I | 8.914 | 8.857(02) | 8.891(04) | 0.57 | 0.45 |
J | 7.747 | 7.720(07) | 7.748(11) | 0.42 | 0.33 |
K | 7.257 | 7.238(12) | 7.269(18) | 0.33 | 0.19 |
L | 7.026 | 7.001(13) | 7.039(21) | 0.96 | 0.44 |
M | 6.825 | 6.804(16) | 6.840(26) | 0.96 | 0.30 |
N | 6.767 | 6.732(18) | 6.775(29) | 2.00 | 0.53 |
O | 6.695 | 6.677(19) | 6.715(29) | 1.81 | 0.39 |
MAEc | 0 | 0.049(12) | 0.018(17) | ⋯ | ⋯ |
In our previous study,21 we established the O(n2) scaling of the computational cost (wall time) per MC step with the number of basis functions (n) in an MC-MP2-F12 calculation. This combined with the observations that the relative statistical error increases as O(n) and that the same falls off as N−1/2 with the number of MC steps (N) led to the overall O(n4) scaling of the MC-MP2-F12 method, which is one-rank lower than the O(n5) scaling of deterministic MP2-F12 methods with the SP ansatz [the scaling would be even worse and O(n6) for deterministic MP2-F12 methods that do not use the SP ansatz]. In this sense, the MC algorithm is guaranteed to outperform deterministic ones in the large n limit, though the crossover point seems rather far.15
If we determine O(n) IEs simultaneously from a single GF2-F12 calculation (which is realistic for large molecules and band structures in solids in particular), the scaling of the cost per MC step in MC-GF2-F12 would be the same O(n2) as in MC-MP2-F12 because of the algorithmic similarity illustrated by Eqs. (6)–(9). To know the scaling of MC-GF2-F12, we also need to know how the statistical error in an IE behaves with n. Unlike the total (correlation) energy computed in MC-MP2-F12, we demand that the bare statistical error (not the relative error) fall below a certain threshold (such as 0.04 eV) in the case of IEs. Figure 2 clearly shows the O(n) scaling of the bare statistical error, regardless of the details (V versus VBX) of the formalism. This and the O(n2) scaling of the cost per MC step lead to the overall O(n4) scaling of MC-GF2-F12 for calculating all IEs. This is one-rank lower than the O(n5) scaling of deterministic GF2-F12 calculations.48
The statistical error, σ (in Eh), of IEs calculated by MC-GF2-F12 (in the V and VBX approximations) as a function of the number of basis functions n. A linear function of n is overlaid in blue.
The statistical error, σ (in Eh), of IEs calculated by MC-GF2-F12 (in the V and VBX approximations) as a function of the number of basis functions n. A linear function of n is overlaid in blue.
A unique advantage of MC-GF2-F12 is its ability to use virtually any correlation factor.21,35 Table II compares the performance of three of high-performing correlation factors,21,35 STG, YC, and SJ, as defined by Eqs. (24)–(26), for the first IEs of the water, methane, and benzene molecules. These three correlation factors yield nearly interchangeable F12 corrections (within 0.03 eV of one another) but have vastly different statistical errors. The SJ factor, in particular, suffers from large statistical errors likely because of the greater complexity and number of derivatives appearing in the kinetic-energy operator commutator of Eq. (56). This leaves STG, which happens to be convenient for deterministic F12 algorithms,26 the most attractive choice for the MC algorithms also.
The F12 corrections (in eV) to the first IEs of the water, methane, and benzene molecules obtained by MC-GF2-F12 in the VBX approximation using the Slater-type geminal (STG), Yukawa–Coulomb (YC) geminal, or Slater–Jastrow (SJ) factor with the γ value given in the corresponding row. The statistical errors taken after 4.59 × 107 MC steps are indicated in parentheses.
Correlation factor . | γ . | Water . | Methane . | Benzene . |
---|---|---|---|---|
STG [Eq. (24)] | 1.1 | 0.578(01) | 0.331(01) | 0.402(03) |
YC [Eq. (25)] | 1.6 | 0.569(02) | 0.327(01) | 0.411(10) |
SJ [Eq. (26)] | 1.1 | 0.569(07) | 0.335(06) | 0.384(78) |
Table III lists the results of the MC-GF2 and MC-GF2-F12 calculations for the first IEs of C60 and C70 using the aug-cc-pVDZ basis set. It should be stressed that MC-GF2 and MC-GF2-F12 calculations are always executable no matter how large a system is insofar as the preceding HF calculation can be completed. This is because the MC algorithms use tiny memory footprint and no disk I/O, while being more demanding of CPU time, although the latter issue is mitigated by the high degree of parallelism and the effective use of GPUs.45 The MC algorithms are, therefore, an example of space-time trade-off50 (somewhat like Almlöf’s direct self-consistent field algorithm51), which minimizes memory space at the expense of increased operation cost, thereby lifting a hardware-made limitation in applicability to large problems. Whether such calculations can be converged to within a reasonable statistical error with available resources is a different question, but the answer is affirmative for C60 and C70 using a small fraction of the Blue Waters supercomputer.
The first IEs (in eV) of C60 and C70 calculated by MC-GF2 and MC-GF2-F12 (in the VBX approximation) with the aug-cc-pVDZ basis set. The values in parentheses are the statistical errors. A Slater-type geminal with γ = 1.1 was used in the MC-GF2-F12 calculations. Calculations were performed on 128 GPUs (GF2) or on 896 CPUs (F12) of the Blue Waters supercomputer. The numbers of MC steps taken were 3.67 × 108 (GF2) and 6.15 × 107 (F12) for C60 and 3.14 × 108 (GF2) and 4.54 × 107 (F12) for C70.
Moleculea . | MC-GF2 . | MC-GF2-F12 . | Experimentb . |
---|---|---|---|
P (C60) | 6.70(2) | 7.17(3) | 7.64(2) |
Q (C70) | 6.73(2) | 7.17(4) | 7.47(2) |
After about 4 × 108 (GF2) and 6 × 107 (F12) MC steps, the statistical errors in both molecules fall below 0.04 eV. The F12 part of the calculation is more rapidly convergent than the GF2 part likely because of the smallness of the integral value and the partial removal of singularities in the integrand in the former. This fact combined with the flexibility in the mathematical form of the correlation factor make it most advantageous to evaluate the F12 corrections stochastically. Figures 3 and 4 show their convergence with MC steps for C60 and C70, respectively. These plots (displaying only small initial portions of the whole calculations) suggest that the statistical errors in MC-GF2-F12 largely originate from those in MC-GF2, which nevertheless shows stable convergence at reasonably reliable values already after 5 × 106 steps.
The MC-GF2 and MC-GF2-F12 results for the first IE of C60 as a function of the number of MC steps.
The MC-GF2 and MC-GF2-F12 results for the first IE of C60 as a function of the number of MC steps.
The MC-GF2 and MC-GF2-F12 results for the first IE of C70 as a function of the number of MC steps.
The MC-GF2 and MC-GF2-F12 results for the first IE of C70 as a function of the number of MC steps.
The MC-GF2/aug-cc-pVDZ values of the IEs (6.70 eV and 6.73 eV for C60 and C70, respectively) have substantial errors of 0.94 and 0.74 eV from the experimental values. The F12 method improves these results by reducing the errors by 0.47 eV and 0.44 eV, respectively, to 0.47 eV and 0.30 eV from the experimental values. Therefore, the basis-set incompleteness (in GF2/aug-cc-pVDZ) is responsible for 50%–60% of the errors. The remaining errors must be ascribed to the deficiency of the GF2 method itself such as the lack of higher-order perturbation corrections, the diagonal approximation, and/or the frequency-independent approximation, with the first-listed likely being the primary source.9 Work to extend MC-GF2 to higher perturbation orders is underway in our laboratory.
V. CONCLUSIONS
We have implemented the MC-GF2-F12 method capable of computing correlated IEs near the CBS limit in a highly scalable manner. It is naturally and easily parallelized with the operation cost for computing all IEs scaling as O(n4) with the number (n) of basis functions as opposed to O(n5) of its deterministic counterparts. The largest calculation performed in this study (C70) executed the GF2 part on 128 GPUs and the F12 part on 896 CPUs, yielding its first IE near the CBS limit with a statistical error of 0.04 eV. The applicability of these MC calculations is presently only limited by the feasibility of the preceding HF calculations. The three correlation factors that were shown to perform well in MC-MP2-F12 also work equally well in MC-GF2-F12 except that the relative simplicity of STG seems to minimize the statistical errors and is preferred over the YC or SJ factor.
ACKNOWLEDGMENTS
This work was supported by the Center for Scalable, Predictive methods for Excitation and Correlated phenomena (SPEC), which is funded by the U.S. Department of Energy, Office of Science, Basic Energy Sciences, Chemical Sciences, Geosciences, and Biosciences Division, as a part of the Computational Chemical Sciences Program. It is also part of the Blue Waters sustained-petascale computing project, which is supported by the U.S. National Science Foundation (Award Nos. OCI-0725070 and ACI-1238993) and the state of Illinois. Blue Waters is a joint effort of the University of Illinois at Urbana-Champaign and its National Center for Supercomputing Applications. This work was also supported by the Ministry of Education, Culture, Sports, Science and Technology (MEXT) as Priority Issue on Post-K computer (Development of new fundamental technologies for high-efficiency energy creation, conversion/storage and use) and JSPS Grant-in-Aids for Scientific Research (A) (Grant No. JP18H03900).