We develop a stochastic resolution of identity representation to the second-order Matsubara Green’s function (sRI-GF2) theory. Using a stochastic resolution of the Coulomb integrals, the second order Born self-energy in GF2 is decoupled and reduced to matrix products/contractions, which reduces the computational cost from *O*(*N*^{5}) to *O*(*N*^{3}) (with *N* being the number of atomic orbitals). The current approach can be viewed as an extension to our previous work on stochastic resolution of identity second order Møller-Plesset perturbation theory [T. Y. Takeshita *et al.*, J. Chem. Theory Comput. **13**, 4605 (2017)] and offers an alternative to previous stochastic GF2 formulations [D. Neuhauser *et al.*, J. Chem. Theory Comput. **13**, 5396 (2017)]. We show that sRI-GF2 recovers the deterministic GF2 results for small systems, is computationally faster than deterministic GF2 for *N* > 80, and is a practical approach to describe weak correlations in systems with 10^{3} electrons and more.

## I. INTRODUCTION

To understand the fundamental chemistry and physics that drive emergent phenomena in complex chemical systems under real-world conditions requires an accurate description of the ground and excited electronic states. However, such a task poses unique challenges. First, quantitative electronic structure calculations for realistic chemical systems can be prohibitively expensive due to the system size and poor scaling of current methods. Second, complex chemical systems with multiple components lead to situations where electron correlation and excited states must be accurately described throughout multiple phases and at their interfaces. Finally, it is insufficient to study these systems at equilibrium as a great deal of important chemistry emerges under nonequilibrium conditions.

For weakly correlated systems, the many-body perturbation theory (MBPT) within the Green’s function (GF) method provides an accurate framework to describe ground state properties beyond mean field density functional theory (DFT)^{1–5} and Hartree-Fock (HF)^{6–8} and has been proven extremely fruitful.^{9–21} These post DFT/HF techniques allow for the inclusion of electron correlation in a controlled and systematic fashion through approximations of the self-energy and have been used to accurately describe total ground state energies, quasiparticle excitations, lifetimes, etc.^{22–30} However, their steep computational scaling and costs restrict their use to small molecular systems or to bulk materials with a small unit cell.

A particularly interesting implementation is the self-consistent GF approach based on the second-order approximation to the electron self-energy (GF2), which is experiencing a renaissance.^{30–36} Modern implementations of the finite temperature GF2 (FT-GF2) method have calculated accurate ionization potentials^{35–38} and total energies for organic molecules as large as C_{6}H_{6} as well as transition metal containing triatomics.^{39,40} Recently, GF2 has been applied to capture static correlation, producing potential energy curves of a symmetrically stretched H_{6} ring that are in qualitative agreement with truncated configuration interaction results.^{41}

Although the results of recent studies are extremely promising, the GF2 approach suffers from a prohibitively high computational cost, scaling as *O*(*N*^{5}) (*N* is the number of orbitals used). Thus, GF2 methods have yet to be applied to molecular systems of experimentally relevant sizes. It would be extremely advantageous if methods with inherent and systematic structure, e.g., GF methods, were to become a viable option and can be performed routinely for complex chemical systems. This is especially true of the largely unexplored set of chemical systems that lie in the intermediate regime between small molecules and the bulk limit.

Recently, a stochastic method was developed that reduces the scaling of the ground state and thermal GF2 using a stochastic decomposition of the imaginary time Green’s function. The method was tested successfully on long linear hydrogen chains.^{42}

In this work, we introduce an alternative stochastic orbital approach to reduce the scaling of the GF2 methods that would lend itself very simply to higher-order approximations to the self-energy and for going beyond limited to the ground state and thermal calculations. Unlike the stochastic GF2 method developed in Ref. 42 which uses random variables to represent the GF, the current approach builds upon our recently developed stochastic orbital electron repulsion integral (ERI) method^{43} that decouples the 4-index ERIs.

Decoupling the indices of a “core element” in electronic structure theory enables the derivation of new systems of equations that can be solved at a fraction of the computational cost of the deterministic methods. As a result, large and accurate basis sets, traditionally considered impractical, can be used to perform electronic structure calculations within the framework of GF2 at a similar formal scaling to DFT and HF for systems that are impractical in deterministic calculations by introducing a controlled error.

The manuscript is structured as follows: Sec. II describes the GF2 approach and provides two alternative implementations, one based on going between imaginary time and frequency and the other solely in the imaginary time domain. We also review the stochastic representation of the resolution of identity and apply the formalism to the GF2 self-energy. In Sec. III, we compare the frequency and imaginary time domain approaches for a hydrogen chain as well as provide results for the total correlation energy and the scaling of the stochastic vs deterministic approaches. Section IV concludes.

## II. THEORY

Matsubara, or finite-temperature imaginary time, Green’s function methods are well suited for the study of complex chemical systems as they enable us to evaluate equilibrium thermodynamic properties. In practice, the temperature is selected such that the Matsubara GF technique is better thought of as a ground state electronic structure method. In the following subsections, we outline the Matsubara GF method and the corresponding stochastic orbital implementation.

### A. Matsubara Green’s functions

The Matsubara GF method provides a self-consistent solution to the Dyson equation in imaginary frequency (we use atomic units where *ℏ* = 1 throughout the paper, unless otherwise noted),

where $i\omega n=i(2n+1)\pi \beta $ is the Matsubara frequency (*β* the inverse temperature), $G\u0303(i\omega n)$, $G\u03030(i\omega n)$, and $\Sigma \u0303(i\omega n)$ are the single-particle Green’s function of the interacting system, the single-particle Green’s function of the Hartree-Fock reference system, and the self-energy, respectively. In the second order approximation, i.e., GF2, the self-energy is given by

Here, **Σ**(*τ*) and **G**(*τ*) are the imaginary time self-energy and Green’s function, respectively. Here, we have restricted ourselves to the closed shell case. $vijmn$ is the 4-index electron repulsion integrals (ERIs),

where *χ*_{i} is the atomic orbital. Equations (1) and (2) need to be solved self-consistently since the self-energy depends on Green’s function.

We present two methods to obtain the self-consistent solution to Eq. (1), the first follows the procedure of Ref. 42 which builds on the techniques developed in Refs. 34, 40, and 39, and the second, developed in this work, is based on an imaginary time propagation technique. The self-consistent procedure of Ref. 42 may be summarized in the following steps:

Initialize $G\u0303(i\omega n)=G\u03030(i\omega n)=[(\mu +i\omega n)S\u2212F]\u22121$, where

**S**is the overlap matrix and**F**is the Fock matrix from a converged Hartree-Fock calculation, and*μ*is the chemical potential, initialized at the middle point between the highest occupied molecular orbital (HOMO) and the lowest unoccupied molecular orbital (LUMO).Perform a discrete Fourier transform (FT) to obtain Green’s function in imaginary time, $G(\tau )=1\beta \u2211ne\u2212i\omega n\tau G\u0303(i\omega n)$. To reduce the number of time and frequency points, we use the approach developed in Ref. 42.

Construct

**Σ**(*τ*) from**G**(*τ*), and then perform a FT to the imaginary frequency space to obtain**Σ**(*iω*_{n}).Solve the Dyson equation $G\u0303(i\omega n)=[(G\u03030(i\omega n)\u22121)\u2212\Sigma \u0303(i\omega n)]\u22121$.

Update the Fock matrix, $Fij=hij+\u2211abPab(vijab\u221212vibaj)$, using the relation $G(\tau =\beta )=\u221212P$, where

**P**is the density matrix.Tune the chemical potential,

*μ*, so that the total number of electrons,*N*_{e}= Tr[**PS**], is conserved.Repeat steps 2–6 until self-consistency is reached.

Step 3 in the above algorithm is the computational bottleneck, i.e., the evaluation of **Σ**(*τ*). The computational cost of the second-order Born approximation to the self-energy, **Σ**^{(2)}(*τ*) [see definition in Eq. (2)], scales as *O*(*N*^{5}) (*N* is the number of basis functions) and must be done for all *N*_{τ} imaginary time points.

Alternatively, one can solve the Dyson equations in the time domain, rather than going back and forth between imaginary time and Matsubara frequency domains. In this case, the complication associated with the FT is replaced with an imaginary time integration and imposing the boundary condition such that **P** = −2**G**(*τ* = *β*):

Initialize $G(\tau )=G0(\tau )=Xe\u2212\tau (F\xaf\u2212\mu I)\theta (\u2212\tau )1+e\beta (F\xaf\u2212\mu I)\u2212\theta (\tau )1+e\u2212\beta (F\xaf\u2212\mu I)XT$, where

**X**satisfies**XX**^{T}=**S**^{−1}and $F\xaf=XTFX$ is the Fock matrix in orthogonal basis. Again,**F**is initialized from a converged Fockian, and*μ*is initialized at the middle point of the HOMO-LUMO gap.Construct

**Σ**(*τ*) from**G**(*τ*) using Eq. (2).Solve the Dyson equation in the imaginary time to obtain new $Gnew(\tau )$,

Imaginary time integral gets solved using a proper quadrature.^{30} To be specific, we use 1000 Legendre polynomials. The Legendre quadrature points are cubic splined from 256 Chebyshev grid points of imaginary time. To reduce the computational cost, the double integral is decoupled into two separated integrals. Furthermore, to make the most use of the spline, the integral is evaluated for the spline coefficient, such that we only do the quadrature integral once.

Update the Fock matrix $Fij=hij+\u2211mnPmn(vijmn\u221212vinmj)$ using the new Green’s function, $Gnew(\tau =\beta )=\u221212P$.

Tune the chemical potential,

*μ*, so that the total number of electrons,*N*_{e}= Tr[**PS**], is conserved.Use the new

**G**_{new}(*τ*) to calculate the self-energy and repeat steps 2–5 until self-consistency is reached.

Again, the computational bottleneck is in Step 2 of calculating the self-energy. In Sec. III, we will compare the two algorithms, i.e., imaginary time and imaginary frequency Green’s function methods and discuss the computational complexity of each approach. Before this is done, we first turn to describe a stochastic formulation for the self-energy that reduces the scaling of obtaining the self-energy to *O*(*N*^{3}).

### B. Stochastic resolution of identity

To address the high computational cost of Matsubara GF2 calculations, we begin by noting that a significant reduction in the computational prefactor may be achieved by approximating the 4-index electron repulsion integrals (ERIs), $vijmn$, in Eq. (3) in terms of the 2- and 3-index integrals. This is the resolution of the identity (RI) or density fitting approximation.^{43} The 3- and 2-index ERIs are defined as

and

In the above equations, we use the usual notation: the AO Gaussian basis functions are represented by *χ*_{α}(*r*) and Greek indices *α*, *β*, *γ*, *δ*, …, while the auxiliary basis functions are represented by the indices *A*, *B*, …. Finally, the total number of AO basis functions, auxiliary basis functions, are *N*_{AO} and *N*_{aux}, respectively. Furthermore, both *N*_{aux} and *N*_{AO} are proportional to the system size with *N*_{aux} typically 3–6 times *N*_{AO}.

The approximate 4-index RI-ERIs are then expressed symmetrically in terms of the lower-rank integrals according to

Defining

yields

The stochastic orbital implementation of the RI approximation^{43} utilizes the same set of 2- and 3-index ERIs while introducing an additional set of *N*_{s} *stochastic orbitals*, $\theta \xi $, *ξ* = 1, 2, …, *N*_{s}. Stochastic orbitals are arrays of length *N*_{aux} with randomly selected elements $\theta A\xi =\xb11$ and have the following property:^{43}

where we have denoted the stochastic average over *N*_{s} stochastic orbitals by ⟨⟩_{ξ}. Inserting the stochastic identity matrix into the deterministic RI-ERIs in Eq. (6), we obtain the expression for the sRI-ERIs,

where $\theta \u2297\theta T\xi PQ$ is the *PQ*^{th} element of the stochastic identity matrix. We now define the *ξ*^{th} elements of the stochastic average as

With this definition, the ERI in the AO basis [Eq. (10)] is now given by a stochastic average, an $O(NsNAO4)$ step,

The introduction of the stochastic identity matrix decouples the index *Q* in Eq. (6). Previous work has shown that the trade-off between *decoupling* this index and the introduction of stochastic noise allows for a reduced overall scaling and computational time.^{43} It was also shown that the stochastic error of the elements of the identity matrix and therefore the error of the ERIs is governed by the number of stochastic orbitals, *N*_{s}. This is due to the fact that it is the *length* of stochastic arrays, *N*_{aux}, that increases with the system size, not the number of stochastic orbitals.

### C. Stochastic resolution of identity second-order Born self-energy

In this section, we demonstrate how the sRI-ERIs may be applied to the second-order approximation to the self-energy to reduce the overall scaling of computing **Σ**(*τ*) [cf., Eq. (2)]. Using the stochastic resolution of identity of ERIs [Eq. (12)], the second-order Born self-energy in Eq. (3) can be represented by

We note that, in the above equations, *R* and *R*′ matrices use different sets of stochastic orbitals. In the above equations, we have also defined

In the above equations, the evaluation of $\Sigma ijdir(\tau )$ and $\Sigma ijex(\tau )$ is done now simply by matrix products (or matrix contractions), which scale as *O*(*N*^{3}), as long as the number of stochastic orbitals does not change with the size of the system. It should also be noted that sRI-ERIs can also be used to reduce the scaling to *O*(*N*^{3}) when evaluating the third order or higher self-energy for all contractions in self-energy are now matrix products. That being said, for now, we limit ourselves to the second order Born self-energy.

## III. RESULTS AND DISCUSSION

To study the observed scaling, stochastic errors, and the impact of the prefactors, *N*_{s} and *N*_{t}, on the sRI-GF2 method, we test the approach on a hydrogen dimer chain *H*_{n} with sto-3g basis, with *n* being the number of H atoms. In such a hydrogen dimer chain, the distance between two hydrogen atoms within a dimer is 0.74 Å, and the distance between two H atoms over the nearest hydrogen dimers is 1.26 Å. For such a system, we use cc-pvdz as a fitting basis set (we use cc-pvdz-jkfit for Fock Matrix elements and cc-pvdz-ri for self-energy). In addition, we restrict ourselves to closed shell calculations. This system served as a test bed for other GF2 implementations as well as for the stochastic approach of Ref. 42.

We do not take advantage of the locality of the atomic basis and the resulting sparsity of the ERIs which makes the current applications more realistic for 3D structures, where sparsity often occurs at much larger system sizes. We note in passing that using the sparsity of the 4-index ERIs would further reduce the cost of computing the self-energy.

### A. Comparison of GF2 in frequency and imaginary time domain

We first compare the imaginary time and imaginary frequency sRI-GF2 methods. One of the most important properties we can gain from Matsubara GF is the correlation energy, which is given by

We generally used *N*_{τ} = 256 imaginary time points for the integrals. In Table I, we compare correlation energy from imaginary time and imaginary frequency sRI-GF2 methods on *H*_{2}, *H*_{10}, *H*_{20}, and *H*_{40}. We find excellent agreement between the two sRI-GF2 methods, up to an error of 6 meV for the total correlation energy (about 0.2 meV for correlation energy per electron). The computational cost of the two methods is comparable and is mainly determined by the evaluation of the self-energy. The results shown in the remainder of the paper were obtained with the frequency domain calculation, but the analysis applies to both methods.

. | E_{corr} from time
. | E_{corr} frequency
. | CPU time . |
---|---|---|---|

. | sRI-GF2 . | sRI-GF2 . | ratio (%) . |

H_{2} | −0.6546 | −0.6548 | 47.9 |

H_{10} | −3.7541 | −3.7565 | 37.7 |

H_{20} | −6.3811 | −6.3835 | 63.1 |

H_{40} | −13.2038 | −13.2092 | 61.1 |

. | E_{corr} from time
. | E_{corr} frequency
. | CPU time . |
---|---|---|---|

. | sRI-GF2 . | sRI-GF2 . | ratio (%) . |

H_{2} | −0.6546 | −0.6548 | 47.9 |

H_{10} | −3.7541 | −3.7565 | 37.7 |

H_{20} | −6.3811 | −6.3835 | 63.1 |

H_{40} | −13.2038 | −13.2092 | 61.1 |

### B. Correlation energy

In Fig. 1, we plot the correlation energy for the hydrogen dimer chain as a function of number of hydrogen atoms. We use *N*_{s} = 800 stochastic orbitals for each sRI-GF2 calculation. The final results are then averaged over 10 independent runs. We find that sRI-GF2 results provide a very good agreement with the deterministic GF2 results. Note also that the correlation energy grows nearly linearly with the system size, which signifies weak correlations in the hydrogen dimer chain. We also note that the difference in correlation energy per electron [Fig. 1(b)] between deterministic and stochastic GF2 is within the statistical error.

Note also that from Fig. 1(b), the statistical error bar for energy per electron does not increase with the system size. In other words, to obtain constant accuracy for energy per electron for different system sizes, one does not need to increase the number of stochastic orbitals; therefore, our sRI-GF2 approach scales as *N*^{3} with the system size or number of electrons. However, if one does want to obtain the same accuracy for the total energy (instead of energy per electron), since the stochastic error is proportional to the inverse square root of the sampling size (central limit theorem), our sRI-GF2 approach will scale as *N*^{5} with system size, which is the same scaling as the deterministic implementation of GF2. That being said, for extended systems, e.g., bulk materials, the meaningful quantities are energy per particle or per unit cell instead of the total energy. We expect our sRI-GF2 method will be very useful for such extended systems.

### C. Error estimation

In Fig. 2, we plot the average correlation energy per electron as a function of the number of stochastic orbitals. The stochastic error is the standard deviation of the 10 independent sRI-GF2 runs, which is roughly 0.02 eV. Clearly, the error bars decrease with the increasing number of stochastic orbitals and do not depend on the system size. Note also that the average correlation energy does not show a monotonic behavior with the number of stochastic orbitals. In Ref. 42, the authors report a linear dependence between the average correlation energy and the number of stochastic orbitals due to a systematic bias which is observable due to the small statistical errors. Here, we do not observe a clean linear dependence, suggesting that our bias, if there is any, is much smaller than our statistical error. This also implies that in order to converge the results to a given statistical error, the number of stochastic orbitals does not need to increase with the system size (at least for the system size we are testing here, under 1000 electrons).

### D. Timing

In Fig. 3, we show the overall computational scaling of our sRI-GF2 method and compare it to the deterministic version. In all cases, we used, as mentioned, *N*_{τ} = 256, and roughly 20 SCF cycles were required for convergence. Furthermore, *N*_{ω} = 20 000 frequency points (cubic splined from 200 frequency points^{42}) were used. All calculations used the 32-core Intel-Xeon processor E5-2698 v3 (“Haswell”) at 2.3 GHz on a single node. The wall time for sRI-GF2 is tested for a single run with 800 stochastic orbitals.

In our calculations, we do not rely on the sparsity of the 4-index ERIs nor on the sparsity of the overlap matrix or the Fockian. Using the sparsity of the 4-index ERIs would further reduce the computational cost for both the deterministic and stochastic GF2 implementations. For larger systems, we find that the computational cost scales as *N*^{3}, with *N* being the number of hydrogen atoms. As stated above, the most time consuming part is calculating the self-energy. Using the stochastic resolution of identity, calculating the self-energy is simply matrix multiplication, which scales *N*^{3}. The deterministic GF2 scales as *N*^{5}. We find that the crossover between the deterministic and stochastic GF2 is around 80 hydrogen atoms for the accuracy of 0.01 eV in correlation energy per particle (as estimated by the standard error of the mean value over 10 independent runs, $\sigma 10$) given by 800 stochastic orbitals.

## IV. CONCLUSIONS

We have presented two stochastic versions of the Matsubara Green’s function method to calculate ground states properties of molecular systems. Using the resolution of identity, the second order Born self-energy is represented in a matrix product form, which reduces the scaling from the original *N*^{5} to *N*^{3}. Our sRI-GF2 can be seen as an extension of the stochastic MP2 results as presented in Ref. 43 and offers an alternative to the one as presented in Ref. 42. We have shown that our sRI-GF2 reproduces deterministic GF2 up to the stochastic error. In addition, we do not see systemic bias as we increase the number of stochastic orbitals. Future work will investigate how our stochastic resolution of identity can be implemented in the real time GF method. Such work is ongoing.

## ACKNOWLEDGMENTS

The authors would like to thank Dr. Ming Chen for fruitful discussions. T.Y.T. is supported by a fellowship from the Molecular Sciences Software Institute (MolSSI) under NSF Grant No. ACI-1547580. D.N. and E.R. are grateful for support by the Center for Computational Study of Excited State Phenomena in Energy Materials (C2SEPEM) at the Lawrence Berkeley National Laboratory, which is funded by the U.S. Department of Energy, Office of Science, Basic energy Sciences, Materials Sciences and Engineering Division under Contract No. DEAC02-05CH11231 as part of the Computational Materials Sciences Program. R.B. is grateful for support by Binational US-Israel Science Foundation Grant No. BSF 2015687. Resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility operated under Contract No. DE-AC02-05CH11231, are greatly acknowledged.