We develop a range-separated stochastic resolution of identity (RS-SRI) approach for the four-index electron repulsion integrals, where the larger terms (above a predefined threshold) are treated using a deterministic RI and the remaining terms are treated using a SRI. The approach is implemented within a second-order Green’s function formalism with an improved O(N3) scaling with the size of the basis set, N. Moreover, the RS approach greatly reduces the statistical error compared to the full stochastic version [T. Y. Takeshita et al., J. Chem. Phys. 151, 044114 (2019)], resulting in computational speedups of ground and excited state energies of nearly two orders of magnitude, as demonstrated for hydrogen dimer chains and water clusters.

Many-body perturbation theory (MBPT) based on Green’s function (GF) approaches [e.g., the Møller–Plesset (MP) perturbation theory,1 the second-order Green’s function (GF2) approach,2 and the GW3 approximation] have proven very useful in predicting the ground state properties beyond the limitations of density functional theory (DFT) and the Hartree–Fock (HF) method as well as in predicting quasi-particle and neutral excitation. In these methods, correlations are treated systemically by expanding the self-energy (which contains the information of correlations) in the Coulomb2,4 or screened Coulomb5–7 interactions. MBPT has been applied to a variety of molecular and bulk systems in predicting, e.g., correlation energies, ionization potentials and electron affinities,8–22 and excited states.7,15,23–26 Excluding several recent applications,27–33 MBPT has been limited to relatively small systems due to the steep computational scaling with the system size.

A particularly interesting implementation of MBPT, relevant to the applications reported below, is based on a second-order approximation to the electron self-energy,2,4,34 which has received increasing attention in recent years.8–10,35 In contrast to the GW approximation,1 dynamical exchange correlations are included explicitly in the GF2 self-energy to the second order in Coulomb interactions, providing accurate ground state energies36,37 and quasi-particle energies.9,10,38,39 Although the results of recent studies are extremely promising, the GF2 approach suffers from a high computational cost [O(N5)], limiting its application to relatively small system sizes.

To overcome this limitation, two stochastic formulations were recently introduced to reduce the computational scaling. Neuhauser et al.40 developed a stochastic decomposition of the imaginary time GF to reduce the overall scaling of GF2 to O(N3). Takeshita et al.41 and Dou et al.42 proposed an approach that builds on the stochastic resolution of identity (SRI) for the electron repulsion integrals (ERIs)43 to describe both ground and quasi-particle excited states. Similar to the deterministic resolution of identity (RI),44–48 the SRI decouples the four-index ERIs. While the number of auxiliary bases increases with the system size for the RI, the number of stochastic orbitals in the SRI is independent of the system size, resulting in an overall O(N3) scaling. However, the SRI technique comes at the cost of introducing a statistical error in the energy and nuclear forces,43,49–52 which can be controlled by increasing the number of stochastic realizations, Ns. While the overall scaling of the stochastic formulations of GF2 is similar to DFT and HF, achieving chemical accuracy requires a large number of stochastic realizations, resulting in increasingly longer computational time, even for small systems.41,42

In this work, we develop a range-separated stochastic resolution of identity (RS-SRI) approach to decouple the four-index ERIs, where the short-range ERIs (larger values) are treated deterministically using the resolution of identity (RI)44–48 and the remaining terms are treated using the stochastic resolution of identity (SRI).43 The RS-SRI approach allows for a significant reduction in the statistical error without the need to increase the number of stochastic realizations while maintaining the overall O(N3) scaling. We apply the RS-SRI technique to GF2 theory and demonstrate its ability to reduce the overall computational scaling from O(N5) to O(N3) as well as increase the sampling efficiency by nearly two orders of magnitude as compared to the SRI technique.

Consider a generic many-body electronic Hamiltonian in the second-quantization representation,

Ĥ=ijhijâiâj+12ijklvijklâiâkâlâj,
(1)

where âi and âi are the Fermionic creation and annihilation operators, respectively, for an electron in orbital χi(r). In the applications below, χi(r) is chosen to be an atomic orbital, but we do not use the locality of the basis to reduce the scaling or do we introduce a cutoff to compute the ERIs [see Eq. (2)] or the overlap matrix [see Eq. (3)]. Therefore, the formalism and the resulting scaling reported below are general for any choice of basis. The creation and annihilation operators obey the following anti-commutation relationship:

âi,âj=Sij,
(2)

where Sij = ∫χi(r)χj(r)dr is the matrix element of the overlap matrix S. In Eq. (1), hij is the matrix element of the one-body Hamiltonian and vijkl is the four-index ERI (vijkl ≡ (ij|kl)),

vijkl=dr1dr2χi(r1)χj(r1)χk(r2)χl(r2)r1r2.
(3)

Describing correlations within a many-body perturbation technique beyond the mean-field approximation relies on the contraction of vijkl (or powers of vijkl), a task that becomes computationally intractable with increasing levels of accuracy. A common approach to reduce the computational complexity is based on the resolution of identity (RI), where the four-index ERIs in Eq. (3) are approximated by products of three-index ERIs and two-index ERIs,44,46–48

vijklABNauxij|AVAB1B|kl.
(4)

Here, χA(r) and χB(r) are the auxiliary orbitals, and (ij|A) and VAB are the three-index and two-index ERIs respectively,

(ij|A)=dr1dr2χi(r1)χj(r1)χA(r2)r1r2,
(5)
VAB=dr1dr2χA(r1)χB(r2)r1r2.
(6)

For convenience, we define a new set of three-index ERIs KijQ,

KijQ=ANaux(ij|A)(V)AQ12,
(7)

such that the four-index ERI can be expressed in terms of three-index ERIs only,

vijkl=QNauxKijQKklQ.
(8)

The advantage of the above decomposition is that the resolution of identity reduces the number of two-body ERIs from O(N4) to O(N2Naux), where N is the size of the atomic basis and Naux is the size of the auxiliary basis. However, since Naux increases nearly linearly with the size of the atomic basis N and since the calculation of KijQ scales as O(N4), the approach does not always reduce the computational scaling of the correlation energy for, e.g., MP2 and GF2.41–43 

Recently, we have introduced a stochastic version of the resolution of identity, which provides a framework to reduce the scaling for contraction within many-body perturbation techniques at the account of introducing a controlled statistical error in the calculated observables (e.g., the forces on the nuclei and the energy per electron). The balance between accuracy and efficiency is controlled by the number of stochastic realizations (Ns) according to the central limit theorem. The stochastic RI approach utilizes the same set of three-index ERIs (ij|A) while circumventing the need to directly compute KijQ by introducing a set of Nsstochastic orbitals, {θξ}, ξ = 1, 2, …, Ns. The stochastic orbitals are defined as arrays of length Naux with randomly selected elements 1 or −1, i.e., θAξ=±1. Defining

Rijξ=AQNaux(ij|A)(V)AQ12θQξ=ANaux(ij|A)QNaux(V)AQ12θQξ,
(9)

the expression for vijkl can be reduced to

vijkl1NsξRijξRklξRijRklθ,
(10)

where θ implies a statistical average over the stochastic orbitals, {θ}. The overall computational scaling of the Rijξ matrices is O(NsN3), but Ns is found to be independent of the system size for different applications.25,29,41–43,49,53 The SRI technique has been successfully used to reduce the scaling of the correlation energy within MP2 and GF2 theories, from O(N5) to O(N3).

The above approach has been implemented for simple molecules and for hydrogen chains of different lengths in order to assess its accuracy for large systems.41–43 To converge the results to chemical accuracy required a rather large number of stochastic orbitals (Ns ≈ 1000), which limits the application of the SRI technique to relatively small systems (due to the large “prefactor”), with N → 1000, still exceedingly larger than the deterministic approach.41–43 In order to reduce the number of stochastic orbitals and to allow for a smaller statistical error, we first sort the ERIs (ij|A) according to their magnitude and keep only those that are larger than a threshold,

(ij|A)L=(ij|A) if|(ij|A)|ϵN{|(ij|A)|}jmax0 otherwise.
(11)

Here, {|(ij|A)|}jmax is the maximal value of |(ij|A)| for each j and ϵ′ is a predefined parameter. The superscript L (or S) denotes large (or small) values. By setting the cutoff threshold to depend on ϵN, the number of nonzero elements in (ij|A)L for each j scales as O(N) [rather than O(N2) if no threshold is used or O(1) if a fixed threshold ϵ′ is used]. This implies that the total non-vanishing elements in (ij|A)L scale as O(N2). We then define [KijQ]L as

[KijQ]L=ANaux(ij|A)L(V)AQ12
(12)

and keep only the terms that are larger than a predefined threshold, namely, we set [KijQ]L=0 for values below the threshold according to

[KijQ]L=[KijQ]L if|[KijQ]L|ϵ{|[KijQ]L|}max0 otherwise.
(13)

The calculation of [KijQ]L using the above procedure scales as O(N3). We proceed by defining

[Rijξ]L=QNaux[KijQ]LθQξ,
(14)
[Rijξ]S=Rijξ[Rijξ]L,
(15)

where Rijξ is defined above in Eq. (9) and the computational scaling for both terms, [Rijξ]L and [Rijξ]S, is O(N3). Using these definitions, the four-index tensor vijkl can be rewritten as

vijkl=QNaux[KijQ]L[KklQ]L+RijLRklSθ+RijSRklLθ+RijSRklSθ.
(16)

Equation (16) is referred to as the range-separated stochastic resolution of identity (RS-SRI). The RS-SRI reduces to the SRI for ϵ = 1 and to the deterministic RI for ϵ = 0. This suggests that ϵ can be used as a control parameter balancing the computational efficiency and statistical errors. For optimal choices of ϵ, the contribution of QNaux[KijQ]L[KklQ]L in Eq. (16) must be larger than the other terms.

We now apply the above formalism to the second-order Matsubara Green’s function (GF2) theory.40–42 The main entity in the GF2 theory is the Matsubara single-particle, finite temperature, Green’s function given by (we set = 1 unless otherwise stated)

Gij(τ)=Tcâi(τ)âj,
(17)

where âi and âj are defined above in Sec. II, Tc is a time ordering operator, and τ is an imaginary time point along τ ∈ (0, −β). In the above, we have used the Heisenberg picture for the operators: âi(τ)=e(ĤμN^)τâie(ĤμN^)τ, where N^=ijSijâiâj is the number operator and Ĥ is the many-body Hamiltonian defined in Eq. (1). The average is taken with respect to the grand canonical partition function =Z1Tr()eβ(ĤμN^), where Z=Treβ(ĤμN^) is the normalization factor, β = 1/kBT is the inverse temperature, and μ is the chemical potential.

The Matsubara GF obeys the following Dyson equation:

SτG(τ)=δ(τ)+(FμS)G(τ)+0βdτ1Σ(ττ1)G(τ1),
(18)

where F is the Fock matrix given by

Fij=hij2klGkl(β)(vijkl12vilkj)   
(19)

and Σ is the self-energy. In the second-order Born approximation, the self-energy (in the closed shell case) is given by

Σij(τ)=klmnpqvimqk(2vlpnjvnplj)Gkl(τ)Gmn(τ)Gpq(βτ).
(20)

The above form scales as O(N5) using the appropriate contraction.

The Matsubara Green’s function for the Fermionic systems obeys the following anti-symmetric relationship: G(τ) = −G(τ + β). The anti-symmetry feature allows for a Fourier representation of G(τ) in imaginary frequency,

G̃(iωn)=0βeiωnτG(τ).
(21)

Here, iωn=i(2n+1)πβ are the Matsubara frequencies, and the inverse Fourier transform is defined by

G(τ)=1βneiωnτG̃(iωn).
(22)

The Dyson equation [cf. Eq. (18)] can then be solved in the frequency domain,

G̃(iωn)=1[G̃0(iωn)]1Σ̃(iωn),
(23)

where Σ̃(iωn) is the Fourier transform of the self-energy [Eq. (20)] and G̃0(iωn) is the non-interacting GF,

G0̃(iωn)=[(μ+iωn)SF]1.
(24)

Since the self-energy Σ̃(iωn) depends on G̃(iωn) itself, Eq. (23) as well as Eq. (20) must be solved self-consistently. This is done by first performing a Hartree–Fock calculation to obtain the overlap matrix S, the Fock matrix F, and the chemical potential μ. The Fock matrix can then be used for constructing the non-interacting GF [cf. Eq. (24)] that serves as our initial guess of G̃(iωn)=G̃0(iωn). The next step involves the calculation of the self-energy, which is preformed in the imaginary time domain [Eq. (20)]. The self-energy is then used to update the GF in Eq. (23), and the latter is used to update the Fock matrix in Eq. (19). It is often necessary to conserve the number of particles Ne = −2ijGij(τ = β)Sij. This can be achieved by tuning the chemical potential μ.

The computational bottleneck in GF2 is the calculation is the self-energy, which scales formally as O(N5). Using the RS-SRI representation for vijkl given by Eq. (16), the self-energy can be written as

Σij(τ)=klmnpqGkl(τ)Gmn(τ)Gpq(βτ)×QNaux[KimQ]L[KqkQ]L+RimLRqkSθ+RimSRqkLθ+RimSRqkSθ×2QNaux[KlpQ]L[KnjQ]L+RlpLRnjSθ+RlpSRnjLθ+RlpSRnjSθQNaux[KnpQ]L[KljQ]L+RnpLRljSθ+RnpSRljLθ+RnpSRljSθ.
(25)

In Sec. IV, we apply the RS-SRI to a series of hydrogen chain molecules and compare the results to deterministic RI as well as to SRI. We find, in practice, that the RS-SRI scales even better than the upper theoretical limit of O(N3) and at the same time reduces the statistical error by about an order of magnitude as shown below.

In this section, we assess the performance of the RS-SRI-GF2 approach and compare the results to deterministic GF2 and SRI-GF2 for hydrogen dimer chains HNH of length NH and on a series of water clusters. We begin with the hydrogen chain that has been the canonical model system to assess different electronic structure methods. The distance between strongly bonded hydrogen atoms was set to 0.74 Å, and the distance between weakly bonded hydrogen atoms was set to 1.26 Å. For each hydrogen, we used the STO-3G basis and the CC-pVDZ-RI fitting basis for the resolution of identity in evaluating the self-energy as well as the CC-pVDZ-JKFIT fitting basis in evaluating the Fock matrix in Eq. (19). The inverse temperature used for the calculation of the GFs was set to β = 50 inverse Hartree, sufficient to converge the results due to the large quasi-particle gap. We used the approach developed in Ref. 40 to perform the discrete Fourier transform with 20 000 Matsubara frequencies and 300 imaginary-time points.

We first test how ϵ′ and ϵ [in Eqs. (11) and (13)] affect the final results. In Fig. 1, we plot the correlation energy per electron, defined as40 

Ecorr=1Ne0βdτTr(Σ(τ)G(βτ))
(26)

for H10 as a function of ϵ for two values of ϵ′. The statistical errors were estimated from the standard deviation σ over Nrun = 10 independent RS-SRI-GF2 runs. The RS-SRI-GF2 results agree with the deterministic GF2 results within the statistical error for all values of ϵ and ϵ′. In general, we find that the statistical errors decrease with the decreasing values of ϵ and ϵ′, consistent with the deterministic limit defined by ϵ → 0 and ϵ′ → 0, respectively. However, for the larger value of ϵ′, we find an optimal value of ϵ, suggesting that the approach to the deterministic limit may not be monotonic. Clearly, one needs to be careful when choosing the optimal thresholds for range separation. Below, for hydrogen molecule chains, we will set ϵ′ = 0.02 and ϵ = 0.1 as our thresholds for RS-SRI calculations unless otherwise noted.

FIG. 1.

Correlation energy per electron for H10 as a function of ϵ for two values of ϵ′ from RS-SRI-GF2 calculations (blue symbols and lines). Orange lines denote the deterministic results that are independent of ϵ and ϵ′. Note that the RS-SRI-GF2 results agree with the deterministic GF2 results within the statistical error bar. The statistical errors were estimated from the standard deviation σ over Nrun = 10 independent RS-SRI-GF2 runs.

FIG. 1.

Correlation energy per electron for H10 as a function of ϵ for two values of ϵ′ from RS-SRI-GF2 calculations (blue symbols and lines). Orange lines denote the deterministic results that are independent of ϵ and ϵ′. Note that the RS-SRI-GF2 results agree with the deterministic GF2 results within the statistical error bar. The statistical errors were estimated from the standard deviation σ over Nrun = 10 independent RS-SRI-GF2 runs.

Close modal

In Fig. 2, we plot the correlation energy per electron for a series of hydrogen dimer chains. We compare the results obtained using the RS-SRI-GF2 with SRI-GF2 and for small systems, with deterministic calculations. We find, as expected, that the correlation energy per electron is roughly independent of the length of the chain. Furthermore, both RS-SRI-GF2 with SRI-GF2 agree with the deterministic results within their statistical error. However, the statistical error for the same number of stochastic orbitals (Ns) is significantly smaller (by nearly an order of magnitude) for RS-SRI-GF2 compared to SRI-GF2 for the entire range of system sizes. The error bar was estimated as the standard deviation σ, where we have used Nrun = 10 runs to estimate the statistical fluctuations.

FIG. 2.

Correlation energy per electron [cf. Eq. (26)] for a series of hydrogen dimer chains of different lengths (NH is the number of hydrogen atoms). The error bar is estimated by the standard deviation σ over Nrun = 10 independent runs. We have used Ns = 800 stochastic orbitals for both the RS-SRI-GF2 and the SRI-GF2 calculations. Note that both stochastic approaches agree with the deterministic approach (calculated only for the smaller system sizes) within the statistical error.54 

FIG. 2.

Correlation energy per electron [cf. Eq. (26)] for a series of hydrogen dimer chains of different lengths (NH is the number of hydrogen atoms). The error bar is estimated by the standard deviation σ over Nrun = 10 independent runs. We have used Ns = 800 stochastic orbitals for both the RS-SRI-GF2 and the SRI-GF2 calculations. Note that both stochastic approaches agree with the deterministic approach (calculated only for the smaller system sizes) within the statistical error.54 

Close modal

In Fig. 3, we plot the correlation energy per electron as a function of the inverse of the number of stochastic orbitals (1Ns) for H80, H200, and H500. We find that the statistical fluctuations decrease as 1Ns, indicated by the decrease in the magnitude of the error bars. For Ns = 800, we compare the RS-SRI-GF2 with the SRI-GF2 (red symbol, Fig. 3) for H500. Clearly, the statistical noise is much larger (by about a factor of 10) compared to the RS-SRI-GF2 result (green symbols). We also find that the statistical fluctuations in the correlation energy per electron are independent of the system size, consistent with Ns being independent on N, similar to other implementation of stochastic orbital techniques.25,29,41–43,49,53

FIG. 3.

The correlation energy per electron as a function of 1/Ns for H80, H200, and H500 obtained using the RS-SRI-GF2. For Ns = 800, we also show the result for NH = 500 using the SRI-GF2 approach (red symbol). Note that, for clarity, we have shifted slightly the values of the x axis for different system sizes.

FIG. 3.

The correlation energy per electron as a function of 1/Ns for H80, H200, and H500 obtained using the RS-SRI-GF2. For Ns = 800, we also show the result for NH = 500 using the SRI-GF2 approach (red symbol). Note that, for clarity, we have shifted slightly the values of the x axis for different system sizes.

Close modal

A closer examination of the results shown in Fig. 3 suggests that for H500, the correlation energy per electron depends linearly on 1/Ns. This dependence has been discussed and analyzed in Ref. 40 and was attributed to the existence of a bias resulting from the nonlinear dependence of the correlation energy on the stochastic process (see the  Appendix for a complete derivation of the bias). In comparison to a previous work,40 we find that the bias is rather small, well within the statistical errors, and thus, its existence is questionable.

In Fig. 4, we plot the computational wall time of the different GF2 approaches (deterministic GF2, RS-SRI-GF2, and SRI-GF2) as a function of the length of the hydrogen atom chain, NH. All calculations are performed on a single node with the 32-core Intel-Xeon processor E5-2698 v3 (“Haswell”) at 2.3 GHz. The deterministic GF2 scales as O(N5.1), the SRI-GF2 scales as O(N3.1), and the current approach, for the same level of accuracy as in the SRI-GF2, scales as O(N2.2), slightly better than the theoretical limit of O(N3). Note that the RS-SRI-GF2 approach has a much smaller total wall time compared to the other approaches, across the entire system range studied. As additional checks, the inset of Fig. 4 shows the scaling of computing [KijQ]L as well as the scaling of the deterministic portion of the self-energy (terms that only involve [KijQ]L but not RijL or RijS). The former scales as O(N3), and the latter is found to scale as O(N2).

FIG. 4.

Computational wall time of the different GF2 approaches (deterministic GF2, RS-SRI-GF2, and SRI-GF2) as a function of NH. Computational scalings are O(N5.1), O(N2.2), and O(N3.1) from fitting (solid lines). The inset shows the scaling of computing [KijQ]L (black symbols), which is fitted by O(N3) (black solid line), as well as the scaling of the deterministic portion of the self-energy (terms that only involve [KijQ]L but not RijL or RijS (cyan symbols), which is fitted by O(N2) (cyan solid line).

FIG. 4.

Computational wall time of the different GF2 approaches (deterministic GF2, RS-SRI-GF2, and SRI-GF2) as a function of NH. Computational scalings are O(N5.1), O(N2.2), and O(N3.1) from fitting (solid lines). The inset shows the scaling of computing [KijQ]L (black symbols), which is fitted by O(N3) (black solid line), as well as the scaling of the deterministic portion of the self-energy (terms that only involve [KijQ]L but not RijL or RijS (cyan symbols), which is fitted by O(N2) (cyan solid line).

Close modal

Finally, we demonstrate that the range-separated stochastic resolution of identity method is also useful for three dimensional systems. In Fig. 5, we plot the correction function per electron for a series of water clusters. We set ϵ′ = 0.005 and ϵ = 0.025 in the RS-SRI-GF2 calculations. Note that, again, the SRI-GF2 as well as the RS-SRI-GF2 results agree with the deterministic GF2 results within the statistical errors, which are significantly smaller for the RS-SRI-GF2 approach. The statistical error is estimated by the standard deviation over Nrun = 10 runs. The computational scaling for the water cluster is found numerically to be O(N2.4) (not shown), slightly better than the theoretical limit of (N3). This implies that the range-separated stochastic resolution of identity approach is also useful to reduce the computational scaling as well as statistical error for 3D structures.

FIG. 5.

Correlation energy per electron for the water molecule cluster. The error bars were estimated from the standard deviation σ over Nrun = 10 runs. We have used Ns = 800 stochastic orbitals for both RS-SRI-GF2 and SRI-GF2 calculations. Note that both stochastic approaches agree with the deterministic approach within the statistical error. We have set ϵ′ = 0.005 and ϵ = 0.025 in the RS-SRI-GF2 calculations.

FIG. 5.

Correlation energy per electron for the water molecule cluster. The error bars were estimated from the standard deviation σ over Nrun = 10 runs. We have used Ns = 800 stochastic orbitals for both RS-SRI-GF2 and SRI-GF2 calculations. Note that both stochastic approaches agree with the deterministic approach within the statistical error. We have set ϵ′ = 0.005 and ϵ = 0.025 in the RS-SRI-GF2 calculations.

Close modal

We have developed a range-separated stochastic resolution of identity approach to decouple the four-index electron repulsion integrals and implemented the approach within the second-order Green’s function formalism, GF2. The RS-SRI technique can be viewed as a hybridization of the RI and SRI techniques, leveraging from both the accuracy of the RI and the reduced computational complexity of the SRI approaches. Results calculated for hydrogen dimer chains of varying lengths and for water clusters show an improved scaling of O(N2.2) with the size of the basis, N. In comparison to our previous fully stochastic approach, the RS-SRI-GF2 approach significantly reduces the statistical error, resulting in computational wall times that are nearly two orders of magnitude shorter compared to the SRI-GF2. While we focused in this work on the specific implementation of the RS-SRI, the approach lends itself to higher-order approximations to the self-energy and for going beyond ground state properties. Future work should assess the performance of this RS-SRI technique for a wider range of geometries as well as its applicability to the calculation of excited state properties.

W.D. and M.C. contributed equally to this work.

The data that support the findings of this study are available from the corresponding author upon reasonable request.

D.N. and E.R. are grateful for support from the Center for Computational Study of Excited State Phenomena in Energy Materials (C2SEPEM) at the Lawrence Berkeley National Laboratory, which is funded by the U.S. Department of Energy, Office of Science, Basic energy Sciences, Materials Sciences and Engineering Division under Contract No. DEAC02-05CH11231 as part of the Computational Materials Sciences Program. R.B. is grateful for support from the U.S.–Israel Binational Science Foundation (Grant No. BSF-2020602). The resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility operated under Contract No. DE-AC02-05CH11231, are greatly acknowledged.

In this appendix, we provide a short analysis of the origin of the bias and its interplay with the statistical error. Consider the general case where g is the observable of interest given by

g=G(x),
(A1)

where G is some smooth function and ⟨x⟩ is the expected value of a random variable x, which also has the variance σ2 = ⟨(x − ⟨x⟩)2⟩,

Using Ns independent samples, xn, n = 1, …, Ns, we define yNs as

yNs=1Nsn=1Nsxn,
(A2)

which has the following properties:

yNsx=0,
(A3)
(yNsx)2=σ2Ns.
(A4)

We now Taylor expand G(yNs) near ⟨x⟩,

G(yNs)=G(x)+G(x)(yNsx)
(A5)
+12G(x)(yNsx)2
(A6)
+O((yNsx)3).
(A7)

According to the central limit theorem, the expectation value of g = G(⟨x⟩) is then given by

gNsNrun=G(x)+12G(x)σ2Ns
(A8)
±G(x)σNsNrun.
(A9)

The second term on the right-hand side of the above equation is what we refer to as bias, and the last term is the statistical error. Note that bias depends linearly on 1/Ns, whereas the statistical error depends on 1/Ns.

1.
C.
Møller
and
M. S.
Plesset
,
Phys. Rev.
46
,
618
(
1934
).
2.
L. S.
Cederbaum
,
J. Phys. B: At. Mol. Phys.
8
,
290
(
1975
).
3.
L.
Hedin
,
Phys. Rev.
139
,
A796
(
1965
).
4.
L. J.
Holleboom
and
J. G.
Snijders
,
J. Chem. Phys.
93
,
5826
(
1990
).
5.
M. S.
Hybertsen
and
S. G.
Louie
,
Phys. Rev. Lett.
55
,
1418
(
1985
).
6.
M. M.
Rieger
,
L.
Steinbeck
,
I. D.
White
,
H. N.
Rojas
, and
R. W.
Godby
,
Comput. Phys. Commun.
117
,
211
(
1999
).
7.
G.
Onida
,
L.
Reining
, and
A.
Rubio
,
Rev. Mod. Phys.
74
,
601
(
2002
).
8.
N. E.
Dahlen
,
R.
van Leeuwen
, and
U.
von Barth
,
Int. J. Quantum Chem.
101
,
512
(
2005
).
9.
Y.-y.
Ohnishi
and
S.
Ten-no
,
J. Comput. Chem.
37
,
2447
(
2016
).
10.
F.
Pavošević
,
C.
Peng
,
J.
Ortiz
, and
E. F.
Valeev
,
J. Chem. Phys.
147
,
121101
(
2017
).
11.
M. S.
Hybertsen
and
S. G.
Louie
,
Phys. Rev. B
34
,
5390
(
1986
).
12.
P.
Rinke
,
A.
Qteish
,
J.
Neugebauer
,
C.
Freysoldt
, and
M.
Scheffler
,
New J. Phys.
7
,
126
(
2005
).
13.
P.
Liao
and
E. A.
Carter
,
Phys. Chem. Chem. Phys.
13
,
15189
(
2011
).
14.
J. B.
Neaton
,
M. S.
Hybertsen
, and
S. G.
Louie
,
Phys. Rev. Lett.
97
,
216405
(
2006
).
15.
M. L.
Tiago
and
J. R.
Chelikowsky
,
Phys. Rev. B
73
,
205334
(
2006
).
16.
C.
Friedrich
,
A.
Schindlmayr
,
S.
Blügel
, and
T.
Kotani
,
Phys. Rev. B
74
,
045104
(
2006
).
17.
M.
Grüning
,
A.
Marini
, and
A.
Rubio
,
J. Chem. Phys.
124
,
154108
(
2006
).
18.
C.
Rostgaard
,
K. W.
Jacobsen
, and
K. S.
Thygesen
,
Phys. Rev. B
81
,
085103
(
2010
).
19.
P.
Koval
,
D.
Foerster
, and
D.
Sánchez-Portal
,
Phys. Rev. B
89
,
155417
(
2014
).
20.
I.
Tamblyn
,
P.
Darancet
,
S. Y.
Quek
,
S. A.
Bonev
, and
J. B.
Neaton
,
Phys. Rev. B
84
,
201402
(
2011
).
21.
N.
Marom
,
F.
Caruso
,
X.
Ren
,
O. T.
Hofmann
,
T.
Körzdörfer
,
J. R.
Chelikowsky
,
A.
Rubio
,
M.
Scheffler
, and
P.
Rinke
,
Phys. Rev. B
86
,
245127
(
2012
).
22.
M. J.
van Setten
,
F.
Caruso
,
S.
Sharifzadeh
,
X.
Ren
,
M.
Scheffler
,
F.
Liu
,
J.
Lischner
,
L.
Lin
,
J. R.
Deslippe
,
S. G.
Louie
 et al,
J. Chem. Theory Comput.
11
,
5665
(
2015
).
23.
M.
Rohlfing
and
S. G.
Louie
,
Phys. Rev. B
62
,
4927
(
2000
).
24.
L. X.
Benedict
,
A.
Puzder
,
A. J.
Williamson
,
J. C.
Grossman
,
G.
Galli
,
J. E.
Klepeis
,
J.-Y.
Raty
, and
O.
Pankratov
,
Phys. Rev. B
68
,
085310
(
2003
).
25.
E.
Rabani
,
R.
Baer
, and
D.
Neuhauser
,
Phys. Rev. B
91
,
235302
(
2015
).
26.
S.
Refaely-Abramson
,
R.
Baer
, and
L.
Kronik
,
Phys. Rev. B
84
,
075144
(
2011
).
27.
M.
Shishkin
and
G.
Kresse
,
Phys. Rev. B
75
,
235102
(
2007
).
28.
F.
Caruso
,
P.
Rinke
,
X.
Ren
,
A.
Rubio
, and
M.
Scheffler
,
Phys. Rev. B
88
,
075105
(
2013
).
29.
D.
Neuhauser
,
Y.
Gao
,
C.
Arntsen
,
C.
Karshenas
,
E.
Rabani
, and
R.
Baer
,
Phys. Rev. Lett.
113
,
076402
(
2014
).
30.
H.-V.
Nguyen
,
T. A.
Pham
,
D.
Rocca
, and
G.
Galli
,
Phys. Rev. B
85
,
081101
(
2012
).
31.
J.
Deslippe
,
G.
Samsonidze
,
D. A.
Strubbe
,
M.
Jain
,
M. L.
Cohen
, and
S. G.
Louie
,
Comput. Phys. Commun.
183
,
1269
(
2012
).
32.
D.
Foerster
,
P.
Koval
, and
D.
Sánchez-Portal
,
J. Chem. Phys.
135
,
074105
(
2011
).
33.
X.
Gonze
,
B.
Amadon
,
P.-M.
Anglade
,
J.-M.
Beuken
,
F.
Bottin
,
P.
Boulanger
,
F.
Bruneval
,
D.
Caliste
,
R.
Caracas
,
M.
Côté
 et al,
Comput. Phys. Commun.
180
,
2582
(
2009
).
34.
G.
Stefanucci
and
R.
van Leeuwen
,
Nonequilibrium Many-Body Theory of Quantum Systems: A Modern Introduction
(
Cambridge University Press
,
2013
).
35.
J. J.
Phillips
and
D.
Zgid
,
J. Chem. Phys.
140
,
241101
(
2014
).
36.
A. A.
Kananenka
,
J. J.
Phillips
, and
D.
Zgid
,
J. Chem. Theory Comput.
12
,
564
(
2016
).
37.
A. A.
Rusakov
and
D.
Zgid
,
J. Chem. Phys.
144
,
054106
(
2016
).
38.
N. E.
Dahlen
and
R.
van Leeuwen
,
J. Chem. Phys.
122
,
164102
(
2005
).
39.
A. R.
Welden
,
J. J.
Phillips
, and
D.
Zgid
, arXiv:1505.05575 (
2015
).
40.
D.
Neuhauser
,
R.
Baer
, and
D.
Zgid
,
J. Chem. Theory Comput.
13
,
5396
(
2017
).
41.
T. Y.
Takeshita
,
W.
Dou
,
D. G. A.
Smith
,
W. A.
de Jong
,
R.
Baer
,
D.
Neuhauser
, and
E.
Rabani
,
J. Chem. Phys.
151
,
044114
(
2019
).
42.
W.
Dou
,
T. Y.
Takeshita
,
M.
Chen
,
R.
Baer
,
D.
Neuhauser
, and
E.
Rabani
,
J. Chem. Theory Comput.
15
,
6703
(
2019
).
43.
T. Y.
Takeshita
,
W. A.
de Jong
,
D.
Neuhauser
,
R.
Baer
, and
E.
Rabani
,
J. Chem. Theory Comput.
13
,
4605
(
2017
).
44.
J. L.
Whitten
,
J. Chem. Phys.
58
,
4496
(
1973
).
45.
B. I.
Dunlap
,
J. Chem. Phys.
78
,
3140
(
1983
).
46.
B. I.
Dunlap
,
J. W. D.
Connolly
, and
J. R.
Sabin
,
J. Chem. Phys.
71
,
3396
(
1979
).
47.
O.
Vahtras
,
J.
Almlöf
, and
M. W.
Feyereisen
,
Chem. Phys. Lett.
213
,
514
(
1993
).
48.
M.
Feyereisen
,
G.
Fitzgerald
, and
A.
Komornicki
,
Chem. Phys. Lett.
208
,
359
(
1993
).
49.
M.
Chen
,
R.
Baer
,
D.
Neuhauser
, and
E.
Rabani
,
J. Chem. Phys.
150
,
034106
(
2019
).
50.
E.
Arnon
,
E.
Rabani
,
D.
Neuhauser
, and
R.
Baer
,
J. Chem. Phys.
152
,
161103
(
2020
).
51.
Y.
Cytter
,
E.
Rabani
,
D.
Neuhauser
, and
R.
Baer
,
Phys. Rev. B
97
,
115207
(
2018
).
52.
Q.
Ge
,
Y.
Gao
,
R.
Baer
,
E.
Rabani
, and
D.
Neuhauser
,
J. Phys. Chem. Lett.
5
,
185
(
2013
).
53.
Y.
Gao
,
D.
Neuhauser
,
R.
Baer
, and
E.
Rabani
,
J. Chem. Phys.
142
,
034106
(
2015
).
54.

Due to the steep scaling of O(N5), we were reluctant to compute the correlation energies deterministically for N > 100.