We explore the use of the stochastic resolution-of-the-identity (sRI) with the phaseless auxiliary-field quantum Monte Carlo (ph-AFQMC) method. sRI is combined with four existing local energy evaluation strategies in ph-AFQMC, namely, (1) the half-rotated electron repulsion integral tensor (HR), (2) Cholesky decomposition (CD), (3) tensor hypercontraction (THC), or (4) low-rank factorization (LR). We demonstrate that HR–sRI achieves no scaling reduction, CD–sRI scales as O(N3), and THC–sRI and LR–sRI scale as O(N2), albeit with a potentially large prefactor. Furthermore, the walker-specific extra memory requirement in CD is reduced from O(N3) to O(N2) with sRI, while sRI-based THC and LR algorithms lead to a reduction from O(N2) extra memory to O(N). Based on numerical results for one-dimensional hydrogen chains and water clusters, we demonstrated that, along with the use of a variance reduction technique, CD–sRI achieves cubic-scaling without overhead. In particular, we find that for the systems studied, the observed scaling of standard CD is O(N34), while for CD–sRI, it is reduced to O(N23). Once a memory bottleneck is reached, we expect THC–sRI and LR–sRI to be preferred methods due to their quadratic-scaling memory requirements and their quadratic-scaling of the local energy evaluation (with a potentially large prefactor). The theoretical framework developed here should facilitate large-scale ph-AFQMC applications that were previously difficult or impossible to carry out with standard computational resources.

The accurate ab initio simulation of the ground state properties of molecules and solids is essential for the understanding of major swaths of chemistry and physics. All known techniques for the reliable calculation of electronic structure are challenged by large systems containing strongly interacting electrons. Here, mean field approaches fail, and the exponential scaling of a brute force solution of the Schrödinger equation renders the description of all but the smallest systems impossible. A large array of powerful methods have been developed, which have been successfully employed in the study of large correlated systems.1–4 Among the many available electronic structure methodologies with which to attack such systems, quantum Monte Carlo (QMC) stands out as a unique tool for the calculation of ground state energies due to its attractive combination of scalability and accuracy.2 

While there are many flavors of QMC,2,5–7 our focus in this work centers on the auxiliary-field QMC (AFQMC) approach.8–10 AFQMC is a variant of projector QMC, which is formulated in a second-quantized determinant space. The unbiased version of AFQMC, often referred to as free-projection AFQMC, is exact, in principle, but has an exponential scaling with system size due to noise growth caused by the fermionic sign (or phase) problem.10 Zhang and co-workers developed the phaseless approximation to AFQMC (ph-AFQMC), which has become a practical and accurate tool for the ab initio simulation of molecules and materials. In ph-AFQMC, the phase of walker wavefunctions during imaginary-time propagation is constrained by the a priori chosen trial wavefunction, |ΨT⟩. While this approach is biased, it can yield answers that systematically approach the exact solution either by the release of the constraint11 or via the choice of progressively more sophisticated trial functions.12 

In standard AFQMC simulations, walker propagation scales as O(OM2) where O is the number of electrons and M is the number of single-particle basis functions. Such a cubic-scaling propagation per sample makes ph-AFQMC attractive for general ab initio problems. While there have been many successful ph-AFQMC applications to ab initio problems,13–27 there are several remaining challenges that still need to be addressed before ph-AFQMC can become a universal tool for the study of large scale correlated electronic structure problems. The challenge that we address in this work is the steep cost of the local energy evaluation that is necessary for estimating the ph-AFQMC energy. The local energy of a walker with a wavefunction |ψ⟩ is given by

EL[ψ]=ΨT|Ĥ|ψΨT|ψ.
(1)

The cubic-scaling of the walker propagation mentioned above is asymptotically irrelevant in standard ph-AFQMC calculations because the local energy evaluation scales quadratically with system size without exploiting the sparsity or low-rank (LR) structure. This quartic complexity arises from the two-electron repulsion integral (ERI) tensor,

(μν|λσ)=dr1dr2ϕμ*(r1)ϕν(r1)ϕλ*(r2)ϕσ(r2)|r1r2|,
(2)

where {ϕμ} are the underlying single-particle basis functions. In ph-AFQMC, this integral is most commonly factorized via the Cholesky decomposition (CD) into

(μν|λσ)=PLμνP(LσλP)*,
(3)

where L denotes a Cholesky vector. Even with CD, the evaluation of the local energy remains a quartic-scaling task in ph-AFQMC. For systems with translational symmetry, it is possible to work with plane waves and achieve cubic-scaling with15 or without21 fast Fourier transform. In this work, however, we focus on general basis functions that do not exploit such symmetry.

We mention two notable previous approaches that may be used to address this problem. The first is the tensor hypercontraction decomposition (THC)28–30 proposed by Martínez, Sherill, and co-workers to factorize the ERI tensor into

(μν|λσ)=P^Q^(ημP^)*ηνP^MP^Q^(ηλQ^)*ησQ^.
(4)

Malone et al. successfully applied the THC factorization of the ERI tensor to ph-AFQMC simulations and showed that the local energy evaluation can be brought down to O(cTHC2OM2) where the value of cTHC was found to be ∼8 for accurate local energy evaluation.26 While the improved asymptotic scaling is satisfying, the resulting algorithm has a steep overhead such that the actual crossover between the conventional algorithm and the THC variant occurs for very large system sizes in practice.26,31

An alternative approach has been proposed by Motta et al.32 In this formulation, the nested diagonalization of Cholesky vectors proposed by Peng and Kowalski33 is used to exploit the underlying low-rank structure of the ERI tensor. We refer to this factorization as the low-rank (LR) factorization. In the LR factorization, one writes

(μν|λσ)=αβ(XμαP)*UναP(XλβP)*UσβP,
(5)

which arises from

LμνP=α(XμαP)*UναP.
(6)

This factorization achieves an asymptotically cubic-scaling local energy evaluation due to the fact that the number of terms in the summation over α in Eq. (6) is limited to O(logN), where N is the system size. Again, due to a large overhead, this asymptotic scaling will generally not be achieved until the system reaches ≳1000 electrons.32 

The THC and LR factorization schemes enable a cubic-scaling local energy evaluation algorithm, but the overhead associated with both approaches is large enough that the actual efficiency crossover will generally not be observed for medium-sized molecules without obvious sparsity or low-rank structure. We note, however, that these algorithms offer quadratic scaling memory requirements, which results in a reduction of the usual cubic to quartic memory requirement of standard ph-AFQMC. In fact, such a reduction in the memory cost may be the biggest benefit gained from using the THC and LR factorizations. We will return to a discussion of the memory reduction afforded by these techniques before concluding.

Given the above facts, it is clear that there is a need for an overhead-free cubic-scaling algorithm, which can accelerate ab initio ph-AFQMC simulations for medium-sized systems. Based on the earlier work of Baer and Neuhauser,34 Neuhauser and co-workers developed the use of stochastic orbitals as a means of reducing scaling in a variety of electronic structure methodologies, including density functional theory (DFT),35–37 time-dependent DFT (TDDFT),38 second-order Møller–Plesset perturbation theory (MP2),39–41 second-order perturbative Green’s functions (GF2),42–44 the random phase approximation,45 GW,46 and the Bethe–Salpeter equation.46 

Drawing from this body of work, we explore the combined use of the stochastic resolution-of-the-identity (sRI) formulation of Takeshita et al.41 with ph-AFQMC. The central idea underlying sRI is a stochastic resolution of the Coulomb operator, which was first developed in Ref. 36. We will demonstrate that sRI can be naturally incorporated into ph-AFQMC and used to reduce the formal scaling of the local energy evaluation to cubic-scaling when combined with CD and to quadratic-scaling when combined with THC or LR factorization. Most notably, we will numerically demonstrate that the combined use of Cholesky decomposition and the stochastic resolution of the identity (CD–sRI) with a simple variance reduction (VR) technique allows for a cubic-scaling algorithm without overhead. Given this attractive feature, we expect the CD–sRI approach to become the standard method for the computation of such quantities for medium-sized systems and THC–sRI and LC–sRI formulations to be the preferred approach for very large ones.

This paper is organized as follows: (1) we give a brief review of the ph-AFQMC algorithm, (2) we analyze the formal scaling of the existing local energy evaluation strategies, (3) we examine scaling reduction in these strategies when combined with sRI, and (4) we present numerical examples (hydrogen chains and water clusters) to show speed-up and scaling reduction of the CD approach with sRI.

To set precise notation for our discussion of scaling, in the following, we use O to denote the number of occupied orbitals, M to denote the number of single-particle basis functions, X to denote the number of Cholesky vectors (or auxiliary basis functions), and N to denote the system size in general.

We briefly review the basic algorithmic details of AFQMC here and refer interested readers to Ref. 47 for a more extended review of modern AFQMC development. We start from an ab initio Hamiltonian Ĥ given by

Ĥ=pqhpqâpâq+12pqrs(pr|qs)âpâqâsârĤ1+Ĥ2,
(7)

where Ĥ1 and Ĥ2 are the one-body and two-body part of Ĥ, respectively. As with other projector QMC methods, AFQMC starts from

|Ψ0limτeτĤ|Φ0,
(8)

where Ĥ is the Hamiltonian, τ denotes imaginary time, |Ψ0⟩ is the exact ground state, and |Φ0⟩ can be any wavefunction with a non-zero overlap with the true ground state. A key feature of AFQMC is that one employs the Hubbard–Stratonovich transformation along with the Trotter approximation to simplify the many-body propagator in Eq. (8) and reduce the problem to that of a series of one-body problems in a fluctuating auxiliary field. With the Trotter approximation for a given time step Δτ, the propagator is expressed as

eΔτĤ=eΔτ2Ĥ1eΔτĤ2eΔτ2Ĥ1+O(Δτ2).
(9)

The Hubbard–Stratonovich transformation allows for the rewriting of the two-body propagator as an integration over auxiliary fields x,

eΔτĤ2=dxp(x)eΔτxv^,
(10)

where the one-body operator v^ is obtained from

Ĥ2=12PXv^P2.
(11)

Equation (11) is usually achieved via the CD of the ERI tensor. Alternatively, density-fitting can be used. With this decomposition, the total propagator for given fields x and a given time step Δτ is written as

B^(Δτ,x)=eΔτ2Ĥ1eΔτxv^eΔτ2Ĥ1.
(12)

By the virtue of the Thouless theorem,48,49 the application of this propagator to a single determinant remains a single determinant. This allows for an efficient walker propagation in AFQMC.

One can reduce statistical fluctuations greatly by employing a mean-field subtraction,47 which is closely related to the shifted contour technique developed by Rom and co-workers.50 The mean-field subtraction technique redefines the one-body and two-body using the expectation value of v^P with a trial wavefunction,

v^PTΨT|v^P|ΨT.
(13)

We then define

Ĥ1=Ĥ1PXv^Pv^PT+12PXv^PT2,
(14)
Ĥ2=12PX(v^Pv^PT)2,
(15)

which maintains Ĥ = Ĥ1 + Ĥ2. With this, the propagator in Eq. (12) uses Ĥ1 and v^v^T instead of Ĥ1 and v^. This simple subtraction has been shown to be effective in reducing statistical fluctuations,47,50 and this is what is used throughout this work.

The imaginary-time equation-of-motion is dictated by the propagator in Eq. (12). With importance sampling via a trial wavefunction |ΨT⟩, we write the global AFQMC wavefunction at imaginary-time τ as

|Ψ(τ)=iNwwi(τ)|ψi(τ)ΨT|ψi(τ),
(16)

where Nw is the number of walkers. Within the phaseless approximation,10 we update the ith walker weight and determinant via

|ψi(τ+Δτ)=B^(Δτ,xix¯i)|ψi(τ),
(17)
wi(τ+Δτ)=Iph(xi,x¯i,τ,Δτ)×wi(τ),
(18)

where the force-bias, x¯i, is defined as

x¯i(Δτ,τ)=ΔτΨT|v^v^T|ψi(τ)ΨT|ψi(τ).
(19)

The phaseless importance function is defined as

Iph(xi,x¯i,τ,Δτ)=|I(xi,x¯i,τ,Δτ)|×max(0,cos(θi(τ))),
(20)

with the hybrid importance function given by

I(xi,x¯i,τ,Δτ)=Si(τ,Δτ)exix¯ix¯ix¯i/2,
(21)

the overlap ratio of the ith walker Si given by

Si(τ,Δτ)=ΨT|B^(Δτ,xix¯i)|ψi(τ)ΨT|ψi(τ),
(22)

and the phase θi(τ) given by

θi(τ)=argSi(τ,Δτ).
(23)

This completes our compact description of the ph-AFQMC algorithm.

In practical AFQMC calculations, one usually targets the ground state energy estimated via

E=iwiEL[ψi]iwi,
(24)

where i indexes the ith walker, wi is the weight of the ith walker, and EL[ψi] is the local energy of the ith walker, as defined in Eq. (1). Without any approximations, the two-body contribution to the local energy is, via Wick’s theorem, given by

EL[2][ψ]=12pqrs(pr|qs)(GprGqsGpsGqr),
(25)

where the Green’s function G is a walker-dependent quantity defined as

Gpr=ΨT|âpâr|ψΨT|ψ=(Θ(CT))rp.
(26)

Here, CT is the molecular orbital (MO) coefficient matrix for the trial wavefunction and Θ is defined as

Θ=C(CTC)1,
(27)

with C being the MO coefficient matrix for the walker determinant |ψ⟩.

Without exploiting any structure in the ERI tensor, the local energy evaluation scales quadratically with system size. For instance, a standard way to evaluate Eq. (1) is to use the so-called “half-rotated” (HR) ERI tensor,14,26

EL,HR[2][ψ]=12ijrs(ir|js)(ΘriΘsjΘsiΘrj)
(28)

while storing (ir| js) in memory. This algorithm requires O(O2M2) memory and scales as O(O2M2). Each walker additionally requires O(O2M) memory to save intermediates that appear when evaluating Eq. (28). Furthermore, the formation of (ir| js) at the beginning of the QMC run scales as O(OM4), which is more expensive than the local energy evaluation, although it needs to be performed only once. Despite these relatively steep scaling behaviors, HR is perhaps the most widely used local energy algorithm when the memory cost is affordable. We refer this algorithm to as the “half-rotated” (HR) local energy evaluation.

Another standard evaluation procedure is to work with density-fitted integrals or Cholesky vectors directly. This may be achieved by using Eq. (3) within the local energy expression, yielding

EL,CD[2][ψ]=12ijrsP(LirP(LsjP)*)(ΘriΘsjΘsiΘrj),
(29)

which scales as O(O2MX). In general, X is 4–5 times larger than M, and therefore, Eq. (29) is slower than Eq. (28). However, Eq. (29) requires only O(OMX) memory storage, which makes it better suited for large-scale simulations. To save intermediates, each walker additionally requires O(O2X) memory. We refer this algorithm to as the “Cholesky decomposition” (CD) local energy evaluation.

Finally, we note that some recent advances in integral factorization techniques have allowed for a cubic-scaling local energy evaluation with a quadratic memory storage.26 We first discuss the THC strategy28–30 invented by Sherill, Martínez and, co-workers, which was applied to AFQMC by Malone et al.26 Specifically, we use Eq. (4) within the local energy evaluation, yielding

EL,THC[2][ψ]=12ijrsP^Q^(ηiP^)*ηrP^MP^Q^(ηjQ^)*ηsQ^(ΘriΘsjΘsiΘrj).
(30)

All of the tensor contractions in Eq. (30) are simple matrix–matrix multiplications, leading to cubic scaling with system size. More precisely, the local energy calculation scales as O(cTHC2OM2), where we define the THC rank to be cTHCM. The memory requirement is set by storing ηpP^ and M that have only a quadratic number of non-zero values. Furthermore, each walker additionally requires O(cTHC2M2) memory to save intermediates that naturally appear in Eq. (30). While these asymptotic properties are highly attractive, the THC algorithm has been shown to have a large prefactor compared to Eqs. (28) and (29) because cTHC can be quite large in practice.26 For example, cTHC was found to be ∼8 for the accurate calculation of the local energy of diamond within the double-zeta basis.

Similarly, for the LR factorization of Motta et al.,32 we use Eq. (6) with the AFQMC local energy expression, yielding

EL,LR[2][ψ]=12ijrsαβ(XiαP)*UrαP(XjβP)*UsβP(ΘriΘsjΘsiΘrj),
(31)

which scales as O(nrOMX) where the rank nr sets the upper limit on the summation over α and β and scales like log(N). Note that this approach also effectively achieves a cubic-scaling local energy evaluation algorithm. The memory requirement is O(nrOX+nrMX), and each walker additionally requires O(nrOX) to save intermediates. However, as with the use of THC, nr is usually large, and therefore, for practical applications, useful cubic-scaling may not be observed,32 and only memory saving may be practically useful.

We follow the explication of the stochastic resolution-of-the-identity (sRI) technique proposed by Takeshita et al.,41 which has been shown to lower the scaling of MP2 (RI-MP2)41 and the second order Green’s function method (RI-GF2).43,44 sRI is based on the simple mathematical observation that one may represent the Kronecker delta function with stochastic functions as

δαβ=limNξ1Nξξ=1Nξθαξθβξ,
(32)

where Nξ is the number of stochastic samples and θξ is an sRI basis function whose entry is randomly chosen to be ±1. In practice, we will limit the number of samples to a finite number, which does not scale with system size. This restriction has been shown to be sufficient to achieve a fixed statistical error per particle in MP2 and GF2. The key feature of this approach is that it does not assume any structure (either sparsity or low-rank) in the underlying ERI tensor while still reducing the cost. Due to this fact, the overhead of this approach is almost negligible, especially when looking at size-intensive quantities for which ph-AFQMC applications are also well-suited.

We emphasize that the scaling of a QMC algorithm should not be discussed without the assessment of the underlying statistical error. For instance, one may argue that propagation within ph-AFQMC scales cubically, but the number of samples associated with a fixed statistical error scales linearly with system size. Then, this makes the ph-AFQMC propagation scale quadratically with system size for a fixed statistical error. The usual cubic scaling of the ph-AFQMC propagation quoted in the literature21 is the scaling for a fixed statistical error per particle. Similarly, the lower scaling of sRI-MP2 and sRI-GF2 approaches cited above is only justified when considering observables for a fixed statistical error per particle. This makes the combination of these two methods very natural. We also note that it is still possible to obtain the total energies for a fixed statistical error in both the ph-AFQMC and sRI methods, albeit with an increased cost.

1. The half-rotated sRI (HR–sRI) algorithm

We first re-write Eq. (28) with two Kronecker deltas,

EL,HR[2][ψ]=12ijrspq(ir|js)δrpδsq(ΘpiΘqjΘqiΘpj).
(33)

Next, one may employ two sets of stochastic orbitals to represent these Kronecker deltas and write the local energy as

EL,HR–sRI[2][ψ],=12Nξ2ijξξ(iξ|jξ)ΘξiΘξiΘξiΘξi,
(34)

where

(iξ|jξ)=rsθrξθsξ(ir|js)
(35)

and

Θξi=rθrξΘri.
(36)

The formation of Eq. (35) is the bottleneck, scaling as O(NξO2M2). The additional memory usage of each walker scales as O(NξO2M). Thus, there is no reason to employ this algorithm as the asymptotic scaling is not improved by sRI and no memory saving is obtained.

2. The Cholesky decomposition sRI (CD–sRI) algorithm

We next insert Eq. (32) into Eq. (29) and show that the scaling of the local energy evaluation is reduced from quartic to cubic. The combination of the sRI expression to the CD-based local energy algorithm is most natural in the auxiliary basis function space. Specifically,

P(LirP(LsjP)*)=PQ(LirPδPQ(LsjQ)*)=1NξξPθPξLirPQθQξ(LsjQ)*.
(37)

We further define an intermediate tensor R,

RirξPθPξLirP.
(38)

The formation of R scales as O(NξOMX), which is cubic-scaling for a size-intensive quantity Nξ. The Coulomb term in Eq. (29) is thus expressed as

EJ,CD[ψ]=12ijrsP(LirP(LsjP)*)ΘriΘsj,
(39)

which scales as O(OMX). The extra memory cost for storing walker-specific intermediates scales as O(X). Using R,

EJ,CD–sRI[ψ]=12Nξijrsξ(Rirξ(Rsjξ)*)ΘriΘsj=1NξξEJ,CD–sRIξ[ψ],
(40)

where

EJ,CD–sRIξ[ψ]12ijrs(Rirξ(Rsjξ)*)ΘriΘsj.
(41)

The summation over P is the term now replaced by the summation over ξ, which lowers the scaling from O(OMX) to O(OMNξ). The walker-specific extra memory cost is reduced from O(X) to O(Nξ). The exchange term in Eq. (29) is

EK,CD[ψ]=12ijrsP(LirP(LsjP)*)ΘsiΘrj
(42)

and scales as O(O2MX) with O(O2X) walker-specific extra memory cost. This term now becomes

EK,CD–sRI[ψ]=12Nξijrsξ(Rirξ(Rsjξ)*)ΘsiΘrj=1NξξEK,CD–sRIξ[ψ],
(43)

where

EK,CD–sRIξ[ψ]12ijrs(Rirξ(Rsjξ)*)ΘsiΘrj,
(44)

which scales as O(O2MNξ). Its walker-specific memory cost scales as O(NξO2). This completes the demonstration of a scaling reduction of Eq. (29) to cubic scaling. In summary, the asymptotic scaling of the CD–sRI algorithm is O(NξOMX+NξO2M), which includes the cost of the formation of R as well. The additional memory cost due to storing R and other intermediates scales as O(NξOM+NξO2). This is also an improvement over the conventional CD algorithm.

3. Tensor hypercontraction sRI (THC–sRI) algorithm

We next apply sRI to Eq. (30) and show that the overall scaling can be reduced to quadratic. With the use of two sets of sRI insertions, we write

EL,THC–sRI[2][ψ]=1Nξ2ξξEL,THC–sRIξξ[ψ],
(45)

with

EL,THC–sRIξξ[ψ]=12ijξξP^Q^(ηiP^)*ηξP^MP^Q^(ηjQ^)*ηξQ^×(ΘξiΘξjΘξiΘξj),
(46)

where we define

ηξP^=rηrP^θrξ.
(47)

With a proper set of contractions, one can show that this expression scales as O(NξcTHCM2+NξOM+NξcTHC2M2). The additional walker memory usage due to storing intermediates comes at a cost of O(NξcTHCM+Nξ2cTHCM). THC–sRI improves both scaling and memory usage over the conventional THC algorithm.

4. Low-rank factorization sRI (LR–sRI) algorithm

The same conclusion can be deduced for the LR–sRI algorithm by employing two sets of sRI expressions. With sRI, one may write Eq. (31) as

EL,LR–sRI[2][ψ]=1Nξ2ξξEL,LR–sRIξξ[ψ],
(48)

with

EL,LR–sRIξξ[ψ]=12ijαβ(XiαP)*UξαP(XjβP)*UξβP×(ΘξiΘξjΘξiΘξj).
(49)

Here, we have defined

UξαP=rUrαPθrξ.
(50)

With an appropriate series of contractions, it can be shown that this local energy evaluation scales as O(NξnrMX+NξOM+NξnrOX). The extra memory required for storing walker-specific intermediates is O(NξnrX). As with THC–sRI, we observe an improvement using LR–sRI in both scaling and memory usage over the LR algorithm.

The implementation of the sRI local energy estimator can be achieved with only minor modifications to the existing AFQMC programs. There are two stochastic samplings to complete the evaluation of the sRI global energy: one is the standard AFQMC walker local energy sampling for each walker in Eq. (24) and the other is the sRI stochastic orbital sampling in Eq. (32). A simple algorithm for this is as follows:

  1. Each walker samples a set (or two sets depending on the choice of algorithms) of Nξ sRI orbitals θ.

  2. Evaluate the corresponding local energy expression derived in Sec. II C.

  3. The global mixed estimator energy is then estimated in the standard manner as in Eq. (24) as before.

Minimizing Nξ is critical in order to achieve the aforementioned lower scaling with as little overhead as possible. Fortunately, within the QMC setup, it is natural to set Nξ = 1 per sample because the representation power of sampled sRI orbitals increases as stochastic samples are accumulated throughout the imaginary-time propagation. In other words, at each time step, each walker samples one sRI orbital, and the global estimator is obtained by stochastically averaging over many such samples. In the limit of infinite statistical sampling, this algorithm converges to the exact ph-AFQMC energy (i.e., the one without sRI).

The use of a variance reduction (VR) technique51,52 for sRI-ph-AFQMC has been found to be very effective. For any of the algorithms outlined in this work, we write

EL[2][ψ]=EL[2][ψT]+ξξ(ELξξ[ψ]ELξξ[ψT]),
(51)

where ψT is the trial wavefunction. As suggested by this equation, we compute the “correlation” contribution to the local energy via sRI and the mean-field energy is reused. By expressing the energy in this form, the statistical fluctuations are found to be greatly reduced. This technique is often referred to as the “control variate” approach in Monte Carlo.51,52 We note that the efficacy of the control variate approach in this work will ultimately depend on the quality of the trial wavefunction as well. As long as the trial wavefunction is of good quality, the variance reduction will be effective.

We have discussed a total of four local energy evaluation strategies and how the sRI can reduce the scaling of three of them. We summarize this in Table I. With the sRI, we have formulated one cubic-scaling (CD–sRI) and two quadratic-scaling (THC–sRI and LR–sRI) algorithms.

TABLE I.

Computational scaling of different algorithms for evaluating the two-body contribution to the energy as in Eq. (25). Used acronyms are HR, half-rotated scheme; CD, Cholesky decomposition scheme; THC, tensor hypercontraction approach; and LR, low-rank factorization approach. This scaling does not include computations that occur only once at the beginning of a QMC run.

HRCDTHCLR
Conventional O(O2M2) O(O2MX) vb O(cTHC2OM2) O(nrOMX) 
sRI O(NξO2M2) ONξOMX+NξO2M ONξcTHCM2+NξOM+NξcTHC2M2 ONξnrMX+NξOM+NξnrOX 
Leading speedup None O/Nξ O/Nξ O/Nξ 
HRCDTHCLR
Conventional O(O2M2) O(O2MX) vb O(cTHC2OM2) O(nrOMX) 
sRI O(NξO2M2) ONξOMX+NξO2M ONξcTHCM2+NξOM+NξcTHC2M2 ONξnrMX+NξOM+NξnrOX 
Leading speedup None O/Nξ O/Nξ O/Nξ 

For each algorithm, we have also discussed the additional memory requirement of each walker for storing intermediates. This is summarized in Table II. With sRI, we have one algorithm (CD–sRI) that requires additional quadratic-scaling memory and two algorithms that require additional linear-scaling (THC–sRI and LR–sRI) memory. These are all improvements over their conventional counterparts.

TABLE II.

Additional walker-specific memory requirement of different algorithms for evaluating the two-body contribution to the energy as in Eq. (25). Used acronyms are HR, half-rotated scheme; CD, Cholesky decomposition scheme; THC, tensor hypercontraction approach; and LR, low-rank factorization approach. This does not include memory usage for storing shared tensors across all of the walkers.

HRCDTHCLR
Conventional O(O2M) O(O2X) O(cTHC2M2) O(nrOX+nrMX) 
sRI O(NξO2M) O(NξOM+NξO2) ONξcTHCM+Nξ2cTHCM O(NξnrX) 
Leading saving None OX/(NξMcTHCM/Nξ2 M/Nξ 
HRCDTHCLR
Conventional O(O2M) O(O2X) O(cTHC2M2) O(nrOX+nrMX) 
sRI O(NξO2M) O(NξOM+NξO2) ONξcTHCM+Nξ2cTHCM O(NξnrX) 
Leading saving None OX/(NξMcTHCM/Nξ2 M/Nξ 

Among these algorithms, we choose CD–sRI to demonstrate the behavior of sRI-ph-AFQMC methods due to its simple implementation, reasonable leading speedup of O/Nξ, and leading walker-specific memory saving of O(OX/(NξM)). The implementation of THC–sRI and LR–sRI is straightforward if THC and LR are already implemented. However, in PAUXY,53 these are currently unavailable, which led us to choosing CD–sRI for the purpose of demonstration. Moreover, we apply the sRI only to the exchange contribution (i.e., EK) because the Coulomb contribution (i.e., EJ) can be evaluated at cubic cost within the standard algorithm. More specifically, we write (with the VR technique)

EL,CD–sRI[2][ψ]=EJ,CD[ψ]+EK,CD[ψT]+(EK,CD–sRI[ψ]EK,CD–sRI[ψT]),
(52)

where EK,CD[ψT] is evaluated only once in the beginning and used throughout the simulation. Since EK,CD–sRI is evaluated twice (once for each of ψ and ψT), this adds an additional prefactor of 2. This prefactor increase is negligible compared to the variance reduction that we gain, as we shall see. The resulting algorithm is overall cubic-scaling per sample.

All calculations were performed with an open-source python-based AFQMC program called PAUXY.53 The pair-branching algorithm was employed for the population control.54 The time step of 0.005 a.u. was used in all computations. We set Nξ = 1 throughout all sRI calculations. Increasing Nξ provided almost no improvement in statistical efficiency for systems considered in this work. This enables a speedup of the order of O (the number of electrons) compared to the conventional CD algorithm. We use the cc-pVDZ55 basis set in all examples presented below. We used restricted Hartree–Fock as a trial wavefunction throughout because systems considered in this work are dominated mainly by dynamical correlation. The truncation threshold for the Cholesky decomposition was set to 10−5. A total of 160 walkers were used in every calculation presented below. QMCPACK56,57 was used to cross-check the total energies reported in this work along with the underlying population bias associated with the pair-branching algorithm. The reported total energies have negligible population bias and time step error.

For numerical examples, we picked a series of one-dimensional (1D) hydrogen chains (H-chains) and water clusters, which clearly illustrates the following:

  1. The VR technique discussed in Sec. II D is very effective. The VR was found to be so effective with restricted Hartree–Fock trial wavefunctions for systems considered here, which allowed for a very small number of stochastic orbitals to sample (i.e., Nξ = 1).

  2. The overhead of using CD–sRI with VR for computing ph-AFQMC total energies as opposed to using CD is negligible, unlike for the THC and LR approaches.

  3. CD–sRI is a cubic-scaling local energy algorithm if implemented correctly (i.e., with a proper tensor contraction ordering).

(1) and (2) are rather difficult to show analytically, and therefore, we provide numerical examples to support our claims. While (3) was formally shown in Sec. II C 2, we provide timing benchmarks to bolster support for this analysis. For 1D H-chains, the inter-hydrogen distance was fixed to be 1.6 bohr, and the water cluster geometries were taken from Ref. 31.

We first focus on the effect of the VR technique discussed in Eq. (52). In Fig. 1, we present the energy per H atom as a function of imaginary time. In both H10 and H40, we observe the same trend. Without VR, CD–sRI fluctuates significantly around the CD results, while CD–sRI with VR behaves statistically very similarly to CD. These results indicate that the overhead of CD–sRI with VR is almost negligible, which becomes more evident when looking at statistically averaged total energies.

FIG. 1.

An example of imaginary-time propagation trajectories (energy per H atom as a function of imaginary time) for (a) H10 and (b) H40 with the cc-pVDZ basis. The energy estimator was evaluated via three different algorithms.

FIG. 1.

An example of imaginary-time propagation trajectories (energy per H atom as a function of imaginary time) for (a) H10 and (b) H40 with the cc-pVDZ basis. The energy estimator was evaluated via three different algorithms.

Close modal

In Table III, we compare CD and CD–sRI with VR for the total energies of H-chains. The difference in total energies between two algorithms is statistically insignificant since they are all within error bars. This is not surprising since CD–sRI is not expected to introduce any biases into the estimator and should recover the CD result with enough samples. Furthermore, the magnitude of statistical error is similar when comparing the two algorithms for a similar number of statistical samples.

TABLE III.

The total energy (Eh) of H-chains for N = 10, 20, 40, 80 using CD and CD–sRI with VR. The number of statistical samples Nsample is also given.

CDCD–sRI with VR
NENsampleENsample
10 −5.571(1) 4 000 −5.5696(9) 4 000 
20 −11.0990(6) 21 506 −11.0985(6) 22 191 
40 −22.1612(7) 48 127 −22.1610(9) 27 942 
80 −44.2895(7) 138 429 −44.2890(8) 91 234 
CDCD–sRI with VR
NENsampleENsample
10 −5.571(1) 4 000 −5.5696(9) 4 000 
20 −11.0990(6) 21 506 −11.0985(6) 22 191 
40 −22.1612(7) 48 127 −22.1610(9) 27 942 
80 −44.2895(7) 138 429 −44.2890(8) 91 234 

We have also carried out similar numerical experiments for water clusters, (H2O)N, with N = 2, 4, 8. A representative imaginary-time propagation trajectory is given in Fig. 2. The conclusions drawn from the H-chain results hold for these cases as well. With VR, the statistical fluctuations of CD–sRI are close enough to CD. This is true for all values of N = 2, 4, 8, and we expect this to hold for larger clusters. Finally, we present the absolute energies of (H2O)N computed via CD and CD–sRI in Table IV. The total energies from two different algorithms lie within their respective error bars. Similarly to Table III, two methods exhibit comparable magnitude of error bars, given a similar number of statistical samples. This strongly suggests that CD–sRI with VR performs as well as CD without overhead.

FIG. 2.

An example of imaginary-time propagation trajectories (energy per H2O as a function of imaginary time) for (a) (H2O)2 and (b) (H2O)8 with the cc-pVDZ basis. The energy estimator was evaluated via three different algorithms. The number of walkers was 160 in all data.

FIG. 2.

An example of imaginary-time propagation trajectories (energy per H2O as a function of imaginary time) for (a) (H2O)2 and (b) (H2O)8 with the cc-pVDZ basis. The energy estimator was evaluated via three different algorithms. The number of walkers was 160 in all data.

Close modal
TABLE IV.

The total energy (Eh) of water clusters (H2O)N for N = 2, 4, 8 using CD and CD–sRI with VR. The number of statistical samples Nsample is also given.

CDCD–sRI with VR
NENsampleENsample
−152.4987(9) 16 000 −152.4976(9) 16 000 
−305.033(1) 16 000 −305.033(1) 16 000 
−609.989(1) 49 855 −609.9905(9) 69 596 
CDCD–sRI with VR
NENsampleENsample
−152.4987(9) 16 000 −152.4976(9) 16 000 
−305.033(1) 16 000 −305.033(1) 16 000 
−609.989(1) 49 855 −609.9905(9) 69 596 

We finish our discussion with timing benchmarks of walker memory saving of H-chains and water clusters. For a consistent increase in the number of Cholesky vectors, we employ the density-fitting approximation instead. The auxiliary basis set used here is that of Weigend et al.58 The timing benchmark here was obtained from an Intel(R) Xeon(R) Platinum 8268 central processing unit (CPU) 2.90 GHz processor, and only a single thread was used. Furthermore, a single local energy evaluation of a single walker is considered for the purpose of demonstration. The timing results are reported in Fig. 3. Panel (a) shows the results for 1D H-chains up to N = 300. The observed scaling for CD is O(N3.13), whereas it is O(N2.12) for CD–sRI (estimated with R2 greater than 0.99). For the obtained data points, there is no crossover between CD and CD–sRI. CD–sRI is always faster than CD, and there is no overhead for using CD–sRI compared to CD for a single sample. Similarly, panel (b) shows the timing benchmark for water clusters up to N = 64. The observed scaling for CD is O(N3.63), whereas it is O(N2.94) for CD–sRI (estimated with R2 greater than 0.999). The CD algorithm does not exhibit asymptotic scaling for these system sizes, which could be due to the efficient use of Basic Linear Algebra Subroutines (BLAS). CD–sRI appears to be quite close to its theoretically predicted asymptotic scaling. The memory results are obtained based on Table II and are shown in Fig. 4. Even for the smallest systems examined, CD–sRI requires an order of magnitude less memory for each walker. Due to the scaling differences between CD and CD–sRI, the difference in the memory usage becomes only larger as the system size increases. This highlights the utility of CD–sRI in reducing the additional memory usage for each walker.

FIG. 3.

Measured CPU timing for a single local energy evaluation of a single walker using CD and CD–sRI with VR in the case of (a) 1D H-chains (N = 10, 20, 40, 50, 80, 100, 150, 200, 300) and (b) water clusters (N = 2, 4, 8, 12, 16, 20, 32, 64). The slope of CD is 3.13 for (a) and 3.63 for (b), while the slope of CD–sRI is 2.12 for (a) and 2.94 for (b).

FIG. 3.

Measured CPU timing for a single local energy evaluation of a single walker using CD and CD–sRI with VR in the case of (a) 1D H-chains (N = 10, 20, 40, 50, 80, 100, 150, 200, 300) and (b) water clusters (N = 2, 4, 8, 12, 16, 20, 32, 64). The slope of CD is 3.13 for (a) and 3.63 for (b), while the slope of CD–sRI is 2.12 for (a) and 2.94 for (b).

Close modal
FIG. 4.

Measured additional walker-specific memory usage for a single local energy evaluation of a single walker using CD and CD–sRI with VR in the case of (a) 1D H-chains (N = 10, 20, 40, 50, 80, 100, 150, 200, 300) and (b) water clusters (N = 2, 4, 8, 12, 16, 20, 32, 64). The slope of CD is 3.0 for both (a) and (b), while the slope of CD–sRI is 2.0 for both (a) and (b).

FIG. 4.

Measured additional walker-specific memory usage for a single local energy evaluation of a single walker using CD and CD–sRI with VR in the case of (a) 1D H-chains (N = 10, 20, 40, 50, 80, 100, 150, 200, 300) and (b) water clusters (N = 2, 4, 8, 12, 16, 20, 32, 64). The slope of CD is 3.0 for both (a) and (b), while the slope of CD–sRI is 2.0 for both (a) and (b).

Close modal

In summary, based on simple benchmarks on 1D H-chains and water clusters, we have successfully demonstrated the utility of the CD–sRI algorithm. With the variance reduction technique introduced here, the CD–sRI algorithm achieves a cubic-scaling local energy evaluation without overhead.

In this work, we have shown how the stochastic resolution-of-the-identity (sRI) technique developed by Takeshita et al. can be seamlessly integrated with the local energy evaluation of phaseless auxiliary-field quantum Monte Carlo (ph-AFQMC). We considered four different existing local energy evaluation strategies: a half-rotated (HR) electron repulsion integral tensor approach, a Cholesky decomposition-based approach (CD), tensor hypercontraction (THC), and low-rank (LR) factorization approaches. We have carefully analyzed their formal scalings and discussed possible scaling reduction as well as walker-specific memory reduction when combined with sRI.

We found that HR–sRI achieves neither scaling reduction nor walker-specific memory reduction. It scales quadratically with system size like the original HR approach with the same memory usage. On the other hand, the CD approach, which formally scales quadratically with system size, can be reduced to cubic-scaling when combined with sRI (CD–sRI). Furthermore, the walker-specific memory usage is reduced to quadratic from cubic. Similarly, the cubic-scaling approaches, THC and LR, can be reduced to quadratic-scaling with sRI. The additional walker-specific memory requirement is also reduced from quadratic to linear. Without sRI, previously available algorithms using THC and LR achieve cubic scaling, but the overhead associated with them limits the applicability of these algorithms. Therefore, we have examined a new cubic-scaling algorithm, CD–sRI, with a particular focus on characterizing the overhead associated with the algorithm.

We applied the CD–sRI algorithm to one-dimensional hydrogen chains (1D H-chains) and water clusters. With a variance reduction (VR) technique developed for CD–sRI, CD–sRI with VR exhibits no overhead compared to CD up to H80 and up to (H2O)8. By no overhead, we mean that CD–sRI and CD exhibit a similarly large error bar for a similar number of statistical samples. Therefore, CD–sRI adds no additional cost for computing the total ph-AFQMC energy for a fixed statistical error. Furthermore, we have performed a timing benchmark for H-chains up to H300 and water clusters up to (H2O)64. We observed reduced scaling from CD–sRI compared to CD. Specifically, the observed scaling of CD was O(N3.13) for H-chains and O(N3.63) for water clusters, while we observed O(N2.12) for H-chains and O(N2.94) for water clusters in the case of CD–sRI. Based on these observations, we argue that the CD–sRI algorithm should be the standard cubic-scaling local energy algorithm with an expected generic speed-up compared to the conventional CD algorithm.

The remaining challenges for large-scale ph-AFQMC simulations are the memory bottleneck and the cubic-scaling walker propagation that occurs every time step. Both THC and LR approaches have successfully addressed the memory bottleneck26,32 as they require only quadratic-scaling storage for each walker. More theoretical development is needed for accelerating the walker propagation. With THC–sRI or LR–sRI, the local energy evaluation scales quadratically with system size. In these cases, the bottleneck for ph-AFQMC is the walker propagation. It has been shown that a significant speed-up of the propagation is possible by utilizing graphical processing units (GPUs).59,60 Furthermore, the significant reduction in the requirements for walker-specific extra memory will be particularly useful with GPUs since in this case, the available memory is highly limited. With sRI, the propagation is the bottleneck in AFQMC for all system sizes. Therefore, it will be interesting to explore ideas such as sRI to design new algorithms for walker propagation, which scale quadratically (or less) with system size. We reserve this and the exploration of additional projects centered on THC–sRI or LR–sRI with GPUs for large-scale ph-AFQMC applications for future work.

The data that support the findings of this study are available from the corresponding author upon reasonable request.

We are indebted to Fionn Malone for stimulating discussions and pointing out the potential usefulness of control variate in reducing the variance. We also thank Fionn Malone for help with QMCPACK,56,57 which was used to verify the lack of population bias of reported numbers in this work. We thank Evgeny Epifanovsky for useful discussions on efficient implementations of the local energy evaluation. We thank Miguel Morales and Shiwei Zhang for helpful comments. D.R.R. acknowledges support from Grant No. NSF-CHE 1954791.

1.
A.
Georges
,
G.
Kotliar
,
W.
Krauth
, and
M. J.
Rozenberg
, “
Dynamical mean-field theory of strongly correlated fermion systems and the limit of infinite dimensions
,”
Rev. Mod. Phys.
68
,
13
(
1996
).
2.
W. M. C.
Foulkes
,
L.
Mitas
,
R. J.
Needs
, and
G.
Rajagopal
, “
Quantum Monte Carlo simulations of solids
,”
Rev. Mod. Phys.
73
,
33
(
2001
).
3.
R. J.
Bartlett
and
M.
Musiał
, “
Coupled-cluster theory in quantum chemistry
,”
Rev. Mod. Phys.
79
,
291
(
2007
).
4.
Q.
Sun
and
G. K.-L.
Chan
, “
Quantum embedding theories
,”
Acc. Chem. Res.
49
,
2705
2712
(
2016
).
5.
J. A.
Barker
, “
A quantum-statistical Monte Carlo method; path integrals with boundary conditions
,”
J. Chem. Phys.
70
,
2914
2918
(
1979
).
6.
C. J.
Umrigar
,
M. P.
Nightingale
, and
K. J.
Runge
, “
A diffusion Monte Carlo algorithm with very small time-step errors
,”
J. Chem. Phys.
99
,
2865
2890
(
1993
).
7.
D. M.
Ceperley
, “
Path integrals in the theory of condensed helium
,”
Rev. Mod. Phys.
67
,
279
(
1995
).
8.
S.
Zhang
,
J.
Carlson
, and
J. E.
Gubernatis
, “
Constrained path quantum Monte Carlo method for fermion ground states
,”
Phys. Rev. Lett.
74
,
3652
(
1995
).
9.
S.
Zhang
,
J.
Carlson
, and
J. E.
Gubernatis
, “
Constrained path Monte Carlo method for fermion ground states
,”
Phys. Rev. B
55
,
7464
(
1997
).
10.
S.
Zhang
and
H.
Krakauer
, “
Quantum Monte Carlo method using phase-free random walks with Slater determinants
,”
Phys. Rev. Lett.
90
,
136401
(
2003
).
11.
H.
Shi
and
S.
Zhang
, “
Symmetry in auxiliary-field quantum Monte Carlo calculations
,”
Phys. Rev. B
88
,
125132
(
2013
).
12.
E. J. L.
Borda
,
J.
Gomez
, and
M. A.
Morales
, “
Non-orthogonal multi-Slater determinant expansions in auxiliary field quantum Monte Carlo
,”
J. Chem. Phys.
150
,
074105
(
2019
).
13.
W. A.
Al-Saidi
,
H.
Krakauer
, and
S.
Zhang
, “
Auxiliary-field quantum Monte Carlo study of TiO and MnO molecules
,”
Phys. Rev. B
73
,
075103
(
2006
).
14.
W. A.
Al-Saidi
,
S.
Zhang
, and
H.
Krakauer
, “
Auxiliary-field quantum Monte Carlo calculations of molecular systems with a Gaussian basis
,”
J. Chem. Phys.
124
,
224101
(
2006
).
15.
M.
Suewattana
,
W.
Purwanto
,
S.
Zhang
,
H.
Krakauer
, and
E. J.
Walter
, “
Phaseless auxiliary-field quantum Monte Carlo calculations with plane waves and pseudopotentials: Applications to atoms and molecules
,”
Phys. Rev. B
75
,
245123
(
2007
).
16.
W.
Purwanto
,
W. A.
Al-Saidi
,
H.
Krakauer
, and
S.
Zhang
, “
Eliminating spin contamination in auxiliary-field quantum Monte Carlo: Realistic potential energy curve of F2
,”
J. Chem. Phys.
128
,
114309
(
2008
).
17.
W.
Purwanto
,
S.
Zhang
, and
H.
Krakauer
, “
An auxiliary-field quantum Monte Carlo study of the chromium dimer
,”
J. Chem. Phys.
142
,
064302
(
2015
).
18.
H.
Hao
,
J.
Shee
,
S.
Upadhyay
,
C.
Ataca
,
K. D.
Jordan
, and
B. M.
Rubenstein
, “
Accurate predictions of electron binding energies of dipole-bound anions via quantum Monte Carlo methods
,”
J. Phys. Chem. Lett.
9
,
6185
6190
(
2018
).
19.
J.
Shee
,
E. J.
Arthur
,
S.
Zhang
,
D. R.
Reichman
, and
R. A.
Friesner
, “
Singlet–triplet energy gaps of organic biradicals and polyacenes with auxiliary-field quantum Monte Carlo
,”
J. Chem. Theory Comput.
15
,
4924
4932
(
2019
).
20.
J.
Shee
,
B.
Rudshteyn
,
E. J.
Arthur
,
S.
Zhang
,
D. R.
Reichman
, and
R. A.
Friesner
, “
On achieving high accuracy in quantum chemical calculations of 3d transition metal-containing systems: A comparison of auxiliary-field quantum Monte Carlo with coupled cluster, density functional theory, and experiment for diatomic molecules
,”
J. Chem. Theory Comput.
15
,
2346
2358
(
2019
).
21.
J.
Lee
,
F. D.
Malone
, and
M. A.
Morales
, “
An auxiliary-field quantum Monte Carlo perspective on the ground state of the dense uniform electron gas: An investigation with Hartree–Fock trial wavefunctions
,”
J. Chem. Phys.
151
,
064122
(
2019
).
22.
M.
Motta
,
S.
Zhang
, and
G. K.-L.
Chan
, “
Hamiltonian symmetries in auxiliary-field quantum Monte Carlo calculations for electronic structure
,”
Phys. Rev. B
100
,
045127
(
2019
).
23.
M.
Motta
and
S.
Zhang
, “
Communication: Calculation of interatomic forces and optimization of molecular geometry with auxiliary-field quantum Monte Carlo
,”
J. Chem. Phys.
148
,
181101
(
2018
).
24.
M.
Motta
and
S.
Zhang
, “
Computation of ground-state properties in molecular systems: Back-propagation with auxiliary-field quantum Monte Carlo
,”
J. Chem. Theory Comput.
13
,
5367
(
2017
).
25.
S.
Zhang
,
F. D.
Malone
, and
M. A.
Morales
, “
Auxiliary-field quantum Monte Carlo calculations of the structural properties of nickel oxide
,”
J. Chem. Phys.
149
,
164102
(
2018
).
26.
F. D.
Malone
,
S.
Zhang
, and
M. A.
Morales
, “
Overcoming the memory bottleneck in auxiliary field quantum Monte Carlo simulations with interpolative separable density fitting
,”
J. Chem. Theory Comput.
15
,
256
(
2019
).
27.
J.
Lee
,
F. D.
Malone
, and
M. A.
Morales
, “
Utilizing essential symmetry breaking in auxiliary-field quantum Monte Carlo: Application to the spin gaps of the C36 fullerene and an iron porphyrin model complex
,”
J. Chem. Theory Comput.
16
,
3019
3027
(
2020
).
28.
E. G.
Hohenstein
,
R. M.
Parrish
, and
T. J.
Martínez
, “
Tensor hypercontraction density fitting. I. Quartic scaling second- and third-order Møller–Plesset perturbation theory
,”
J. Chem. Phys.
137
,
044103
(
2012
).
29.
R. M.
Parrish
,
E. G.
Hohenstein
,
T. J.
Martínez
, and
C. D.
Sherrill
, “
Tensor hypercontraction. II. Least-squares renormalization
,”
J. Chem. Phys.
137
,
224106
(
2012
).
30.
E. G.
Hohenstein
,
R. M.
Parrish
,
C. D.
Sherrill
, and
T. J.
Martínez
, “
Communication: Tensor hypercontraction. III. Least-squares tensor hypercontraction for the determination of correlated wavefunctions
,”
J. Chem. Phys.
137
,
221101
(
2012
).
31.
J.
Lee
,
L.
Lin
, and
M.
Head-Gordon
, “
Systematically improvable tensor hypercontraction: Interpolative separable density-fitting for molecules applied to exact exchange, second- and third-order Møller–Plesset perturbation theory
,”
J. Chem. Theory Comput.
16
,
243
263
(
2019
).
32.
M.
Motta
,
J.
Shee
,
S.
Zhang
, and
G. K.-L.
Chan
, “
Efficient ab initio auxiliary-field quantum Monte Carlo calculations in Gaussian bases via low-rank tensor decomposition
,”
J. Chem. Theory Comput.
15
,
3510
3521
(
2019
).
33.
B.
Peng
and
K.
Kowalski
, “
Highly efficient and scalable compound decomposition of two-electron integral tensor and its application in coupled cluster calculations
,”
J. Chem. Theory Comput.
13
,
4179
4192
(
2017
).
34.
R.
Baer
and
D.
Neuhauser
, “
Communication: Monte Carlo calculation of the exchange energy
,”
J. Chem. Phys.
137
,
051103
(
2012
).
35.
R.
Baer
,
D.
Neuhauser
, and
E.
Rabani
, “
Self-averaging stochastic Kohn–Sham density-functional theory
,”
Phys. Rev. Lett.
111
,
106402
(
2013
).
36.
D.
Neuhauser
,
E.
Rabani
,
Y.
Cytter
, and
R.
Baer
, “
Stochastic optimally tuned range-separated hybrid density functional theory
,”
J. Phys. Chem. A
120
,
3071
3078
(
2016
).
37.
D.
Neuhauser
,
R.
Baer
, and
E.
Rabani
, “
Communication: Embedded fragment stochastic density functional theory
,”
J. Chem. Phys.
141
,
041102
(
2014
).
38.
Y.
Gao
,
D.
Neuhauser
,
R.
Baer
, and
E.
Rabani
, “
Sublinear scaling for time-dependent stochastic density functional theory
,”
J. Chem. Phys.
142
,
034106
(
2015
).
39.
D.
Neuhauser
,
E.
Rabani
, and
R.
Baer
, “
Expeditious stochastic approach for MP2 energies in large electronic systems
,”
J. Chem. Theory Comput.
9
,
24
27
(
2013
).
40.
Q.
Ge
,
Y.
Gao
,
R.
Baer
,
E.
Rabani
, and
D.
Neuhauser
, “
A guided stochastic energy-domain formulation of the second order Møller–Plesset perturbation theory
,”
J. Phys. Chem. Lett.
5
,
185
189
(
2014
).
41.
T. Y.
Takeshita
,
W. A.
de Jong
,
D.
Neuhauser
,
R.
Baer
, and
E.
Rabani
, “
Stochastic formulation of the resolution of identity: Application to second order Møller–Plesset perturbation theory
,”
J. Chem. Theory Comput.
13
,
4605
4610
(
2017
).
42.
D.
Neuhauser
,
R.
Baer
, and
D.
Zgid
, “
Stochastic self-consistent second-order Green’s function method for correlation energies of large electronic systems
,”
J. Chem. Theory Comput.
13
,
5396
5403
(
2017
).
43.
T. Y.
Takeshita
,
W.
Dou
,
D. G. A.
Smith
,
W. A.
de Jong
,
R.
Baer
,
D.
Neuhauser
, and
E.
Rabani
, “
Stochastic resolution of identity second-order Matsubara Green’s function theory
,”
J. Chem. Phys.
151
,
044114
(
2019
).
44.
W.
Dou
,
T. Y.
Takeshita
,
M.
Chen
,
R.
Baer
,
D.
Neuhauser
, and
E.
Rabani
, “
Stochastic resolution of identity for real-time second-order Green’s function: Ionization potential and quasi-particle spectrum
,”
J. Chem. Theory Comput.
15
,
6703
6711
(
2019
).
45.
D.
Neuhauser
,
E.
Rabani
, and
R.
Baer
, “
Expeditious stochastic calculation of random-phase approximation energies for thousands of electrons in three dimensions
,”
J. Phys. Chem. Lett.
4
,
1172
1176
(
2013
).
46.
E.
Rabani
,
R.
Baer
, and
D.
Neuhauser
, “
Time-dependent stochastic Bethe–Salpeter approach
,”
Phys. Rev. B
91
,
235302
(
2015
).
47.
M.
Motta
and
S.
Zhang
, “
Ab initio computations of molecular systems by the auxiliary-field quantum Monte Carlo method
,”
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
8
,
e1364
(
2018
).
48.
D. J.
Thouless
, “
Stability conditions and nuclear rotations in the Hartree–Fock theory
,”
Nucl. Phys.
21
,
225
232
(
1960
).
49.
D. J.
Thouless
, “
Vibrational states of nuclei in the random phase approximation
,”
Nucl. Phys.
22
,
78
95
(
1961
).
50.
N.
Rom
,
D. M.
Charutz
, and
D.
Neuhauser
, “
Shifted-contour auxiliary-field Monte Carlo: Circumventing the sign difficulty for electronic-structure calculations
,”
Chem. Phys. Lett.
270
,
382
386
(
1997
).
51.
G.
Spink
,
R.
Needs
, and
N.
Drummond
, “
Quantum Monte Carlo study of the three-dimensional spin-polarized homogeneous electron gas
,”
Phys. Rev. B
88
,
085121
(
2013
).
52.
C.
Lemieux
, “
Control variates
,” in , edited by
N.
Balakrishnan
,
T.
Colton
,
B.
Everitt
,
W.
Piegorsch
,
F.
Ruggeri
, and
J. L.
Teugels
(
Wiley
,
2017
).
53.
See https://github.com/pauxy-qmc/pauxy for details on how to obtain the source code.
54.
L. K.
Wagner
,
M.
Bajdich
, and
L.
Mitas
, “
QWalk: A quantum Monte Carlo program for electronic structure
,”
J. Comput. Phys.
228
,
3390
(
2009
).
55.
T. H.
Dunning
, “
Gaussian basis sets for use in correlated molecular calculations. I. The atoms boron through neon and hydrogen
,”
J. Chem. Phys.
90
,
1007
1023
(
1989
).
56.
J.
Kim
 et al, “
QMCPACK: An open source ab initio quantum Monte Carlo package for the electronic structure of atoms, molecules and solids
,”
J. Phys.: Condens. Matter
30
,
195901
(
2018
).
57.
P. R. C.
Kent
et al., “
QMCPACK: Advances in the development, efficiency, and application of auxiliary field and real-space variational and diffusion quantum Monte Carlo
,”
J. Chem. Phys.
152
,
174105
(
2020
).
58.
F.
Weigend
,
A.
Köhn
, and
C.
Hättig
, “
Efficient use of the correlation consistent basis sets in resolution of the identity MP2 calculations
,”
J. Chem. Phys.
116
,
3175
3183
(
2002
).
59.
J.
Shee
,
E. J.
Arthur
,
S.
Zhang
,
D. R.
Reichman
, and
R. A.
Friesner
, “
Phaseless auxiliary-field quantum Monte Carlo on graphical processing units
,”
J. Chem. Theory Comput.
14
,
4109
4121
(
2018
).
60.
F. D.
Malone
,
S.
Zhang
, and
M. A.
Morales
, “
Accelerating auxiliary-field quantum Monte Carlo simulations of solids with graphical processing unit
,”
J. Chem. Theory Comput.
16
(
7
),
4286
4297
(
2020
).