Linear scaling density functional theory is important for understanding electronic structure properties of nanometer scale systems. Recently developed stochastic density functional theory can achieve linear or even sublinear scaling for various electronic properties without relying on the sparsity of the density matrix. The basic idea relies on projecting stochastic orbitals onto the occupied space by expanding the Fermi-Dirac operator and repeating this for *N*_{χ} stochastic orbitals. Often, a large number of stochastic orbitals are required to reduce the statistical fluctuations (which scale as $N\chi \u22121/2$) below a tolerable threshold. In this work, we introduce a new stochastic density functional theory that can efficiently reduce the statistical fluctuations for certain observable and can also be integrated with an embedded fragmentation scheme. The approach is based on dividing the occupied space into energy windows and projecting the stochastic orbitals with a single expansion onto all windows simultaneously. This allows for a significant reduction of the noise as illustrated for bulk silicon with a large supercell. We also provide theoretical analysis to rationalize why the noise can be reduced only for a certain class of ground state properties, such as the forces and electron density.

## I. INTRODUCTION

Accurate modeling of the electronic structure is important for understanding the thermal, optical, and magnetic properties of molecules and condensed matters. Density functional theory (DFT)^{1} within the Kohn-Sham (KS) scheme^{2} is often the method of choice due to its relatively low computational cost, which formally scales as $Ne3$, where *N*_{e} is the number of electrons in the system. However, even this mild power-law scaling limits the applicability of DFT to study complex systems that require modeling of more than 10^{4} electrons.^{3–5} In order to reduce the scaling, much effort has been devoted to the development of linear scaling DFT algorithms.^{6–17} The majority of these algorithms either divide a system into subsystems^{14–17} and attempt to solve the electronic structure problem in each subsystem or rely on the sparsity of the density matrix.^{6,10,12} Although linear scaling DFT has achieved many successes, difficulties exist in both approaches. Algorithms based on solving subsystem such as like embedding methods require a proper treatment of boundaries between subsystems^{16,18–22} while methods based on the locality of the density matrix often suffer from slow convergence when the fundamental band gap is small.

Recently, we have introduced an alternative scheme for linear-scaling DFT based on the notion of stochastic orbitals, termed stochastic DFT (sDFT).^{23} The idea is to compute the electron density and ground state electronic properties with stochastic orbitals that are random linear combinations of the deterministic occupied orbitals. Importantly, the sDFT approach depends on neither the sparsity of density matrix nor how the boundaries between fragments in an embedding scheme are treated. This has been shown for a variety of systems, including insulators and semiconductors.^{23–28} Although sDFT is very efficient for large systems, lowering the statistical noise of ground state electronic properties to below tolerable thresholds often requires hundreds to thousands of stochastic orbitals, leading to significant computational costs.^{23} Various techniques based on embedded fragmentation algorithms have been successfully introduced to reduce the noise in the electron density for specific systems, such as molecular systems^{24} and covalently bonded materials,^{25,26} but more robust noise reduction schemes are needed, particularly for systems for which it is nontrivial to construct embedded fragments.

Here, we propose a general improvement to sDFT that reduces the noise in the electron density and nuclear forces by breaking the electron density into components corresponding to orbital energy windows and using sDFT to sample each window. We refer to this new method as “energy window stochastic density functional theory” (ew-sDFT). We find that ew-sDFT successfully reduces the statistical noise in the electron density and in certain other ground state electronic structure properties (e.g., forces).

This paper is organized as follows: We briefly review the sDFT approach and then present the energy-window modification followed by a numerical illustration for bulk silicon with a large supercell. Finally, we provide a discussion of the noise reduction mechanism in ew-sDFT.

## II. STOCHASTIC DENSITY FUNCTIONAL THEORY

We begin by briefly reviewing the stochastic density functional theory (for more details, consult Refs. 23–26 and 28). Without loss of generality, we limit the discussion to a system in a box of volume *V* with periodic boundary condition. Electronic properties are calculated by KS-DFT with Hamiltonian *ĥ*_{KS} given by

where $t^$, $v^nl$, $v^loc$, $v^h[\rho ]$, and $v^xc[\rho ]$ are kinetic, nonlocal pseudopotential, local pseudopotential, Hartree, and exchange correlation terms. In KS-DFT, the Hartree and exchange correlation potentials explicitly depend on the electron density $\rho r$, which is given by $\rho (r)=2\u2211i=1Nocc|\psi i(r)|2$ on *N*_{g} real-space grid points, where *ψ*_{i}(** r**) are the KS orbitals of

*ĥ*

_{KS}and

*N*

_{occ}is the number of occupied orbitals.

*ρ*(

**) can also be expressed as the trace of an operator, i.e.,**

*r*where $\rho ^=lim\beta \u2192\u221e\theta \beta (\u0125KS;\mu )$ is the one-body density matrix, *θ*_{β}(*x*; *μ*) = 1/(1 + *e*^{β(x−μ)}) is the Fermi-Dirac distribution function with inverse temperature *β* and chemical potential *μ*, and $\delta (r\u2212r^)$ is the Dirac-delta function. In practice, *β* is chosen to be large enough to converge the ground state properties and *μ* is tuned such that $Tr\u2009\rho ^=Ne/2$ is equal to the number of electron pairs.

In sDFT, *ρ*(** r**) in Eq. (2) is determined by using a set of

*N*

_{χ}stochastic orbitals $\chi $ (rather than finding all KS orbitals),

where $\xi =\rho ^\chi $ and ⟨⋯⟩_{χ} imply averaging over an ensemble of stochastic orbitals. One possible choice of $\chi $ in real space is $\chi (r)=\xb11/\Delta V$ with equal random probabilities of positive or negative values for each grid point, and Δ*V* = *V*/*N*_{g} represents the volume element. In practice, a set of *N*_{χ} stochastic orbitals ${\chi i}$ is used in sDFT and a sample average leads to $\rho \xaf\chi (r)=2N\chi \u2211i=1N\chi |\xi (r)|2$. $\rho \xaf\chi (r)$ is then used to evaluate quantities that are functionals of the density, such as the local pseudopotential energy, Hartree energy, exchange-correlation energy, and the local pseudopotential nuclei force. A ground state quantity corresponding to a one-body operator, *Ô*, can also be obtained via a trace formula,

Equation (4) is used to evaluate kinetic energy (*E*_{K}) and nonlocal pseudopotential energy (*E*_{nl}) and nonlocal pseudopotential nuclei force. Applying Eq. (4) with a finite number of samples results in a statistical approximation of the expectation value of *Ô*.

Achieving linear (or sublinear) scaling in sDFT relies on two requirements. First, applying $\rho ^=\theta \beta (\u0125KS;\mu )$ to a stochastic orbital $\chi $ should scale linearly with respect to system size. This is achieved by expanding the Fermi-Dirac operator in a Chebyshev polynomial,^{29,30}

where *a*_{n}(*μ*, *β*) are the expansion coefficients of the Fermi-Dirac operator^{31} and *N*_{c} is the highest degree of polynomials. Linear scaling then relies on the sparsity of the KS Hamiltonian and not on the density matrix. Second, for many local observables like the energy per particle and the forces on the atoms, the statistical noise does not increase with increasing system size and often decreases with the system size, leading to sublinear scaling.^{23–26,28}

Within the sDFT formulation, the exact electron density *ρ*(** r**) can only be recovered by averaging infinitely many stochastic orbitals. Thus, for a finite set of stochastic orbitals, $\rho \xaf\chi (r)$ contains a statistical noise. With the choice of random state $\chi $ described above, the variance of density calculated from one stochastic orbital, $\rho \chi r=2\chi \theta \beta \u0125KS;\mu \delta r\u2212r^\chi $, is approximately given by

^{25}

where Var_{χ}(⋯) represents variance in *ρ*(** r**). From the central limit theorem, the variance of the electron density $\rho \xaf\chi (r)$ evaluated with

*N*

_{χ}stochastic orbitals is 2

*ρ*

^{2}(

**)/**

*r**N*

_{χ}. Equation (6) indicates that the standard deviation of the density is proportional to the density itself. Thus, by dividing

*ρ*(

**) into multiple components and using sDFT to sample each of them, it is possible to reduce the statistical noise, as discussed in Sec. III.**

*r*## III. ENERGY WINDOW STOCHASTIC DENSITY FUNCTIONAL THEORY

The central idea behind the energy window stochastic density functional theory (ew-sDFT) is to divide the space spanned by the occupied orbitals into subspaces and perform sDFT calculation in each subspace before combining contributions from all subspaces. We will illustrate ew-sDFT with two energy windows; however, generalizing the results to multiple windows is straightforward. First, we divide the occupied orbitals, {*ψ*_{i}}, into two subsets: one with orbital energy (*ε*_{i}) smaller than a given energy *ε*^{*} while orbitals in the other subset have orbital energies higher than *ε*^{*}, where *ε*^{*} < *μ*. The orbital indexes in the two subsets are $w1$ = {*i*|*ε*_{i} ≤ *ε*^{*}} and $w2$ = {*i*|*ε*^{*} < *ε*_{i} < *μ*}. The corresponding projection operators onto each energy window are $\rho ^1=\theta \beta (\u0125KS;\u03f5*)$ and $\rho ^2=\theta \beta (\u0125KS;\mu )\u2212\theta \beta (\u0125KS;\u03f5*)$. Similar to sDFT, applying $\rho ^1$ and $\rho ^2$ to a stochastic orbital $\chi $ generates filtered stochastic orbitals $\xi (1)=\rho ^1\chi $ and $\xi (2)=\rho ^2\chi $. The deterministic density *ρ*(** r**) can be recovered by averaging 2|

*ξ*

^{(1)}(

**)|**

*r*^{2}+ 2|

*ξ*

^{(2)}(

**)|**

*r*^{2}, i.e.,

We evaluate both $\rho 1^$ and $\rho 2^$ using a Chebyshev polynomial expansion given by Eq. (5). We emphasize that ew-sDFT barely increases the computational wall time compared to sDFT because the same Chebyshev polynomials of *ĥ*_{KS} can be used to simultaneously project a random stochastic orbital on both energy windows, generating |*ξ*^{(1)}⟩ and |*ξ*^{(2)}⟩ with a similar effort to one window sDFT.

Within ew-sDFT, a general ground state observable can be evaluated according to

and by *ρ*_{χ}(** r**) = 2|

*ξ*

^{(1)}(

**)|**

*r*^{2}+ 2|

*ξ*

^{(2)}(

**)|**

*r*^{2}, the variance of

*ρ*

_{χ}(

**) in ew-sDFT is given by**

*r*where $\rho 1r=2\u2211i\u2208w1|\psi ir2|$ and $\rho 2r=2\u2211i\u2208w2|\psi ir2|$ are densities corresponding to the two energy windows. Recalling that the variance of $\rho \xaf\chi (r)$ from sDFT is $2\rho 2(r)=2(\rho 1(r)+\rho 2(r))2$, the variance is thus reduced by 4*ρ*_{1}(** r**)

*ρ*

_{2}(

**). Therefore, with similar computational wall time, ew-sDFT achieves smaller fluctuations in the electron density compared to sDFT.**

*r*The implementation of ew-sDFT is similar to sDFT and can be implemented in plane waves, real space grids, or Gaussian basis sets. Starting from the initial guess of the electron density, ew-sDFT first generates *N*_{χ} random vectors in real space as described in Sec. II. Next, the chemical potential *μ* is determined by solving the following equation:

Note that Eq. (10) is the same as the one used in sDFT since $\u27e8\chi i|\rho ^1|\chi i\u27e9+\u27e8\chi i|\rho ^2|\chi i\u27e9=\u27e8\chi i|\rho ^|\chi i\u27e9$ for each |*χ*_{i}⟩. After determining *μ*, each |*χ*_{i}⟩ is filtered by $\rho ^1$ and $\rho ^2$ using a Chebyshev polynomial expansion as described by Eq. (5) to generate $|\xi i(1)$ and $|\xi i(2)$. The filtered stochastic orbitals are then used to calculate the density, $\rho \xaf\chi (r)$, and other electronic properties via Eqs. (7) and (8). The KS Hamiltonian is then updated with $\rho \xaf\chi (r)$, and the above calculations repeat until $\rho \xaf\chi (r)$ converges. The same random number seed is used throughout the self-consistency iterations to ensure convergence.

## IV. RESULTS AND DISCUSSION

We assessed the performance of ew-sDFT on silicon crystal with various supercell size. Silicon crystal is a challenging system for linear scaling DFT algorithms due to its small band gap within the local density approximation (LDA) to DFT (fundamental gap of ∼0.6 eV). We used *β* ≈ 600 in inverse Hartree, sufficient to converge the ground state properties with respect to the electron temperature. The same value of *β* ≈ 600 was also sufficient to converge the ground state properties within ew-sDFT, regardless of the number of energy windows. The wave function cutoff was 30 Ry and the density cutoff was 60 Ry, which corresponds to real space grid spacing of 0.22 Å. Troullier-Martins norm conserving pseudopotentials^{32} were used with Kleinman-Bylander separable form.^{33} The application of the nonlocal pseudopotential was performed in real space to reduce the computational costs.^{34} We first tested a rather small supercell containing 64 silicon atoms as an illustrative example of ew-sDFT. 512 stochastic orbitals were used in both sDFT and ew-sDFT and 10 energy windows were used in ew-sDFT. All statistical quantities were evaluated from 10 independent runs.

In addition to the perfect crystalline silicon structures, we have also compared the performance of sDFT and ew-sDFT on a thermally equilibrated silicon structure. The structure was obtained by equilibrating Si_{512} crystal at 1000 K and ambient pressure with molecular dynamics simulation using the Stillinger and Weber force-field.^{35} Due to smaller fundamental gap of thermal equilibrium silicon, we used *β* ≈ 800 in inverse Hartree to converge the ground state properties in both sDFT and ew-sDFT calculations. All the other computational details were not changed, and all statistical quantities were averaged over 5 independent runs.

In Fig. 1, we plot the standard deviation of $\rho \xaf\chi (r)$ for sDFT (blue curves) and ew-sDFT (red curves). In addition, we plot the estimate of the standard deviation given by Eqs. (6) and (9) for sDFT and ew-sDFT (dashed lines), respectively. We find that the standard deviation of the electron density is roughly proportional to the value of *ρ*(** r**), consistent with the theoretical predictions. Both theory and numerical calculations demonstrate that ew-sDFT reduces the statistical noise in the density. With 10 windows, the density fluctuation from ew-sDFT is reduced by nearly a factor of 3 compared to sDFT. Since the standard deviation scales as $N\chi \u22121/2$, it implies that the sDFT requires about 10 times more stochastic orbitals to achieve the same level of statistical noise in the density.

A similar picture emerges for the density of states (DOS), as shown in Fig. 2. We constructed the KS Hamiltonian $\u0125KS[\rho \xaf\chi (r)]$ from $\rho \xaf\chi (r)$ and diagonalized it in the subspace of occupied orbitals. The DOS of $\u0125KS[\rho \xaf\chi (r)]$ with$\rho \xaf\chi (r)$ from ew-sDFT is in better agreement with the deterministic result (upper panel of Fig. 2) and the fluctuations in the DOS are smaller (lower panel of Fig. 2).

Reducing the noise in the electron density should lead to a reduction of the noise of *all* ground state properties. To test this hypothesis, we calculated the energy per electron and nuclei forces for Si_{64} and Si_{512} supercells at *T* = 0 K and for Si_{512} supercell at *T* = 1000 K and compared the results of ew-sDFT to sDFT. 10 energy windows were used for all systems. In addition, to explore the effect of the number of windows, we have used 100 energy windows for the crystal Si_{512}. The energies per electron calculated by sDFT and ew-sDFT agree well with deterministic results, as presented in Table I. Surprisingly, the standard deviations are similar comparing sDFT and ew-sDFT. Increasing the number of windows to 100 does not change this picture.

System . | Method . | N_{w}
. | E_{k}
. | E_{nl}
. | E_{loc}
. | E_{h} + E_{xc}
. | E_{tot}
. |
---|---|---|---|---|---|---|---|

Crystal Si_{64} | sDFT | 10.450(40) | 6.081(8) | −8.633(22) | −6.270(6) | −26.937(37) | |

ew-sDFT | 10 | 10.450(40) | 6.081(8) | −8.641(19) | −6.271(6) | −26.946(36) | |

Deterministic | 10.450 | 6.083 | −8.660 | −6.266 | −26.958 | ||

Crystal Si_{512} | sDFT | 10.364(7) | 6.105(3) | −8.581(6) | −6.291(2) | −26.969(7) | |

ew-sDFT | 10 | 10.363(6) | 6.104(3) | −8.587(5) | −6.293(2) | −26.978(7) | |

ew-sDFT | 100 | 10.363(7) | 6.104(3) | −8.589(5) | −6.293(2) | −26.980(7) | |

Deterministic | 10.360 | 6.105 | −8.594 | −6.292 | −26.985 | ||

Thermal Si_{512} | sDFT | 10.301(5) | 6.104(3) | −8.790(5) | −6.190(1) | −26.942(8) | |

ew-sDFT | 10 | 10.298(6) | 6.104(2) | −8.800(2) | −6.190(1) | −26.956(6) | |

Deterministic | 10.298 | 6.104 | −8.804 | −6.190 | −26.959 |

System . | Method . | N_{w}
. | E_{k}
. | E_{nl}
. | E_{loc}
. | E_{h} + E_{xc}
. | E_{tot}
. |
---|---|---|---|---|---|---|---|

Crystal Si_{64} | sDFT | 10.450(40) | 6.081(8) | −8.633(22) | −6.270(6) | −26.937(37) | |

ew-sDFT | 10 | 10.450(40) | 6.081(8) | −8.641(19) | −6.271(6) | −26.946(36) | |

Deterministic | 10.450 | 6.083 | −8.660 | −6.266 | −26.958 | ||

Crystal Si_{512} | sDFT | 10.364(7) | 6.105(3) | −8.581(6) | −6.291(2) | −26.969(7) | |

ew-sDFT | 10 | 10.363(6) | 6.104(3) | −8.587(5) | −6.293(2) | −26.978(7) | |

ew-sDFT | 100 | 10.363(7) | 6.104(3) | −8.589(5) | −6.293(2) | −26.980(7) | |

Deterministic | 10.360 | 6.105 | −8.594 | −6.292 | −26.985 | ||

Thermal Si_{512} | sDFT | 10.301(5) | 6.104(3) | −8.790(5) | −6.190(1) | −26.942(8) | |

ew-sDFT | 10 | 10.298(6) | 6.104(2) | −8.800(2) | −6.190(1) | −26.956(6) | |

Deterministic | 10.298 | 6.104 | −8.804 | −6.190 | −26.959 |

Contrary to the case of the total energy, the standard deviation in the nuclei forces is significantly reduced by the introduction of energy windows, as shown in Fig. 3. Increasing the number of windows in ew-sDFT to 100 further reduces the noise in the nuclei forces [Fig. 3 panel (d)], however, the standard deviation seems to converge to a lower limit. This suggests that there should exist an optimal window size (close to 10) balancing between the small additional computational overhead associated with increasing the number of windows and the reduction in the statistical noise. In addition to the role of the number of energy windows, Fig. 3 panel (d) clearly demonstrates that the noise in the nuclei forces does not depend on the system size, implying that ew-sDFT scales linearly with the system size for nuclei forces.

Turning to discuss the role of thermal fluctuations and the introduction of disorder, comparing the standard deviation of the nuclei forces in Fig. 3 panels (c) and (d), we find that the reduction in the noise in the nuclei forces using 10 energy windows is not limited to a perfectly ordered structure at *T* = 0 and a similar reductions are observed for the thermally equilibrated structure at *T* = 1000 K. This signifies that the sort of self-averaging observed here is not a consequence of the perfectly ordered structure.

## V. FLUCTUATIONS OF OBSERVABLES AND DENSITY FUNCTIONALS

In this section, we analyze the fluctuations in observables described by the trace over a generic operator *Ô* and rationalize the behavior depicted in Table I and Fig. 3, where reduction in the fluctuations was observed for the forces but not the total energies per particle. We begin by focusing on a simple case in which the operator *Ô* does not depend on the electron density, $\rho \xaf\chi (r)$. In the stochastic formulation [cf. Eq. (4)], the expectation value is approximated by *O*_{χ} = 2⟨*ξ*|*Ô*|*ξ*⟩. The variance of *O*_{χ} can be calculated by the following equation:

where *ψ*_{i} and *ψ*_{j} are deterministic KS occupied orbitals. We need to generate the KS orbital for the analysis, but of course, ew-sDFT does not require any knowledge of the KS orbitals. The variance given by Eq. (11) is determined by the Frobenius norm of the matrix representation of *Ô* in the occupied subspace. In the case of ew-sDFT, the variance is given by the sum of variances for each window (for simplicity we take only two windows)

Recalling that $w1$ ($w2$) is the set of orbital indices for the first (second) energy window, Eq. (12) suggests that the variance of *O*_{χ} is propositional to the Frobenius norms of diagonal blocks of ⟨*ψ*_{i}|*Ô*|*ψ*_{j}⟩ (see Fig. 4). If ⟨*ψ*_{i}|*Ô*|*ψ*_{j}⟩ is diagonal dominant, the variance calculated from Eqs. (11) and (12) is nearly identical. In this case, ew-sDFT is less efficient at reducing the statistical fluctuations in *O*_{χ}. In the limiting case in which *ψ*_{i} are eigenfunctions of *Ô*, the noise level of *O*_{χ} calculated from ew-sDFT is identical to that of sDFT (a single window ew-sDFT). On the hand, if ⟨*ψ*_{i}|*Ô*|*ψ*_{j}⟩ is off-diagonal dominant, significant noise reduction can be achieved by ew-sDFT, as suggested by comparing Eqs. (11) and (12).

To illustrate this, consider the stochastic fluctuations in the kinetic energy, nonlocal pseudopotential energy, and local pseudopotential energy, which are described by the operators $t^$, $v^nl$, and $v^loc$ [independent on the electron density, $\rho \xaf\chi (r)$]. The corresponding matrix elements calculated in the KS eigenstates representation are shown in Fig. 5 panels (a)–(c), respectively. The matrices are clearly diagonal dominant for all three cases [panels (a)–(c)]. Thus, it is not surprising that ew-sDFT does not reduce the noise for the corresponding observables, *E*_{k}, *E*_{nl}, and *E*_{loc} (as discussed above, see also Table I). On the contrary, matrices of nuclei forces are not diagonal dominant, as illustrated in Fig. 5 panels (d)–(f), consistent with the ew-sDFT results shown in Fig. 3. We note in passing that for amorphous materials, off-diagonal terms in the matrices of $t^$, $v^nl$, and $v^loc$ become important, as shown in the supplementary material. However, these matrices are still diagonal dominant and thus noise reduction in energy terms are limited, as illustrated in the thermal equilibrium Si_{512} example in Table I. On the other hand, for these amorphous materials, the structure of the corresponding matrices for the force remain nondiagonal dominant and ew-sDFT can achieve excellent noise reduction in atom forces, which is also demonstrated by the thermal equilibrium Si_{512} example in Fig. 3.

Next, we consider the case where the operator *Ô* depends on the electron density, $\rho \xaf\chi (r)$. This is important for observables like the Hartree energy and the exchange-correlation energy required to fully understand the behavior with respect to the number of energy windows observed for the total energy. Consider the variance of a density functional $f[\rho \xaf\chi (r)]$, where $\rho \xaf\chi (r)$ is calculated from *N*_{χ} stochastic orbitals in sDFT or ew-sDFT. Approximating $\rho \xaf\chi (r)$ as $\rho \xaf\chi (r)\u2248\rho (r)(1+\alpha (r))$, where *α*(** r**) is a small random perturbation of order $\u03f5$ where

*ϵ*= 1/

*N*

_{χ}, and retaining terms to order

*ϵ*leads to

where, as before, $\rho (r,r\u2032)=2\u27e8r|\rho ^|r\u2032\u27e9$ is the single particle density matrix. For linear functional like the local pseudopotential energy, $\u27e8\u2009f[\rho \xaf\chi (r)]\u27e9=f[\rho r]$ and the functional variance given by Eq. (14) is identical to applying the central limit theorem to the variance given by Eq. (11) with *Ô* replaced by a local *effective* operator $\delta f\delta \rho [\rho r](r)$. For a nonlinear functional, $\u222c\delta 2f\delta \rho (r)\delta \rho (r\u2032)\rho 2(r,r\u2032)drdr\u2032$ is nonzero, corresponding to “bias” which decays as 1/*N*_{χ} as recently discussed by Cytter *et al.*^{27} and Fabian *et al.*^{28} In ew-sDFT, the average and variance of $f[\rho \xaf\chi (r)]$ becomes

Using the above relations in Eqs. (11)–(16) results in the following expression for the variance of total energy:

for sDFT and

for ew-sDFT. The expressions for the total energy variance given by Eqs. (17) and (18) are identical, suggesting that the statistical noise in ew-sDFT is identical to that of sDFT, consistent with the results shown above. We wish to emphasize that the results in Eqs. (17) and (18) also support the observed sublinear scaling of the energy per electron with both sDFT and ew-sDFT, since the occupied orbital energies are bounded, and thus, the variance of the total energy is proportional to the system size. This implies that the standard deviation of the energy per electron is inverse proportional to the square root of the system size, explaining our previous observations.^{23,24}

## VI. CONCLUSION

In this study, we developed a stochastic density functional theory scheme which allows us to significantly reduce the statistical noise in certain ground state observable with DFT. This was achieved by dividing the occupied space into energy windows and using correlated sampling in each window. Our new scheme (ew-sDFT) is able to reduce the statistical noise in both the electron density and nuclei forces without significantly increasing the computational time, but not for the total energy or total energy per electron. We showed that noise reduction of an observable in ew-sDFT correlates with whether or not the corresponding operator is diagonal dominant in KS eigenstate representation. The total energy is diagonal dominant, and no noise reduction was observed. However, for the forces and electron density, this was not the case and we observed a significant reduction in the noise that results in nearly an order of magnitude reduction in the computational effort compared to sDFT (since the noise is proportional to $1/N\chi $). This reduction allows for a more rapid sampling of the canonical distribution using methods that do not rely on the total energy, such as Langevin dynamics.^{26} The developed ew-sDFT provides an additional approach to noise reduction within the family of sDFT methods and can be combined with embedded fragmentation schemes to further improve the accuracy and efficiency of stochastic density functional theories.

## SUPPLEMENTARY MATERIAL

Detailed derivations for variance of ground state properties in sDFT and ew-sDFT are provided in the supplementary material. The supplementary material also includes operator matrices for disordered Si_{64} system.

## ACKNOWLEDGMENTS

R.B. gratefully acknowledges support from the Israel Science Foundation, Grant No. 800/19. D.N. and E.R. are grateful for support by the Center for Computational Study of Excited State Phenomena in Energy Materials (C2SEPEM) at the Lawrence Berkeley National Laboratory, which is funded by the U.S. Department of Energy, Office of Science, Basic energy Sciences, Materials Sciences and Engineering Division under Contract No. DEAC02-05CH11231 as part of the Computational Materials Sciences Program. Resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility operated under Contract No. DE-AC02-05CH11231 are greatly acknowledged.