The stochastic density functional theory (DFT) [R. Baer *et al.*, Phys. Rev. Lett. **111**, 106402 (2013)] is a valuable linear-scaling approach to Kohn-Sham DFT that does not rely on the sparsity of the density matrix. Linear (and often sub-linear) scaling is achieved by introducing a controlled statistical error in the density, energy, and forces. The statistical error (noise) is proportional to the inverse square root of the number of stochastic orbitals and thus decreases slowly; however, by dividing the system into fragments that are embedded stochastically, the statistical error can be reduced significantly. This has been shown to provide remarkable results for non-covalently-bonded systems; however, the application to covalently bonded systems had limited success, particularly for delocalized electrons. Here, we show that the statistical error in the density correlates with both the density and the density matrix of the system and propose a new fragmentation scheme that elegantly interpolates between overlapped fragments. We assess the performance of the approach for bulk silicon of varying supercell sizes (up to *N*_{e} = 16 384 electrons) and show that overlapped fragments reduce significantly the statistical noise even for systems with a delocalized density matrix.

## I. INTRODUCTION

An accurate description of the electronic properties is often a prerequisite for understanding the behavior of complex materials. Density functional theory (DFT)^{1} within the Kohn-Sham (KS) formulation^{2} provides an excellent framework that balances computational complexity and accuracy and thus has been the method of choice for extended, large-scale systems. Solving the KS equations scales formally as $ONe3$, where *N*_{e} is the number of electrons in the system. Traditional implementations of KS-DFT are often limited to relative systems containing *N*_{e} < 10^{4} electrons,^{3–5} even with the high-performance computer architectures of today. Clearly, there is a need for low-scaling KS-DFT approaches.

A natural approach to linear-scaling KS-DFT relies on the “locality” of the density matrix, that is, the elements of the density matrix $\rho r,r\u2032$ decay with increasing ∥** r** −

**′∥.**

*r*^{6–14}The reliance on Kohn’s “nearsightedness” principle

^{10}makes these approaches sensitive to the dimensionality and the character of the system. They work extremely well for low dimensional structures or systems with a large fundamental gap,

^{11}but in 3-dimensions (3D), linear-scaling is achieved only for very large systems, typically for

*N*

_{e}> 10

^{4}. An alternative to Kohn’s “nearsightedness” principle is based on “divide” and “conquer,”

^{15–20}where the density or the density matrix is partitioned into fragments and the KS equations are solved for each fragment separately using localized basis sets.

^{15,17,18}The complexity of solving the KS equations is shifted to computing the interactions between the fragments,

^{17,21–25}and the accuracy (systematic errors that decrease with the fragment size) and scaling depend on the specific implementation.

Recently, we have proposed a new scheme for linear-scaling DFT based approach.^{26} Stochastic DFT (sDFT) sidesteps the calculation of the density matrix and is therefore not directly sensitive to its evasive sparseness. Instead, the density is given in terms of a trace formula and is evaluated using stochastic occupied orbitals generated by a Chebyshev or Newton expansion of the occupation operator.^{26} Random fluctuations of local properties, e.g., energy per atom, forces, and density of states, are controlled by the number of stochastic orbitals and are often independent of the system size,^{27} leading to a computational cost that scales linearly (and sometimes sub-linearly).

The statistical error is substantially reduced when, instead of sampling the full density, one samples the difference between the full system density and that of reference fragments.^{28} Indeed, embedded fragment stochastic DFT (efsDFT) has been extremely successful in reducing the noise level for non-covalently bonded fragments.^{28} However, for covalently bonded fragments, efsDFT needs to be structured appropriately,^{29} particularly for periodic boundary conditions (PBCs).

In this manuscript, we have developed a new fragmentation approach suitable for covalently bonded systems with open or periodic boundary conditions (PBCs). Similar to efsDFT, we divide the system into *core* fragments, but in addition, for each core fragment, we add a *buffer* zone allowing for fragments to overlap in the embedding procedure. Similar in spirit to an earlier work on divide and conquer,^{17} we demonstrate that allowing the fragments to overlap results in significant improvements of the reference density matrix, reducing significantly the level of statistical noise. This manuscript is organized as follows: In Sec. II, we briefly review the theory of stochastic orbital DFT and its embedded-fragmented version. In Sec. III, we outline the new overlapped embedded fragmented stochastic DFT (o-efsDFT). Section V presents the results of numerical calculations on a set of silicon crystals of varying super-cell size, up to 4096 atoms, and discusses the significance of the overlapped fragments on reducing the statistical error.

## II. STOCHASTIC DENSITY FUNCTIONAL THEORY AND EMBEDDED FRAGMENT STOCHASTIC DENSITY FUNCTIONAL THEORY

We consider a system with periodic boundary conditions in a super-cell of volume *V* with an electron density *ρ*(** r**) represented on a real-space grid with

*N*

_{G}grid points. The electronic properties are described within Kohn-Sham (KS) density functional theory,

^{2}where the density is given by $\rho r=\u2211i=1Nocc\u27e8r|\psi i\u27e9\u27e8\psi i|r\u27e9$. Here, ⟨

**|**

*r**ψ*

_{i}⟩ are the KS orbitals and

*N*

_{occ}is the number of occupied states. The KS density can also be expressed as a trace over the density matrix, $\rho ^=lim\beta \u2192\u221e\theta \beta (\mu \u2212\u0125KS)$,

where $\delta r\u2212r^$ is Dirac’s delta function and $\theta \beta x^=1+e\u2212\beta x^\u22121$ is a step function in the limit *β* → *∞* (in practice, *β* is chosen to be large enough to converge the ground state properties). The chemical potential, *μ*, is determined by imposing the relation $Ne\mu =Tr\theta \beta (\mu \u2212\u0125KS)$, where $Ne\mu $ is the total number of electrons in the system. In Eq. (1), *ĥ*_{KS} is the KS Hamiltonian, given by

where $t^$ is the kinetic energy, $v^loc$ and $v^nl$ are the local and nonlocal pseudopotentials, respectively, $v^H$ is the Hartree potential, and $v^XC$ is the exchange-correlation potential.

Equation (1) is the starting point for the derivation of the stochastic DFT.^{26} The trace is performed using *N*_{χ} stochastic orbitals, |*χ*⟩, leading to

where $|\xi =\theta \beta (\mu \u2212\u0125KS)|\chi $ is a projected stochastic orbital. In Eq. (3), $\chi r=\xb1\Delta V\u22121/2$, where Δ*V* = *V*/*N*_{G} is the volume element of the real-space grid and ± indicates that we randomly, uniformly, and independently select the sign of *χ* for each grid point ** r**. The additional brackets ($\cdots \u2009\chi $) in Eq. (3) represent an average over all stochastic realizations, namely, $\cdots \u2009\chi =1N\chi \u2211\chi \cdots \u2009$. Equation (3) becomes exact only in the limit that

*N*

_{χ}→

*∞*, and otherwise it is a stochastic approximation to the electron density, with a random error whose magnitude decays as $N\chi $. Equation (3) is solved self-consistently, since the KS Hamiltonian depends on $\rho r$ itself.

Obtaining the total energy within the sDFT formalism is tricky. The Hartree, local pseudopotential, and local exchange-correlation energy terms can be obtained directly from the stochastic density, similar to deterministic DFT. The calculation of the kinetic energy and nonlocal pseudopotential energy requires the Kohn-Sham orbitals in deterministic DFT. In sDFT, these can be evaluated using the relations^{26}

The advantage of the stochastic DFT formalism is that the density can be calculated with linear (and often sub-linear) scaling at the cost of introducing a controlled statistical error, which is controlled by $N\chi $ where $Nx$ is the number of stochastic orbitals. Linear-scaling is achieved by taking advantage of the sparsity of *ĥ*_{KS} in real- and reciprocal-space representations and approximating $\xi r=\theta \beta (\mu \u2212\u0125KS)\u2009\chi r$ in Eq. (3) using Chebyshev or Newton interpolation polynomials,^{30,31} cf. $\theta \beta (\mu \u2212\u0125KS)\u2248\u2211n=0Ncan(\mu )Tn(\u0125KS)$, where *N*_{c} is the length of the polynomial chain and *T*_{n}(*ĥ*_{KS})’s are the Chebyshev polynomials. Instead of directly evaluating *T*_{n}(*ĥ*_{KS}) from powers of *ĥ*_{KS}, sDFT only requires the application of *T*_{n}(*ĥ*_{KS}) to |*χ*⟩ via the recurrence relation of the Chebyshev polynomial, i.e., |*χ*^{(n+1)}⟩ ≡ *T*_{n+1}(*ĥ*_{KS})|*χ*⟩ = 2*ĥ*_{KS}|*χ*^{(n)}⟩ − |*χ*^{(n−1)}⟩. In situations where the statistical noise does not increase with the system size, sDFT scales linearly with the system size. Often, for certain local observables, we find that sDFT scales better than linear (sub-linear) due to self-averaging. This has been recently discussed in detail in Ref. 27.

The decay of the magnitude of the statistical error with $N\chi $ implies that in order to reduce the error by an *order* of magnitude, one has to increase the number of stochastic orbitals by *two orders* of magnitude. Thus, small statistical errors required to converge local and single particle observables often require a huge set of stochastic orbitals, making sDFT impractical. Since the standard deviation of the density is proportional to $\rho r$ itself (as shown in the supplementary material), the statistical noise can be reduced by decomposing the density into a reference density, $\rho refr$, and correction terms $\Delta \rho r\u226a\rho r$ such that $\rho r=\rho refr+\Delta \rho r$. This is the central idea behind embedded-fragmented stochastic DFT (efsDFT).^{27–29} The system is divided into non-overlapping fragments, and the reference density is decomposed as a sum of the fragment densities. This has led to a significant reduction of the noise in efsDFT compared to sDFT when fragments are closed-shell molecules in non-covalently bonded systems.^{28} However, for covalently bonded materials, it is necessary to break covalent bonds when fragmenting the system. This leads to an increasingly large statistical error of the density at the boundaries between the fragments (see Fig. 1 for an illustration of this effect for Si_{128} with PBCs where the system was divided into two non-overlapping Si_{64} fragments), leading to small improvements of efsDFT with respect to sDFT.^{27,29}

## III. OVERLAPPED EMBEDDED FRAGMENTED STOCHASTIC DFT

To overcome the limitation of the efsDFT in covalently bonded fragments, we propose to use overlapping fragments, as sketched in Fig. 2. The basic idea behind the proposed overlapped embedded fragmented stochastic DFT (o-efsDFT) is that each fragment density and density matrix is calculated with a buffer region, but the projection of the fragment density is limited to the core region only. This allows for continuous densities and density matrices across the boundaries between the core fragments, thereby resolving the issue discussed above and allowing for a significant reduction of the statistical error, even for systems with periodic boundary conditions and a delocalized density matrix.

As sketched in Fig. 2, the system is divided into non-overlapped fragments that are composed of unit cells and/or small super cells referred to as “core regions,” labeled *C*_{f} for fragment *f*. Each core region is then wrapped with a “buffer” region (*B*_{f}), and a dressed fragment (labeled *D*_{f} = *C*_{f} ∪ *B*_{f}) is defined as the sum of core and buffered regions. For any ** r** ∈

*C*

_{f}, the dressed fragment density matrix ($\u27e8r|\rho ^f|r\u2032\u27e9$) is given by

In Eq. (6), $\phi if(r)$ are KS occupied orbitals for the dressed fragment *f* obtained by a deterministic DFT in region *D*_{f}, where we impose periodic boundary conditions for each dressed region separately. The closest distance between the boundaries *C*_{f} and *D*_{f} (Δ*r*, see Fig. 2) is chosen so that periodic boundary conditions can be imposed on region *D*_{f}. Note that the above definition of $\u27e8r|\rho ^f|r\u2032\u27e9$ is not necessarily hermitian and idempotent. However, this turns out to be insignificant, since the density matrix of the entire system remains hermitian regardless of how $\u27e8r|\rho ^f|r\u2032\u27e9$ behaves (see more detail below). In the limit where Δ*r* = 0, o-efsDFT is identical to efsDFT.^{28}

With the definition of $\u27e8r|\rho ^f|r\u2032\u27e9$ in Eq. (6), the o-efsDFT density at position ** r** ∈

*C*

_{f}is given by

where $\xi f(r)=\u2211i=1Noccf\phi if(r)\u27e8\phi if|\chi \u27e9Df$ and $\u27e8f|g\u27e9Df=\u222bDfdrf*(r)g(r)$, i.e., integration over the dressed fragment region. It is easy to verify that the density in Eq. (7) satisfies the following requirements:

In the infinite sampling limit, the properties calculated from o-efsDFT converge to deterministic DFT results.

If

*C*_{f}is the system itself, the properties calculated from o-efsDFT are equal to those from the deterministic DFT calculation regardless of the number of stochastic orbitals.Given any partitioning of the system into core regions, in the limit where the buffered zone grows such that

*D*_{f}represents the entire system, the properties calculated from o-efsDFT will again be equivalent to the deterministic results with any number of stochastic orbitals.

We now turn to discuss the calculations of kinetic energy, the nonlocal pseudopotential energy, and the nonlocal force using the above formalism with overlapped fragments. The calculation of the local pseudopotential energy, the Hartree term, the exchange correlation energies, and the corresponding forces can be obtained directly using the density in Eq. (7), similar to sDFT^{26} and efsDFT.^{28,29} The kinetic energy is evaluated in real-space by integrating the kinetic energy density over the core region of each fragment

For an infinite sample set (*N*_{χ} → *∞*), $\u27e8\chi |\phi if\u27e9Df\u27e8\phi jf|\chi \u27e9Df\chi =\u27e8\phi jf|\phi if\u27e9Df=\delta ij$. Therefore, the first and third terms in Eq. (8) cancel each other. Similarly, the non-local pseudopotential energy (*E*_{nl}) can be calculated as follows:

where $v^nlatom$ is the non-local pseudopotential operator for each atom and the sum $\u2211atom\u2208Cf$ refers to the summation over all atoms that are in region *C*_{f}. The non-local pseudopotential nuclear force for each atom with a position *R*_{atom} ∈ *C*_{f} is given by

It is straightforward to show that Eqs. (9) and (10) satisfy all three requirements above. Even with overlapped fragmentation, $Fnlatom$ is still a random variable with a finite number of stochastic orbitals. Random noise in $Fnlatom$ can be designed as a part of the temperature control algorithm in *ab initio* Langevin dynamics, as described in Ref. 29.

## IV. IMPLEMENTATION

The o-efsDFT approach can be implemented for any planewave or real-space based approach. The necessary steps to complete the self-consistent iteration to converge the density, energy, and forces can be summarized as follows (for a planewave based approach):

Perform a deterministic DFT calculation in region

*D*_{f}for each fragment*f*using a real-space grid spacing that equals the grid spacing of the full system.Using the fragment density as an initial guess, generate the corresponding Kohn-Sham potential.

Generate a set of stochastic orbitals $|\chi 1\cdots |\chi N\chi $ on a real-space grid and then transform to reciprocal space (it requires ≈

*N*_{g}/2 grid points in reciprocal space to store an*N*_{g}real-space orbital due to the isotropic kinetic energy cutoff).For all stochastic orbitals |

*χ*⟩, expand the action of the density matrix in terms of Chebyshev polynomials and tune the chemical potential to satisfy

where, as before, $Ne(\mu )=\chi \theta \beta (\mu \u2212\u0125KS)\chi \chi $, *ρ*_{f}(** r**) is defined in Eq. (7), $\xi fr=\u2211i=1Noccf\phi if(r)\u27e8\phi if|\chi \u27e9Df$, and

*N*

_{e}is the number of electrons in the system.

- (e)
Use the chosen

*μ*from the previous step to generate a set of stochastic projected orbitals, $|\xi =\theta \beta (\mu \u2212\u0125KS)|\chi $, using a Chebyshev or Newton interpolation polynomial to act with $\theta \beta (\mu \u2212\u0125KS)$ on |*χ*⟩. - (f)
Calculate the density

*ρ*() using Eq. (7) and the corresponding energy with Eqs. (8) and (9). The other energy terms can be obtained directly from the density, similarly to the deterministic DFT.*r* - (g)
Use the density as input for the next iteration and go to step (d). Repeat until convergence is achieved.

Throughout the self-consistent iterations, the same random number seed was used to generate the stochastic orbitals. This is necessary to converge the self-consistent procedure to a tolerance similar to deterministic DFT. This also reduces the computational effort associated with projecting the stochastic orbitals onto the fragments so that the first and last terms in Eqs. (7)–(10) were calculated prior to the self-consistent loop.

## V. RESULTS AND DISCUSSION

To assess the accuracy of the o-efsDFT method and its limitations, we performed Γ point calculations on a set of silicon crystals with varying super-cell size. Bulk silicon is rather challenging for linear-scaling DFT due to its particularly small local-density approximation (LDA) bandgap.^{32} A grid spacing of 0.41*a*_{0} corresponding to a planewave cutoff of 60 Ry was used to converge the energy and force to within an acceptable tolerance. To reduce the energy range of the KS-Hamiltonian, we used an “early truncation” scheme for the kinetic energy operator in which the magnitude of the reciprocal space vector ** k** (used in evaluating the kinetic energy operator acting on a wave function) was replaced by min{∥

**∥,**

*k**k*

_{ke}}, where $kke=2Ecutke$, with $Ecutke=20$ Ry for all 3 systems studied here. The Troullier-Martins norm-conserving pseudopotentials

^{33}were used together with the Kleinman-Bylander separable form.

^{34}The pseudopotentials were evaluated in real-space to reduce the computational scaling.

^{35}In the Fermi function,

*β*was set to ≈600 inverse Hartree, which is sufficient to converge the ground state properties to within an error much smaller than the statistical error.

We studied bulk silicon with 3 super cells of size 4 × 4 × 4 unit cells (*N*_{atom} = 512), 6 × 6 × 6 unit cells (*N*_{atom} = 1728), and 8 × 8 × 8 unit cells (*N*_{atom} = 4096). Core fragments of Si_{8} were used to partition all systems, while three different buffer regions were adopted for forming the dressed fragments with 8, 64, and 216 Si atoms. *N*_{χ} = 80 stochastic orbitals were used in each calculation, but for the non-overlapped Si_{8} fragments, we used *N*_{χ} = 800 stochastic orbitals. It should be noted that calculations with *N*_{χ} = 80 stochastic orbitals failed to converge without using overlapped fragments. The self-consistent iterations were converged using DIIS^{36–38} with a convergence criterion set to 10^{−8} hartree per electron.

Table I summarizes the results for the kinetic, nonlocal, local, and total energies per electron for all three system sizes studied and for all dressed fragments used. The external local potential energy per electron, *E*_{loc}/*N*_{e}, depends linearly on the density and provides a reliable measure of the accuracy of the stochastic density. As observed in Table I, for Si_{512}, *E*_{loc}/*N*_{e} with dressed fragments of sizes Si_{64} and Si_{216}, the external potential energy per electron agrees well with the deterministic DFT result, where the differences between the two are well within the error bars of 5 independent o-efsDFT runs. Although *E*_{tot}/*N*_{e} of the Si_{512} system with Si_{64} as fragments agrees better with the deterministic result, this is simply a matter of luck and the relevant information regarding the converging with the fragment size is contained in the statistical error. Reductions in the standard deviation of *E*_{loc}/*N*_{e} with respect to the dressed fragment size were observed in all 3 system sizes.

System . | Fragment . | N_{χ}
. | E_{K}/N_{e}
. | E_{nl}/N_{e}
. | E_{loc}/N_{e}
. | E_{tot}/N_{e}
. |
---|---|---|---|---|---|---|

Si_{512} | Deterministic | 10.3577 | 6.1274 | −8.6267 | −26.9954 | |

Si_{8} | 800 | 10.3596(88) | 6.1269(29) | −8.6215(41) | −26.9881(77) | |

Si_{64} | 80 | 10.3523(21) | 6.1277(59) | −8.6223(61) | −26.9955(12) | |

Si_{216} | 80 | 10.3560(13) | 6.1284(15) | −8.6265(27) | −26.9959(5) | |

Si_{1728} | Si_{64} | 80 | 10.3507(14) | 6.1287(25) | −8.6207(31) | −26.9953(11) |

Si_{216} | 80 | 10.3527(5) | 6.1286(7) | −8.6227(6) | −26.9964(2) | |

Si_{4096} | Si_{64} | 80 | 10.3526(23) | 6.1286(5) | −8.6219(13) | −26.9944(12) |

Si_{216} | 80 | 10.3539(7) | 6.1296(7) | −8.6243(11) | −26.9957(5) |

System . | Fragment . | N_{χ}
. | E_{K}/N_{e}
. | E_{nl}/N_{e}
. | E_{loc}/N_{e}
. | E_{tot}/N_{e}
. |
---|---|---|---|---|---|---|

Si_{512} | Deterministic | 10.3577 | 6.1274 | −8.6267 | −26.9954 | |

Si_{8} | 800 | 10.3596(88) | 6.1269(29) | −8.6215(41) | −26.9881(77) | |

Si_{64} | 80 | 10.3523(21) | 6.1277(59) | −8.6223(61) | −26.9955(12) | |

Si_{216} | 80 | 10.3560(13) | 6.1284(15) | −8.6265(27) | −26.9959(5) | |

Si_{1728} | Si_{64} | 80 | 10.3507(14) | 6.1287(25) | −8.6207(31) | −26.9953(11) |

Si_{216} | 80 | 10.3527(5) | 6.1286(7) | −8.6227(6) | −26.9964(2) | |

Si_{4096} | Si_{64} | 80 | 10.3526(23) | 6.1286(5) | −8.6219(13) | −26.9944(12) |

Si_{216} | 80 | 10.3539(7) | 6.1296(7) | −8.6243(11) | −26.9957(5) |

This is also illustrated in Fig. 3(a), where we plot the density along the *x*-axis (at fixed *y* and *z*) for Si_{512} for all three dressed fragment sizes. The non-overlapping fragments required 10 times more stochastic orbitals (*N*_{χ} = 800) in order to reduce the noise in the density to similar levels as the overlapping Si_{64} dressed fragments. For Si_{216} dressed fragments, the noise in the density dropped by a factor of ≈2, suggesting that only *N*_{χ} = 20 stochastic orbitals are needed to converge the density to the same level of noise as that of Si_{64} dressed fragments.

The decrease in the noise with increasing buffer sizes (increase in *B*_{f} region) is not limited to the density itself or to local observables that depends directly on the density, like *E*_{loc}/*N*_{e}. The kinetic energy per electron (*E*_{K}/*N*_{e}) and non-local pseudopotential energy per electron (*E*_{nl}/*N*_{e}) also show a significant reduction in their variance with increasing *B*_{f}, as can be seen in Table I. Compared to the deterministic results for Si_{512}, we find that o-efsDFT provides energies per electron that are within 10^{−3} eV from the deterministic values and that the agreement improves with the size of the buffer regions. This is also illustrated in panels (b) and (c) of Fig. 3, where we plot the total energy per electron (relative to a reference energy of the deterministic result for Si_{512}) for the 3 system sizes and for 3 different buffer regions (Si_{8}, Si_{64}, and Si_{216}) studied in this work. As before, we use *N*_{χ} = 80 stochastic orbitals except for the Si_{8} buffer region, where *N*_{χ} = 800. The total energy per electron contains much larger noise when the fragments do not overlap, even when the number of stochastic orbitals was 10 larger. Replacing the Si_{8} fragment with a dressed Si_{64} fragment significantly reduces the noise in the total energy per electron, and further reduction in the noise was achieved by enlarging the buffer region, i.e., using Si_{216} as a dressed fragment.

Within the accuracy of the current calculations, we find that the statistical error in the total energy per electron is similar for all 3 system sizes studied here (for a fixed buffer size), implying that self-averaging is not significant.^{26} This suggests that o-efsDFT scales nearly linearly with the system size, as indeed is observed for the scaling of the energy per electron, shown in Fig. 4. We focused on the computational cost of the self-consistent iterations which dominate the overall central processing unit (CPU) time. Since parallelism over stochastic orbitals is straightforward and the communication time is negligible, we report the CPU time per stochastic orbital (each orbital was distributed on a different node). Our results show that the time used in each self-consistent iteration scales as $ONe0.77$. The net CPU time per stochastic orbital over all self-consistent iterations scales as $ONe0.71$, which suggests that the number of self-consistent iterations does not significantly increase with increasing system size. We note that the practical scaling is somewhat better than linear ($ONe$) mainly due to improved multithreading timings for larger grids. This however may depend on the computational architecture used and on the range of systems studied. Generally, we expect the approach to scale as $ONe$ for large systems.

In addition to computing the individual contribution to the total energy per electron, o-efsDFT also offers an efficient and accurate approach to compute the forces on each atom. Fig. 5(a) shows the average and standard deviation of nuclear forces from 5 independent o-efsDFT runs for 50 representative atoms in Si_{512}. The forces are computed for the equilibrium structure and thus should fluctuate about 0. Forces from o-efsDFT with non-overlapped fragments and *N*_{χ} = 800 stochastic orbitals fluctuate with comparable standard deviations as those from o-efsDFT with Si_{64} dressed fragments and *N*_{χ} = 80 stochastic orbitals, yet the computational effort is nearly 10 times smaller in the latter. Smaller force fluctuations were observed with growing buffer region, similar to the reduction in the statistical noise in the total energy per electron.

A summary of the statistical error in the forces as a function of the size of the dressed fragment and as a function of the system size are shown in panels (b) and (c) of Fig. 5, respectively. Within the central limit theorem, we assume that the standard deviation of force on the nuclei decays as $\sigma 0/N\chi $, where *σ*_{0} is the standard deviation with only 1 stochastic orbital. In panel (b), we plot *σ*_{0} as a function of the dressed fragment size, where a significant reduction is clearly observed. Panel (c) shows that *σ*_{0} does not change significantly with the system size, suggesting that the number of stochastic orbitals does not increase with the system size for a given statistical error, suggesting that the scaling for the force follows that of the energy per electron shown in Fig. 4.

## VI. OVERLAPPED VERSUS NON-OVERLAPPED FRAGMENTS

To further understand the o-efsDFT results presented in Sec. V, we examine the role played by the overlap of fragments. In the supplementary material, we derive an expression for the variance of the density in terms of the density itself and the density matrix

where $\Delta \rho ^=\u2211f\rho ^f\u2212\rho ^$ and $\rho ^f$ and *ρ* are defined in Eq. (6) and Eq. (1), respectively. It is clear that the noise in the density is not just correlated with the density, but also with the density matrix. Thus, fragmentation schemes to reduce the noise in $\rho r$ must provide a reasonable approximation for the density matrix across different fragments. This is not the case for the non-overlap fragments, where off-diagonal matrix elements of the full reference density matrix ($\u2211f\rho ^f$) for different fragments vanish, leading to a significant level of noise. In o-efsDFT with non-vanishing buffer zones, the reference density matrix is allowed to decay at least within distance Δ*r* before it is truncated to zero [see Fig. (2)]. This improves the reference density matrix and the resulting noise is significantly reduced as the size of the buffer zone increases.

In Fig. 6, we show the reference density matrix for three different fragmentation schemes. Panels (a) and (b) correspond to non-overlapped fragments with Si_{8} and Si_{64} fragments, respectively, and panel (c) shows the reference density matrix for Si_{8} with a dressed fragment of Si_{64}. In the eigenfunction representation, the density matrix should equal the identity matrix. This is clearly not the case for panels (a) and (b), but is rather close to the unit matrix for panel (c). This is rather surprising since even with fragments that equal the size of the dressed fragment, the density matrix does not improve. Obviously, the discontinuity in the density matrix at the interfaces between fragments does not depends on the size of the fragment for the non-overlapped case, leading to significant noise in the density even for large fragments. However, it only takes a small fragment, Si_{8}, with a buffer zone of Si_{64} to significantly reduce the noise.

## VII. CONCLUSION

Motivated by the need to reduce significantly the noise level in linear-scaling sDFT, we have proposed a new fragmentation scheme that circumvents some of the limitations of our original embedded-fragmented stochastic DFT approach, mainly for covalently bonded systems. While previous attempts to develop a fragmentation scheme were based solely on providing a good reference system for the density alone, here, we showed that this is not a sufficient criterion. Based on the detailed analysis of the variance in the density, we developed a new overlapped fragmentation scheme, which provides an excellent reference for both the *density matrix* and the *density*. By dividing the system into non-overlapped core fragments and overlapped dressed fragments, we demonstrated that the existence of a buffer zone is crucial in the reduction of the stochastic noise in the energy per electron and nuclear forces. Moreover, we showed that the stochastic noise does not increase with respect to system size when using a fixed fragment sizes nor does the number of self-consistent iterations, leading to a practical scaling that is slightly better than the expected linear-scaling. While all applications in this study were limited to bulk silicon with periodic boundary conditions, we wish to iterate that the approach is also valid for open boundary conditions, where we expect it to work equally well.

## SUPPLEMENTARY MATERIAL

See supplementary material for derivations of convergence and stochastic fluctuations of density in o-efsDFT.

## ACKNOWLEDGMENTS

R.B. gratefully acknowledges support from the Israel Science Foundation (Grant No. 189/14). D.N. and E.R. are grateful for support by the Center for Computational Study of Excited State Phenomena in Energy Materials (C2SEPEM) at the Lawrence Berkeley National Laboratory, which is funded by the U.S. Department of Energy, Office of Science, Basic energy Sciences, Materials Sciences and Engineering Division under Contract No. DEAC02-05CH11231 as part of the Computational Materials Sciences Program. Resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility operated under Contract No. DE-AC02-05CH11231, are greatly acknowledged.