The stochastic density functional theory (DFT) [R. Baer et al., Phys. Rev. Lett. 111, 106402 (2013)] is a valuable linear-scaling approach to Kohn-Sham DFT that does not rely on the sparsity of the density matrix. Linear (and often sub-linear) scaling is achieved by introducing a controlled statistical error in the density, energy, and forces. The statistical error (noise) is proportional to the inverse square root of the number of stochastic orbitals and thus decreases slowly; however, by dividing the system into fragments that are embedded stochastically, the statistical error can be reduced significantly. This has been shown to provide remarkable results for non-covalently-bonded systems; however, the application to covalently bonded systems had limited success, particularly for delocalized electrons. Here, we show that the statistical error in the density correlates with both the density and the density matrix of the system and propose a new fragmentation scheme that elegantly interpolates between overlapped fragments. We assess the performance of the approach for bulk silicon of varying supercell sizes (up to Ne = 16 384 electrons) and show that overlapped fragments reduce significantly the statistical noise even for systems with a delocalized density matrix.

An accurate description of the electronic properties is often a prerequisite for understanding the behavior of complex materials. Density functional theory (DFT)1 within the Kohn-Sham (KS) formulation2 provides an excellent framework that balances computational complexity and accuracy and thus has been the method of choice for extended, large-scale systems. Solving the KS equations scales formally as ONe3, where Ne is the number of electrons in the system. Traditional implementations of KS-DFT are often limited to relative systems containing Ne < 104 electrons,3–5 even with the high-performance computer architectures of today. Clearly, there is a need for low-scaling KS-DFT approaches.

A natural approach to linear-scaling KS-DFT relies on the “locality” of the density matrix, that is, the elements of the density matrix ρr,r decay with increasing ∥rr′∥.6–14 The reliance on Kohn’s “nearsightedness” principle10 makes these approaches sensitive to the dimensionality and the character of the system. They work extremely well for low dimensional structures or systems with a large fundamental gap,11 but in 3-dimensions (3D), linear-scaling is achieved only for very large systems, typically for Ne > 104. An alternative to Kohn’s “nearsightedness” principle is based on “divide” and “conquer,”15–20 where the density or the density matrix is partitioned into fragments and the KS equations are solved for each fragment separately using localized basis sets.15,17,18 The complexity of solving the KS equations is shifted to computing the interactions between the fragments,17,21–25 and the accuracy (systematic errors that decrease with the fragment size) and scaling depend on the specific implementation.

Recently, we have proposed a new scheme for linear-scaling DFT based approach.26 Stochastic DFT (sDFT) sidesteps the calculation of the density matrix and is therefore not directly sensitive to its evasive sparseness. Instead, the density is given in terms of a trace formula and is evaluated using stochastic occupied orbitals generated by a Chebyshev or Newton expansion of the occupation operator.26 Random fluctuations of local properties, e.g., energy per atom, forces, and density of states, are controlled by the number of stochastic orbitals and are often independent of the system size,27 leading to a computational cost that scales linearly (and sometimes sub-linearly).

The statistical error is substantially reduced when, instead of sampling the full density, one samples the difference between the full system density and that of reference fragments.28 Indeed, embedded fragment stochastic DFT (efsDFT) has been extremely successful in reducing the noise level for non-covalently bonded fragments.28 However, for covalently bonded fragments, efsDFT needs to be structured appropriately,29 particularly for periodic boundary conditions (PBCs).

In this manuscript, we have developed a new fragmentation approach suitable for covalently bonded systems with open or periodic boundary conditions (PBCs). Similar to efsDFT, we divide the system into core fragments, but in addition, for each core fragment, we add a buffer zone allowing for fragments to overlap in the embedding procedure. Similar in spirit to an earlier work on divide and conquer,17 we demonstrate that allowing the fragments to overlap results in significant improvements of the reference density matrix, reducing significantly the level of statistical noise. This manuscript is organized as follows: In Sec. II, we briefly review the theory of stochastic orbital DFT and its embedded-fragmented version. In Sec. III, we outline the new overlapped embedded fragmented stochastic DFT (o-efsDFT). Section V presents the results of numerical calculations on a set of silicon crystals of varying super-cell size, up to 4096 atoms, and discusses the significance of the overlapped fragments on reducing the statistical error.

We consider a system with periodic boundary conditions in a super-cell of volume V with an electron density ρ(r) represented on a real-space grid with NG grid points. The electronic properties are described within Kohn-Sham (KS) density functional theory,2 where the density is given by ρr=i=1Noccr|ψiψi|r. Here, ⟨r|ψi⟩ are the KS orbitals and Nocc is the number of occupied states. The KS density can also be expressed as a trace over the density matrix, ρ^=limβθβ(μĥKS),

ρr=Tr θβ(μĥKS)δrr^,
(1)

where δrr^ is Dirac’s delta function and θβx^=1+eβx^1 is a step function in the limit β (in practice, β is chosen to be large enough to converge the ground state properties). The chemical potential, μ, is determined by imposing the relation Neμ=Trθβ(μĥKS), where Neμ is the total number of electrons in the system. In Eq. (1), ĥKS is the KS Hamiltonian, given by

ĥKS=t^+v^loc+v^nl+v^H+v^XC,
(2)

where t^ is the kinetic energy, v^loc and v^nl are the local and nonlocal pseudopotentials, respectively, v^H is the Hartree potential, and v^XC is the exchange-correlation potential.

Equation (1) is the starting point for the derivation of the stochastic DFT.26 The trace is performed using Nχ stochastic orbitals, |χ⟩, leading to

ρr=χθβ(μĥKS)δrr^χχ=ξδrr^ξχξr2χ,
(3)

where |ξ=θβ(μĥKS)|χ is a projected stochastic orbital. In Eq. (3), χr=±ΔV1/2, where ΔV = V/NG is the volume element of the real-space grid and ± indicates that we randomly, uniformly, and independently select the sign of χ for each grid point r. The additional brackets (χ) in Eq. (3) represent an average over all stochastic realizations, namely, χ=1Nχχ. Equation (3) becomes exact only in the limit that Nχ, and otherwise it is a stochastic approximation to the electron density, with a random error whose magnitude decays as Nχ. Equation (3) is solved self-consistently, since the KS Hamiltonian depends on ρr itself.

Obtaining the total energy within the sDFT formalism is tricky. The Hartree, local pseudopotential, and local exchange-correlation energy terms can be obtained directly from the stochastic density, similar to deterministic DFT. The calculation of the kinetic energy and nonlocal pseudopotential energy requires the Kohn-Sham orbitals in deterministic DFT. In sDFT, these can be evaluated using the relations26 

EK=Tr ρ^t^=ξt^ξχ,
(4)
Enl=Tr ρ^v^nl=ξv^nlξχ.
(5)

The advantage of the stochastic DFT formalism is that the density can be calculated with linear (and often sub-linear) scaling at the cost of introducing a controlled statistical error, which is controlled by Nχ where Nx is the number of stochastic orbitals. Linear-scaling is achieved by taking advantage of the sparsity of ĥKS in real- and reciprocal-space representations and approximating ξr=θβ(μĥKS)χr in Eq. (3) using Chebyshev or Newton interpolation polynomials,30,31 cf. θβ(μĥKS)n=0Ncan(μ)Tn(ĥKS), where Nc is the length of the polynomial chain and Tn(ĥKS)’s are the Chebyshev polynomials. Instead of directly evaluating Tn(ĥKS) from powers of ĥKS, sDFT only requires the application of Tn(ĥKS) to |χ⟩ via the recurrence relation of the Chebyshev polynomial, i.e., |χ(n+1)⟩ ≡ Tn+1(ĥKS)|χ⟩ = 2ĥKS|χ(n)⟩ − |χ(n−1)⟩. In situations where the statistical noise does not increase with the system size, sDFT scales linearly with the system size. Often, for certain local observables, we find that sDFT scales better than linear (sub-linear) due to self-averaging. This has been recently discussed in detail in Ref. 27.

The decay of the magnitude of the statistical error with Nχ implies that in order to reduce the error by an order of magnitude, one has to increase the number of stochastic orbitals by two orders of magnitude. Thus, small statistical errors required to converge local and single particle observables often require a huge set of stochastic orbitals, making sDFT impractical. Since the standard deviation of the density is proportional to ρr itself (as shown in the supplementary material), the statistical noise can be reduced by decomposing the density into a reference density, ρrefr, and correction terms Δρrρr such that ρr=ρrefr+Δρr. This is the central idea behind embedded-fragmented stochastic DFT (efsDFT).27–29 The system is divided into non-overlapping fragments, and the reference density is decomposed as a sum of the fragment densities. This has led to a significant reduction of the noise in efsDFT compared to sDFT when fragments are closed-shell molecules in non-covalently bonded systems.28 However, for covalently bonded materials, it is necessary to break covalent bonds when fragmenting the system. This leads to an increasingly large statistical error of the density at the boundaries between the fragments (see Fig. 1 for an illustration of this effect for Si128 with PBCs where the system was divided into two non-overlapping Si64 fragments), leading to small improvements of efsDFT with respect to sDFT.27,29

FIG. 1.

The average relative deviation of the stochastic density from the deterministic density along the c-axis of a silicon super cell with 128 atoms (2 × 2 × 4) (left panel). The stochastic density was calculated within efsDFT with two non-overlapped Si64 fragments (2 × 2 × 2). The right panel shows that this relative density difference is projected onto 2D (we only plot deviations above 10%). The largest deviation is at the interface between the two fragments (c ≈ 0, 10, 20 Å).

FIG. 1.

The average relative deviation of the stochastic density from the deterministic density along the c-axis of a silicon super cell with 128 atoms (2 × 2 × 4) (left panel). The stochastic density was calculated within efsDFT with two non-overlapped Si64 fragments (2 × 2 × 2). The right panel shows that this relative density difference is projected onto 2D (we only plot deviations above 10%). The largest deviation is at the interface between the two fragments (c ≈ 0, 10, 20 Å).

Close modal

To overcome the limitation of the efsDFT in covalently bonded fragments, we propose to use overlapping fragments, as sketched in Fig. 2. The basic idea behind the proposed overlapped embedded fragmented stochastic DFT (o-efsDFT) is that each fragment density and density matrix is calculated with a buffer region, but the projection of the fragment density is limited to the core region only. This allows for continuous densities and density matrices across the boundaries between the core fragments, thereby resolving the issue discussed above and allowing for a significant reduction of the statistical error, even for systems with periodic boundary conditions and a delocalized density matrix.

FIG. 2.

A sketch of the overlapped fragmentation scheme. The system is first divided into non-overlapping core fragments (solid orange and blue regions) of size Cf for each fragment f. For each core fragment f (orange core), we define a buffer region (dashed red region) of size Bf. The fragment density is then computed for the core and buffer region together (dressed fragment, Df) using periodic boundary conditions for Df and projected onto the core, as described in the main text.

FIG. 2.

A sketch of the overlapped fragmentation scheme. The system is first divided into non-overlapping core fragments (solid orange and blue regions) of size Cf for each fragment f. For each core fragment f (orange core), we define a buffer region (dashed red region) of size Bf. The fragment density is then computed for the core and buffer region together (dressed fragment, Df) using periodic boundary conditions for Df and projected onto the core, as described in the main text.

Close modal

As sketched in Fig. 2, the system is divided into non-overlapped fragments that are composed of unit cells and/or small super cells referred to as “core regions,” labeled Cf for fragment f. Each core region is then wrapped with a “buffer” region (Bf), and a dressed fragment (labeled Df = CfBf) is defined as the sum of core and buffered regions. For any rCf, the dressed fragment density matrix (r|ρ^f|r) is given by

r|ρ^f|r=i=1Noccfr|φifφif|rrDf0rDf  .
(6)

In Eq. (6), φif(r) are KS occupied orbitals for the dressed fragment f obtained by a deterministic DFT in region Df, where we impose periodic boundary conditions for each dressed region separately. The closest distance between the boundaries Cf and Dfr, see Fig. 2) is chosen so that periodic boundary conditions can be imposed on region Df. Note that the above definition of r|ρ^f|r is not necessarily hermitian and idempotent. However, this turns out to be insignificant, since the density matrix of the entire system remains hermitian regardless of how r|ρ^f|r behaves (see more detail below). In the limit where Δr = 0, o-efsDFT is identical to efsDFT.28 

With the definition of r|ρ^f|r in Eq. (6), the o-efsDFT density at position rCf is given by

ρ(r)=r|ρ^fρ^f|r+|ξ(r)|2χr|ρ^f|χχ|ρ^f|rχ=ρf(r)+|ξ(r)|2χ|ξf(r)|2χ,
(7)

where ξf(r)=i=1Noccfφif(r)φif|χDf and f|gDf=Dfdrf*(r)g(r), i.e., integration over the dressed fragment region. It is easy to verify that the density in Eq. (7) satisfies the following requirements:

  • In the infinite sampling limit, the properties calculated from o-efsDFT converge to deterministic DFT results.

  • If Cf is the system itself, the properties calculated from o-efsDFT are equal to those from the deterministic DFT calculation regardless of the number of stochastic orbitals.

  • Given any partitioning of the system into core regions, in the limit where the buffered zone grows such that Df represents the entire system, the properties calculated from o-efsDFT will again be equivalent to the deterministic results with any number of stochastic orbitals.

We now turn to discuss the calculations of kinetic energy, the nonlocal pseudopotential energy, and the nonlocal force using the above formalism with overlapped fragments. The calculation of the local pseudopotential energy, the Hartree term, the exchange correlation energies, and the corresponding forces can be obtained directly using the density in Eq. (7), similar to sDFT26 and efsDFT.28,29 The kinetic energy is evaluated in real-space by integrating the kinetic energy density over the core region of each fragment

EK=fi=1Noccfφif|t^|φifCf+ξ|t^|ξχfξf|t^|ξfCfχ.
(8)

For an infinite sample set (Nχ), χ|φifDfφjf|χDfχ=φjf|φifDf=δij. Therefore, the first and third terms in Eq. (8) cancel each other. Similarly, the non-local pseudopotential energy (Enl) can be calculated as follows:

Enl=fatomCfi=1Noccfφif|v^nlatom|φifDf+ξ|v^nl|ξχfatomCfξf|v^nlatom|ξfDfχ,
(9)

where v^nlatom is the non-local pseudopotential operator for each atom and the sum atomCf refers to the summation over all atoms that are in region Cf. The non-local pseudopotential nuclear force for each atom with a position RatomCf is given by

Fnlatom=i=1Noccfφifv^nlatomRatomφifDf+ξv^nlatomRatomξχξfv^nlatomRatomξfDfχ.
(10)

It is straightforward to show that Eqs. (9) and (10) satisfy all three requirements above. Even with overlapped fragmentation, Fnlatom is still a random variable with a finite number of stochastic orbitals. Random noise in Fnlatom can be designed as a part of the temperature control algorithm in ab initio Langevin dynamics, as described in Ref. 29.

The o-efsDFT approach can be implemented for any planewave or real-space based approach. The necessary steps to complete the self-consistent iteration to converge the density, energy, and forces can be summarized as follows (for a planewave based approach):

  • Perform a deterministic DFT calculation in region Df for each fragment f using a real-space grid spacing that equals the grid spacing of the full system.

  • Using the fragment density as an initial guess, generate the corresponding Kohn-Sham potential.

  • Generate a set of stochastic orbitals |χ1|χNχ on a real-space grid and then transform to reciprocal space (it requires ≈Ng/2 grid points in reciprocal space to store an Ng real-space orbital due to the isotropic kinetic energy cutoff).

  • For all stochastic orbitals |χ⟩, expand the action of the density matrix in terms of Chebyshev polynomials and tune the chemical potential to satisfy

Ne(μ)+fCfdrρf(r)fCfdr|ξf(r)|2χ=Ne,
(11)

where, as before, Ne(μ)=χθβ(μĥKS)χχ, ρf(r) is defined in Eq. (7), ξfr=i=1Noccfφif(r)φif|χDf, and Ne is the number of electrons in the system.

  • (e)

    Use the chosen μ from the previous step to generate a set of stochastic projected orbitals, |ξ=θβ(μĥKS)|χ, using a Chebyshev or Newton interpolation polynomial to act with θβ(μĥKS) on |χ⟩.

  • (f)

    Calculate the density ρ(r) using Eq. (7) and the corresponding energy with Eqs. (8) and (9). The other energy terms can be obtained directly from the density, similarly to the deterministic DFT.

  • (g)

    Use the density as input for the next iteration and go to step (d). Repeat until convergence is achieved.

Throughout the self-consistent iterations, the same random number seed was used to generate the stochastic orbitals. This is necessary to converge the self-consistent procedure to a tolerance similar to deterministic DFT. This also reduces the computational effort associated with projecting the stochastic orbitals onto the fragments so that the first and last terms in Eqs. (7)–(10) were calculated prior to the self-consistent loop.

To assess the accuracy of the o-efsDFT method and its limitations, we performed Γ point calculations on a set of silicon crystals with varying super-cell size. Bulk silicon is rather challenging for linear-scaling DFT due to its particularly small local-density approximation (LDA) bandgap.32 A grid spacing of 0.41a0 corresponding to a planewave cutoff of 60 Ry was used to converge the energy and force to within an acceptable tolerance. To reduce the energy range of the KS-Hamiltonian, we used an “early truncation” scheme for the kinetic energy operator in which the magnitude of the reciprocal space vector k (used in evaluating the kinetic energy operator acting on a wave function) was replaced by min{∥k∥, kke}, where kke=2Ecutke, with Ecutke=20 Ry for all 3 systems studied here. The Troullier-Martins norm-conserving pseudopotentials33 were used together with the Kleinman-Bylander separable form.34 The pseudopotentials were evaluated in real-space to reduce the computational scaling.35 In the Fermi function, β was set to ≈600 inverse Hartree, which is sufficient to converge the ground state properties to within an error much smaller than the statistical error.

We studied bulk silicon with 3 super cells of size 4 × 4 × 4 unit cells (Natom = 512), 6 × 6 × 6 unit cells (Natom = 1728), and 8 × 8 × 8 unit cells (Natom = 4096). Core fragments of Si8 were used to partition all systems, while three different buffer regions were adopted for forming the dressed fragments with 8, 64, and 216 Si atoms. Nχ = 80 stochastic orbitals were used in each calculation, but for the non-overlapped Si8 fragments, we used Nχ = 800 stochastic orbitals. It should be noted that calculations with Nχ = 80 stochastic orbitals failed to converge without using overlapped fragments. The self-consistent iterations were converged using DIIS36–38 with a convergence criterion set to 10−8 hartree per electron.

Table I summarizes the results for the kinetic, nonlocal, local, and total energies per electron for all three system sizes studied and for all dressed fragments used. The external local potential energy per electron, Eloc/Ne, depends linearly on the density and provides a reliable measure of the accuracy of the stochastic density. As observed in Table I, for Si512, Eloc/Ne with dressed fragments of sizes Si64 and Si216, the external potential energy per electron agrees well with the deterministic DFT result, where the differences between the two are well within the error bars of 5 independent o-efsDFT runs. Although Etot/Ne of the Si512 system with Si64 as fragments agrees better with the deterministic result, this is simply a matter of luck and the relevant information regarding the converging with the fragment size is contained in the statistical error. Reductions in the standard deviation of Eloc/Ne with respect to the dressed fragment size were observed in all 3 system sizes.

TABLE I.

Si512, Si1728, and Si4096 were calculated by o-efsDFT with Si64/Si216 as fragments. Kinetic energy per electron, non-local pseudo-potential energy per electron, and total energy per electron (in eV) are presented in the table. The standard deviation in the last digits of energies is listed in parentheses. Deterministic DFT calculation of Si512 is also presented.

SystemFragmentNχEK/NeEnl/NeEloc/NeEtot/Ne
Si512 Deterministic  10.3577 6.1274 −8.6267 −26.9954 
 Si8 800 10.3596(88) 6.1269(29) −8.6215(41) −26.9881(77) 
 Si64 80 10.3523(21) 6.1277(59) −8.6223(61) −26.9955(12) 
 Si216 80 10.3560(13) 6.1284(15) −8.6265(27) −26.9959(5) 
Si1728 Si64 80 10.3507(14) 6.1287(25) −8.6207(31) −26.9953(11) 
 Si216 80 10.3527(5) 6.1286(7) −8.6227(6) −26.9964(2) 
Si4096 Si64 80 10.3526(23) 6.1286(5) −8.6219(13) −26.9944(12) 
 Si216 80 10.3539(7) 6.1296(7) −8.6243(11) −26.9957(5) 
SystemFragmentNχEK/NeEnl/NeEloc/NeEtot/Ne
Si512 Deterministic  10.3577 6.1274 −8.6267 −26.9954 
 Si8 800 10.3596(88) 6.1269(29) −8.6215(41) −26.9881(77) 
 Si64 80 10.3523(21) 6.1277(59) −8.6223(61) −26.9955(12) 
 Si216 80 10.3560(13) 6.1284(15) −8.6265(27) −26.9959(5) 
Si1728 Si64 80 10.3507(14) 6.1287(25) −8.6207(31) −26.9953(11) 
 Si216 80 10.3527(5) 6.1286(7) −8.6227(6) −26.9964(2) 
Si4096 Si64 80 10.3526(23) 6.1286(5) −8.6219(13) −26.9944(12) 
 Si216 80 10.3539(7) 6.1296(7) −8.6243(11) −26.9957(5) 

This is also illustrated in Fig. 3(a), where we plot the density along the x-axis (at fixed y and z) for Si512 for all three dressed fragment sizes. The non-overlapping fragments required 10 times more stochastic orbitals (Nχ = 800) in order to reduce the noise in the density to similar levels as the overlapping Si64 dressed fragments. For Si216 dressed fragments, the noise in the density dropped by a factor of ≈2, suggesting that only Nχ = 20 stochastic orbitals are needed to converge the density to the same level of noise as that of Si64 dressed fragments.

FIG. 3.

Panel (a): The standard deviation of the density along the x-axis for y = z = 0 for Si512 with three different dressed fragment sizes (all for Si8 core fragment), as specified in the legend. Panels (b) and (c): The energy per electron (relative to that of a deterministic DFT calculation for Si512) as a function of the system size for dressed fragment of Si64 [panel (b)] and Si216 [panel (c)] with Nχ = 80. In panel (b), we also show the result with the Si8 dressed fragment (namely, no overlap between the fragments) with Nχ = 800, where we also observe a large bias27 and find that the energy is larger by 4 meV per electron compared to the deterministic value.

FIG. 3.

Panel (a): The standard deviation of the density along the x-axis for y = z = 0 for Si512 with three different dressed fragment sizes (all for Si8 core fragment), as specified in the legend. Panels (b) and (c): The energy per electron (relative to that of a deterministic DFT calculation for Si512) as a function of the system size for dressed fragment of Si64 [panel (b)] and Si216 [panel (c)] with Nχ = 80. In panel (b), we also show the result with the Si8 dressed fragment (namely, no overlap between the fragments) with Nχ = 800, where we also observe a large bias27 and find that the energy is larger by 4 meV per electron compared to the deterministic value.

Close modal

The decrease in the noise with increasing buffer sizes (increase in Bf region) is not limited to the density itself or to local observables that depends directly on the density, like Eloc/Ne. The kinetic energy per electron (EK/Ne) and non-local pseudopotential energy per electron (Enl/Ne) also show a significant reduction in their variance with increasing Bf, as can be seen in Table I. Compared to the deterministic results for Si512, we find that o-efsDFT provides energies per electron that are within 10−3 eV from the deterministic values and that the agreement improves with the size of the buffer regions. This is also illustrated in panels (b) and (c) of Fig. 3, where we plot the total energy per electron (relative to a reference energy of the deterministic result for Si512) for the 3 system sizes and for 3 different buffer regions (Si8, Si64, and Si216) studied in this work. As before, we use Nχ = 80 stochastic orbitals except for the Si8 buffer region, where Nχ = 800. The total energy per electron contains much larger noise when the fragments do not overlap, even when the number of stochastic orbitals was 10 larger. Replacing the Si8 fragment with a dressed Si64 fragment significantly reduces the noise in the total energy per electron, and further reduction in the noise was achieved by enlarging the buffer region, i.e., using Si216 as a dressed fragment.

Within the accuracy of the current calculations, we find that the statistical error in the total energy per electron is similar for all 3 system sizes studied here (for a fixed buffer size), implying that self-averaging is not significant.26 This suggests that o-efsDFT scales nearly linearly with the system size, as indeed is observed for the scaling of the energy per electron, shown in Fig. 4. We focused on the computational cost of the self-consistent iterations which dominate the overall central processing unit (CPU) time. Since parallelism over stochastic orbitals is straightforward and the communication time is negligible, we report the CPU time per stochastic orbital (each orbital was distributed on a different node). Our results show that the time used in each self-consistent iteration scales as ONe0.77. The net CPU time per stochastic orbital over all self-consistent iterations scales as ONe0.71, which suggests that the number of self-consistent iterations does not significantly increase with increasing system size. We note that the practical scaling is somewhat better than linear (ONe) mainly due to improved multithreading timings for larger grids. This however may depend on the computational architecture used and on the range of systems studied. Generally, we expect the approach to scale as ONe for large systems.

FIG. 4.

A log-log plot of the CPU time per core versus the number of electrons. The green and blue symbols are for the total wall time of all self-consistent iterations and for a single self-consistent iteration, respectively. The straight lines are fits to a power law with exponents of Ne0.71 and Ne0.77, respectively.

FIG. 4.

A log-log plot of the CPU time per core versus the number of electrons. The green and blue symbols are for the total wall time of all self-consistent iterations and for a single self-consistent iteration, respectively. The straight lines are fits to a power law with exponents of Ne0.71 and Ne0.77, respectively.

Close modal

In addition to computing the individual contribution to the total energy per electron, o-efsDFT also offers an efficient and accurate approach to compute the forces on each atom. Fig. 5(a) shows the average and standard deviation of nuclear forces from 5 independent o-efsDFT runs for 50 representative atoms in Si512. The forces are computed for the equilibrium structure and thus should fluctuate about 0. Forces from o-efsDFT with non-overlapped fragments and Nχ = 800 stochastic orbitals fluctuate with comparable standard deviations as those from o-efsDFT with Si64 dressed fragments and Nχ = 80 stochastic orbitals, yet the computational effort is nearly 10 times smaller in the latter. Smaller force fluctuations were observed with growing buffer region, similar to the reduction in the statistical noise in the total energy per electron.

FIG. 5.

Panel (a): Forces on the nuclei along the x-axis for 50 representative atoms in Si512 with Si64 (blue) and Si216 (red) dressed fragments and for Nχ = 80. The dots and error bars are averages and standard deviations of the forces from 5 independent runs. Panels (b) and (c) show the value of σ0 (see the main text for definition) as a function of the buffer size and the number of electrons, respectively.

FIG. 5.

Panel (a): Forces on the nuclei along the x-axis for 50 representative atoms in Si512 with Si64 (blue) and Si216 (red) dressed fragments and for Nχ = 80. The dots and error bars are averages and standard deviations of the forces from 5 independent runs. Panels (b) and (c) show the value of σ0 (see the main text for definition) as a function of the buffer size and the number of electrons, respectively.

Close modal

A summary of the statistical error in the forces as a function of the size of the dressed fragment and as a function of the system size are shown in panels (b) and (c) of Fig. 5, respectively. Within the central limit theorem, we assume that the standard deviation of force on the nuclei decays as σ0/Nχ, where σ0 is the standard deviation with only 1 stochastic orbital. In panel (b), we plot σ0 as a function of the dressed fragment size, where a significant reduction is clearly observed. Panel (c) shows that σ0 does not change significantly with the system size, suggesting that the number of stochastic orbitals does not increase with the system size for a given statistical error, suggesting that the scaling for the force follows that of the energy per electron shown in Fig. 4.

To further understand the o-efsDFT results presented in Sec. V, we examine the role played by the overlap of fragments. In the supplementary material, we derive an expression for the variance of the density in terms of the density itself and the density matrix

Varρr2Vmaxrr|2ρ^+Δρ^|r2+4ρ(r)×drr|Δρ^|r2,
(12)

where Δρ^=fρ^fρ^ and ρ^f and ρ are defined in Eq. (6) and Eq. (1), respectively. It is clear that the noise in the density is not just correlated with the density, but also with the density matrix. Thus, fragmentation schemes to reduce the noise in ρr must provide a reasonable approximation for the density matrix across different fragments. This is not the case for the non-overlap fragments, where off-diagonal matrix elements of the full reference density matrix (fρ^f) for different fragments vanish, leading to a significant level of noise. In o-efsDFT with non-vanishing buffer zones, the reference density matrix is allowed to decay at least within distance Δr before it is truncated to zero [see Fig. (2)]. This improves the reference density matrix and the resulting noise is significantly reduced as the size of the buffer zone increases.

In Fig. 6, we show the reference density matrix for three different fragmentation schemes. Panels (a) and (b) correspond to non-overlapped fragments with Si8 and Si64 fragments, respectively, and panel (c) shows the reference density matrix for Si8 with a dressed fragment of Si64. In the eigenfunction representation, the density matrix should equal the identity matrix. This is clearly not the case for panels (a) and (b), but is rather close to the unit matrix for panel (c). This is rather surprising since even with fragments that equal the size of the dressed fragment, the density matrix does not improve. Obviously, the discontinuity in the density matrix at the interfaces between fragments does not depends on the size of the fragment for the non-overlapped case, leading to significant noise in the density even for large fragments. However, it only takes a small fragment, Si8, with a buffer zone of Si64 to significantly reduce the noise.

FIG. 6.

Fragment density matrix difference Δρ^=fρ^fρ^ evaluated for Si512 with the (a) non-overlapped Si8 fragment, (b) non-overlapped Si64 fragment, and (c) overlapped Si64 fragments. Fragment density matrices are represented in the subspace spanned by occupied orbitals of Si512. In this representation, the system density matrix ρ^ becomes the identity matrix. To span several orders of magnitude in the values Δρ^, we use a logarithmic color scale.

FIG. 6.

Fragment density matrix difference Δρ^=fρ^fρ^ evaluated for Si512 with the (a) non-overlapped Si8 fragment, (b) non-overlapped Si64 fragment, and (c) overlapped Si64 fragments. Fragment density matrices are represented in the subspace spanned by occupied orbitals of Si512. In this representation, the system density matrix ρ^ becomes the identity matrix. To span several orders of magnitude in the values Δρ^, we use a logarithmic color scale.

Close modal

Motivated by the need to reduce significantly the noise level in linear-scaling sDFT, we have proposed a new fragmentation scheme that circumvents some of the limitations of our original embedded-fragmented stochastic DFT approach, mainly for covalently bonded systems. While previous attempts to develop a fragmentation scheme were based solely on providing a good reference system for the density alone, here, we showed that this is not a sufficient criterion. Based on the detailed analysis of the variance in the density, we developed a new overlapped fragmentation scheme, which provides an excellent reference for both the density matrix and the density. By dividing the system into non-overlapped core fragments and overlapped dressed fragments, we demonstrated that the existence of a buffer zone is crucial in the reduction of the stochastic noise in the energy per electron and nuclear forces. Moreover, we showed that the stochastic noise does not increase with respect to system size when using a fixed fragment sizes nor does the number of self-consistent iterations, leading to a practical scaling that is slightly better than the expected linear-scaling. While all applications in this study were limited to bulk silicon with periodic boundary conditions, we wish to iterate that the approach is also valid for open boundary conditions, where we expect it to work equally well.

See supplementary material for derivations of convergence and stochastic fluctuations of density in o-efsDFT.

R.B. gratefully acknowledges support from the Israel Science Foundation (Grant No. 189/14). D.N. and E.R. are grateful for support by the Center for Computational Study of Excited State Phenomena in Energy Materials (C2SEPEM) at the Lawrence Berkeley National Laboratory, which is funded by the U.S. Department of Energy, Office of Science, Basic energy Sciences, Materials Sciences and Engineering Division under Contract No. DEAC02-05CH11231 as part of the Computational Materials Sciences Program. Resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility operated under Contract No. DE-AC02-05CH11231, are greatly acknowledged.

1.
P.
Hohenberg
and
W.
Kohn
,
Phys. Rev.
136
,
B864
(
1964
).
2.
W.
Kohn
and
L. J.
Sham
,
Phys. Rev.
140
,
A1133
(
1965
).
3.
S.
Baroni
,
S.
de Gironcoli
,
A.
Dal Corso
, and
P.
Giannozzi
,
Rev. Mod. Phys.
73
,
515
(
2001
).
4.
T.
Frauenheim
,
G.
Seifert
,
M.
Elstner
,
T.
Niehaus
,
C.
Kohler
,
M.
Amkreutz
,
M.
Sternberg
,
Z.
Hajnal
,
A. D.
Carlo
, and
S.
Suhai
,
J. Phys.: Condens. Matter
14
,
3015
(
2002
).
5.
C.
Freysoldt
,
B.
Grabowski
,
T.
Hickel
,
J.
Neugebauer
,
G.
Kresse
,
A.
Janotti
, and
C. G.
Van de Walle
,
Rev. Mod. Phys.
86
,
253
(
2014
).
6.
F.
Mauri
,
G.
Galli
, and
R.
Car
,
Phys. Rev. B
47
,
9973
(
1993
).
7.
P.
Ordejón
,
D. A.
Drabold
,
M. P.
Grumbach
, and
R. M.
Martin
,
Phys. Rev. B
48
,
14646
(
1993
).
8.
S.
Goedecker
,
J. Comput. Phys.
118
,
261
(
1995
).
9.
E.
Hernández
and
M. J.
Gillan
,
Phys. Rev. B
51
,
10157
(
1995
).
11.
R.
Baer
and
M.
Head-Gordon
,
Phys. Rev. Lett.
79
,
3962
(
1997
).
12.
A. H. R.
Palser
and
D. E.
Manolopoulos
,
Phys. Rev. B
58
,
12704
(
1998
).
13.
S.
Goedecker
,
Rev. Mod. Phys.
71
,
1085
(
1999
).
14.
W.
Liang
,
C.
Saravanan
,
Y.
Shao
,
R.
Baer
,
A. T.
Bell
, and
M.
Head-Gordon
,
J. Chem. Phys.
119
,
4117
(
2003
).
16.
17.
T.
Zhu
,
W.
Pan
, and
W.
Yang
,
Phys. Rev. B
53
,
12713
(
1996
).
18.
F.
Shimojo
,
R. K.
Kalia
,
A.
Nakano
, and
P.
Vashishta
,
Phys. Rev. B
77
,
085103
(
2008
).
19.
C. R.
Jacob
and
J.
Neugebauer
,
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
4
,
325
(
2014
).
20.
T. A.
Wesolowski
,
S.
Shedge
, and
X.
Zhou
,
Chem. Rev.
115
,
5891
(
2015
).
21.
P.
Elliott
,
M. H.
Cohen
,
A.
Wasserman
, and
K.
Burke
,
J. Chem. Theory Comput.
5
,
827
(
2009
).
22.
A. W.
Götz
,
S. M.
Beyhan
, and
L.
Visscher
,
J. Chem. Theory Comput.
5
,
3161
(
2009
).
23.
C.
Huang
and
E. A.
Carter
,
J. Chem. Phys.
135
,
194104
(
2011
).
24.
J. D.
Goodpaster
,
T. A.
Barnes
, and
T. F.
Miller
,
J. Chem. Phys.
134
,
164108
(
2011
).
25.
K.
Yu
and
E. A.
Carter
,
Proc. Natl. Acad. Sci. U. S. A.
114
,
E10861
(
2017
).
26.
R.
Baer
,
D.
Neuhauser
, and
E.
Rabani
,
Phys. Rev. Lett.
111
,
106402
(
2013
).
27.
M.
Fabian
,
B.
Shpiro
,
E.
Rabani
,
D.
Neuhauser
, and
R.
Baer
, e-print arXiv:1809.08307 (
2018
).
28.
D.
Neuhauser
,
R.
Baer
, and
E.
Rabani
,
J. Chem. Phys.
141
,
041102
(
2014
).
29.
E.
Arnon
,
E.
Rabani
,
D.
Neuhauser
, and
R.
Baer
,
J. Chem. Phys.
146
,
224111
(
2017
).
30.
R.
Kosloff
,
J. Phys. Chem.
92
,
2087
(
1988
).
31.
R.
Kosloff
,
Annu. Rev. Phys. Chem.
45
,
145
(
1994
).
32.
C.-K.
Skylaris
and
P. D.
Haynes
,
J. Chem. Phys.
127
,
164712
(
2007
).
33.
N.
Troullier
and
J. L.
Martins
,
Phys. Rev. B
43
,
1993
(
1991
).
34.
L.
Kleinman
and
D. M.
Bylander
,
Phys. Rev. Lett.
48
,
1425
(
1982
).
35.
R. D.
King-Smith
,
M. C.
Payne
, and
J. S.
Lin
,
Phys. Rev. B
44
,
13063
(
1991
).
36.
37.
P.
Pulay
,
J. Comput. Chem.
3
,
556
(
1982
).
38.
G.
Kresse
and
J.
Furthmuller
,
Phys. Rev. B
54
,
11169
(
1996
).

Supplementary Material