An *ab initio* Langevin dynamics approach is developed based on stochastic density functional theory (sDFT) within a new *embedded saturated fragment* formalism, applicable to covalently bonded systems. The forces on the nuclei generated by sDFT contain a random component natural to Langevin dynamics, and its standard deviation is used to estimate the friction term on each atom by satisfying the fluctuation–dissipation relation. The overall approach scales linearly with the system size even if the density matrix is not local and is thus applicable to ordered as well as disordered extended systems. We implement the approach for a series of silicon nanocrystals (NCs) of varying size with a diameter of up to 3 nm corresponding to *N*_{e} = 3000 electrons and generate a set of configurations that are distributed canonically at a fixed temperature, ranging from cryogenic to room temperature. We also analyze the structure properties of the NCs and discuss the reconstruction of the surface geometry.

## I. INTRODUCTION

*Ab initio* molecular dynamics based on density functional theory (DFT) is becoming an important tool for studying the plethora of structural and dynamical processes in a broad range of systems in material science, chemistry, biology, and physics.^{1–11} The application of this approach to very large systems is still limited by the computational scaling of the electronic structure portion of the calculation, regardless of whether one uses a Lagrangian-based or Born-Oppenheimer-based method. This is because of the cubic scaling involved in solving the Kohn-Sham equations coupled with the need to iterate to self-consistency or to propagate the Kohn-Sham (KS) orbitals, as both of these options further increase the computational times by an order of magnitude.

Significant advances in these respects have been made along two major directions. One primary direction is based on a Lagrangian formulation of density functional theory^{1,11} and circumvents the need for SCF iterations by the propagation of the KS orbitals. This venue does not eliminate the cubic scaling and is therefore limited to relatively small systems. Another approach is based on linear-scaling techniques,^{12–15} which reduces the algorithmic complexity by finding the density matrix directly, relying on its asymptotic sparseness in real-space. However, sparsity sets in only for very large systems, limiting the applicability sparse-matrix methods, especially in 3D.

In a recent set of papers, we have introduced the stochastic DFT (sDFT) methods^{16–19} which scale linearly (or even sublinearly) with the system size and do not rely on the sparsity of the density matrix. sDFT is a general approach to the electronic structure based on a stochastic process and is applicable to extended ordered as well as disordered materials. Some of the techniques we use, based on the stochastic trace formula,^{20} have been developed for the tight-binding electronic structure,^{21–23} for molecular electronics^{24} and for multi-exciton generation in nanocrsytals.^{25} The success of sDFT in reducing the scaling comes at a price of introducing a stochastic error in all its predictions, including forces, and that precludes application to *ab initio* molecular dynamics.

In this paper, we show that sDFT can be used to study equilibrium structural properties of large nanocrystals (NCs), despite the statistical fluctuations in the force estimates. For this, we invoke the Langevin equation following the work of Attaccalite and Sorella,^{26} and generate a sequence of configurations distributed according to the canonical ensemble. These configurations can be used in a variety of applications for studying the structural, electronic and optical properties of NCs. Here we demonstrate their use for studying the structural properties of silicon nanocrystals (NCs) with a diameter of up to 3 nm, and *N*_{e} = 3000 electrons.

The paper includes development of the embedded saturated fragments method which allows reducing the statistical errors in sDFT. This new method is inspired by, but more general than, the embedded fragments method developed in Ref. 17. It uses small saturated fragments of the system and carves out the relevant part of the density to be embedded in the system. Hence, it is applicable not only to clusters of molecules, like Ref. 17, but also to covalently bonded systems, such as silicon NCs. The method is described in detail in Appendix A.

## II. METHODS

### A. Stochastic DFT

Kohn-Sham density functional theory^{27,28} maps a system of *N*_{e} interacting electrons in an external electron-nucleus potential $veNr\u2009=\u2212e4\pi \mathit{\epsilon}0\u2211NZNer\u2212RN$, where **R**_{N} ($N\u2009=\u20091,2,\u2026$) are the nuclei positions and *Z*_{N}*e* are their charge (*e* is the electron charge), onto a system of *non-interacting* electrons (the KS system), having the same ground-state density $nr$. This mapping is performed by solving the KS equations^{27,28}

where the KS Hamiltonian is

and the KS potential $vKSr$ is the sum of the external electron-nuclear potential $veNr$, the density-dependent Hartree potential $vHr\u2009=\u2009e24\pi \mathit{\epsilon}0\u222bnr\u2032r\u2212r\u2032dr\u2032$, and the exchange-correlation potential $vxcr$,

In the KS system, the density is expressed in terms of the normalized single electron KS eigenstates $\varphi nr$ and eigenvalues $\mathit{\epsilon}n$,

where $\mathit{\theta}x$ is the Heaviside function and *μ* is the chemical potential chosen so that $2\u2211n\mathit{\theta}\mu \u2212\mathit{\epsilon}n=Ne$. Equations (1)–(4) must be solved self-consistently since $h^KS$ depends on the density. While the entire scheme is a significant simplification over the original many-electron problem, it remains a challenge for large systems since the computational effort scales as $ONe3$.

An important step towards reducing the computational scaling of KS-DFT was recently proposed by Baer, Neuhauser, and Rabani (BNR),^{16} where the density of Eq. (4) was expressed as a trace over the projected density operator,^{16}

The problem now shifts into calculating self-consistently the trace in Eq. (5) [since $h^KS$ depends on $nr$] rather than solving the KS equations by brute-force diagonalization. When the trace is performed using the KS eigenstates, the computational cost remains $ONe3$ similar to the traditional approach. However, since the trace is invariant to the basis, alternative schemes that potentially lead to improved scaling can be used. One such scheme is based on the concept of a stochastic trace formula, which reduces the scaling of the trace operation by introducing a controlled statistical error.^{20}

Using the stochastic trace formula, the density can be estimated as a symmetrized stochastic trace formula, given by^{16}

where $\cdots \u2009\chi $ denotes an average over *I* stochastic orbitals $|\chi $, defined as

for each grid point $r$, the parameter *h* (not to be confused with the KS Hamiltonian $h^KS$ operator) is the grid spacing, and $\phi r$ are statistically independent random variables in the range $[0,2\pi ]$ ($ei\phi re\u2212i\phi r\u2032\phi \u2009=\u2009\delta r\u2009r\u2032$). The density $nr$ is, strictly speaking, given by the limit $nr\u2009=\u2009limI\u2192\u221enIr$ and we approximate it with a finite *I*. The Heaviside function in Eq. (6) is smoothed by the function $\mathit{\theta}\beta \mathit{\epsilon}\u2009\u2261\u200912erfc\beta \mathit{\epsilon}$, where *β* is a large constant satisfying $\beta Eg\u2009\u226b\u20091$, where *E*_{g} is the KS-DFT fundamental gap. Throughout this paper we set the value of *β* to $100Eh\u22121$. The action of $\mathit{\theta}\beta \mu \u2212h^KS$ on $|\chi $ is evaluated by a Chebyshev expansion in powers of the sparse KS Hamiltonian, $h^KS$.^{29} The length of the Chebyshev series is determined by the value of $\mu \u2212Emin\u2215\Delta E$ and $\beta \Delta E$, where $\Delta E=Emax\u2212Emin\u22152$ and *E*_{min/max} are the minimal and maximal eigenvalues of $h^KS$. Under the conditions of the present systems, the length of the series is ^{∼}3000 terms.

The stochastic trace evaluation [Eq. (6)] reduces the computational scaling of KS-DFT to $ONe$ and for certain properties even to a sub-linear scaling.^{16} Linear-scaling complexity is achieved due to the following facts: (1) the application of a Hamiltonian to a stochastic orbital, $h^KS|\chi $, requires a linear scaling effort (irrespective of the structure of the orbital); (2) the length of the Chebyshev series is independent (or at most weakly dependent) of the system size; and (3) only a system-size independent number of stochastic orbitals are required. This type of assumptions is different from the linear-scaling approaches depending on density matrix sparsity,^{12,30} which assume that locality of orbitals is not completely destroyed by the repeated operation of the Hamiltonian.

A converged self-consistent solution of Eq. (6) provides an estimate of the electron density and in addition can be used to generate other quantities, such as the density of states (DOS), the total energy per electron, and the forces acting on the nuclei. All estimates contain a statistical error that can be controlled by increasing the number of stochastic orbitals (*I*) used to evaluate the trace in Eq. (6). Of particular relevance to this work are the Cartesian forces exerted by the electrons on *N* nuclei ($\alpha \u2009=\u20091,\u2026,N$), which can be evaluated through the Hellmann-Feynman theorem,^{31,32}

It should be stressed that for finite sampling, these forces are only approximately commensurate with the stochastic estimate of the energy (which is not used in the sampling procedure at all), as discussed in Appendix B. The stochastic estimate of the Hellmann-Feynman forces is an excellent estimator of the deterministic forces, as can be seen in Fig. 1, where the mean absolute deviation is dominated by the fluctuations and not by additional bias terms.

These sDFT forces can be expressed as

where $\u2009f\alpha det$ is the deterministic (generally unknown) force, $f\alpha fluc$ is the pure fluctuating term, and $f\alpha bias$ is the bias expected to be proportional to $1I$ in leading order. The choice of *I* should be large enough to reduce $f\alpha bias$ to negligible values, and the only source of error in the procedure is then the statistical fluctuations proportional to $1I$ with vanishing mean $f\alpha fluc\u2009=\u20090$.

### B. Embedded saturated fragments sDFT

The reduction of the scaling in sDFT is achieved by replacing the deterministic, numerically exact, trace evaluation with a stochastic sampling of the density. In return, this leads to statistical errors in the computed observables. To reduce the size of the statistical fluctuations, an embedded saturated fragments method is introduced inspired by (but different from) the method of Ref. 17. The latter approach was suitable mainly for systems composed of proximate but chemically separated molecules (like clusters of water molecules, for examples). The present method is applicable for fragmenting covalently bonded systems, like silicon NCs.

In this approach, the system is divided into *F* small fragments that are possibly overlapping. The division to fragments is flexible, and any desired physically motivated fragmentation can be used. The density is then a sum of the fragment density and a small correction term,

where $nFr\u2009=\u2009\u2211f=1Fnfr$ is the density generated by the individual fragments obtained from a deterministic KS-DFT (dDFT) calculation for each fragment and $\Delta nr=\u2009nIr\u2212nFIr$ is a correction term evaluated using stochastic orbitals. Here, $nIr$ is given by Eq. (6) and $nFIr\u2009=\u2009\u2211f=1FnfIr$ is a sum over a *stochastic* estimate of the fragments density. In the limit $I\u2009\u2192\u2009\u221e$, Eqs. (6) and (10) are identical and equal to the deterministic density. For finite values of *I*, the size of the statistical fluctuations of the two approaches are quite different. Since the deterministic fragmented density, $nFr$, provides a reasonable approximation for the full density $nr$, the correction term, $\Delta nr$, which is evaluated stochastically, is rather small, leading to a reduced variance in the relevant observables (forces, DOS, total energy per electron, etc.) compared to the direct stochastic approach of Eq. (6). An equivalent viewpoint is that the fragmentation is a device for reducing the variance in the stochastic evaluation of the density. This is evident by rewriting Eq. (10) in the following form:

and the implementation of this form is described in Appendix A.

To assess the accuracy of the embedded saturated fragmented sDFT (efsDFT), we calculated the standard deviations (STDs) and mean absolute deviations (MADs) with respect the deterministic DFT of the atomic forces in an $Si35H36$ NC using hydrogen passivated $Si5$ fragments. The results are shown in Fig. 1.^{33,43–45} The STDs and MADs decrease as $1\u2215I$, indicating that the bias in the force estimation is negligible. The standard deviations in the sDFT forces are larger by a factor of $\u22483$ compared to those of efsDFT. This implies that the required number of stochastic orbitals in efsDFT is nearly an order of magnitude smaller than in sDFT for similar STDs. The STDs can be further reduced by using larger fragments as discussed below (cf. Fig. 7).

### C. Langevin dynamics based on efsDFT

The standard approach to generate canonically distributed configurations *using ab initio* techniques is based on molecular dynamics, which requires as input accurate force estimates for each atomic degree of freedom. Since the forces generated by efsDFT contain a stochastic component, we use Langevin dynamics (LD) instead of molecular dynamics to sample configurations according to the Boltzmann distribution. A LD trajectory^{34–37} is a sequence of configurations $p,qm\u2009=\u2009ptm,qtm$ at discrete “times” $tm\u2009=\u2009m\Delta t$, where $\Delta t$ is the time step, and $q\u2009\u2261\u2009q1,\u2026,qN$ and $p\u2009\u2261\u2009p1,\u2026,pN$ are the Cartesian coordinates and conjugate momenta, respectively, for the *N* atoms. The trajectory is a solution of the Langevin equation (LE) of motion,^{38}

where $\mu \alpha $ is the mass of the atom *α*, $\gamma \alpha $ is its friction constant, and $f\alpha \u2009=\u2009\u2009f\alpha det+f\alpha fluc$ is the total efsDFT force acting on it, including deterministic and fluctuating parts [see Eq. (9)]. The bias is assumed negligible, so that $f\alpha \u2009=\u2009\u2009f\alpha det$. In Eq. (12)$\eta \alpha $ is an additional uncorrelated white-noise force introduced so as to satisfy the fluctuation-dissipation (FD) relation. We require that the total random fluctuation on each atom obeys

and

where $\alpha ,\alpha \u2032\u2009=\u20091,\u2026N$ are atom indices, $\cdots \u2009$ designates average over the atomic force distribution, $I3\xd73$ is the $3\u2009\xd7\u20093$ unit matrix, and $\sigma \alpha $ is the atomic force STD of atom *α*, which is taken to satisfy the fluctuation-dissipation relation,

We use the Verlet-like algorithm^{39} for numerically integrating the LE of motion at a fixed temperature *T* and a predefined time step $\Delta t$. The positions and momenta in time step *m* + 1 depend on the positions and momenta in time step *m* as well as on the forces in time step *m* and the additional white noise $\eta \alpha m$ is sampled from a Gaussian distribution such that the discretized version of Eq. (13) holds: $\eta \alpha m+\u2009f\alpha fluc\u2297\eta \alpha \u2032n+\u2009f\alpha \u2032fluc\Delta t=I\sigma \alpha 2\delta \alpha \alpha \u2032\delta mn$,

where $a\alpha \u2009=\u2009b\alpha 1\u221212\gamma \alpha \Delta t$ and $b\alpha \u22121\u2009=\u20091+12\gamma \alpha \Delta t$. The algorithm allows for stable and accurate solutions of the LE with time step comparable to that used in molecular dynamics simulations for similar systems. It treats the additional white noise component $\eta \alpha $ of the force differently from the force $f\alpha \u2009=\u2009\u2009f\alpha det+\u2009f\alpha fluc$ that result from the efsDFT calculation containing deterministic and fluctuating components that cannot be separated.

In Fig. 2, we plot for $Si35H36$ the running average of the transient temperature, *T*^{m}, calculated from the kinetic energy

and from the virial estimator,

In the above, $q\alpha $ is the time average of the coordinate of atom *α*. The initial positions of the Si atoms were taken from the bulk values. All surface Si atoms with more than two dangling bonds were removed, and the remaining surface Si atoms were passivated using one or two H atoms placed in a tetrahedral geometry at the Si–H distance of 1.47 Å. The momenta were sampled from a Boltzmann distribution at *T* = 300 K. This non-equilibrium initial configuration relaxes towards equilibrium.

The agreement in Fig. 2 between the two temperature estimators is consistent with a proper sampling of the canonical distribution of both positions and velocities. The small discrepancies at the longest averaging time are due to the large fluctuations of the transient temperature, particularly when using the virial estimator. We have also calculated the fluctuations in the kinetic energy and found good agreement with the corresponding analytical value (see caption of the figure). The two atomic species have a slightly $\xb15%$ deviations in the temperatures. These deviations are slightly inconsistent with equipartition theorem and may result from several factors, such as the finite time step of the Langevin propagator,^{39} incomplete SCF convergence,^{40} and insufficiently accurate estimate of the amount of white noise $\eta \alpha m$ in Eq. (13) required to fulfill the fluctuation-dissipation relation.

### D. Determining the optimal friction

The effect of $\gamma Si$ on the configurational relaxation and on the velocity autocorrelation decay is illustrated in Fig. 3 for $Si36H35$. In order to decrease the number of unknown parameters, we set the values of $\gamma \alpha $ and $\sigma \alpha $ to be identical for all atoms of the same type (i.e., Si or H in the systems studied here). To achieve such a uniform value of $\sigma $, we introduced white noise $\eta \alpha i\u2009=\u2009\sigma \alpha i2\u2212f\alpha i,fluc2$ for each degree of freedom [see Eq. (13)], where $f\alpha i,fluc2$ is estimated by a separate set of runs on the initial NC configuration using several independent sets of stochastic orbitals. Note that we have tested that the magnitude of the sDFT force fluctuation $ffluc2$ is not sensitive to the particular configuration used.

As expected, the configurational relaxation time increases with increasing values of $\gamma Si$ with the opposite trend for the decay time of the velocity autocorrelation function. Based on the results of $Si36H35$ presented in Fig. 3, we conclude that a friction coefficient of $0.04\u2009fs\u22121$ is sufficiently small for this system, with respect to minimizing both velocity and pair distance autocorrelation times. Although lower values of the friction coefficients could decrease the correlation time further, they would require reducing the statistical noise, which would be expensive to achieve using sDFT. Thus we chose the friction constants, $\gamma Si\u2009=\u20090.04\u2009fs\u22121$ for Silicon and $\gamma H\u2009=\u20090.12\u2009fs\u22121$ for the lighter H atoms. These values were used for the larger systems described in Sec. III (see Table I). Note that the results shown in Fig. 3, which were generated using LD under dDFT, could have been equally well generated under efsDFT. This is shown explicitly for $\gamma Si\u2009=\u20090.04\u2009fs\u22121$ (dotted red line) proving that the relaxation times are similar to those of the dDFT based LD calculation with the same value of $\gamma Si$.

. | . | $\gamma fs\u22121$ . | . | . | . | |
---|---|---|---|---|---|---|

NC . | T(K) . | H . | Si . | I
. | $\Delta tfs\u22121$ . | $titermin$ . |

$Si35H36$ | 30 | 0.12 | 0.04 | 120 | 1.2 | 1 |

300 | 0.04 | 0.04 | 30 | 1.2 | 1 | |

$Si147H100$ | 30 | 0.12 | 0.04 | 120 | 1.2 | 2 |

300 | 0.12 | 0.04 | 30 | 1.2 | 2 | |

$Si705H300$ | 30 | 0.12 | 0.04 | 120 | 1.2 | 10 |

300 | 0.12 | 0.04 | 92 | 1.2 | 10 |

. | . | $\gamma fs\u22121$ . | . | . | . | |
---|---|---|---|---|---|---|

NC . | T(K) . | H . | Si . | I
. | $\Delta tfs\u22121$ . | $titermin$ . |

$Si35H36$ | 30 | 0.12 | 0.04 | 120 | 1.2 | 1 |

300 | 0.04 | 0.04 | 30 | 1.2 | 1 | |

$Si147H100$ | 30 | 0.12 | 0.04 | 120 | 1.2 | 2 |

300 | 0.12 | 0.04 | 30 | 1.2 | 2 | |

$Si705H300$ | 30 | 0.12 | 0.04 | 120 | 1.2 | 10 |

300 | 0.12 | 0.04 | 92 | 1.2 | 10 |

### E. Validation of LD within efsDFT

Validation of the structure obtained using efsDFT based LD is demonstrated using the pair distribution function $gr$.^{34} For finite size NCs, the average number of neighbors at a distance *r* is expected to be smaller than the bulk value due to surface atoms with a smaller number of neighbors. Figure 4 shows a close agreement between the dDFT and efsDFT based LD estimates of $gr$ of the $Si35H36$ NC at two temperatures. The inset focuses on the first peak in $gr$, comparing the efsDFT to dDFT at *T* = 30 and $300\u2009K$.

## III. RESULTS

In Secs. I and II, we presented the methods and assessed the accuracy and validity of the efsDFT based LD. Here we apply the method to study the structural properties of larger NCs exceeding *N*_{e} = 3000 electrons. The Si–Si pair distribution functions $gr$ at two temperatures *T* (30 and $300\u2009K$) are displayed in the upper panel of Fig. 5 for $Si147H100$ and $Si705H300$. Temperature broadens the peaks by a factor of 2–3 without significantly changing the peak position.

In the lower panel of Fig. 5, we plot the transient and relaxed $gr$ at $30\u2009K$ for the two systems, focusing on the first, nearest neighbor peak. As described also for $Si35H36$, the initial positions of the Si atoms for both systems were taken from the experimental bulk values, and all surface Si atoms with more than two dangling bonds were removed. The remaining surface Si atoms was then passivated using one or two H atoms placed in a tetrahedral position at the Si–H distance of 1.47 Å. The initially sharp peak broadens and shifts to longer Si–Si bond lengths as the system relaxes towards thermal equilibrium. For $30\u2009K$, the relaxation times are 180 and $650\u2009$ fs for $Si147H100$ and $Si705H300$, respectively. For $300\u2009K$, they are 180 and $250\u2009$ fs, respectively.

The relaxation transient is studied in greater detail in Fig. 6, where the average nearest-neighbor bond lengths are shown for $Si147H100$ (spherical shells A-B) and $Si705H300$ (shells A-E); see Table II for the definition and properties of the shells. In $Si705H300$ the deep layer shells (A-D) relax slower than those near the surface showing that relaxation progresses from the surface inwards. The difference between the relaxation times of the two systems is correlated with the smaller frequency, *ω*, of the breathing mode of the larger NC. In the limit of an overdamped motion (as is the case here since $\gamma 2\u2009\u226b\u2009\omega 2$), the relaxation is dominated by two time scales proportional to $\gamma \u22121$ and $(\omega 2\u2215\gamma )\u22121$. The former leads to a fast relaxation, while the latter is slower and depends on the value of $\omega \u22122$. The ratio of the breathing mode frequency for the two particles is $\omega L2\omega S2\u2009\u2248\u20092.8$ (L/S for large/small) assuming that the breathing mode frequency scales linearly with the NC diameter.^{41} This is similar to the ratio of the relaxation times (650/180 = 3.6) for the lower temperature. At the higher temperature, one needs to consider anharmonic effects which are more pronounced in the large NC with lower acoustic phonons. Another noticeable feature in Fig. 6 is that the Si–Si bonds seem slightly shorter in $Si147H100$ than in $Si705H300$. This results from the difference in the bond distance of atoms in the outer shell, while the inner shell atoms have similar bond distances.

## IV. CONCLUSIONS

In this paper, we developed an *ab initio* Langevin dynamics approach based on a new embedded saturated fragment stochastic DFT method. We showed how the noisy forces resulting from the efsDFT calculation are used to generate a set of configurations that are distributed canonically at cryogenic and room temperatures. Choosing the friction coefficients and the number of stochastic orbitals according to the criteria presented in Sec. II D, thermalization is reached within $\u2248100$ time steps for these materials, since the method is trivially parallelizable, larger computer resources would allow to easily reduce the friction coefficients thus greatly improving the sampling efficiency. While the methods presented here have already allowed impressive achievements, such as determining the structural properties of silicon NCs of $3\u2009nm$ diameter containing more than 3000 electrons, larger systems still, of unprecedented size, are now coming within our grasp due to the linear-scaling highly parallelizable features of sDFT.

## ACKNOWLEDGMENTS

E.R. acknowledges support from the Physical Chemistry of Inorganic Nanostructures Program, KC3103, Office of Basic Energy Sciences of the United States Department of Energy under Contract No. DE-AC02-05CH11232. D.N. acknowledges support from the NSF Grant No. DMR/BSF-1611382. R.B. acknowledges support from the US-Israel Binational Science foundation under the BSF-NSF program, Grant No. 2015687.

### APPENDIX A: THE EMBEDDED SATURATED FRAGMENTS APPROACH

Here, we provide the technical details for the embedded fragment method described in Sec. II. The method corrects the stochastic estimate $\u27e8\xc2\u27e9I$ for the expectation value $\u27e8\xc2\u27e9$ of a one-body operator $\xc2$, using calculations performed on *F* separate fragments [see Eq. (11)],

where the stochastic correction due to fragment *f* is

and the deterministic and the stochastic estimates, $\u27e8A^f\u27e9$ and $\u27e8A^f\u27e9I$, are calculated directly on the fragment itself.

Previous implementations of embedded fragment sDFT were applied to systems of many weakly interacting molecules where the selection of fragments or clusters of such molecules was natural.^{17} We now describe a new method for defining and carrying calculations with fragments which can break up covalently bonded systems, such as silicon NCs. The large system is divided into *F* small fragments composed of one or more bonded atoms each. The surface dangling bonds of the fragment are passivated using an H atom placed in 1.46 Å from the Si atom, in the direction of the neighboring atom which is not included in the fragment. This forms a saturated fragment. For a saturated fragment *f*, the deterministic KS-DFT method is applied to determine the KS eigenvalues $\mathit{\epsilon}nf$ and eigenfunctions $\psi nfr$. Further, occupation numbers $pnf2\u2009=\u200912erfc\beta \mathit{\epsilon}nf\u2212\mu f$ are introduced for determining the saturated fragment density $nsfr\u2009=\u2009\u2211npnf2\psi nfr2$. The fragment density $nfr=cfr2nsfr$ is “carved out” of $nsfr$ using a carving function $cfr2$. Thus

where, inspired by Hirshfeld partitioning,^{42} the carving function is defined as

where $na0r$ is the spherical density of neutral atom *a*. The temperature parameter $\beta $ in the definition of the population $pnf$ is chosen to be the same value as that of the sDFT calculation, while the chemical potential $\mu f$ of each fragment is determined by the condition of neutrality of the fragment,

Defining non-orthogonal functions $\psi \u0303nfr\u2009=\u2009cfrpnf\psi nfr$, the fragment density of Eq. (A3) becomes $nfr\u2009=\u20092\u2211n\psi \u0303nfr2$, so the chemical potential is determined from the condition

After determining $\mu f$ and in order to construct the reduced density matrix (RDM), we orthogonalize the functions $\psi \u0303nfr$ by diagonalizing the overlap matrix $Snn\u2032f=\psi \u0303nf\psi \u0303n\u2032f$, obtaining the unitary matrix *U*_{f} of eigenvectors and the eigenvalues $snf\u2009>\u20090$ (so that $UfTSfUf\u2009=\u2009diags1f,s2f\u2026$). The orthogonal wavefunctions are $\varphi mfr\u2009=\u2009\u2211n\psi \u0303nfrUnmf$ and the norm is $\varphi mf\varphi mf\u2009=\u2009smf$. Using the new wave functions, the unsaturated fragment density is given by

and the RDM by

Using the RDM, we express the unsaturated fragment expectation value appearing in Eq. (A2) as

where

By choosing the fragment grid-points to be a subset of the full system grid, each stochastic orbital $\chi i$ ($i\u2009=\u20091,\u2026,I$) of the full system appears as a stochastic orbital on the fragment grid and can be used to perform the stochastic estimate appearing in Eq. (A2) as

where the subscript *f* on the left denotes integration over the fragment grid. The difference $\Delta AfI=A^f\u2212A^fI$ in Eq. (A2) can now be written in a unified form as

where

Hence, by calculating the matrix $\Delta mm\u2032f\u2009I$ all types of expectation value corrections can be obtained from Eq. (A7).

The efficacy of embedded fragments in sDFT force calculations is achieved through a reduction of the STD $\sigma fx$ of a force component. The STD $\sigma fx$ is proportional to $1\u2215I$, where *I* is the number of stochastic orbitals and the proportionality constant, denoted by $\sigma 1fx=I\sigma fx$, is called the *inherent* STD. This quantity depends on the NC characteristics but not on the number of stochastic orbitals. In Fig. 7 we plot the inherent STD on each atom for $Si35H36$ and $Si705H300$ as a function of fragment size. Even the use of the smallest fragments reduces the inherent force STD by a significant factor, 1.6 (for $Si705H300$) to 2.3 (for $Si35H36$). Using larger fragments reduces the STD by an additional factor of $\u22481.5$, with increasing effect for larger systems since the electron density in the larger fragments is similar to that of the full system. It is interesting to see that for the forces there is no noticeable sublinear scaling: the inherent STD for both systems is similar, with the larger system having slightly higher ($\u22485%$) STD.

In summary, the embedded fragment sDFT method serves as a way to expedite the sDFT calculation by a judicious choice of fragment size and composition. As the fragment size grows, the numerical effort invested in sDFT decreases (due to reduction of STD) while in dDFT it increases. For example, consider Fig. 7 where we showed that increasing the fragment size by a factor of 10–20 reduces the STD by a factor of 2 and therefore the sDFT CPU time by a factor of $\u224822\u2009=\u20094$. On the other hand since the fragments are ten-fold larger, the amount of dDFT work on them increases (cubically) by a factor of more than $\u223c103$. Clearly then, the optimal fragment size is system dependent. Embedded fragments have the additional benefit of providing an initial density for the SCF calculation, significantly reducing the number of SCF cycles.

### APPENDIX B: STOCHASTIC ESTIMATES OF THE FORCES AND ENERGY PERTURBATIONS

In the stochastic method, the electronic density is [see Eq. (6)]:

where $P^=\mathit{\theta}\mu $ is the Chebyshev expansion of the projection operator, depending on $\beta $ and $\mu $, on the occupied space of $h^KS$, and the energy is

where $EHXCn$ is the Hartree-exchange-correlation energy functional, depending only on the electronic density $nr$. Under variation in position of nuclei *R*,

which using

can be written as

The average of the second term on the right leads to the Hellmann-Feynman force Eq. (8), while the first term can be shown to vanish when a full sampling is made on $\chi $ and when $\beta \u2192\u221e$, owing to the fact that within these limits $P\delta PP=0$.

## REFERENCES

All calculations in this work use real-space grids of spacing $\Delta x=0.5a0$ and Troullier-Martins norm-conserving pseudopotentials^{43} within the Kleinman-Bylander approximation.^{44} Fast Fourier transforms were used for applying the kinetic energy operator and for determining the Hartree potentials, and the method of Ref. ^{45} was used for treating the long range Coulomb interactions in a finite simulation cell with periodic boundary conditions. DFT calculations were performed under the local density approximation (LDA).