Kohn–Sham density functional theory calculations using conventional diagonalization based methods become increasingly expensive as temperature increases due to the need to compute increasing numbers of partially occupied states. We present a density matrix based method for Kohn–Sham calculations at high temperatures that eliminates the need for diagonalization entirely, thus reducing the cost of such calculations significantly. Specifically, we develop real-space expressions for the electron density, electronic free energy, Hellmann–Feynman forces, and Hellmann–Feynman stress tensor in terms of an orthonormal auxiliary orbital basis and its density kernel transform, the density kernel being the matrix representation of the density operator in the auxiliary basis. Using Chebyshev filtering to generate the auxiliary basis, we next develop an approach akin to Clenshaw–Curtis spectral quadrature to calculate the individual columns of the density kernel based on the Fermi operator expansion in Chebyshev polynomials and employ a similar approach to evaluate band structure and entropic energy components. We implement the proposed formulation in the SPARC electronic structure code, using which we show systematic convergence of the aforementioned quantities to exact diagonalization results, and obtain significant speedups relative to conventional diagonalization based methods. Finally, we employ the new method to compute the self-diffusion coefficient and viscosity of aluminum at 116 045 K from Kohn–Sham quantum molecular dynamics, where we find agreement with previous more approximate orbital-free density functional methods.

## I. INTRODUCTION

Over the past few decades, Kohn–Sham density functional theory (DFT)^{1,2} has established itself as a powerful framework for understanding and predicting a wide range of material properties, from the first principles of quantum mechanics, without any empirical or *ad hoc* parameters. The ubiquitous use of DFT is a consequence of its simplicity, generality, and high accuracy-to-cost ratio compared to other such first principles methods. However, the solution of the underlying nonlinear eigenvalue problem for the Kohn–Sham orbitals remains a challenging task, with the computational cost and memory requirements scaling cubically and quadratically with the system size, respectively.^{3–5} Moreover, the orthogonality constraint on the orbitals translates to significant global communications in parallel computing, limiting the minimum time to solution that can be attained. This can become particularly important for quantum molecular dynamics (QMD) simulations,^{6,7} wherein tens or hundreds of thousands of such Kohn–Sham solutions may be required to complete a single simulation.

To overcome the cubic scaling bottleneck in DFT calculations, significant effort has been directed toward the development of methods that scale linearly with the system size (see, e.g., Refs. 3–5 and references therein), in both computational cost and computer memory, which have culminated in a number of mature codes, e.g., Refs. 8–14. While major advances have been achieved, a number of challenges remain for linear scaling methods and their implementations. These include the need for additional computational parameters, which complicate use in practice, limitations of the underlying basis sets used for discretization, subtleties in choosing the numbers and/or centers of localized orbitals for different physical systems, scalability on parallel computing platforms due to complex communication patterns and challenges in load balancing, and calculation of accurate atomic forces and stresses as employed in structural relaxation and QMD simulations.^{4,5,15} Perhaps most importantly, the study of systems with partially occupied Kohn–Sham orbitals, as encountered in metallic systems or insulating systems at high temperatures, remains particularly challenging.

Kohn–Sham QMD simulations at high temperatures are employed in a variety of application areas, such as warm dense matter and dense plasmas, as occurring in fusion energy research and the inner regions of giant planets and stars.^{16–21} However, such calculations pose unique challenges in addition to those described above for ambient conditions. In particular, the number of orbitals that need to be computed increases with temperature due to the increase in the number of states that become partially occupied in the Fermi–Dirac distribution, which advances the onset of the cubic scaling bottleneck in diagonalization based methods, i.e., methods that calculate the Kohn–Sham orbitals. In addition, these orbitals are more diffuse since higher-energy states become less localized. As a result, local-orbital based linear scaling methods suffer from large prefactors that increase rapidly with temperature. Consequently, Kohn–Sham QMD simulations at high temperatures become impractical using either of these approaches.

There has been recent effort in addressing the aforementioned challenges associated with high temperature calculations, including orbital-free molecular dynamics (OFMD),^{22} in which a functional of the electron density is used to approximate the electronic kinetic energy, extended first principles molecular dynamics (ext-FPMD),^{23,24} in which plane waves are used to approximate the higher-energy orbitals, finite-temperature potential functional theory (PFT),^{25} in which a coupling-constant formalism is used to develop an orbital-free approximation for the free energy functional, stochastic DFT (SDFT),^{26,27} in which the density is computed directly from the Kohn–Sham Hamiltonian, without diagonalization, by averaging over multiple stochastic samples, and a mixed stochastic–deterministic approach (MDFT),^{28} in which the advantageous features of the deterministic and stochastic approaches are leveraged by suitable partitioning of the eigenspectrum. To address scaling with both system size and temperature, while retaining full Kohn–Sham accuracy and systematic convergence for metals and insulators similarly, the Spectral Quadrature (SQ) method for linear scaling Kohn–Sham calculations at high temperatures was recently developed.^{29–31} In particular, the computational cost of the SQ method decreases with increasing temperature, a consequence of the faster decay of the density matrix, i.e., electronic interactions becoming more localized, and the increase in the smoothness of the Fermi–Dirac distribution. In addition, the SQ method has excellent scaling in parallel computations since the communication pattern remains fixed and well localized to nearby processors throughout the computation. However, while the SQ method has proven highly accurate and efficient in applications reaching millions of kelvin,^{32,33} the associated prefactor becomes larger at less extreme temperatures, e.g., $O$(10000)–$O$(250000) K, particularly when large numbers of grid points per atom are required.

In this work, we present a method, which we call SQ3, that is accurate and efficient at temperatures too high for efficient calculations using conventional diagonalization based methods but too low for efficient calculations using the SQ method. The combination of conventional diagonalization based methods at low temperatures, SQ3 at moderately high temperatures [e.g., $O$(10000)–$O$(250000) K], and SQ at higher temperatures then brings accurate and efficient Kohn–Sham calculations to the full range of temperatures from ambient to millions of kelvin. As we detail below, the key idea of the method is to employ spectral quadrature to compute the density kernel—i.e., density operator in a minimal orthonormal basis—rather than required parts of the full density matrix—i.e., density operator on the real-space grid—as in the SQ method. In doing so, the need for diagonalization is eliminated and key operations are reduced to vectors and matrices of dimensions equal to the number of occupied states rather than the number of real-space grid points. We implement the method in the SPARC electronic structure code,^{34–36} where we find systematic convergence to exact diagonalization results and significant speedups relative to conventional diagonalization based methods.

The remainder of this paper is organized as follows. In Secs. II and III, we present real-space expressions for the electron density, electronic free energy, Hellmann–Feynman atomic forces, and Hellmann–Feynman stress tensor in terms of the density operator and density kernel, respectively. In Sec. IV, we describe the formulation and implementation of the proposed SQ3 method, whose accuracy and efficiency are verified in Sec. V. Finally, we provide concluding remarks in Sec. VI.

## II. REAL-SPACE DFT: DENSITY OPERATOR FORMULATION

Consider a unit cell Ω with nuclei positioned at **R** = {**R**_{1}, **R**_{2}, …, **R**_{N}} and a total of *N*_{e} electrons. Neglecting spin and Brillouin zone integration, the single-particle density operator $D$ in Kohn–Sham DFT^{1,2} can be written as

or, in real space,

where *N*_{s} is the number of occupied states and *ψ*_{i} are the Kohn–Sham orbitals with energies *λ*_{i} and occupations *g*_{i} given by the Fermi–Dirac function *g*,

Here, *σ* = *k*_{B}*T* is the smearing value, where *k*_{B} is Boltzmann’s constant and *T* is the electronic temperature, and *μ* is the Fermi level determining the total number of electrons,

where Tr(·) denotes the trace of the operator.

The Kohn–Sham orbitals, energies, and occupations are solutions to the nonlinear eigenvalue problem,

where $H$ denotes the Hamiltonian operator, *V*_{xc} is the exchange–correlation potential, taken in the local density approximation (LDA)^{2} in the present work, *ϕ* is the electrostatic potential,^{38–39} *V*_{nl} is the nonlocal pseudopotential operator, and $\rho D$ is the electron density,

The electrostatic potential is the solution of the Poisson problem,^{35,37,40}

where *b* is the total pseudocharge density. In addition, the nonlocal pseudopotential operator in the Kleinman–Bylander form^{41} is given by

where the summation index *J* extends over all atoms in Ω, *l* and *m* are azimuthal and magnetic quantum numbers, respectively, and $\chi \u0303Jlm=\u2211J\u2032\chi J\u2032lm$ are periodically extended nonlocal projectors, with $\chi J\u2032lm$ being the projectors of the *J*th atom and *J*′ running over the *J*th atom and its periodic images.

The density operator can be written in terms of the Hamiltonian as

where $I$ is the identity operator. Once the electronic ground state has been determined through the self-consistent solution of the above equation, the electronic free energy can be written as^{30,31}

where the first term is referred to as the band structure energy (*E*_{band}), the last term is the electronic entropy energy (*E*_{ent}) associated with partial occupations, *E*_{self} and *E*_{c} are the self and overlap energy corrections, respectively, associated with the pseudocharges,^{40,42} and *E*_{xc} is the exchange–correlation energy,

Here, *ɛ*_{xc} is the sum of the exchange and correlation energy per particle of a uniform electron gas.

The Hellman–Feynman forces on the nuclei can be written as^{30,31}

where $fsc,I=\u2212\u2202(\u2212Eself(R)+Ec(R))\u2202RI$ (see Ref. 43 for an explicit expression), *b*_{I′} is the pseudocharge density of the *I*′th nucleus that generates potential *V*_{I′}, the summation index *I*′ runs over the *I*th atom and its periodic images, and *I* extends over all atoms in Ω. The first two terms together constitute the local component of the force $(fIl)$, and the last term is the nonlocal component of the force $(fInl)$.

The Hellmann–Feynman stress tensor can be written as^{44}

where |Ω| is the volume of the unit cell, and $\sigma \alpha \beta I$, $\sigma \alpha \beta II$, $\sigma \alpha \beta III$, and $\sigma \alpha \beta IV$ are the contributions arising from the electronic kinetic energy, exchange–correlation energy *E*_{xc}, nonlocal pseudopotential energy *E*_{nl}, and the total electrostatic energy, respectively,

In the above equations, $\u2207x\alpha $ is the *α*th component of the gradient vector *∇*_{x}, *δ*_{αβ} is the Kronecker delta function, $\sigma \alpha \beta Ec$ is the stress tensor correction corresponding to overlapping pseudocharges,^{45} the summation index *J* runs over all atoms in Ω, *J*′ runs over the *J*th atom and its periodic images, and *I* extends over all atoms in $R3$. Note that upon substitution of Eq. (2) into the above expressions for the energy, atomic force, and stress tensor, the corresponding expressions in terms of Kohn–Sham orbitals^{35,36,45} are readily obtained.

## III. REAL-SPACE DFT: DENSITY KERNEL FORMULATION

In this section, we formulate the electron density, electronic free energy, Hellmann–Feynman force, and Hellmann–Feynman stress tensor in terms of the density kernel,^{10} which in the current context is the matrix $Ds=(Dijs)$ corresponding to the single-particle density operator $D$ expressed in an orthonormal auxiliary orbital basis ${\phi i(x)}i=1Ns$, i.e., $Dijs=\u3008\phi i|D|\phi j\u3009$. In particular,

which corresponds to a unitary transformation of the Kohn–Sham orbitals,

where **Q** = (*Q*_{ij}) is the orthogonal matrix that diagonalizes **D**_{s},

Hence, the auxiliary orbitals ${\phi i(x)}i=1Ns$ span the same subspace as the Kohn–Sham orbitals ${\psi i(x)}i=1Ns$.

Let the subspace Hamiltonian $Hs=(Hijs)$ be the matrix representation of $H$ in the orthonormal basis ${\phi i(x)}i=1Ns$, i.e., $Hijs=\u3008\phi i|H|\phi j\u3009$. It follows that the density kernel can be expressed in terms of **H**_{s} as

where **I** is the identity matrix, and the Fermi level *μ* is determined from the constraint on the total number of electrons [Eq. (4)], which can be written in terms of the density kernel as

where tr(·) denotes the trace. In arriving at this equation from Eq. (4), we have used the following relation:

To make the expressions for electron density, atomic force, and stress analogous to those based on Kohn–Sham orbitals in the SPARC electronic structure code,^{34–36} into which we implement the proposed scheme, we introduce the density kernel transformed auxiliary orbitals ${\phi \u0303i(x)}i=1Ns$,

Here, the density operator in Eq. (18) takes the form

Thereafter, the electron density in Eq. (6) can be written as

Once the electronic ground state has been determined, the electronic free energy can be written as

where the first and last terms, i.e., the band structure (*E*_{band}) and electronic entropy (*E*_{ent}) energies, have been obtained using orthogonality and trace relations as in Eq. (23).

The nonlocal component of the atomic force in Eq. (12) can be rewritten as

Thereafter, the total Hellmann–Feynman atomic forces take the form

The stress tensor contributions arising from the electronic kinetic energy and nonlocal pseudopotential energy in Eqs. (14) and (16), respectively, can be rewritten as

and

Thereafter, the Hellmann–Feynman stress tensor takes the form

## IV. SQ3 METHOD: FORMULATION AND IMPLEMENTATION

In this section, we describe the calculation of the auxiliary orbitals ${\phi i(x)}i=1Ns$ and density kernel transformed auxiliary orbitals ${\phi \u0303i(x)}i=1Ns$ in each iteration of the self-consistent field (SCF) method,^{46} using which the electron density, electronic free energy, Hellmann–Feynman atomic forces, and Hellmann–Feynman stress tensor can be computed as discussed above. We refer to the resulting method as SQ3, given its inherent similarity in philosophy and formulation to the linear scaling Spectral Quadrature (SQ) method,^{29–31} with notable differences that the SQ3 method does not assume the subspace Hamiltonian to be sparse and does not employ truncation in the calculation of the density kernel, making the computational cost scale cubically with *N*_{s} rather than linearly.

In the SQ3 method, we perform Chebyshev filtering^{47,48} on the auxiliary orbitals from the previous SCF iteration followed by orthonormalization to generate the auxiliary orbitals ${\phi i(x)}i=1Ns$ for the current SCF iteration. To calculate ${\phi \u0303i(x)}i=1Ns$, the density kernel **D**_{s} needs to be determined [Eq. (24)], which we recall can be written in terms of the subspace Hamiltonian **H**_{s}—projection of the Hamiltonian $H$ onto the basis ${\phi i(x)}i=1Ns$—using the Fermi–Dirac function *g* [Eq. (21)],

One possible strategy to determine **D**_{s} is to perform an eigendecomposition of **H**_{s}, an approach that would also make the Kohn–Sham orbitals available. However, this method not only scales cubically with *N*_{s}, but also has limited scalability in parallel computations, which limits the minimum time to solution that can be reached in DFT simulations.^{34} To delay this bottleneck for calculations at high temperatures, we perform a Fermi Operator Expansion (FOE) of the density kernel in terms of Chebyshev polynomials,^{49,50}

where the prime on the summation indicates that the first term is halved, *n*_{pl} is the degree of the expansion, *T*_{j} denotes the Chebyshev polynomial of degree *j*, $\chi =(\lambda Ns+\lambda 1)/2,\xi =(\lambda Ns\u2212\lambda 1)/2$, and $H\u0302s=Hs\u2212\chi I/\xi $ is the scaled and shifted subspace Hamiltonian whose spectrum lies in the interval [−1, 1]. The coefficients *c*_{j} in the Chebyshev expansion can be determined using the relation

Analogous to the Clenshaw–Curtis SQ method,^{30,31} the *n*th column of the density kernel in the SQ3 method is written as

where **e**_{n} is the standard basis vector, and $tnj$ is the *n*th column of $Tj(H\u0302s)$, determined using the three-term recurrence relation for Chebyshev polynomials,

The coefficients *c*_{j} in Eq. (36) are determined using Eq. (35), with the Fermi level *μ* chosen such that the constraint on the number of electrons is satisfied [Eq. (22)],

Note that to avoid performing the iteration in Eq. (37), the most expensive operations of the SQ3 method, for every guess *μ* encountered during the solution of Eq. (38), we store the vectors $tnj$, whereby *μ* can be efficiently determined.

Once the density kernel has been so calculated, ${\phi n(x)}n=1Ns$ are transformed by the density kernel to obtain ${\phi \u0303n(x)}n=1Ns$ [Eq. (24)]. The electron density is then calculated using Eq. (26), and the electronic free energy is evaluated using Eq. (27) while employing the following expressions for the band structure and electronic entropy energies:

where the coefficients are calculated using the relations

In arriving at the above expressions, we have performed Chebyshev polynomial expansions for *E*_{band} and *E*_{ent}, similar to that done for the density kernel [Eq. (34)]. At the electronic ground state, the Hellmann–Feynman atomic forces and stress tensor are computed using Eqs. (29) and (32), respectively.

We have implemented the SQ3 method in the SPARC electronic structure code.^{34–36} In particular, we build upon the implementation of the CheFSI method^{47,48} in SPARC, which shares a number of computational kernels with the SQ3 method. Specifically, the Chebyshev filtering, orthogonalization, and projection kernels are used in the SQ3 implementation. In particular, we start the QMD simulation with a random guess for the auxiliary orbital basis and perform multiple Chebyshev filtering and orthogonalization steps in the first SCF iteration.^{51} For every subsequent QMD step, we use the auxiliary orbital basis generated in the last SCF iteration of the previous QMD step as the initial guess. We calculate the density kernel **D**_{s} using the iteration in Eq. (37) while parallelizing computations over the different columns of the density kernel. Such a scheme restricts the communication to only that required for the calculation of the Fermi level [Eq. (38)], while the computationally intensive step [Eq. (37)] is free from any data transfer between processors. To calculate ${\phi \u0303n(x)}n=1Ns$ from ${\phi n(x)}n=1Ns$ [Eq. (24)], we perform a dense matrix–matrix multiplication using the parallel PDGEMM routine in ScaLAPACK.^{52} In doing so, we first use the PDGEMR2D routine to redistribute the density kernel from columnwise distribution to two-dimensional block-cyclic distribution, as required by PDGEMM.

The degree of the Chebyshev polynomial *n*_{pl} required for a given accuracy is dependent on the smearing *σ*, spectral width of the subspace Hamiltonian **H**_{s} (i.e., 2*ξ*), and the relative location of the Fermi level *μ* within the spectrum of **H**_{s}.^{29} In particular, the value of *n*_{pl} required decreases with increasing temperature and decreasing spectral width, a consequence of the increased smoothness of the Fermi–Dirac function and smaller interval over which it must be evaluated. Since the target application for the SQ3 method is systems at high temperatures, and the spectral width of **H**_{s} is only slightly more than that of the occupied spectrum, low polynomial orders generally suffice so that the SQ3 method can be highly efficient, as we demonstrate below. Furthermore, since computations at each real-space grid point are largely independent, the SQ3 method can attain excellent parallel scaling, as also demonstrated below. It is worth noting that since **H**_{s} is dense, the computational cost of the SQ3 method as described above has $O(Ns3)$ scaling, similar to that for eigendecomposition, but with a smaller prefactor that decreases with increasing temperature. Indeed, it is possible to achieve $O(Ns)$ scaling, as in the SQ method, if the auxiliary orbitals are such that **H**_{s} is sparse and truncation is adopted based on the decay of the density kernel.

## V. RESULTS AND DISCUSSION

In this section, we verify the accuracy and efficiency of the SQ3 method for the calculation of the electronic free energy, Hellmann–Feynman atomic forces, and Hellmann–Feynman stress tensor by comparison to conventional diagonalization based methods. For this purpose, we have implemented the SQ3 method in the SPARC electronic structure code.^{34–36} We consider aluminum at density 2.7 g/cc with temperatures ranging from *T* = 10000 to 250000 K. We demonstrate the practical utility of the method by calculating the self-diffusion coefficient and viscosity of aluminum at temperature *T* = 116 045 K (*k*_{B}*T* = 10 eV). In all simulations, we employ a twelfth-order accurate real-space finite-difference discretization, the local density approximation^{2} for exchange–correlation interactions, a three-electron optimized norm conserving Vanderbilt (ONCV) pseudopotential^{53} suitable for the range of temperatures considered, Γ-point Brillouin sampling, the restarted Periodic Pulay method^{54,55} for acceleration of the SCF iteration, and the alternating Anderson–Richardson (AAR) linear solver^{56,57} for calculation of the electrostatic potential and application of the real-space Kerker preconditioner.^{58}

### A. Accuracy and convergence

We first verify the accuracy and convergence of the SQ3 formulation and implementation by considering a 24-atom aluminum cell at four different temperatures: *T* = 10 000, 50 000, 100 000, and 250 000 K. In each system, all atoms are randomly perturbed by up to 10% of the nearest neighbor distance in an FCC configuration. We choose a mesh size of 0.5 bohr to put associated discretization errors within chemical accuracy. In Fig. 1, we plot the convergence of the electronic ground state free energy, Hellmann–Feynman atomic forces, and Hellmann–Feynman stress tensor with respect to *n*_{pl}, i.e., degree of polynomial used in the Chebyshev polynomial expansions for the density kernel, band structure energy, and electronic entropy energy. The error is defined to be the difference from standard diagonalization results in the SPARC code. The results show exponential convergence of the energy, atomic forces, and stress tensor with respect to *n*_{pl}, with degree $O(10\u221230)$ sufficient at the temperatures considered to attain accuracies typical in practice. The results also show that the polynomial degree required decreases with increasing temperature as expected.

### B. Performance and scaling

We next investigate the efficiency of SQ3 relative to diagonalization for Kohn–Sham calculations at high temperatures. For this purpose, we choose four FCC aluminum systems: (i) 1568-atom cell at *T* = 10000 K, (ii) 500-atom cell at *T* = 50000 K, (iii) 256-atom cell at *T* = 100000 K, and (iv) 64-atom cell at *T* = 250000 K, all with *N*_{s} = 10000 and atoms randomly perturbed by up to 10% of the nearest neighbor distance in the FCC configuration. The system sizes have been chosen to be comparable to those typical in QMD simulations at the associated temperatures. In all cases, we select a mesh size of 0.75 bohr to put associated discretization errors within chemical accuracy, i.e., 0.001 Ha/atom, 0.001 Ha/bohr, and 1% in the energy, force, and stress, respectively. In addition, we select *n*_{pl} = 33, 12, 10, and 8 in SQ3 for the Al_{1568}, Al_{500}, Al_{256}, and Al_{64} systems, respectively, to reduce *n*_{pl} related errors to chemical accuracy also (Fig. 1). All simulations have been performed on the phoenix cluster at the Georgia Institute of Technology.^{59}

In Table I, we compare the strong scaling performance of SQ3 and CheFSI based diagonalization methods.^{47,48} In particular, we consider numbers of processors ranging from 24 to 1000 and report the time taken per QMD step, along with the time taken for subspace diagonalization using ELPA^{60}—a recently developed library that has been found more efficient and scalable than other widely used libraries, such as ScaLAPACK,^{52} in the electronic structure context—and density kernel calculation in the diagonalization and SQ3 methods, respectively. Indeed, these are the main computational kernels that are distinct between the two methods. We observe from the results that the SQ3 method scales up to $O(1,000)$ processors, with further reduction in the wall time possible for the density kernel calculation when more processors are utilized. The density kernel calculation does not scale ideally, a consequence of the reduced effectiveness of the BLAS3 operations as the number of columns of the density kernel associated with each processor becomes smaller. As expected, the speedup provided by the SQ3 method over diagonalization increases with increasing temperature, as the value of *n*_{pl} required becomes smaller. In particular, while the minimum times to solution for diagonalization and SQ3 are similar for Al_{1568} (10000 K), SQ3 achieves an overall speedup of $\u223c1.3$, $\u223c1.5$, and $\u223c2.1$ for Al_{500}, Al_{256}, and Al_{64}, respectively, with corresponding speedups of the density kernel calculation over subspace diagonalization of $\u223c2.5$, $\u223c2.7$, and $\u223c3.5$, respectively.

. | Al_{1568}, T = 10 000 K
. | Al_{500}, T = 50 000 K
. | Al_{256}, T = 100 000 K
. | Al_{64}, T = 250 000 K
. | ||||
---|---|---|---|---|---|---|---|---|

np . | Diag . | SQ3 . | Diag . | SQ3 . | Diag . | SQ3 . | Diag . | SQ3 . |

24 | 1757.2 (84.8) | 1970.1 (297.7) | 580.5 (84.8) | 598.7 (103.1) | 383.5 (84.8) | 382.1 (83.4) | 173.6 (84.8) | 151.4 (62.6) |

48 | 844.0 (49.8) | 964.7 (170.5) | 345.4 (49.8) | 355.6 (60) | 218.0 (49.8) | 220.6 (52.4) | 104.9 (49.8) | 93.4 (38.3) |

96 | 518.6 (31.7) | 568.6 (81.8) | 192.7 (31.7) | 188.0 (27.1) | 124.7 (31.7) | 116.5 (23.5) | 64.1 (31.7) | 51.1 (18.8) |

192 | 307.4 (27.0) | 332.9 (52.6) | 118.1 (27.0) | 110.0 (18.9) | 79.5 (27.0) | 68.1 (15.6) | 48.4 (27.0) | 33.6 (12.3) |

384 | 224.2 (27.0) | 236.0 (38.9) | 81.6 (27.0) | 68.0 (13.5) | 59.7 (27.0) | 44.8 (12.1) | 40.2 (27.0) | 22.5 (9.3) |

500 | 178.2 (27.0) | 187.0 (35.8) | 73.3 (27.0) | 58.7 (12.4) | 54.8 (27.0) | 38.9 (11.1) | 38.9 (27.0) | 20.6 (8.8) |

768 | 170.7 (27.0) | 176.2 (32.6) | 67.1 (27.0) | 51.5 (11.4) | 52.5 (27.0) | 35.8 (10.4) | 38.8 (27.0) | 20.3 (8.6) |

1000 | 173.4 (27.0) | 177.2 (30.8) | 61.8 (27.0) | 45.5 (10.8) | 49.7 (27.0) | 32.6 (9.9) | 37.5 (27.0) | 18.2 (7.8) |

. | Al_{1568}, T = 10 000 K
. | Al_{500}, T = 50 000 K
. | Al_{256}, T = 100 000 K
. | Al_{64}, T = 250 000 K
. | ||||
---|---|---|---|---|---|---|---|---|

np . | Diag . | SQ3 . | Diag . | SQ3 . | Diag . | SQ3 . | Diag . | SQ3 . |

24 | 1757.2 (84.8) | 1970.1 (297.7) | 580.5 (84.8) | 598.7 (103.1) | 383.5 (84.8) | 382.1 (83.4) | 173.6 (84.8) | 151.4 (62.6) |

48 | 844.0 (49.8) | 964.7 (170.5) | 345.4 (49.8) | 355.6 (60) | 218.0 (49.8) | 220.6 (52.4) | 104.9 (49.8) | 93.4 (38.3) |

96 | 518.6 (31.7) | 568.6 (81.8) | 192.7 (31.7) | 188.0 (27.1) | 124.7 (31.7) | 116.5 (23.5) | 64.1 (31.7) | 51.1 (18.8) |

192 | 307.4 (27.0) | 332.9 (52.6) | 118.1 (27.0) | 110.0 (18.9) | 79.5 (27.0) | 68.1 (15.6) | 48.4 (27.0) | 33.6 (12.3) |

384 | 224.2 (27.0) | 236.0 (38.9) | 81.6 (27.0) | 68.0 (13.5) | 59.7 (27.0) | 44.8 (12.1) | 40.2 (27.0) | 22.5 (9.3) |

500 | 178.2 (27.0) | 187.0 (35.8) | 73.3 (27.0) | 58.7 (12.4) | 54.8 (27.0) | 38.9 (11.1) | 38.9 (27.0) | 20.6 (8.8) |

768 | 170.7 (27.0) | 176.2 (32.6) | 67.1 (27.0) | 51.5 (11.4) | 52.5 (27.0) | 35.8 (10.4) | 38.8 (27.0) | 20.3 (8.6) |

1000 | 173.4 (27.0) | 177.2 (30.8) | 61.8 (27.0) | 45.5 (10.8) | 49.7 (27.0) | 32.6 (9.9) | 37.5 (27.0) | 18.2 (7.8) |

Overall, we find that the SQ3 method is able to delay the cubic scaling bottleneck for Kohn–Sham calculations at high temperatures, with increasing advantages as the temperature and/or number of processors is increased. Implementation of the SQ3 method on graphics processing units (GPUs) is likely to bring down the minimum time to solution substantially, which would then merit comparison with efficient eigensolver implementations on GPUs, such as that found in the cuSOLVER library.^{61} We note that we have verified that alternate forms of parallelization, such as block-cyclic decomposition of the $H\u0302s$ and $TjH\u0302s$ matrices—matrix–matrix multiplication routine replacing the iteration in Eq. (37), as in conventional FOE methods—are not as efficient as the approach adopted here ($\u223c1.3$× slower). It is also worth noting that since the SQ3 method does not employ truncation and has cubic scaling with *N*_{s}, the linear scaling SQ method^{29–31} would become the method of choice at temperatures higher than those considered here.

### C. High temperature QMD: Self-diffusion coefficient and viscosity of aluminum at 116 045 K

We now calculate the self-diffusion coefficient and viscosity of aluminum at density 2.7 g/cc and temperature 116045 K (*k*_{B}*T* = 10 eV). Specifically, we consider a 108-atom unit cell in the isokinetic ensemble using a Gaussian thermostat.^{62} In order to efficiently obtain sufficient statistics, we average over QMD simulations corresponding to 20 different initial conditions for the atomic positions and velocities—obtained by performing 20 orbital-free DFT^{40,42} simulations, each run for 40000 steps with a time step of 0.15 fs—where each simulation has been run for more than 24 000 steps with a time step of 0.15 fs. In Fig. 2, we present the variation in the total free energy, i.e., including the electronic entropy, during the QMD simulations. It is clear that the energy is well conserved, consistent with the accuracy and systematic convergence of the SQ3 force formulation/implementation.

Since macroscopic dynamical properties can be written as time integrals of associated microscopic time correlation functions using Green–Kubo (GK) relations,^{63} we calculate the self-diffusion coefficient *D* and viscosity *η* using the expressions

where ⟨·⟩ denotes the ensemble average, *v*_{i} is the velocity, and *s*_{i} are the independent components of the deviatoric (i.e., traceless) stress tensor: *σ*_{12}, *σ*_{23}, *σ*_{31}, (*σ*_{11} − *σ*_{22})/2, and (*σ*_{22} − *σ*_{33})/2.^{64} We present the results so obtained in Fig. 3. It is clear that the both the velocity autocorrelation function (VACF) [i.e., $1N\u2211i=1N\u27e8vi(\tau )\u22c5vi(0)\u27e9$] and stress autocorrelation function (SACF) [i.e., $15\u2211i=15\u27e8si(\tau )si(0)\u27e9$] decay in $\u223c100$ fs, yielding a self-diffusion coefficient and viscosity of (0.984 ± 0.001) ×10^{−2} cm^{2}/s and 1.943 ± 0.015 mPa s, respectively. The present full Kohn–Sham DFT result for the self-diffusion coefficient is consistent with recent orbital-free DFT calculations,^{65} where a value of 0.93 ×10^{−2} cm^{2}/s was reported, while being free of the kinetic and entropic energy approximations inherent in orbital-free DFT.

## VI. CONCLUDING REMARKS

We presented the SQ3 method, a density matrix based method for Kohn–Sham calculations at high temperatures that eliminates the need for diagonalization, thus reducing the cost of such calculations significantly relative to conventional diagonalization based approaches. We developed real-space expressions for the electron density, electronic free energy, Hellmann–Feynman forces, and Hellmann–Feynman stress tensor in terms of an orthonormal auxiliary orbital basis and its density kernel transform. Using Chebyshev filtering to generate the auxiliary basis, we then developed an approach akin to Clenshaw–Curtis spectral quadrature to compute the individual columns of the density kernel based on the Fermi operator expansion in Chebyshev polynomials and employed a similar approach to evaluate band structure and entropic energy components. Upon implementation of the method in the SPARC electronic structure code,^{34–36} we found systematic convergence to exact diagonalization results and significant speedups relative to conventional diagonalization based methods of up to $\u223c2$×, with increasing advantages as the temperature and/or the number of processors is increased. Finally, we employed the new method to compute the self-diffusion coefficient and viscosity of aluminum at 116 045 K from Kohn–Sham quantum molecular dynamics, where we found agreement with previous more approximate orbital-free density functional methods.

The combination of conventional diagonalization based methods at low temperatures, SQ3 at moderately high temperatures [e.g., $O$(10000)–$O$(250000) K], and SQ at higher temperatures enables accurate and efficient Kohn–Sham calculations over the full range of temperatures from ambient to millions of kelvin. The use of a localized orthonormal basis as in the discrete discontinuous basis projection (DDBP) method,^{66} in combination with a GPU implementation, is likely to further increase the efficiency of such calculations, making it a worthy subject for future research.

## ACKNOWLEDGMENTS

This work was performed, in part, under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract No. DE-AC52-07NA27344. Q.X., X.J., B.Z., and P.S. gratefully acknowledge support from the U.S. Department of Energy, Office of Science, under Grant No. DE-SC0019410. J.E.P. gratefully acknowledges support from the ASC/PEM program at LLNL. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Department of Energy, or the U.S. Government.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.