Kohn–Sham density functional theory calculations using conventional diagonalization based methods become increasingly expensive as temperature increases due to the need to compute increasing numbers of partially occupied states. We present a density matrix based method for Kohn–Sham calculations at high temperatures that eliminates the need for diagonalization entirely, thus reducing the cost of such calculations significantly. Specifically, we develop real-space expressions for the electron density, electronic free energy, Hellmann–Feynman forces, and Hellmann–Feynman stress tensor in terms of an orthonormal auxiliary orbital basis and its density kernel transform, the density kernel being the matrix representation of the density operator in the auxiliary basis. Using Chebyshev filtering to generate the auxiliary basis, we next develop an approach akin to Clenshaw–Curtis spectral quadrature to calculate the individual columns of the density kernel based on the Fermi operator expansion in Chebyshev polynomials and employ a similar approach to evaluate band structure and entropic energy components. We implement the proposed formulation in the SPARC electronic structure code, using which we show systematic convergence of the aforementioned quantities to exact diagonalization results, and obtain significant speedups relative to conventional diagonalization based methods. Finally, we employ the new method to compute the self-diffusion coefficient and viscosity of aluminum at 116 045 K from Kohn–Sham quantum molecular dynamics, where we find agreement with previous more approximate orbital-free density functional methods.

Over the past few decades, Kohn–Sham density functional theory (DFT)1,2 has established itself as a powerful framework for understanding and predicting a wide range of material properties, from the first principles of quantum mechanics, without any empirical or ad hoc parameters. The ubiquitous use of DFT is a consequence of its simplicity, generality, and high accuracy-to-cost ratio compared to other such first principles methods. However, the solution of the underlying nonlinear eigenvalue problem for the Kohn–Sham orbitals remains a challenging task, with the computational cost and memory requirements scaling cubically and quadratically with the system size, respectively.3–5 Moreover, the orthogonality constraint on the orbitals translates to significant global communications in parallel computing, limiting the minimum time to solution that can be attained. This can become particularly important for quantum molecular dynamics (QMD) simulations,6,7 wherein tens or hundreds of thousands of such Kohn–Sham solutions may be required to complete a single simulation.

To overcome the cubic scaling bottleneck in DFT calculations, significant effort has been directed toward the development of methods that scale linearly with the system size (see, e.g., Refs. 3–5 and references therein), in both computational cost and computer memory, which have culminated in a number of mature codes, e.g., Refs. 8–14. While major advances have been achieved, a number of challenges remain for linear scaling methods and their implementations. These include the need for additional computational parameters, which complicate use in practice, limitations of the underlying basis sets used for discretization, subtleties in choosing the numbers and/or centers of localized orbitals for different physical systems, scalability on parallel computing platforms due to complex communication patterns and challenges in load balancing, and calculation of accurate atomic forces and stresses as employed in structural relaxation and QMD simulations.4,5,15 Perhaps most importantly, the study of systems with partially occupied Kohn–Sham orbitals, as encountered in metallic systems or insulating systems at high temperatures, remains particularly challenging.

Kohn–Sham QMD simulations at high temperatures are employed in a variety of application areas, such as warm dense matter and dense plasmas, as occurring in fusion energy research and the inner regions of giant planets and stars.16–21 However, such calculations pose unique challenges in addition to those described above for ambient conditions. In particular, the number of orbitals that need to be computed increases with temperature due to the increase in the number of states that become partially occupied in the Fermi–Dirac distribution, which advances the onset of the cubic scaling bottleneck in diagonalization based methods, i.e., methods that calculate the Kohn–Sham orbitals. In addition, these orbitals are more diffuse since higher-energy states become less localized. As a result, local-orbital based linear scaling methods suffer from large prefactors that increase rapidly with temperature. Consequently, Kohn–Sham QMD simulations at high temperatures become impractical using either of these approaches.

There has been recent effort in addressing the aforementioned challenges associated with high temperature calculations, including orbital-free molecular dynamics (OFMD),22 in which a functional of the electron density is used to approximate the electronic kinetic energy, extended first principles molecular dynamics (ext-FPMD),23,24 in which plane waves are used to approximate the higher-energy orbitals, finite-temperature potential functional theory (PFT),25 in which a coupling-constant formalism is used to develop an orbital-free approximation for the free energy functional, stochastic DFT (SDFT),26,27 in which the density is computed directly from the Kohn–Sham Hamiltonian, without diagonalization, by averaging over multiple stochastic samples, and a mixed stochastic–deterministic approach (MDFT),28 in which the advantageous features of the deterministic and stochastic approaches are leveraged by suitable partitioning of the eigenspectrum. To address scaling with both system size and temperature, while retaining full Kohn–Sham accuracy and systematic convergence for metals and insulators similarly, the Spectral Quadrature (SQ) method for linear scaling Kohn–Sham calculations at high temperatures was recently developed.29–31 In particular, the computational cost of the SQ method decreases with increasing temperature, a consequence of the faster decay of the density matrix, i.e., electronic interactions becoming more localized, and the increase in the smoothness of the Fermi–Dirac distribution. In addition, the SQ method has excellent scaling in parallel computations since the communication pattern remains fixed and well localized to nearby processors throughout the computation. However, while the SQ method has proven highly accurate and efficient in applications reaching millions of kelvin,32,33 the associated prefactor becomes larger at less extreme temperatures, e.g., O(10000)–O(250000) K, particularly when large numbers of grid points per atom are required.

In this work, we present a method, which we call SQ3, that is accurate and efficient at temperatures too high for efficient calculations using conventional diagonalization based methods but too low for efficient calculations using the SQ method. The combination of conventional diagonalization based methods at low temperatures, SQ3 at moderately high temperatures [e.g., O(10000)–O(250000) K], and SQ at higher temperatures then brings accurate and efficient Kohn–Sham calculations to the full range of temperatures from ambient to millions of kelvin. As we detail below, the key idea of the method is to employ spectral quadrature to compute the density kernel—i.e., density operator in a minimal orthonormal basis—rather than required parts of the full density matrix—i.e., density operator on the real-space grid—as in the SQ method. In doing so, the need for diagonalization is eliminated and key operations are reduced to vectors and matrices of dimensions equal to the number of occupied states rather than the number of real-space grid points. We implement the method in the SPARC electronic structure code,34–36 where we find systematic convergence to exact diagonalization results and significant speedups relative to conventional diagonalization based methods.

The remainder of this paper is organized as follows. In Secs. II and III, we present real-space expressions for the electron density, electronic free energy, Hellmann–Feynman atomic forces, and Hellmann–Feynman stress tensor in terms of the density operator and density kernel, respectively. In Sec. IV, we describe the formulation and implementation of the proposed SQ3 method, whose accuracy and efficiency are verified in Sec. V. Finally, we provide concluding remarks in Sec. VI.

Consider a unit cell Ω with nuclei positioned at R = {R1, R2, …, RN} and a total of Ne electrons. Neglecting spin and Brillouin zone integration, the single-particle density operator D in Kohn–Sham DFT1,2 can be written as

D=i=1Nsgiψiψi
(1)

or, in real space,

D(x,y)=xDy=i=1Nsgiψi(x)ψi(y),
(2)

where Ns is the number of occupied states and ψi are the Kohn–Sham orbitals with energies λi and occupations gi given by the Fermi–Dirac function g,

gi=g(λi,μ,σ)1+expλiμσ1.
(3)

Here, σ = kBT is the smearing value, where kB is Boltzmann’s constant and T is the electronic temperature, and μ is the Fermi level determining the total number of electrons,

2Tr(D)=Ne,
(4)

where Tr(·) denotes the trace of the operator.

The Kohn–Sham orbitals, energies, and occupations are solutions to the nonlinear eigenvalue problem,

H122+Vxc[ρD]+ϕ[ρD,R]+Vnl[R]ψi(x)=λiψi(x),
(5)

where H denotes the Hamiltonian operator, Vxc is the exchange–correlation potential, taken in the local density approximation (LDA)2 in the present work, ϕ is the electrostatic potential,38–39 Vnl is the nonlocal pseudopotential operator, and ρD is the electron density,

ρD(x)=2D(x,x).
(6)

The electrostatic potential is the solution of the Poisson problem,35,37,40

14π2ϕ(x,R)=ρD(x)+b(x,R),
(7)

where b is the total pseudocharge density. In addition, the nonlocal pseudopotential operator in the Kleinman–Bylander form41 is given by

Vnl=JVnl,J=JlmγJlχ̃Jlmχ̃Jlm,
(8)

where the summation index J extends over all atoms in Ω, l and m are azimuthal and magnetic quantum numbers, respectively, and χ̃Jlm=JχJlm are periodically extended nonlocal projectors, with χJlm being the projectors of the Jth atom and J′ running over the Jth atom and its periodic images.

The density operator can be written in terms of the Hamiltonian as

D=g(H,μ,σ)=I+expHμIσ1,
(9)

where I is the identity operator. Once the electronic ground state has been determined through the self-consistent solution of the above equation, the electronic free energy can be written as30,31

F(R)=2Tr(DH)+Exc[ρD]ΩVxc[ρD(x)]ρD(x)dx+12Ω(b(x,R)ρD(x))ϕ(x,R)dxEself(R)+Ec(R)+2σTrDlogD+(ID)log(ID),
(10)

where the first term is referred to as the band structure energy (Eband), the last term is the electronic entropy energy (Eent) associated with partial occupations, Eself and Ec are the self and overlap energy corrections, respectively, associated with the pseudocharges,40,42 and Exc is the exchange–correlation energy,

ExcρD=ΩεxcρD(x)ρD(x)dx.
(11)

Here, ɛxc is the sum of the exchange and correlation energy per particle of a uniform electron gas.

The Hellman–Feynman forces on the nuclei can be written as30,31

fI=fIl+fInl=IΩbI(x,RI)ϕ(x,R)dx+fsc,I4TrVnl,ID,
(12)

where fsc,I=(Eself(R)+Ec(R))RI (see Ref. 43 for an explicit expression), bI is the pseudocharge density of the I′th nucleus that generates potential VI, the summation index I′ runs over the Ith atom and its periodic images, and I extends over all atoms in Ω. The first two terms together constitute the local component of the force (fIl), and the last term is the nonlocal component of the force (fInl).

The Hellmann–Feynman stress tensor can be written as44 

σαβ=1|Ω|σαβI+σαβII+σαβIII+σαβIV,α,β{1,2,3},
(13)

where |Ω| is the volume of the unit cell, and σαβI, σαβII, σαβIII, and σαβIV are the contributions arising from the electronic kinetic energy, exchange–correlation energy Exc, nonlocal pseudopotential energy Enl, and the total electrostatic energy, respectively,

σαβI=2ΩyαyβD(y,x)y=xdx,
(14)
σαβII=δαβExc(ρD)ΩVxcρD(x)ρD(x)dx,
(15)
σαβIII=2δαβJlmγJlΩΩχ̃Jlm(x,RJ)D(x,y)χ̃Jlm(y,RJ)dxdy4JlmγJlJΩΩχJlm(x,RJ)xRJβxαD(x,y)χ̃Jlm(y,RJ)dxdy,
(16)
σαβIV=14πΩxαϕ(x,R)xβϕ(x,R)dx+IΩxαbI(x,RI)xRIβϕ(x,R)12VI(x,RI)dx12IΩxαVI(x,RI)xRIβbI(x,RI)dx+12δαβΩb(x,R)ρD(x)ϕ(x,R)dxδαβEself(R)+σαβEc.
(17)

In the above equations, xα is the αth component of the gradient vector x, δαβ is the Kronecker delta function, σαβEc is the stress tensor correction corresponding to overlapping pseudocharges,45 the summation index J runs over all atoms in Ω, J′ runs over the Jth atom and its periodic images, and I extends over all atoms in R3. Note that upon substitution of Eq. (2) into the above expressions for the energy, atomic force, and stress tensor, the corresponding expressions in terms of Kohn–Sham orbitals35,36,45 are readily obtained.

In this section, we formulate the electron density, electronic free energy, Hellmann–Feynman force, and Hellmann–Feynman stress tensor in terms of the density kernel,10 which in the current context is the matrix Ds=(Dijs) corresponding to the single-particle density operator D expressed in an orthonormal auxiliary orbital basis {φi(x)}i=1Ns, i.e., Dijs=φi|D|φj. In particular,

D=i=1Nsj=1NsφiDijsφj,D(x,y)=i=1Nsj=1Nsφi(x)Dijsφj(y),
(18)

which corresponds to a unitary transformation of the Kohn–Sham orbitals,

ψi(x)=j=1Nsφj(x)Qji,
(19)

where Q = (Qij) is the orthogonal matrix that diagonalizes Ds,

QTDsQ=diagg1,g2,,gNs.
(20)

Hence, the auxiliary orbitals {φi(x)}i=1Ns span the same subspace as the Kohn–Sham orbitals {ψi(x)}i=1Ns.

Let the subspace Hamiltonian Hs=(Hijs) be the matrix representation of H in the orthonormal basis {φi(x)}i=1Ns, i.e., Hijs=φi|H|φj. It follows that the density kernel can be expressed in terms of Hs as

Ds=g(Hs,μ,σ)=I+expHsμIσ1,
(21)

where I is the identity matrix, and the Fermi level μ is determined from the constraint on the total number of electrons [Eq. (4)], which can be written in terms of the density kernel as

2tr(Ds)=Ne,
(22)

where tr(·) denotes the trace. In arriving at this equation from Eq. (4), we have used the following relation:

Tr(D)=Tri=1Nsj=1NsφiDijsφj=i=1Nsj=1NsTrφiDijsφj=i=1Nsj=1NsDijsφj|φi=tr(Ds).
(23)

To make the expressions for electron density, atomic force, and stress analogous to those based on Kohn–Sham orbitals in the SPARC electronic structure code,34–36 into which we implement the proposed scheme, we introduce the density kernel transformed auxiliary orbitals {φ̃i(x)}i=1Ns,

φ̃i(x)=j=1Nsφj(x)Djis.
(24)

Here, the density operator in Eq. (18) takes the form

D=i=1Nsφ̃iφi,D(x,y)=i=1Nsφ̃i(x)φi(y).
(25)

Thereafter, the electron density in Eq. (6) can be written as

ρD(x)=2i=1Nsφ̃i(x)φi(x).
(26)

Once the electronic ground state has been determined, the electronic free energy can be written as

F(R)=2tr(DsHs)+ExcρDΩVxcρD(x)ρD(x)dx+12Ωb(x,R)ρD(x)ϕ(x,R)dxEself(R)+Ec(R)+2σtrDslogDs+(IDs)log(IDs),
(27)

where the first and last terms, i.e., the band structure (Eband) and electronic entropy (Eent) energies, have been obtained using orthogonality and trace relations as in Eq. (23).

The nonlocal component of the atomic force in Eq. (12) can be rewritten as

fInl=4TrVnl,ID=4TrVnl,Ii=1Nsφ̃iφi=4i=1NsφiVnl,Iφ̃i=4i=1NsφilmγIlχ̃Ilmχ̃Ilm|φ̃i=4i=1NslmγIlΩφi(x)χ̃Ilm(x,RI)dx×Ωφ̃i(x)χ̃Ilm(x,RI)dx.
(28)

Thereafter, the total Hellmann–Feynman atomic forces take the form

fI=IΩbI(x,RI)ϕ(x,R)dx+fsc,I4i=1NslmγIlΩφi(x)χ̃Ilm(x,RI)dx×Ωφ̃i(x)χ̃Ilm(x,RI)dx.
(29)

The stress tensor contributions arising from the electronic kinetic energy and nonlocal pseudopotential energy in Eqs. (14) and (16), respectively, can be rewritten as

σαβI=2ΩyαyβD(y,x)y=xdx=2Ωyαyβi=1Nsφ̃i(y)φi(x)y=xdx=2i=1NsΩφi(x)xαxβφ̃i(x)dx=2i=1NsΩxαφi(x)xβφ̃i(x)dx
(30)

and

σαβIII=2δαβJlmγJlΩΩχ̃Jlm(x,RJ)D(x,y)χ̃Jlm(y,RJ)dxdy4JlmγJlJΩΩχJlm(x,RJ)xRJβxαD(x,y)χ̃Jlm(y,RJ)dxdy=2δαβJlmγJlΩΩχ̃Jlm(x,RJ)i=1Nsφ̃i(x)φi(y)χ̃Jlm(y,RJ)dxdy4JlmγJlJΩΩχJlm(x,RJ)xRJβxαi=1Nsφ̃i(x)φi(y)χ̃Jlm(y,RJ)dxdy=2δαβi=1NsJlmγJlΩχ̃Jlm(x,RJ)φ̃i(x)dxΩχ̃Jlm(y,RJ)φi(y)dy4i=1NsJlmγJlJΩχJlm(x,RJ)xRJβxαφ̃i(x)dxΩχ̃Jlm(y,RJ)φi(y)dy.
(31)

Thereafter, the Hellmann–Feynman stress tensor takes the form

σαβ=1|Ω|2i=1NsΩxαφi(x)xβφ̃i(x)dx+δαβExc[ρD]ΩVxcρD(x)ρD(x)dx2δαβi=1NsJlmγJlΩχ̃Jlm(x,RJ)φ̃i(x)dxΩχ̃Jlm(y,RJ)φi(y)dy4i=1NsJlmγJlJΩχJlm(x,RJ)xRJβxαφ̃i(x)dxΩχ̃Jlm(y,RJ)φi(y)dy+14πΩxαϕ(x,R)xβϕ(x,R)dx+IΩxαbI(x,RI)xRIβϕ(x,R)12VI(x,RI)dx12IΩxαVI(x,RI)xRIβbI(x,RI)dx+12δαβΩb(x,R)ρD(x)ϕ(x,R)dxδαβEself(R)+σαβEc.
(32)

In this section, we describe the calculation of the auxiliary orbitals {φi(x)}i=1Ns and density kernel transformed auxiliary orbitals {φ̃i(x)}i=1Ns in each iteration of the self-consistent field (SCF) method,46 using which the electron density, electronic free energy, Hellmann–Feynman atomic forces, and Hellmann–Feynman stress tensor can be computed as discussed above. We refer to the resulting method as SQ3, given its inherent similarity in philosophy and formulation to the linear scaling Spectral Quadrature (SQ) method,29–31 with notable differences that the SQ3 method does not assume the subspace Hamiltonian to be sparse and does not employ truncation in the calculation of the density kernel, making the computational cost scale cubically with Ns rather than linearly.

In the SQ3 method, we perform Chebyshev filtering47,48 on the auxiliary orbitals from the previous SCF iteration followed by orthonormalization to generate the auxiliary orbitals {φi(x)}i=1Ns for the current SCF iteration. To calculate {φ̃i(x)}i=1Ns, the density kernel Ds needs to be determined [Eq. (24)], which we recall can be written in terms of the subspace Hamiltonian Hs—projection of the Hamiltonian H onto the basis {φi(x)}i=1Ns—using the Fermi–Dirac function g [Eq. (21)],

Ds=g(Hs,μ,σ)=I+expHsμIσ1.
(33)

One possible strategy to determine Ds is to perform an eigendecomposition of Hs, an approach that would also make the Kohn–Sham orbitals available. However, this method not only scales cubically with Ns, but also has limited scalability in parallel computations, which limits the minimum time to solution that can be reached in DFT simulations.34 To delay this bottleneck for calculations at high temperatures, we perform a Fermi Operator Expansion (FOE) of the density kernel in terms of Chebyshev polynomials,49,50

Dsj=0nplcj(μ,σ)TjHsχIξ=j=0nplcj(μ,σ)TjĤs,
(34)

where the prime on the summation indicates that the first term is halved, npl is the degree of the expansion, Tj denotes the Chebyshev polynomial of degree j, χ=(λNs+λ1)/2,ξ=(λNsλ1)/2, and Ĥs=HsχI/ξ is the scaled and shifted subspace Hamiltonian whose spectrum lies in the interval [−1, 1]. The coefficients cj in the Chebyshev expansion can be determined using the relation

cj(μ,σ)=2π11g(ξλ+χ,μ,σ)Tj(λ)1λ2dλ.
(35)

Analogous to the Clenshaw–Curtis SQ method,30,31 the nth column of the density kernel in the SQ3 method is written as

Dsenj=0nplcjtnj,
(36)

where en is the standard basis vector, and tnj is the nth column of Tj(Ĥs), determined using the three-term recurrence relation for Chebyshev polynomials,

tn0=en,tn1=Ĥsen,tnj+1=2Ĥstnjtnj1,j=1,2,,npl1.
(37)

The coefficients cj in Eq. (36) are determined using Eq. (35), with the Fermi level μ chosen such that the constraint on the number of electrons is satisfied [Eq. (22)],

2n=1Nsj=0nplcj(μ,σ)enTtnj=Ne.
(38)

Note that to avoid performing the iteration in Eq. (37), the most expensive operations of the SQ3 method, for every guess μ encountered during the solution of Eq. (38), we store the vectors tnj, whereby μ can be efficiently determined.

Once the density kernel has been so calculated, {φn(x)}n=1Ns are transformed by the density kernel to obtain {φ̃n(x)}n=1Ns [Eq. (24)]. The electron density is then calculated using Eq. (26), and the electronic free energy is evaluated using Eq. (27) while employing the following expressions for the band structure and electronic entropy energies:

Eband2n=1Nsj=0npldjenTtnj,
(39)
Eent2n=1Nsj=0nplejenTtnj,
(40)

where the coefficients are calculated using the relations

dj=2π11(ξλ+χ)g(ξλ+χ,μ,σ)Tj(λ)1λ2dλ,
(41)
ej=2π11(g(ξλ+χ,μ,σ)logg(ξλ+χ,μ,σ)+(1g(ξλ+χ,μ,σ))log(1g(ξλ+χ,μ,σ)))Tj(λ)1λ2dλ.
(42)

In arriving at the above expressions, we have performed Chebyshev polynomial expansions for Eband and Eent, similar to that done for the density kernel [Eq. (34)]. At the electronic ground state, the Hellmann–Feynman atomic forces and stress tensor are computed using Eqs. (29) and (32), respectively.

We have implemented the SQ3 method in the SPARC electronic structure code.34–36 In particular, we build upon the implementation of the CheFSI method47,48 in SPARC, which shares a number of computational kernels with the SQ3 method. Specifically, the Chebyshev filtering, orthogonalization, and projection kernels are used in the SQ3 implementation. In particular, we start the QMD simulation with a random guess for the auxiliary orbital basis and perform multiple Chebyshev filtering and orthogonalization steps in the first SCF iteration.51 For every subsequent QMD step, we use the auxiliary orbital basis generated in the last SCF iteration of the previous QMD step as the initial guess. We calculate the density kernel Ds using the iteration in Eq. (37) while parallelizing computations over the different columns of the density kernel. Such a scheme restricts the communication to only that required for the calculation of the Fermi level [Eq. (38)], while the computationally intensive step [Eq. (37)] is free from any data transfer between processors. To calculate {φ̃n(x)}n=1Ns from {φn(x)}n=1Ns [Eq. (24)], we perform a dense matrix–matrix multiplication using the parallel PDGEMM routine in ScaLAPACK.52 In doing so, we first use the PDGEMR2D routine to redistribute the density kernel from columnwise distribution to two-dimensional block-cyclic distribution, as required by PDGEMM.

The degree of the Chebyshev polynomial npl required for a given accuracy is dependent on the smearing σ, spectral width of the subspace Hamiltonian Hs (i.e., 2ξ), and the relative location of the Fermi level μ within the spectrum of Hs.29 In particular, the value of npl required decreases with increasing temperature and decreasing spectral width, a consequence of the increased smoothness of the Fermi–Dirac function and smaller interval over which it must be evaluated. Since the target application for the SQ3 method is systems at high temperatures, and the spectral width of Hs is only slightly more than that of the occupied spectrum, low polynomial orders generally suffice so that the SQ3 method can be highly efficient, as we demonstrate below. Furthermore, since computations at each real-space grid point are largely independent, the SQ3 method can attain excellent parallel scaling, as also demonstrated below. It is worth noting that since Hs is dense, the computational cost of the SQ3 method as described above has O(Ns3) scaling, similar to that for eigendecomposition, but with a smaller prefactor that decreases with increasing temperature. Indeed, it is possible to achieve O(Ns) scaling, as in the SQ method, if the auxiliary orbitals are such that Hs is sparse and truncation is adopted based on the decay of the density kernel.

In this section, we verify the accuracy and efficiency of the SQ3 method for the calculation of the electronic free energy, Hellmann–Feynman atomic forces, and Hellmann–Feynman stress tensor by comparison to conventional diagonalization based methods. For this purpose, we have implemented the SQ3 method in the SPARC electronic structure code.34–36 We consider aluminum at density 2.7 g/cc with temperatures ranging from T = 10000 to 250000 K. We demonstrate the practical utility of the method by calculating the self-diffusion coefficient and viscosity of aluminum at temperature T = 116 045 K (kBT = 10 eV). In all simulations, we employ a twelfth-order accurate real-space finite-difference discretization, the local density approximation2 for exchange–correlation interactions, a three-electron optimized norm conserving Vanderbilt (ONCV) pseudopotential53 suitable for the range of temperatures considered, Γ-point Brillouin sampling, the restarted Periodic Pulay method54,55 for acceleration of the SCF iteration, and the alternating Anderson–Richardson (AAR) linear solver56,57 for calculation of the electrostatic potential and application of the real-space Kerker preconditioner.58 

We first verify the accuracy and convergence of the SQ3 formulation and implementation by considering a 24-atom aluminum cell at four different temperatures: T = 10 000, 50 000, 100 000, and 250 000 K. In each system, all atoms are randomly perturbed by up to 10% of the nearest neighbor distance in an FCC configuration. We choose a mesh size of 0.5 bohr to put associated discretization errors within chemical accuracy. In Fig. 1, we plot the convergence of the electronic ground state free energy, Hellmann–Feynman atomic forces, and Hellmann–Feynman stress tensor with respect to npl, i.e., degree of polynomial used in the Chebyshev polynomial expansions for the density kernel, band structure energy, and electronic entropy energy. The error is defined to be the difference from standard diagonalization results in the SPARC code. The results show exponential convergence of the energy, atomic forces, and stress tensor with respect to npl, with degree O(1030) sufficient at the temperatures considered to attain accuracies typical in practice. The results also show that the polynomial degree required decreases with increasing temperature as expected.

FIG. 1.

Convergence of the (a) electronic free energy, (b) Hellmann–Feynman atomic forces, and (c) Hellmann–Feynman stress tensor with respect to the polynomial degree npl used in the Chebyshev polynomial expansion. The error is defined to be the difference from the corresponding results obtained by diagonalization in SPARC.

FIG. 1.

Convergence of the (a) electronic free energy, (b) Hellmann–Feynman atomic forces, and (c) Hellmann–Feynman stress tensor with respect to the polynomial degree npl used in the Chebyshev polynomial expansion. The error is defined to be the difference from the corresponding results obtained by diagonalization in SPARC.

Close modal

We next investigate the efficiency of SQ3 relative to diagonalization for Kohn–Sham calculations at high temperatures. For this purpose, we choose four FCC aluminum systems: (i) 1568-atom cell at T = 10000 K, (ii) 500-atom cell at T = 50000 K, (iii) 256-atom cell at T = 100000 K, and (iv) 64-atom cell at T = 250000 K, all with Ns = 10000 and atoms randomly perturbed by up to 10% of the nearest neighbor distance in the FCC configuration. The system sizes have been chosen to be comparable to those typical in QMD simulations at the associated temperatures. In all cases, we select a mesh size of 0.75 bohr to put associated discretization errors within chemical accuracy, i.e., 0.001 Ha/atom, 0.001 Ha/bohr, and 1% in the energy, force, and stress, respectively. In addition, we select npl = 33, 12, 10, and 8 in SQ3 for the Al1568, Al500, Al256, and Al64 systems, respectively, to reduce npl related errors to chemical accuracy also (Fig. 1). All simulations have been performed on the phoenix cluster at the Georgia Institute of Technology.59 

In Table I, we compare the strong scaling performance of SQ3 and CheFSI based diagonalization methods.47,48 In particular, we consider numbers of processors ranging from 24 to 1000 and report the time taken per QMD step, along with the time taken for subspace diagonalization using ELPA60—a recently developed library that has been found more efficient and scalable than other widely used libraries, such as ScaLAPACK,52 in the electronic structure context—and density kernel calculation in the diagonalization and SQ3 methods, respectively. Indeed, these are the main computational kernels that are distinct between the two methods. We observe from the results that the SQ3 method scales up to O(1,000) processors, with further reduction in the wall time possible for the density kernel calculation when more processors are utilized. The density kernel calculation does not scale ideally, a consequence of the reduced effectiveness of the BLAS3 operations as the number of columns of the density kernel associated with each processor becomes smaller. As expected, the speedup provided by the SQ3 method over diagonalization increases with increasing temperature, as the value of npl required becomes smaller. In particular, while the minimum times to solution for diagonalization and SQ3 are similar for Al1568 (10000 K), SQ3 achieves an overall speedup of 1.3, 1.5, and 2.1 for Al500, Al256, and Al64, respectively, with corresponding speedups of the density kernel calculation over subspace diagonalization of 2.5, 2.7, and 3.5, respectively.

TABLE I.

Strong scaling comparison of CheFSI diagonalization and SQ3 methods. The timings are in seconds and correspond to the wall time for a single QMD step, with six SCF iterations required for convergence in all instances. The numbers in parentheses for diagonalization represent the wall times for diagonalization of the subspace Hamiltonian, performed using ELPA.60 The numbers in parentheses for SQ3 represent the wall times for the density kernel calculation. The dimensions of the real-space Hamiltonians for the Al1568, Al500, Al256, and Al64 systems are 413 362, 132 651, 68 921, and 16 400, respectively. The dimension of the subspace Hamiltonian in each case is 10 000.

Al1568, T = 10 000 KAl500, T = 50 000 KAl256, T = 100 000 KAl64, T = 250 000 K
npDiagSQ3DiagSQ3DiagSQ3DiagSQ3
24 1757.2 (84.8) 1970.1 (297.7) 580.5 (84.8) 598.7 (103.1) 383.5 (84.8) 382.1 (83.4) 173.6 (84.8) 151.4 (62.6) 
48 844.0 (49.8) 964.7 (170.5) 345.4 (49.8) 355.6 (60) 218.0 (49.8) 220.6 (52.4) 104.9 (49.8) 93.4 (38.3) 
96 518.6 (31.7) 568.6 (81.8) 192.7 (31.7) 188.0 (27.1) 124.7 (31.7) 116.5 (23.5) 64.1 (31.7) 51.1 (18.8) 
192 307.4 (27.0) 332.9 (52.6) 118.1 (27.0) 110.0 (18.9) 79.5 (27.0) 68.1 (15.6) 48.4 (27.0) 33.6 (12.3) 
384 224.2 (27.0) 236.0 (38.9) 81.6 (27.0) 68.0 (13.5) 59.7 (27.0) 44.8 (12.1) 40.2 (27.0) 22.5 (9.3) 
500 178.2 (27.0) 187.0 (35.8) 73.3 (27.0) 58.7 (12.4) 54.8 (27.0) 38.9 (11.1) 38.9 (27.0) 20.6 (8.8) 
768 170.7 (27.0) 176.2 (32.6) 67.1 (27.0) 51.5 (11.4) 52.5 (27.0) 35.8 (10.4) 38.8 (27.0) 20.3 (8.6) 
1000 173.4 (27.0) 177.2 (30.8) 61.8 (27.0) 45.5 (10.8) 49.7 (27.0) 32.6 (9.9) 37.5 (27.0) 18.2 (7.8) 
Al1568, T = 10 000 KAl500, T = 50 000 KAl256, T = 100 000 KAl64, T = 250 000 K
npDiagSQ3DiagSQ3DiagSQ3DiagSQ3
24 1757.2 (84.8) 1970.1 (297.7) 580.5 (84.8) 598.7 (103.1) 383.5 (84.8) 382.1 (83.4) 173.6 (84.8) 151.4 (62.6) 
48 844.0 (49.8) 964.7 (170.5) 345.4 (49.8) 355.6 (60) 218.0 (49.8) 220.6 (52.4) 104.9 (49.8) 93.4 (38.3) 
96 518.6 (31.7) 568.6 (81.8) 192.7 (31.7) 188.0 (27.1) 124.7 (31.7) 116.5 (23.5) 64.1 (31.7) 51.1 (18.8) 
192 307.4 (27.0) 332.9 (52.6) 118.1 (27.0) 110.0 (18.9) 79.5 (27.0) 68.1 (15.6) 48.4 (27.0) 33.6 (12.3) 
384 224.2 (27.0) 236.0 (38.9) 81.6 (27.0) 68.0 (13.5) 59.7 (27.0) 44.8 (12.1) 40.2 (27.0) 22.5 (9.3) 
500 178.2 (27.0) 187.0 (35.8) 73.3 (27.0) 58.7 (12.4) 54.8 (27.0) 38.9 (11.1) 38.9 (27.0) 20.6 (8.8) 
768 170.7 (27.0) 176.2 (32.6) 67.1 (27.0) 51.5 (11.4) 52.5 (27.0) 35.8 (10.4) 38.8 (27.0) 20.3 (8.6) 
1000 173.4 (27.0) 177.2 (30.8) 61.8 (27.0) 45.5 (10.8) 49.7 (27.0) 32.6 (9.9) 37.5 (27.0) 18.2 (7.8) 

Overall, we find that the SQ3 method is able to delay the cubic scaling bottleneck for Kohn–Sham calculations at high temperatures, with increasing advantages as the temperature and/or number of processors is increased. Implementation of the SQ3 method on graphics processing units (GPUs) is likely to bring down the minimum time to solution substantially, which would then merit comparison with efficient eigensolver implementations on GPUs, such as that found in the cuSOLVER library.61 We note that we have verified that alternate forms of parallelization, such as block-cyclic decomposition of the Ĥs and TjĤs matrices—matrix–matrix multiplication routine replacing the iteration in Eq. (37), as in conventional FOE methods—are not as efficient as the approach adopted here (1.3× slower). It is also worth noting that since the SQ3 method does not employ truncation and has cubic scaling with Ns, the linear scaling SQ method29–31 would become the method of choice at temperatures higher than those considered here.

We now calculate the self-diffusion coefficient and viscosity of aluminum at density 2.7 g/cc and temperature 116045 K (kBT = 10 eV). Specifically, we consider a 108-atom unit cell in the isokinetic ensemble using a Gaussian thermostat.62 In order to efficiently obtain sufficient statistics, we average over QMD simulations corresponding to 20 different initial conditions for the atomic positions and velocities—obtained by performing 20 orbital-free DFT40,42 simulations, each run for 40000 steps with a time step of 0.15 fs—where each simulation has been run for more than 24 000 steps with a time step of 0.15 fs. In Fig. 2, we present the variation in the total free energy, i.e., including the electronic entropy, during the QMD simulations. It is clear that the energy is well conserved, consistent with the accuracy and systematic convergence of the SQ3 force formulation/implementation.

FIG. 2.

Variation in the total free energy during the QMD simulation for aluminum at density 2.7 g/cc and temperature 116 045 K.

FIG. 2.

Variation in the total free energy during the QMD simulation for aluminum at density 2.7 g/cc and temperature 116 045 K.

Close modal

Since macroscopic dynamical properties can be written as time integrals of associated microscopic time correlation functions using Green–Kubo (GK) relations,63 we calculate the self-diffusion coefficient D and viscosity η using the expressions

D(t)=13Ni=1N0tvi(τ)vi(0)dτ,
(43)
η(t)=|Ω|kBT0t15i=15si(τ)si(0)dτ,
(44)

where ⟨·⟩ denotes the ensemble average, vi is the velocity, and si are the independent components of the deviatoric (i.e., traceless) stress tensor: σ12, σ23, σ31, (σ11σ22)/2, and (σ22σ33)/2.64 We present the results so obtained in Fig. 3. It is clear that the both the velocity autocorrelation function (VACF) [i.e., 1Ni=1Nvi(τ)vi(0)] and stress autocorrelation function (SACF) [i.e., 15i=15si(τ)si(0)] decay in 100 fs, yielding a self-diffusion coefficient and viscosity of (0.984 ± 0.001) ×10−2 cm2/s and 1.943 ± 0.015 mPa s, respectively. The present full Kohn–Sham DFT result for the self-diffusion coefficient is consistent with recent orbital-free DFT calculations,65 where a value of 0.93 ×10−2 cm2/s was reported, while being free of the kinetic and entropic energy approximations inherent in orbital-free DFT.

FIG. 3.

(a) Velocity autocorrelation function (VACF) and self-diffusion coefficient and (b) stress autocorrelation function (SACF) and viscosity for aluminum at density 2.7 g/cc and temperature 116 045 K.

FIG. 3.

(a) Velocity autocorrelation function (VACF) and self-diffusion coefficient and (b) stress autocorrelation function (SACF) and viscosity for aluminum at density 2.7 g/cc and temperature 116 045 K.

Close modal

We presented the SQ3 method, a density matrix based method for Kohn–Sham calculations at high temperatures that eliminates the need for diagonalization, thus reducing the cost of such calculations significantly relative to conventional diagonalization based approaches. We developed real-space expressions for the electron density, electronic free energy, Hellmann–Feynman forces, and Hellmann–Feynman stress tensor in terms of an orthonormal auxiliary orbital basis and its density kernel transform. Using Chebyshev filtering to generate the auxiliary basis, we then developed an approach akin to Clenshaw–Curtis spectral quadrature to compute the individual columns of the density kernel based on the Fermi operator expansion in Chebyshev polynomials and employed a similar approach to evaluate band structure and entropic energy components. Upon implementation of the method in the SPARC electronic structure code,34–36 we found systematic convergence to exact diagonalization results and significant speedups relative to conventional diagonalization based methods of up to 2×, with increasing advantages as the temperature and/or the number of processors is increased. Finally, we employed the new method to compute the self-diffusion coefficient and viscosity of aluminum at 116 045 K from Kohn–Sham quantum molecular dynamics, where we found agreement with previous more approximate orbital-free density functional methods.

The combination of conventional diagonalization based methods at low temperatures, SQ3 at moderately high temperatures [e.g., O(10000)–O(250000) K], and SQ at higher temperatures enables accurate and efficient Kohn–Sham calculations over the full range of temperatures from ambient to millions of kelvin. The use of a localized orthonormal basis as in the discrete discontinuous basis projection (DDBP) method,66 in combination with a GPU implementation, is likely to further increase the efficiency of such calculations, making it a worthy subject for future research.

This work was performed, in part, under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract No. DE-AC52-07NA27344. Q.X., X.J., B.Z., and P.S. gratefully acknowledge support from the U.S. Department of Energy, Office of Science, under Grant No. DE-SC0019410. J.E.P. gratefully acknowledges support from the ASC/PEM program at LLNL. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Department of Energy, or the U.S. Government.

The authors have no conflicts to disclose.

The data that support the findings of this study are available from the corresponding author upon reasonable request.

1.
P.
Hohenberg
and
W.
Kohn
,
Phys. Rev.
136
,
B864
(
1964
).
2.
W.
Kohn
and
L. J.
Sham
,
Phys. Rev.
140
,
A1133
(
1965
).
3.
S.
Goedecker
,
Rev. Mod. Phys.
71
,
1085
(
1999
).
4.
D. R.
Bowler
and
T.
Miyazaki
,
Rep. Prog. Phys.
75
,
036503
(
2012
).
5.
J.
Aarons
,
M.
Sarwar
,
D.
Thompsett
, and
C.-K.
Skylaris
,
J. Chem. Phys.
145
,
220901
(
2016
).
6.
D.
Marx
and
J.
Hutter
,
Ab Initio Molecular Dynamics: Basic Theory and Advanced Methods
(
Cambridge University Press
,
2009
).
7.
G.
Kresse
and
J.
Hafner
,
Phys. Rev. B
47
,
558
(
1993
).
8.
J. M.
Soler
,
E.
Artacho
,
J. D.
Gale
,
A.
García
,
J.
Junquera
,
P.
Ordejón
, and
D.
Sánchez-Portal
,
J. Phys.: Condes. Matter
14
,
2745
(
2002
).
9.
M. J.
Gillan
,
D. R.
Bowler
,
A. S.
Torralba
, and
T.
Miyazaki
,
Comput. Phys. Commun.
177
,
14
(
2007
).
10.
C. K.
Skylaris
,
P. D.
Haynes
,
A. A.
Mostofi
, and
M. C.
Payne
,
J. Chem. Phys.
122
,
084119
(
2005
).
11.
E.
Tsuchida
,
J. Phys. Soc. Jpn.
76
,
034708
(
2007
).
12.
D.
Osei-Kuffuor
and
J.-L.
Fattebert
,
Phys. Rev. Lett.
112
,
046401
(
2014
).
13.
S.
Mohr
,
L. E.
Ratcliff
,
P.
Boulanger
,
L.
Genovese
,
D.
Caliste
,
T.
Deutsch
, and
S.
Goedecker
,
J. Chem. Phys.
140
,
204110
(
2014
).
14.
N.
Bock
,
M.
Challacombe
,
C. K.
Gan
,
G.
Henkelman
,
K.
Nemeth
,
A. M. N.
Niklasson
,
A.
Odell
,
E.
Schwegler
,
C. J.
Tymczak
, and
V.
Weber
, freeON, Los Alamos National Laboratory (LA-CC 01-2; LA-CC-04-086),
Copyright University of California
,
2014
.
15.
A.
Ruiz-Serrano
,
N. D. M.
Hine
, and
C.-K.
Skylaris
,
J. Chem. Phys.
136
,
234101
(
2012
).
16.
Frontiers and Challenges in Warm Dense Matter
, Lecture Notes in Computational Science and Engineering, edited by
F.
Graziani
,
M. P.
Desjarlais
,
R.
Redmer
, and
S. B.
Tricky
(
Springer
,
2014
).
17.
F. R.
Graziani
,
V. S.
Batista
,
L. X.
Benedict
,
J. I.
Castor
,
H.
Chen
,
S. N.
Chen
,
C. A.
Fichtl
,
J. N.
Glosli
,
P. E.
Grabowski
,
A. T.
Graf
 et al,
High Energy Density Phys.
8
,
105
(
2012
).
18.
P.
Renaudin
,
C.
Blancard
,
J.
Clérouin
,
G.
Faussurier
,
P.
Noiret
, and
V.
Recoules
,
Phys. Rev. Lett.
91
,
075002
(
2003
).
19.
M.
Dharma-Wardana
,
Phys. Rev. E
73
,
036401
(
2006
).
20.
R.
Ernstorfer
,
M.
Harb
,
C. T.
Hebeisen
,
G.
Sciaini
,
T.
Dartigalongue
, and
R. J. D.
Miller
,
Science
323
,
1033
(
2009
).
21.
T. G.
White
,
S.
Richardson
,
B. J. B.
Crowley
,
L. K.
Pattison
,
J. W. O.
Harris
, and
G.
Gregori
,
Phys. Rev. Lett.
111
,
175002
(
2013
).
22.
F.
Lambert
,
J.
Clerouin
, and
G.
Zerah
,
Phys. Rev. E
73
,
016403
(
2006
).
23.
S.
Zhang
,
H.
Wang
,
W.
Kang
,
P.
Zhang
, and
X. T.
He
,
Phys. Plasmas
23
,
042707
(
2016
).
24.
A.
Blanchet
,
J.
Clérouin
,
M.
Torrent
, and
F.
Soubiran
,
Comput. Phys. Commun.
271
,
108215
(
2021
).
25.
A.
Cangi
and
A.
Pribram-Jones
,
Phys. Rev. B
92
,
161113
(
2015
).
26.
Y.
Cytter
,
E.
Rabani
,
D.
Neuhauser
, and
R.
Baer
,
Phys. Rev. B
97
,
115207
(
2018
).
27.
R.
Baer
,
D.
Neuhauser
, and
E.
Rabani
,
Phys. Rev. Lett.
111
,
106402
(
2013
).
28.
A. J.
White
and
L. A.
Collins
,
Phys. Rev. Lett.
125
,
055002
(
2020
).
29.
P.
Suryanarayana
,
Chem. Phys. Lett.
584
,
182
(
2013
).
30.
P. P.
Pratapa
,
P.
Suryanarayana
, and
J. E.
Pask
,
Comput. Phys. Commun.
200
,
96
(
2016
).
31.
P.
Suryanarayana
,
P. P.
Pratapa
,
A.
Sharma
, and
J. E.
Pask
,
Comput. Phys. Commun.
224
,
288
(
2018
).
32.
C. J.
Wu
,
P. C.
Myint
,
J. E.
Pask
,
C. J.
Prisbrey
,
A. A.
Correa
,
P.
Suryanarayana
, and
J. B.
Varley
,
J. Phys. Chem. A
125
,
1610
(
2021
).
33.
S.
Zhang
,
A.
Lazicki
,
B.
Militzer
,
L. H.
Yang
,
K.
Caspersen
,
J. A.
Gaffney
,
M. W.
Däne
,
J. E.
Pask
,
W. R.
Johnson
,
A.
Sharma
 et al,
Phys. Rev. B
99
,
165103
(
2019
).
34.
Q.
Xu
,
A.
Sharma
,
B.
Comer
,
H.
Huang
,
E.
Chow
,
A. J.
Medford
,
J. E.
Pask
, and
P.
Suryanarayana
,
SoftwareX
15
,
100709
(
2021
).
35.
S.
Ghosh
and
P.
Suryanarayana
,
Comput. Phys. Commun.
216
,
109
(
2017
).
36.
S.
Ghosh
and
P.
Suryanarayana
,
Comput. Phys. Commun.
212
,
189
(
2017
).
37.
J. E.
Pask
and
P. A.
Sterne
,
Phys. Rev. B
71
,
113101
(
2005
).
38.
P.
Suryanarayana
,
V.
Gavini
,
T.
Blesgen
,
K.
Bhattacharya
, and
M.
Ortiz
,
J. Mech. Phys. Solids
58
,
256
(
2010
).
39.
P.
Suryanarayana
,
K.
Bhattacharya
, and
M.
Ortiz
,
J. Comput. Phys.
230
,
5226
(
2011
).
40.
S.
Ghosh
and
P.
Suryanarayana
,
J. Comput. Phys.
307
,
634
(
2016
).
41.
L.
Kleinman
and
D. M.
Bylander
,
Phys. Rev. Lett.
48
,
1425
(
1982
).
42.
P.
Suryanarayana
and
D.
Phanish
,
J. Comput. Phys.
275
,
524
(
2014
).
43.
Q.
Xu
, M.S. thesis,
Georgia Institute of Technology
,
2020
.
44.
A.
Sharma
,
S.
Hamel
,
M.
Bethkenhagen
,
J. E.
Pask
, and
P.
Suryanarayana
,
J. Chem. Phys.
153
,
034112
(
2020
).
45.
A.
Sharma
and
P.
Suryanarayana
,
J. Chem. Phys.
149
,
194104
(
2018
).
46.
R.
Martin
,
Electronic Structure: Basic Theory and Practical Methods
(
Cambridge University Press
,
2004
).
47.
Y.
Zhou
,
Y.
Saad
,
M. L.
Tiago
, and
J. R.
Chelikowsky
,
J. Comput. Phys.
219
,
172
(
2006
).
48.
Y.
Zhou
,
Y.
Saad
,
M. L.
Tiago
, and
J. R.
Chelikowsky
,
Phys. Rev. E
74
,
066704
(
2006
).
49.
S.
Goedecker
and
L.
Colombo
,
Phys. Rev. Lett.
73
,
122
(
1994
).
50.
S.
Goedecker
and
M.
Teter
,
Phys. Rev. B
51
,
9455
(
1995
).
51.
Y.
Zhou
,
J. R.
Chelikowsky
, and
Y.
Saad
,
J. Comput. Phys.
274
,
770
(
2014
).
52.
L. S.
Blackford
,
J.
Choi
,
A.
Cleary
,
E.
D’Azevedo
,
J.
Demmel
,
I.
Dhillon
,
J.
Dongarra
,
S.
Hammarling
,
G.
Henry
,
A.
Petitet
 et al,
ScaLAPACK Users’ Guide
(
SIAM
,
1997
).
53.
D. R.
Hamann
,
Phys. Rev. B
88
,
085117
(
2013
).
54.
A. S.
Banerjee
,
P.
Suryanarayana
, and
J. E.
Pask
,
Chem. Phys. Lett.
647
,
31
(
2016
).
55.
P. P.
Pratapa
and
P.
Suryanarayana
,
Chem. Phys. Lett.
635
,
69
(
2015
).
56.
P. P.
Pratapa
,
P.
Suryanarayana
, and
J. E.
Pask
,
J. Comput. Phys.
306
,
43
(
2016
).
57.
P.
Suryanarayana
,
P. P.
Pratapa
, and
J. E.
Pask
,
Comput. Phys. Commun.
234
,
278
(
2019
).
58.
S.
Kumar
,
Q.
Xu
, and
P.
Suryanarayana
,
Chem. Phys. Lett.
739
,
136983
(
2020
).
59.
See https://docs.pace.gatech.edu/phoenix_cluster/gettingstarted_phnx/ for Phoenix cluster at Georgia Technology; accessed 2021-12-14.
60.
A.
Marek
,
V.
Blum
,
R.
Johanni
,
V.
Havu
,
B.
Lang
,
T.
Auckenthaler
,
A.
Heinecke
,
H.-J.
Bungartz
, and
H.
Lederer
,
J. Phys.: Condens. Matter
26
,
213201
(
2014
).
61.
See https://docs.nvidia.com/cuda/cusolver/index.html for NVIDIA corporation, cuSOLVER; accessed 2022-01-25.
62.
P.
Minary
,
G. J.
Martyna
, and
M. E.
Tuckerman
,
J. Chem. Phys.
118
,
2510
(
2003
).
63.
J.
Hansen
and
I.
McDonald
,
Theory of Simple Liquids: With Applications to Soft Matter
(
Elsevier Science
,
2013
).
64.
D.
Alfè
and
M. J.
Gillan
,
Phys. Rev. Lett.
81
,
5161
(
1998
).
65.
T.
Sjostrom
and
J.
Daligault
,
Phys. Rev. E
92
,
063304
(
2015
).
66.
Q.
Xu
,
P.
Suryanarayana
, and
J. E.
Pask
,
J. Chem. Phys.
149
,
094104
(
2018
).