Graph-based linear scaling electronic structure theory for quantum-mechanical molecular dynamics simulations [A. M. N. Niklasson et al., J. Chem. Phys. 144, 234101 (2016)] is adapted to the most recent shadow potential formulations of extended Lagrangian Born–Oppenheimer molecular dynamics, including fractional molecular-orbital occupation numbers [A. M. N. Niklasson, J. Chem. Phys. 152, 104103 (2020) and A. M. N. Niklasson, Eur. Phys. J. B 94, 164 (2021)], which enables stable simulations of sensitive complex chemical systems with unsteady charge solutions. The proposed formulation includes a preconditioned Krylov subspace approximation for the integration of the extended electronic degrees of freedom, which requires quantum response calculations for electronic states with fractional occupation numbers. For the response calculations, we introduce a graph-based canonical quantum perturbation theory that can be performed with the same natural parallelism and linear scaling complexity as the graph-based electronic structure calculations for the unperturbed ground state. The proposed techniques are particularly well-suited for semi-empirical electronic structure theory, and the methods are demonstrated using self-consistent charge density-functional tight-binding theory both for the acceleration of self-consistent field calculations and for quantum-mechanical molecular dynamics simulations. Graph-based techniques combined with the semi-empirical theory enable stable simulations of large, complex chemical systems, including tens-of-thousands of atoms.

The immense promise of linear scaling electronic structure theory1–3 to study large-scale atomistic systems directly from the first principles of quantum mechanics is exceptionally difficult to realize, in practice. There are a number of obstacles and problems. These challenges include the following: (i) a high computational pre-factor where the linear scaling benefit appears only for very large systems that, in practice, are often beyond acceptable time limits or available computer resources—a challenge accentuated by highly competitive cubically scaling electronic structure solvers that have an impressive performance, especially on hybrid architectures;4–11 (ii) a reduced accuracy that is often difficult, if not impossible, to control, which may lead to instabilities and poor convergence of the self-consistent ground state solutions; and (iii) computationally expensive overheads that can limit parallel efficiency even for highly distributed calculations. These challenges are, in particular, limiting factors for quantum-mechanical Born–Oppenheimer molecular dynamics (QMD) simulations,12–21 where all these problems coalesce, limiting the system sizes or simulation times that can be studied. Graph-based linear scaling electronic structure theory22 was introduced to overcome some of these obstacles. Graph-based linear scaling QMD is a framework designed for non-metallic systems, and it is particularly efficient for quasi-low-dimensional structures as, for example, soft matter systems, such as solvated biomolecules, where the electronic overlap between the atoms can be represented by a sparse (low-dimensional) graph.22 

Graph-based linear scaling electronic structure theory22 with its subsequent theoretical analysis, applications, and implementations23–26 is a very promising approach to study large atomistic systems and combines the natural parallelism of divide-and-conquer-like methods27–36 with the well-controlled and tunable accuracy of a numerically thresholded sparse matrix algebra.37–57 It has been recently considered how graph-based linear scaling electronic structure theory could be used to treat even hundreds of millions of atoms.58,59 Nevertheless, some shortcomings still remain. In particular, special care is required for QMD simulations of charge sensitive or reactive chemical systems, e.g., systems where the electronic energy gap between the Lowest Unoccupied Molecular Orbital (LUMO) and the Highest Occupied Molecular Orbital (HOMO) is opening and closing along the molecular trajectories. Electronic degeneracies, i.e., molecular states with equal energy at the chemical potential, and high susceptibilities associated with a low HOMO–LUMO gap can cause costly convergence and charge stability problems and difficulties in calculating accurate conservative forces, which may lead to nonphysical molecular trajectories.

In this article, we will combine graph-based linear scaling electronic structure theory with the most recent shadow potential formulations of extended Lagrangian Born–Oppenheimer molecular dynamics (XL-BOMD), including fractional molecular-orbital occupation numbers.60–62 This shadow Born–Oppenheimer molecular dynamics framework allows for QMD simulations of reactive or low-gap chemical systems that may have unsteady charge solutions,61 which can be very challenging for regular direct Born–Oppenheimer molecular dynamics simulations. To achieve the necessary accuracy and stability in this shadow, Born–Oppenheimer molecular dynamics formalism requires canonical quantum response calculations for electronic states with fractional occupation numbers at finite electronic temperatures. To perform these canonical response calculations, we will introduce a graph-based quantum perturbation theory for systems with fractional occupation numbers. With this improved graph-based framework for QMD simulations, we can then study challenging complex chemical systems using distributed computational platforms.

The prospect to use graph-based QMD to simulate chemically reactive soft matter systems is of particular interest when chemically relevant transitions appear as rare events beyond the times scales that are normally accessible to QMD simulations. Established accelerated molecular dynamics simulation methods63–66 are often difficult, if not impossible, to apply to soft matter systems that often have multiple shallow local minima where rare events are hard to detect or even to define. Nevertheless, by extending the system size, it is possible to systematically enhance the probability of a chemical relevant event and, thus, indirectly, accelerate the rare event dynamics. The natural parallelism, improved stability, and linear-scaling computational cost of our graph-based QMD framework make this possible.

In this paper, we will introduce graph-based quantum response calculations for systems with fractional occupation numbers. Our main motivation is to be able to perform graph-based QMD simulations using the most recent shadow potential formulations of XL-BOMD.61,62 However, canonical quantum perturbation calculations performed on a partitioned graph are a very useful technique by itself. Our graph-based quantum perturbation theory presented in this article, for example, could also be used for distributed, large-scale calculations of time-independent response properties, such as the magnetic susceptibility, phonon modes, electric polarizabilities, the Raman spectra, or the Born-effective charge.67–71 

The theory and the techniques will be presented for Kohn–Sham density functional theory, but our formulation is general and can also be applied to other related effective single-particle methods, such as Hartree–Fock-based theory. However, the relevant system sizes for graph-based electronic structure calculations are often quite large, and each separate subgraph partition may include hundreds of atoms. The graph-based method for QMD simulations is therefore particularly well-suited for semi-empirical electronic structure methods that typically are one to two orders of magnitude faster compared to higher-level first principles methods. The new methodology will be demonstrated using self-consistent-charge density functional tight-binding theory (SCC-DFTB),72–74 but it is also applicable to other semi-empirical methods.75–82 

Our main objectives in this article are the underlying theoretical concepts and techniques for the graph-based electronic structure and response calculations that enable large-scale QMD simulations of challenging complex (non-metallic) chemical systems. The specific performance of our implementation on different platforms or the particular chemical behavior and mechanisms of the simulated examples are not our priorities.

If not stated explicitly, we will assume that all operators are represented in some local atomic-orbital-like basis set. In general, we will assume that these matrix representations have been orthogonalized by a congruence transformation,50,83 and we will use atomic units throughout the text. For the extended electronic degrees of freedom in XL-BOMD, we use the electron density represented either as a continuous function or in a discretized form as a charge vector, which is particularly well-suited for SCC-DFTB theory. Generalizations to other representations of the extended electronic degrees of freedom, for example, using atomic multipoles, the density matrix, electronic wavefunctions, or the effective single-particle Hamiltonian will not be discussed here. The underlying theoretical approaches remain the same, but some significant details in the techniques may need to be adapted to the different representations of the extended electronic degrees of freedom, e.g., see Ref. 84.

This article is outlined as follows. First, we discuss the basic ideas behind general graph-based matrix function expansions and how it applies to graph-based linear scaling electronic structure theory. Then, we present XL-BOMD in its most recent form that uses an approximate shadow Born–Oppenheimer potential energy surface. In this formulation, a kernel that determines the metric tensor of the generalized harmonic oscillator extension appears in the equations of motion for the electronic degrees of freedom, which makes the electronic degrees of freedom oscillate around a closer estimate of the exact ground state density. The action of this kernel in the electronic equations of motion can be updated on-the-fly using a preconditioned low-rank Krylov subspace approximation that is generated from quantum response calculations. To perform these response calculations, we then introduce a time-independent graph-based quantum perturbation theory for systems with fractional occupation numbers. The graph-based canonical quantum response calculations can be partitioned into response calculations over separate subsystems, whose result can be collected for the response of the composite system. This graph-based canonical quantum perturbation theory allows for large-scale response calculations at finite electronic temperatures with a natural parallelism. We then combine the different electronic structure methods into a framework for graph-based shadow Born–Oppenheimer molecular dynamics simulations. We also present how the methodology can be used as an alternative low-complexity approach to accelerate self-consistent field iterations. The theory is then demonstrated for a few testbed examples using SCC-DFTB theory. Finally, we give a brief summary and our conclusions.

Graph-based linear scaling electronic structure theory is a powerful and flexible approach to achieve linear scaling complexity. The procedure combines a well-controlled and tunable accuracy of a thresholded sparse matrix algebra with the natural parallelism and flexibility of a divide-and-conquer-like scheme.22 

In a matrix that is numerically thresholded, all matrix elements with an absolute value below some tunable numerical tolerance, τ, are set to zero. Matrix operations with numerically thresholded matrices can often be performed with great efficiency because only the non-zero elements above the numerical threshold have to be included in calculations. This can often lead to a significant reduction in the computational cost, where sometimes the total number of arithmetic operations only scales linearly (or stays constant) with the matrix size.1,2,85 The numerical thresholding enables a tunable adaptive accuracy that is easy to control—the numerical cut-off tolerance, τ, is simply increased or decreased.49 In spirit, the numerical thresholding is similar to changing the numerical precision of the floating point representation. However, sparse matrix operations can often be problematic when they are performed in parallel on a distributed computer architecture. The necessary data transfer between nodes and a non-local data storage can lead to a large overhead. In contrast, dense matrix operations, such as matrix–matrix multiplications, can often be performed with high parallel efficiency.

A sparse matrix has an associated graph, where all the non-zero matrix elements are represented by edges between vertices of a graph. If we calculate a polynomial function of a matrix, X, in a step-by-step process (generating each higher-order polynomial term by multiplying by X either from the left-hand side or from the right-hand side), we can represent the step-by-step evolution of the matrix elements with absolute values above the threshold, τ, as edges on a graph. This process is schematically illustrated in Fig. 1, where the new XY term is generated by multiplying Y from the left by X before thresholding. Numerical thresholdings are typically performed locally after each matrix–matrix operation. However, here, we perform a global numerical threshold, where we keep all elements, whose absolute values at any point in the polynomial expansion appear above the numerical threshold, τ. This is illustrated with the X15 = X51 matrix elements in Fig. 1, which are kept after the X × Y multiplications even if they were below τ after the last matrix multiplication because they were above τ in a previous step. The graph associated with the globally thresholded matrix polynomial then represents the data-dependency or the data-connectivity graph, Gτ, which determines how data are allowed to flow across the vertices during the polynomial matrix expansion and only the non-zero matrix entries corresponding to Gτ are used during a polynomial expansion.

FIG. 1.

The step-by-step construction of an undirected symmetric connectivity (or data-dependency) graph, Gτ, using a symmetric matrix, X. The graph represents the data dependencies in a globally numerically thresholded sparse matrix function expansion, where all matrix elements are included that at any point have an absolute value above some global numerical threshold, τ. White empty matrix entries correspond to zero values. After the second (X × Y) matrix–matrix multiplication, |X15| = |X51| < τ, but this edge is kept in the graph because it was previously above the threshold. The data-dependency graph, Gτ, depicts the allowed flow of information during the polynomial expansion.

FIG. 1.

The step-by-step construction of an undirected symmetric connectivity (or data-dependency) graph, Gτ, using a symmetric matrix, X. The graph represents the data dependencies in a globally numerically thresholded sparse matrix function expansion, where all matrix elements are included that at any point have an absolute value above some global numerical threshold, τ. White empty matrix entries correspond to zero values. After the second (X × Y) matrix–matrix multiplication, |X15| = |X51| < τ, but this edge is kept in the graph because it was previously above the threshold. The data-dependency graph, Gτ, depicts the allowed flow of information during the polynomial expansion.

Close modal

A globally thresholded matrix polynomial is thus given by the step-by-step polynomial matrix expansion, where we only include matrix elements corresponding to the vertices and edges of the data-dependency graph, Gτ, after each matrix–matrix operation. This procedure, where we use a fixed sparsity pattern determined by the graph Gτ, we refer to as matrix polynomial on a graph—denoted, for example, by fG(X) or f(X)|G.

For the matrix operations on a graph, it is important to remove any fill-in elements that do not belong to the graph after each individual matrix–matrix multiplication in a step-by-step procedure after each matrix multiplication.22 For example, we do not perform the threshold after Y2 = X3, but, instead, already for the intermediate operation Y1 = X × X and then again for Y2 = X × Y1 (here with a left-hand side multiplication), as illustrated in Fig. 1. Only with this incremental step-by-step procedure can we cover all possible data dependencies that may appear in the polynomial matrix expansion. The example in Fig. 1 is for a symmetric matrix polynomial and a symmetric data-connectivity graph. Generalizations to matrix operations on non-symmetric directed data–dependency graphs are straightforward (see the supplementary material).

FIG. 2.

The connectivity graph Gτ (here undirected and symmetric) represents the data dependencies in the globally numerically thresholded sparse matrix function expansion, fGτ(X)=kckTk(X)Gτ, with a threshold τ (see Fig. 1). The partitioned subgraphs of the different core vertices (e.g., partial subsets 1, 6, and 7) define the principal submatrices (e.g., x(1), x(6), and x(7)) that are extracted from X (here symmetric). White empty matrix entries correspond to zero matrix values. Separate dense matrix function expansions are then performed over these submatrices [e.g., the partial subsets f(x(1)), f(x(6)), and f(x(7))]. The results of the separate matrix function expansions are then to be collected column by column (or row by row) from their corresponding core-column parts {fc(x(i))}, which gives the equivalent result to the globally thresholded matrix function expansion, i.e., fGτ(X)=fc(x(i))collect, where f(X) is expanded on Gτ.22 The core-column parts are, for example, column 1 for f(x(1)), column 3 for f(x(6)), and column 1 for f(x(7)). The partially collected matrix is illustrated in the lower right, indicating in gray the remaining elements to be obtained from other subgraphs. Generalizations to sets of non-symmetric and non-commuting matrices using directed data-dependency graphs, Gτ, are straightforward (see the pseudo-code in the supplementary material).

FIG. 2.

The connectivity graph Gτ (here undirected and symmetric) represents the data dependencies in the globally numerically thresholded sparse matrix function expansion, fGτ(X)=kckTk(X)Gτ, with a threshold τ (see Fig. 1). The partitioned subgraphs of the different core vertices (e.g., partial subsets 1, 6, and 7) define the principal submatrices (e.g., x(1), x(6), and x(7)) that are extracted from X (here symmetric). White empty matrix entries correspond to zero matrix values. Separate dense matrix function expansions are then performed over these submatrices [e.g., the partial subsets f(x(1)), f(x(6)), and f(x(7))]. The results of the separate matrix function expansions are then to be collected column by column (or row by row) from their corresponding core-column parts {fc(x(i))}, which gives the equivalent result to the globally thresholded matrix function expansion, i.e., fGτ(X)=fc(x(i))collect, where f(X) is expanded on Gτ.22 The core-column parts are, for example, column 1 for f(x(1)), column 3 for f(x(6)), and column 1 for f(x(7)). The partially collected matrix is illustrated in the lower right, indicating in gray the remaining elements to be obtained from other subgraphs. Generalizations to sets of non-symmetric and non-commuting matrices using directed data-dependency graphs, Gτ, are straightforward (see the pseudo-code in the supplementary material).

Close modal

Instead of identifying the data-dependency graph from a global numerical threshold of a matrix polynomial, we can, alternatively, simply choose any data–connectivity graph, G, that then limits the allowed data flow during a polynomial matrix expansion on this graph. A matrix polynomial calculated on this graph is given by the same incremental step-by-step expansion, where only matrix elements corresponding to the graph G are kept after each increase in the polynomial order. Examples are given in the supplementary material. The advantage of using a graph determined by a global numerical threshold is that we can then easily control and tune the accuracy of the matrix function by increasing or decreasing the threshold, τ. This is not possible when we chose the graph in some arbitrary way.

A graph G can be partitioned into subgraphs consisting of non-overlapping core (c) parts and overlapping halos (h) (see Fig. 2). For example, we can choose one subgraph for each separate vertex of G, including all its edges and its directly connected neighboring vertices. This partitioning forms a set of overlapping subgraphs. The non-overlapping vertex of a graph is its core part, and the overlapping shared edges and vertices belong to the halo.

There is an important and useful one-to-one relation between a sparse matrix function expansion on a graph G, which may have been determined from a global numerical threshold or just chosen arbitrarily, and a collection of separate matrix function expansions using a dense matrix algebra over a set of smaller principal submatrices, where the submatrices are determined by the overlapping subgraphs from the partitioning of the data-dependency graph, G. This observation forms the basis of graph-based linear scaling electronic structure theory.22 The equivalence enables a natural parallelism in electronic structure calculations, similar to a divide-and-conquer approach, but with the tunable accuracy of a numerically thresholded sparse matrix algebra. After the initial formulation of graph-based linear scaling electronic structure theory, a more rigorous mathematical analysis was subsequently given in Refs. 23 and 25.

In graph-based linear scaling electronic structure theory, a matrix function calculated on a graph,
(1)
i.e., where the flow of data is limited by some graph, G, is equivalent to the concatenated results of the core-centered columns of the same matrix polynomial performed over a set of separate dense principal submatrices, {x(i)}, such that
(2)
Here, the curly brackets, {·}collect, denote the operation of collecting the columns (or rows) of the submatrix function corresponding to matrix elements connecting to the core parts of the subgraphs, fc(x(i)), into the composite matrix function, fG(X), calculated on G (see Fig. 2). Each core column includes matrix elements of the core and its overlap with the halo. Tk(X) in Eq. (1) are assumed to be basis polynomials of various orders with coefficients ak.

Note that the data-connectivity graph G does not have to be determined by the global numerical threshold of f(X). Any sparse graph can be chosen. The relation in Eq. (2) still holds (see the supplementary material). This freedom provides a lot of flexibility in how we can control the accuracy and improve the computational efficiency. In particular, we can always add edges to the graph without losing accuracy.

The relation between the matrix function on the graph and the collected submatrix polynomials is illustrated in Fig. 2. The partitioning of the data-dependency graph, Gτ, into subgraphs determines the principal submatrices, x(i). Each subgraph and the corresponding submatrix have a core and a halo (h). For example, in subgraph 1 in Fig. 2, vertex 1 (yellow) is in the core and vertices 2 and 3 (orange) are in the halo. Here, each core is chosen to be a single vertex (corresponding to a diagonal matrix element) and only the subgraphs for vertices 1, 6, and 7 are shown out of a total of nine subgraphs. The principal submatrices, x(i), may have matrix elements that are not present in the connectivity graph corresponding to matrix elements, Xij, that are initially 0 (empty white matrix entries), but which become finite at some point during the matrix function expansion. An example is the edge between nodes 2 and 3 in subgraph 1 in Fig. 2. The separate results of the matrix elements of the core columns of the dense submatrix function expansions are then extracted and put together column by column (or row by row) to assemble the matrix function expansion of the full sparse matrix that otherwise would be calculated on the data connectivity graph, Gτ, or with the corresponding global numerical threshold.22,24,26 This is the “collect” operation, as indicated by the subscript on the very right-hand side of Eq. (2) and in the lower right corner of Fig. 2.

The one-to-one relation between a matrix function on a graph and its collected submatrix functions also holds for a polynomial matrix expansion using a variety of non-commuting and non-symmetric matrices, including directed (non-symmetric) data-dependency graphs, G. In these more general cases, some special care is required to account for the allowed data flow22 (see the supplementary material). For a matrix expansion of non-commuting or non-symmetric matrices on a general directed data-dependency graph, the principal submatrices are extracted either by columns or by rows. The higher-order polynomial terms on the graph are then built by multiplying matrices either from the left-hand side (for the column case) or from the right-hand side (for the row case) in the incremental step-by-step polynomial expansion. Only matrix elements on the graph are kept after each matrix operation. This matrix polynomial on the graph is then equivalent to the collected set of the submatrix polynomials that are collected either by columns or by rows from their core parts. Examples that illustrate the separate column- and row-wise versions for the non-commuting or non-symmetric polynomial expansions on a graph are provided in the supplementary material.

In graph-based linear scaling electronic structure calculations, the matrix function, fG(X), can be a Fermi-operator expansion, which generates the effective single-particle density matrix that determines the charge density [see Eq. (25)], or the inverse factorization of the overlap matrix,22,24,26,86 which is used to transform the generalized eigenvalue problem into a regular orthonormalized form. These operations can all be expressed as polynomial functions of the Kohn–Sham Hamiltonian or the overlap matrix.50,86 For example, when we calculate the effective single-particle density matrix, D, from the Kohn–Sham Hamiltonian matrix, H, we can do so with a Fermi operator expansion on a graph. In this case,
(3)
(4)
where I is the identity matrix; {ak} are some set of coefficients; and {Tk(H)} is a set of polynomials, e.g., Chebyshev polynomials of various orders; β = 1/(kBT) is the inverse electronic temperature; and μ is the chemical potential. We have here assumed that H is in an orthogonalized matrix representation,2,50 but generalizations to non-orthogonal expansions are straightforward.9,22,87–89
In a polynomial Fermi-operator expansion on the data-dependency graph, Gτ, all matrix elements outside of Gτ are removed in each incremental step-by-step calculation of the polynomials,22 in the same way as for a global numerical thresholding illustrated in Fig. 1. However, the dense matrix function expansions of the principal submatrices, fc(x(i)) in Eq. (2), defined by the subgraph partitioning of Gτ, are agnostic to how the functions are calculated because no graph or numerical threshold is involved. The Fermi operator expansion of the principal submatrices, h(i), extracted from H that are determined by the graph partitioning of Gτ, can therefore be performed in any manner.22 In particular, in the limit of vanishing electronic temperature (β → ∞), we can use a fast recursive Fermi-operator expansion method, where
(5)
(6)
(7)
Here, Θ(·) is the Heaviside step function and {gn(x(i))} is some set of purification or spectral projection polynomials2,43–45,47,50,56,90–97 that rapidly reaches convergence for the idempotent subgraph density matrices, {d(i)}. Alternatively, we may formulate the recursive sequence in Eq. (6) as a convolutional deep neural network.5,6 The globally thresholded density matrix, DGτ, is then assembled by the core parts (c) of the subgraph density matrices, {dc(i)}collect. This relation, as in Eq. (7), is of critical importance to any operations we perform in graph-based electronic structure theory as well as to the graph-based quantum perturbation theory presented later in this article. The separate recursive expansions (or purifications) over the smaller separate dense submatrices can be performed with close to peak performance on specialized hardware, such as many-core processors,7,98 graphics processing units (GPUs),4,22,58 and artificial-intelligence (AI) accelerating mixed-precision tensor cores or tensor processing units (TPUs).5,6,58,99,100
Several alternatives to the recursive expansion approach in Eq. (6) can be used, including expansions in the diagonal molecular-orbital (or eigenbasis) representation, where the molecular orbitals are given from solutions of the corresponding quantum mechanical eigenvalue problem. In the general non-orthogonal case,22,87 the Fermi-operator expansion on the graph then corresponds to the solution of a set of small generalized Kohn–Sham eigenvalue problems, one for each subgraph, i, and corresponding principal submatrix h(i) of the full composite Kohn–Sham Hamiltonian, H. The composite sparse density matrix, DGτ, is then collected from the core parts of each subgraph density matrix, d(i), given in a molecular-orbital representation. As in Eq. (7), the density matrix, DGτ, of the composite system is then given by the collected core-column parts of the subgraph density matrices, i.e.,
(8)
(9)
Here, {vk(i)} are the molecular-orbital eigenvectors of the subgraph Hamiltonian principal submatrices, h(i), in their orthogonalized form, where
(10)
(11)
s(i) are the principal submatrices of the overlap matrix for the full composite system, S, and z(i) is some inverse factorization of s(i). The subsystem Hamiltonian in the non-orthogonal atomic-orbital (ao) representation is given by hao(i). These have been extracted as principal submatrices from the effective single particle Hamiltonian, Hao, using some estimated data-dependency graph, Gτ=Gτao, for the atomic orbital representation. Note that only the core-column parts (c) of the eigenvector outer product in Eq. (9) are included in the column-by-column (or row-by-row) assembly of DGτ. The density matrix can also be assembled in the non-orthogonal atomic-orbital representation, where
(12)
Apart from the congruence transformations with z(i), the non-orthogonal problem is thus equivalent to the orthogonal case.

The data connectivity graph, Gτ, given from a global numerical threshold, τ, of some matrix function, is, in general, not known a priori. In fact, a global numerical threshold of a matrix function requires that we first calculate the full matrix function before it can be partitioned into separate submatrix calculations. This may seem like a highly unpractical approach. However, in graph-based QMD simulations, or in iterative self-consistent field optimizations, the data connectivity graph, Gτ, can be estimated from previous steps, including redundant estimates from new connections formed as the atoms move or the electronic structure changes.22 Our favorite choice is to estimate the data-dependency graph from the non-zero structure of the numerically thresholded density matrix (either in the orthogonalized or the atomic-orbital representation) from a previous self-consistent-field (SCF) iteration or molecular dynamics time step, including any new matrix elements that may have formed in the thresholded Hamiltonian. Additional connectivity is then added by performing paths of length two of this graph. After performing the Fermi-operator expansions on the partitioned subgraphs, the collected density matrix is then, once again, numerically thresholded and the process is repeated. This technique allows for a rapid adaptation to new data-dependency flows and sparsity patterns in combination with an easily numerically tunable accuracy.22 This method will also be used in the examples of our graph-based QMD scheme.

Alternative approaches include estimates from the thresholded envelope functions of the occupied eigenvectors, i.e., where we take the absolute values of elements of each eigenvector of the subsystems, {|vk(i)|}. These can then be used to construct an envelope (E) density matrix, DE, for the full composite system. In this way, decoherence or cancellations in the superposition of the fluctuating tails of the eigenstates, which may lead to a localization of the regular density matrix, are avoided. New data-dependency connections may then be included from the new thresholded Hamiltonian, where the new data connectivity graph, G, is estimated from
(13)
Other more ad hoc alternatives include, for example, using some power, m, of the thresholded overlap matrix, Sm, or the Hamiltonian, Hm, to estimate the data connectivity graph. For many systems with a significant HOMO–LUMO gap, the matrix square, i.e., using m = 2, is often sufficient. However, this ad hoc approach does not account for changes in the electronic overlap present in the density matrix and can therefore lead to uncontrolled errors.

An important feature of the different estimates of the data connectivity graph is that, even if we chose to have a fixed core part, the halo is fluctuating in size after each MD time step or SCF iteration.22 This can lead to some limitations in designing calculations.

The flexibility in determining the connectivity graph Gτ extends to the partitioning of the subgraphs into the core and the halo, which in turn determine the principal submatrices. In particular, the core is not limited to the case of a single matrix element illustrated in Fig. 2. The computational efficiency can be increased by including multiple matrix elements in the core while keeping the combined number of halo elements per core element small.22 Highly efficient core partitionings can be constructed using off-the-shelf graph methods22,26,101,102 that identify communities of closely connected vertices. Such graph partitioning, or other alternative partitioning methods,58 has been shown to decrease the average proportion of halo vertices in the subgraphs, leading to a significant reduction in the computational overhead.22,25,26,58

It is important to note that the data connectivity graph guided by the range of the thresholded density matrix is governed mainly by the electronic overlap of the molecular orbitals and not by the interatomic distances, e.g., based on some radial cutoff. This can be seen in the example of a solvated Trp-cage protein structure shown in Fig. 3, where the halo atoms surrounding a core region can be quite non-uniform. The data connectivity graph therefore gives a picture of the electronic overlap or the chemical bonding in the same way as the numerically thresholded density matrix, DGτ. This also means that graph-based QMD will be inefficient for metallic systems, where we have itinerant electronic states that, in practice, overlap with all atoms. The graph-based calculations will still work correctly, but the lack of sparsity will lead to a single subgraph that encloses the whole system. The natural parallelism and reduced linear scaling complexity are then lost.

FIG. 3.

Chemical representation for two different subgraphs of the Trp-cage protein system solvated in water in a periodic box.103 The protein and water solvent are shown using a licorice and stick representation, respectively. Carbon, oxygen, nitrogen, and hydrogen are depicted with magenta, red, blue, and white colors, respectively. The METIS graph-partitioning algorithm was used to partition the graph.101,102 (a) A small tyrosine residue is here automatically selected as the core part for one of the subgraphs (shown in VDW representation) with the halo atoms shown in translucent blue. (b) A small tightly connected water cluster is automatically selected as the core (shown in VDW representation) together with the halo atoms in translucent blue. Halo atoms that appear distant from each other are actually nearby due to the periodic boundary conditions.

FIG. 3.

Chemical representation for two different subgraphs of the Trp-cage protein system solvated in water in a periodic box.103 The protein and water solvent are shown using a licorice and stick representation, respectively. Carbon, oxygen, nitrogen, and hydrogen are depicted with magenta, red, blue, and white colors, respectively. The METIS graph-partitioning algorithm was used to partition the graph.101,102 (a) A small tyrosine residue is here automatically selected as the core part for one of the subgraphs (shown in VDW representation) with the halo atoms shown in translucent blue. (b) A small tightly connected water cluster is automatically selected as the core (shown in VDW representation) together with the halo atoms in translucent blue. Halo atoms that appear distant from each other are actually nearby due to the periodic boundary conditions.

Close modal

Partitioning the graph by the density matrix will naturally divide soft matter systems into chemically relevant units that will often correspond to a chemically sound division. As an example of the latter, in Fig. 3, we show the partitioning result applied to a system composed of a Trp-cage miniprotein (a 20 residues peptide) solvated in water, which is an archetypal model for a soft matter system. By partitioning the connectivity graph for the merged graphs of the density matrix and the corresponding Hamiltonian matrix, using an off-the-shelf community detection algorithm,101 we can see how key parts can be revealed. The left panel of Fig. 3 shows a tyrosine residue that automatically has been captured as a core part. We have added the surrounding halo atoms in translucent blue to show the extent of the connectivity. The right panel of Fig. 3 shows how a closely connected water cluster is captured within a core partitioning. The ability of graph-partitioning the data-connectivity graph determined by the electronic overlap on identifying chemically intuitive fragments could be of great interest as an automatic analysis tool for large complex simulations, in particular, if fragments appear in core partitions that are consistent over time.

Another example of an automatic graph clustering is shown in Fig. 4, where a METIS community detection algorithm101,102 was used to identify the core parts of a graph corresponding to a phenyl dendrimer system. The left panel shows the molecular structure, and the right panel shows the clustered graph with the more closely interconnected communities. This automatic core partitioning can then be used in the graph-based electronic structure calculations to reduce the computational overhead.

FIG. 4.

(a) Chemical representation of a phenyl dendrimer system. Carbon and hydrogen atoms are depicted in cyan and white spheres, respectively.25,104 (b) Original graph partitioning extracted from the effective single-particle density matrix of the phenyl dendrimer molecular structure using the METIS graph-partitioning algorithm.101,102 Note the fractal-like structure of the graph and the formation of natural communities.

FIG. 4.

(a) Chemical representation of a phenyl dendrimer system. Carbon and hydrogen atoms are depicted in cyan and white spheres, respectively.25,104 (b) Original graph partitioning extracted from the effective single-particle density matrix of the phenyl dendrimer molecular structure using the METIS graph-partitioning algorithm.101,102 Note the fractal-like structure of the graph and the formation of natural communities.

Close modal

To study chemical systems where the electronic HOMO–LUMO gap is small or vanishing, it is necessary to avoid instabilities. This is of particular importance in QMD simulations, where discontinuous crossings of nearly degenerate states may appear almost instantaneously. The problems can be avoided if we introduce fractional occupation numbers, i.e., we use a temperature-dependent free energy formulation for the electronic states.105–107 Using fractional occupation numbers leads to some additional challenges in our graph-based approach to electronic structure calculations. For example, it is no longer sufficient to determine the shared chemical potential μ in Eq. (3) or (5) over all the subgraph systems somewhere in the shared HOMO–LUMO gap.22,97 Instead, we need the exact shared value of the chemical potential, μ. This can be achieved using an iterative Newton–Raphson scheme, as presented in Ref. 108, which rapidly determines the chemical potential that gives the correct total occupation for a system with fractional occupation numbers.

In the Newton–Raphson scheme, we assume that we have calculated the μ-dependent effective single-particle density matrices, dc(i)(μ), of each subsystem (core + halo) from the Fermi function with some initial guess of μ using the diagonalization as in Eqs. (8)(11). The correct chemical potential, μ, which is shared over all the subsystems, is determined from the occupation condition of the composite system, i.e.,
(14)
With some sufficiently good initial guess of μ, this nonlinear equation can be solved iteratively using the Newton–Raphson method,108 where
(15)
Here, we have used the notation DGτ(n)DGτ(μn). The different terms in each Newton–Raphson iteration are given from the separate subsystems based on the relation in Eqs. (8) and (9), where
(16)
and
(17)
with the Fermi occupation factors
(18)
From each separate subsystem, i, we need only the sum of the square of the vector elements, j, of the eigenvectors, {vk(i)}, belonging to the core part, i.e., {(vk,jc(i))2}, for each eigenstate k and the eigenvalues {ϵk(i)}. This makes the Newton–Raphson scheme straightforward to parallelize efficiently.

The Newton–Raphson method is quadratically convergent. In QMD simulations or in self-consistent field iterations, the chemical potential from the previous step can be used as an initial guess. It can also be propagated or extrapolated from a sequence of previous time steps. Often, only a few iterations are then necessary to reach a tightly converged value of the shared chemical potential.

The Newton–Raphson scheme in Eq. (15) illustrates the general power of the graph-based approach to electronic structure calculations with its equivalence between the matrix function on a graph and its collected matrix functions of the principal submatrices, which are described by the graph partitioning construction in Eq. (7). Most algorithms for a full composite sparse system can be partitioned into operations over independent subsystems, which allows for a natural parallelism.

The Newton–Raphson scheme in Eq. (15) is efficient for systems with fractional occupation numbers and a small HOMO–LUMO gap. For systems with integer occupation of the electronic states and a finite HOMO–LUMO gap, there are more efficient alternative techniques that can be used even in combination with recursive Fermi-operator expansion methods.22,97

In graph-based QMD simulations, the overhead from the required self-consistent electronic ground state optimization, which is required prior to each force evaluation, can be reduced by using modern Car–Parrinello–like methods.22,58 In these methods, the electronic degrees of freedom are either propagated dynamically using extended Lagrangian Born–Oppenheimer molecular dynamics (XL-BOMD)60,62,109,110 or are extrapolated from previous time steps.58,111–115 In this section, we will briefly present the most recent formulations of XL-BOMD,50,61 where we use a shadow Born–Oppenheimer potential energy surface and a preconditioned low-rank Krylov subspace approximation for the kernel in the integration of the electronic equations of motion. We will use this updated shadow potential version of XL-BOMD to improve our original graph-based QMD simulation framework.22 Previously, the graph-based XL-BOMD simulations sometimes could require a few SCF iterations in each time step for small-gap systems with unsteady ground-state charge solutions. With the more recent formulation of XL-BOMD, it is possible to efficiently simulate more challenging and charge sensitive systems, e.g., systems where the electronic HOMO–LUMO gap is opening and closing along the molecular trajectories, without any SCF iterations.61 Ordinarily, electronic degeneracies appearing at the chemical potential when the gap is closing can cause costly convergence problems and difficulties in achieving accurate conservative forces, which may lead to unphysical molecular-dynamics trajectories. The most recent shadow potential formulation of XL-BOMD, including finite electronic temperatures and fractional occupation numbers, can avoid these problems.

The shadow potential formulation of XL-BOMD is based on a general principle that is quite simple, yet very powerful. Instead of calculating expensive approximate forces for the underlying exact interatomic potential, U(R), we design an approximate shadow potential, U(R,n), that closely follows the exact solution, but for which we can calculate exact forces cost-effectively. This approach is usually referred to as a backward error analysis or a shadow Hamiltonian approach116–123 and is often associated with the design of symplectic or geometric integration schemes.124–128 

In XL-BOMD, the shadow potential, U(R,n), depends not only on the nuclear coordinates, R, but also on the electron density, n(r). Thus, in contrast to regular QMD and in the spirit of Car–Parrinello dynamics,129 XL-BOMD includes the electronic degrees of freedom, n(r), and its time derivative, ṅ(r), as additional extended dynamical field parameters, apart from the nuclear coordinates, R, and their velocities, Ṙ. The equations of motion in XL-BOMD are derived in an adiabatic limit, where the extended electronic degrees of freedom are assumed to be fast compared to the nuclear motion. This corresponds to the same basic assumption (but for a classical field) as in the Born–Oppenheimer approximation. A similar adiabatic condition can also be enforced in Car–Parrinello molecular dynamics using Lagrange multipliers, which forms the basis of the mass-zero constrained dynamics scheme by Bonella et al.130–132 

In regular Born–Oppenheimer molecular dynamics based on density functional theory (DFT),107,133–135 the potential energy surface and ground state density are given by
(19)
(20)
Here, E(R, ρ) is an energy functional that we assume also includes the ion–ion repulsive terms, and for the finite temperature generalization, including fractional occupation numbers, it is a free energy functional that also includes the electronic entropy contribution.62,105–107,136,137 With the constrained minimization of E(R, ρ), we always mean the lowest stationary solution over all physically relevant electron densities that integrates to the desired Ne number of electrons. The stationary ground state solution, ρmin(r), determines the Born–Oppenheimer potential, U(R), which generates the dynamics. The motion is driven by the interatomic forces, RIU(R), as given by Newton’s equation of motion,
(21)
where {MI} are the atomic masses.

Note that when we include the effects of finite electronic temperatures, we allow for thermally excited states as given by the fractional occupation numbers in Kohn–Sham DFT.105–107 In principle, this goes beyond the regular ground-state Born–Oppenheimer approximation. However, here, we also consider an instantaneously thermally equilibrated ground state solution a part of a generalized Born–Oppenheimer approximation.

In general, E(R, ρ) is a nonlinear energy functional. The constrained minimization, Eq. (19), over all physically relevant electron densities, which in Kohn–Sham density functional theory are determined by single-particle states, leads to a system of non-linear eigenvalue equations. This non-linear problem needs to be solved iteratively with an SCF optimization procedure. Alternatively, the ground state can be found with an iterative direct energy minimization scheme, which in many ways resembles the iterative SCF optimization procedure. The ground-state optimization can be quite expensive, and unless the solution is well converged, the calculated forces from U(R) may not be sufficiently conservative, which could invalidate the results of a QMD simulation. This shortcoming can be avoided with a backward error analysis or a shadow Hamiltonian approach.

In our backward error analysis or shadow Hamiltonian approach, we construct the approximate shadow potential, U(R,n), by first approximating the energy functional, E(R, ρ), with an approximate n-dependent “shadow” energy functional,
(22)
This can be achieved by a functional expansion (linearization) of E(R, ρ) around some test density, n(r), which we assume is an electron density that is close to the exact regular ground-state density, ρmin(r), in Eq. (19). The n-dependent fully relaxed ground state density, ρ0[n](r), which is given from the constrained minimization of the approximate energy functional, i.e.,
(23)
defines our n-dependent shadow Born–Oppenheimer potential,
(24)
The minimization of the shadow energy functional in Eq. (23), which is performed with respect to the lowest stationary solution, thus defines the n-dependent ground state density, ρ0[n](r), and the corresponding shadow Born–Oppenheimer potential, U(R,n). The shadow energy functional, E(R,ρ,n), is given by a linearization of E(R, ρ) in ρ(r) around n(r). The shadow energy functional is, therefore, linear in ρ(r). The constrained minimization can then be performed directly in a single step without any SCF optimization procedure. This not only drastically reduces the cost of the optimization but also provides forces (without any convergence problems) that are fully consistent with the shadow potential U(R,n). This enables a conservative dynamics that diminishes any systematic long-term drift in the total energy in the same way as for shadow Hamiltonian integration schemes in classical molecular dynamics simulations.118–122 
In a shadow density functional formulation for orbital-dependent Kohn–Sham theory, the minimizing electron density, ρ0[n](r), in Eq. (23), can be constructed indirectly through the single-particle density matrix, Dβ,ao[n]={Dijβ,ao[n]}, where
(25)
Here, we assume that {ϕi(r)} represent some real-valued atomic-orbital (indicated using the “ao” superscript) basis functions for the expansion of the single-particle molecular orbitals that are used to represent the electronic density. The density matrix has the inverse temperature superscript, β = 1/(kBTe), to denote a generalization to finite electronic temperatures (Te ≥ 0) with fractional occupation numbers of the molecular orbitals. The (thermally equilibrated) ground state density matrix that is given from the minimization of the linearized energy functional in Eq. (23) is determined by the effective single-particle Kohn–Sham Hamiltonian, which in the basis-set representation has matrix elements,
(26)
In our generalized finite temperature formulation of Kohn–Sham density functional theory62,105–107 (or in finite temperature Hartree–Fock theory105,138,139), the n-dependent, thermally relaxed ground state density matrix of the linearized energy functional in Eq. (23) is given by
(27)
We here assume that H[n] and Dβ[n] are in an orthogonalized matrix representation, where H[n] = ZTHao[n]Z and Dβ,ao[n] = ZDβ[n]ZT, for some inverse factorization of the overlap matrix, Sij = ∫ϕi(r)ϕj(r)dr, such that ZTSZ = I. The eigenvalues of Dβ[n] correspond to the fractional occupation numbers of the electronic states. Because H[n] only depends on n(r) and is independent of ρ0[n](r), we do not need to solve for Dβ[n] and ρ0[n](r) in a self-consistent manner as for the constrained minimization of E(R, ρ). The ground state electron density, ρ0[n](r), can, therefore, be constructed directly without relying on an iterative self-consistent field optimization procedure. This is of particular importance in QMD simulations using linear-scaling electronic structure theory, where a numerical thresholding or other approximations can cause convergence problems.16,18,21,22,140 The same benefit has been demonstrated for QMD simulations with AI-accelerating tensor cores, where a low floating-point precision is used for the electronic structure calculations.5,6 The graph-based QMD simulations will benefit in the same way.
The optimized n-dependent ground state density, ρ0[n](r), calculated from the density matrix in Eq. (27), defines the shadow Born–Oppenheimer potential, U(R,n), in Eq. (24). The error in the shadow potential can be estimated from the size of a residual function, f[n](r) = ρ0[n](r) − n(r), where
(28)
The deviation from the exact regular Born–Oppenheimer potential thus depends on how close n(r) is to the ground state density, ρ0[n](r) or ρmin(r).141 To keep n(r) close to ground state as the positions of the atoms evolve during a molecular dynamics simulation, we propagate n(r) as a dynamical field variable that is driven by an extended harmonic oscillator that is centered on the ground state density. This additional dynamics can be formulated with an extended Lagrangian using our shadow Born–Oppenheimer potential, where n(r) and its time derivative are included as extended electronic degrees of freedom in addition to the atomic positions, R = {RI}, and their velocities, Ṙ={ṘI}.60,62,109 The equations of motion are given from the Euler–Lagrange equations that are derived in an adiabatic limit, where we assume that the extended electronic degrees of freedom are fast compared to the fastest nuclear motion. This is analogous and consistent with the regular Born–Oppenheimer approximation, which was our starting point in the definition of the Born–Oppenheimer potential. In this adiabatic limit, the equations of motions of our shadow Born–Oppenheimer molecular dynamics are given by
(29)
(30)
where {MI} are the atomic masses. Equation (29) represents the regular Newton’s equation of motion for the nuclear degrees of freedom with the forces calculated from the gradient of the shadow potential, while the density, n(r), is kept constant. Equation (30) represents a harmonic oscillator equation for the electron density, n(r), that evolves in a harmonic well centered on ρ0[n](r). In the harmonic oscillator equation, K(r, r′) is a kernel defined by
(31)
This kernel appears similar to a preconditioner, which makes n(r) oscillate around a close approximation to the fully self-consistent electron density of the regular Born–Oppenheimer potential, ρmin(r), and, therefore, also to ρ0[n](r). In this way, n(r) follows the ground state and the error in the shadow potential, Eq. (28), stays small. Typically, the kernel is approximated using a scaled delta function, i.e., K(r,r′) = −(r − r′), c ∈ [0, 1]. However, for sensitive small-gap systems with unsteady charge solutions, this is often not sufficient and a more accurate, preconditioned, low-rank Krylov subspace approximation is needed.60 This will be discussed in more detail below, and it is the key motivation behind our introduction of a graph-based quantum perturbation theory.
The integration of the equations of motion, in Eqs. (29) and (30), can be performed with a modified Verlet integration scheme.142–144 To initialize the electronic degrees of freedom, we set n(r) = ρmin(r); this requires a full regular SCF optimization of the ground state density as in Eq. (19), but only for the first time step. The accuracy in the sampling of the potential energy can then be shown to depend on the integration time step, δt, to fourth order,60 i.e.,
(32)
The sampling error of the potential surface is thus small compared to the local truncation error in the total energy for the Verlet integration scheme, where the accuracy in the total energy scales as O(δt2). In general, we can therefore use the same size of the integration time step as in regular direct Born–Oppenheimer molecular dynamics simulations.62 
The kernel K(r,r′) in Eq. (30) is defined by Eq. (31) as the inverse Jacobian of the residual function,
(33)
The calculation of the kernel can be a demanding task. However, we only need to approximate how the kernel acts on the residual function.61 This allows for a significant simplification.

To facilitate the presentation of how the kernel, K(r,r′), can be approximated, we will use a matrix-vector notation.

In a matrix-vector notation, the kernel is given by
(34)
where K, JRN×N and f, n, ρ0[n] ∈ RN. We can then rewrite the electronic equations of motion in Eq. (30) in the equivalent form,
(35)
We have here introduced an approximate inverse to the Jacobian, K0J−1. This means that (K0J)1I, whose action on the preconditioned residual vector, K0f, should be possible to represent accurately using a low-rank approximation. In the exact case, we can represent (K0J)1 by
(36)
where
(37)
for some complete set of vectors, {vj}. The tensor elements, {Mij}, are given by
(38)
The expression for the preconditioned kernel in Eq. (36) is based on a generalization of the Jacobian, J, which is defined by an arbitrary set of directional derivatives61 instead of the partial derivatives as in Eq. (34).

The directional derivatives in Eq. (37) along vj can be calculated using time-independent quantum perturbation theory. The charge vector, vj, generates a linear perturbation, H1, in the potential of the unperturbed Kohn–Sham Hamiltonian, H0, i.e., we get a linear first-order perturbation in the Hamiltonian, where H(λ) = H0 + λH1. The first-order response in the charge density is then given from the first-order perturbation in the wavefunctions or the density matrix. How we can perform such response calculations on a partitioned graph, including fractional occupation numbers, will be presented in Sec. IV and is one of the main objectives of this paper.

A low-rank approximation of (K0J)1 in Eq. (36), which is acting on K0f(n) in Eq. (35), can be constructed with a reduced set of well-chosen vectors {vj}j=1m(m<N). We can chose these from an orthogonalized preconditioned Krylov subspace,
(39)
(40)
With this preconditioned Krylov subspace approximation, which is described in more detail in Sec. V B, we can rewrite the electronic equations of motion with a low-rank expression as
(41)
The main cost of this preconditioned kernel approximation, which was introduced in Ref. 61 for the integration of the extended electronic degrees of freedom in XL-BOMD, comes from the calculation of the response vectors, fvj. These can be calculated using canonical quantum perturbation theory that accounts for fractional occupation numbers.145,146 In our graph-based approach, we need to modify this canonical perturbation theory to be applicable to calculations on a graph. This is particularly important for sensitive low-gap systems that may have unsteady charge solutions. However, for QMD simulations of non-reactive systems, we can often avoid any low-rank updates of fvj. Instead, a simple fixed preconditioner, K0, alone (with m = 0) is sufficiently accurate. Typically, for molecular systems with a significant HOMO–LUMO gap, even a scaled delta function, where KK0 = −cI with c ∈ [0, 1], provides an accurate approximation. Alternatively, a preconditioner can be constructed from the regularized solution of some approximate system, for example, the molecular system at the initial time step. Even if the calculation of the preconditioner is expensive, if it is reused over many (e.g., thousands) time steps, the total overhead may become only a small fraction of the total computational cost, leading to a net increase in efficiency.

The time-independent canonical quantum response calculations required for the directional derivatives in Eq. (37), which are used in the preconditioned kernel approximation in Eq. (41), can be formulated in terms of density matrix perturbation theory.61,68,145,147–149 Density matrix perturbation theory for fractional occupation numbers can be constructed from matrix function expansions.145 We can therefore design a graph-based canonical quantum perturbation theory if we use the one-to-one mapping between the matrix function expansion on a graph and the collection of the core parts from the dense matrix functions of the principal submatrices that are determined by the partitioned subgraphs, i.e., as in Fig. 2 or in Eq. (7). First, we show how the time-independent linear response in the density matrix can be calculated on a partitioned graph, and then, we look at the response in observables and, in particular, the linear response in the electron density.

The unperturbed effective single-particle density matrix for a system at finite electronic temperature is given, as in Eq. (27), by
(42)
Here, H0 is the unperturbed Hamiltonian, which we, once again, assume is in an orthogonal representation. The chemical potential, μ, is determined by the condition that the trace of the density matrix has a given occupation number, i.e.,
(43)
If we now introduce a perturbation to the Hamiltonian,
(44)
which is linear in the perturbation parameter, λ, then the corresponding linear response in the density matrix is
(45)
(46)
The separate response terms of D1β can be calculated using first-order canonical density matrix perturbation theory.61,145,146 The first partial derivative term in Eq. (45) corresponds to a grand-canonical response with a fixed chemical potential and can be calculated, for example, as the X(1) term in Algorithm 2 in Ref. 84. The second term in Eq. (46) is the response of the density matrix with respect to the chemical potential, μ, which is given by
(47)
To calculate the last factor, μ1/μλ, which is the response in the chemical potential, we can use the condition that the perturbation does not change the number of states, i.e.,
(48)
This condition gives us the response term,
(49)
For convenience, we have dropped the notation that the response terms are calculated in the limit of λ = 0.
To perform the corresponding canonical quantum response calculations on a graph, we can break up the data-dependency graph, Gτ, into subgraphs from which we can extract the principal submatrices, {h0(i)} and {h1(i)} from H0 and its perturbation H1, respectively. The first-order response in the density matrix, D1βGτ, calculated on the data connectivity graph, Gτ, is then given by the collected linear response terms over the core parts of the separate subsystems, i.e.,
(50)
Here,
(51)
and
(52)
These are the density matrix response terms for the local subsystems. The response in the chemical potential that keeps the total number of electrons unchanged is then given by
(53)
where the traces are taken over the core part of each subsystem. The density matrix response calculations of {d1,λβ,(i)} and {d1,μβ,(i)} on the separate subgraphs can be performed in the molecular-orbital representation, e.g., as in Algorithm 2 in Ref. 84. This is particularly efficient when we need multiple response evaluations for the same structure and when we need to consider fractional occupation numbers.
With the linear response in the density matrix, we can calculate the response of any time-independent observable, a, corresponding to some operator Â. In a bra-ket representation, where Aij=ϕi|Â|ϕj, the response is given by
(54)
In this way, we can calculate, for example, the polarizability from the response of an electric field if  is the dipole operator.145,150
From Eq. (25), we find that the linear response in the ground state electron density, ρ0[n](r), is given by
(55)
where
(56)
Here, D1β,ao[n] is the non-orthogonal atomic-orbital representation of the linear response in the density matrix, i.e., D1β,ao[n]=ZD1β[n]ZT, where Z is the inverse factorization of the overlap matrix, S, i.e., where ZTSZ = I.

In general, we never need to construct the full collected density matrix response matrix D1βGτ from the core parts of the separate subgraph density matrix responses, {d1,λβ,(i)} and {d1,μβ,(i)}. Instead, response properties, such as the response in the electron density or the response of the partial charges, can be calculated first for the separate subsystems and then collected. In this way, the amount of data transfer can be reduced when the response calculations are performed on a distributed parallel platform.

The data-dependency graph, Gτ, that is needed for the graph-based quantum response calculations can be determined not only from an estimate of the globally thresholded expansions of the ground state density matrix, D0β, and the Kohn–Sham Hamiltonian, H0, but also from the response density matrix, D1β. The matrix D1β is often less sparse than D0β;150 however, in practice, we have found that it is sufficient to estimate Gτ from the same graph structures of a thresholded D0β and H0, as discussed in Sec. II E.

Our graph-based canonical quantum perturbation theory was motivated by the need to calculate the response in the electron density, as in Eq. (37), which is needed to approximate the kernel in our shadow potential formulation of XL-BOMD.61 Graph-based quantum perturbation theory is a very useful and general tool in its own right, however. In addition, beyond the present application, it should also be possible to use the theory to reduce the complexity of calculations of a broad range of time-independent response properties of large extended systems, including properties such as the magnetic susceptibility, phonon modes, electric polarizabilities, the Born-effective charge, and Raman spectra.45,67–69

The equations of motion that determine our shadow Born–Oppenheimer molecular dynamics, in Eqs. (29) and (30), can be rewritten in an equivalent matrix-vector notation as
(57)
(58)
where we have included the preconditioner, K0J−1, in the electronic equations of motion.

Later, in our demonstration of the graph-based shadow Born–Oppenheimer molecular dynamics simulations, we will use semi-empirical SCC-DFTB theory, which is naturally described using the vector notation. The theory is general, however, and can be applied to electronic structure methods with different representations of the electronic structure, perhaps with some significant modifications.

This section mirrors and partially overlaps with the response theory discussed in Sec. IV. However, the analysis in this section complements the previous discussion. Instead of a general presentation of a graph-based canonical density matrix perturbation theory, we now focus on the specific techniques required for the graph-based shadow Born–Oppenheimer molecular dynamics simulations. First, we look at the construction of a preconditioner and then how the low-rank preconditioned Krylov subspace approximation can be performed on a graph for the integration of the electronic equation of motion.

The effectiveness of the Krylov subspace approximation of K0J1 acting on the preconditioned residual function, K0ρ0[n]n, in Eq. (58) depends on how close the preconditioner, K0, is to the exact solution, J−1. It is possible to construct K0 by calculating a regularized kernel for an approximate test system, for example, the full composite system at the initial time step of a QMD simulation or for some similar equilibrated structure. The principal submatrices of K0 can then be extracted on-the-fly during an MD simulation as preconditioners for the partitioned subgraphs. Such a preconditioner could be reused, possibly over thousands of time steps. Without a parallel implementation, the computational cost currently limits this approach to systems with ≲O(104) atoms. Fortunately, our graph-based methods enable a modified parallel approach where the preconditioner for the whole system is decomposed into a set of smaller separate preconditioners, one for each subgraph. This approach must account for the possibility that the halos surrounding the core regions of the subgraphs can change in each time step, which increases the complexity. It is also necessary to take care that application of the preconditioner conserves the total charge. We now describe such an approach.

The kernel K is given from the inverse Jacobian, J−1, of the residual vector function,
(59)
The Jacobian matrix elements are given by
(60)
(61)
where the response of the chemical potential, μ, has been included explicitly.
The n-dependent ground-state charge vector, ρ0[n], which is given from the constrained minimization in Eq. (23), is determined indirectly from the density matrix as in Eq. (25). The ground state electron density can thus be seen as a function of the density matrix, where
(62)
With the graph-based scheme, the ground state electron density therefore can be determined by collecting the charges of the core parts of the subgraphs, {ρ0(k)[n]|c}. These depend on the submatrices of the density matrix corresponding to the partitioned subgraphs, {d0β,(k)[n]}. The composite charges of the full system therefore can be assembled as
(63)
Equation (63) allows us to determine the partial derivatives necessary to calculate the Jacobian as in Eqs. (60) and (61) using partial derivatives with respect to individual point charges. The first derivative in Eq. (61) of the ground state charge is
(64)
where
(65)
Here, {h1(k)} are the principal submatrices (in an orthogonal representation) of the potential matrix, H1[δj], from a unit delta charge, δj, placed at nj, and with all other set to zero, i.e., nk = 0, ∀kj. This approach requires a full (non-local) Coulomb summation to construct H1[δj], one for each charge component, nj, of the system. This step could be expensive, but it can be parallelized easily and the Coulomb summations can be performed using a fast Fourier transform,151 the Ewald summation method, or the particle mesh Ewald algorithm.152,153 The second term in Eq. (61) contains the derivative of the ground state charge for each component i with respect to the chemical potential, μ, which can be collected from the submatrices,
(66)
Here, {d1,μβ,(k)} are given as in Eq. (52). The last remaining response part of Eq. (61) is the derivative of the chemical potential with respect to the charge, μ1,j/μnj, which is equal for all subsystems, i.e., μ1,j(k)=μ1,j. This shared response in the chemical potential can be determined from the condition that we have no net changes in the total occupation of the collected response density matrices for the full system, i.e., in the same way as in Eq. (53), which gives us
(67)
from which each μ1,j can be determined.
With the separate derivatives in Eq. (61), we can now construct the Jacobian and through its inverse the kernel, K = J−1, of the full composite system. Because calculating this preconditioner could be quite expensive and the parallelization is not ideally suited for our graph-based methods, we adopt an alternative approach using a set of small fixed preconditioners, one for each subgraph partition. Such an approach lowers the cost of calculating the preconditioning and reduces the necessary data transfer and/or the memory requirements. The approach follows from noting that the Jacobian, Jij, of f(n) in Eq. (60) can be represented by a collection of separately calculated submatrix Jacobians,
(68)
where
(69)
(70)
The core part of each of these Jacobians, jc(k)Rm×m, can be inverted, yielding trial kernels that could be used as preconditioners. To guarantee charge neutrality when the preconditioner acts on the residual function, however, we first need to adjust the response for each core part of the Jacobians such that the response of each column of jc(k) sums up to −1. The adjustment is achieved by shifting the response in the chemical potential μ1,j(k) such that icjij(k)=1 for each column response summed only over the core parts. After this charge-response adjustment, the approximate kernels for the core parts of each subgraph are given by
(71)
where we have included a constant, α ≥ 0, that introduces a regularization of the solution. The collection of these separate approximate inverse Jacobians corresponds to a block-diagonal preconditioner for the composite full system,
(72)
There are, of course, other alternatives for how a preconditioner can be constructed, but we have found that this approach is particularly simple and efficient. The set of separate preconditioners, {k0(k)}, can be distributed and used for each core part of our graph partitioning, and each k0(k) does not depend on other subgraphs or on the fluctuating size of the halos.

The block-diagonal preconditioner, K0 in Eq. (72), can be used to accelerate the Krylov subspace approximation of the kernel acting on the residual function in the equations of motion for the electronic degrees of freedom in Eq. (58), which then can be approximated as in Eq. (41). To achieve this, we need to construct the orthogonalized preconditioned Krylov subspace expansion of vectors {vi} and {fvi}. With the graph-based approach to quantum response calculations, it is possible to perform the most demanding tasks of this Krylov subspace expansion in parallel on the separate subsystems. To explain how this is achieved, we will show how the first vectors, v1, fv1, and v2, are constructed. We will then present the general algorithm.

The first vector, v1, of the Krylov subspace is chosen as the normalized preconditioned residual function, i.e.,
(73)
(74)
(75)
(76)
Because the preconditioner, K0, is a block-diagonal matrix, the initial preconditioning can easily be performed on the core parts of the separate subsystems. In addition, the square of the vector norm, v12, can be collected from the squares of the vector norms of the core parts of the separate subsystems. Once v1 is chosen, we need to calculate the directional derivative, fv1, from the preconditioner acting on the directional derivative of the residual function,
(77)
This directional derivative is given by
(78)
(79)
The directional derivative can be calculated with the graph-based canonical quantum perturbation theory in Sec. IV, where the normalized preconditioned residual charge vector, v1, induces a perturbation in the effective single-particle potential. This induced potential then appears as the first-order perturbation in the effective single-particle Hamiltonian, H1. This Hamiltonian perturbation can then be partitioned into subsystems using the data-dependency graph. The process can be described schematically as
(80)
We can then determine the density matrix response from the separate subsystems,
(81)
with the canonical graph-based quantum perturbation theory, as described in Sec. IV. Here, μ1 is the shared response in the chemical potential, which is chosen to keep the perturbed full composite system charge neutral. The density matrix response, D1βGτ, then determines the electronic charge density response or the response in the charge vector due to v1. However, this can also just be performed over the separate subsystems, where
(82)
This allows for a natural parallelism. Here, ρ0(k)d1,v1β,(k)c is the core part of the charge response determined by the derivative of the subsystem density matrix, d1,v1β,(k), with respect to λ in Eq. (51) with h1 = v1. The next term, ρ0(k)d1,μβ,(k)c, is the corresponding charge response determined by the density matrix derivative with respect to the chemical potential as in Eq. (52), and μ1 is determined as in Eq. (53). From this directional response in the charge induced by the charge vector v1, we get
(83)
The next vector v2 of the Krylov subspace is constructed from the components of fv1 orthonormalized to v1, i.e.,
(84)
(85)
This process of generating orthogonalized Krylov subspace vectors can be repeated iteratively until a sufficiently accurate approximation is achieved.61 The general scheme can be described by the following algorithm:
  1. Chose v1 from the residual as in Eqs. (73)(76).

  2. viHi[vi]=hi(k)ccollect.

  3. fvi=k0(k)ρ0(k)[n+λvi]λvi(k)collect.

  4. vi+1=fvij=1i1(fviTvj)vj, vi+1=/vi+1vi+1=vi+1(k)collect.

  5. Repeat 2–4 until sufficient convergence is achieved in Eq. (41).

Once a sufficiently accurate preconditioned subspace approximation is achieved,61 the overlap matrix and inverse overlap matrix,
(86)
(87)
are formed. The inner products, fviTfvj, can be collected in parallel from the separate inner products, fvi(k)Tfvj(k), computed from each core part of the subgraphs.
The integration of the electronic equations of motion can then be performed using the preconditioned Krylov subspace approximation as in Eq. (41). This approximation can be performed in parallel on the separate subsystems, where
(88)
thanks to our graph-based approach. In this way, the core parts of the charge acceleration, n̈(k), can be determined and integrated in parallel for the separate subsystems. Equation (88) and how it can be used to integrate the electronic equations of motion for our graph-based shadow potential formulation of extended Lagrangian shadow Born–Oppenheimer molecular dynamics scheme are one of the key results of this paper.
The techniques used to approximate the preconditioned kernel in Eq. (88) can also be used to accelerate the convergence of the iterative solution of the electronic ground state.61,154 We can do this with the Newton scheme,
(89)
(90)
where the kernel K or the preconditioned kernel K0J1 can be approximated using the preconditioned Krylov subspace approximation as in Eq. (88). This allows for the acceleration of achieving self-consistency over the separate subsystems, where
(91)
With an initial guess that is sufficiently close to the ground state in combination with a good preconditioner (kept constant during the SCF iterations) and a sufficient number of low-rank updates, we can expect close to a quadratic convergence. The method is then numerically equivalent to a direct Newton optimization scheme. In general cases, the computational efficiency may not be as good as related Broyden or Pulay mixing schemes,155–157 but the Newton method in combination with the preconditioned Krylov subspace approximation offers an alternative that may help to accelerate particularly difficult cases. With the graph-based techniques presented in this paper, this Newton-based SCF-acceleration method is also now well-suited for large-scale simulations of complex chemical systems.

In this section, we will show some simulation examples of graph-based quantum response theory and extended Lagrangian shadow Born–Oppenheimer molecular dynamics using the preconditioned Krylov subspace approximation for the integration of the extended electronic degrees of freedom. First, we briefly present SCC-DFTB theory, which forms the basis for our implementation. We then demonstrate the graph-based quantum response theory as it is applied to the preconditioned Krylov subspace approximation in the quasi-Newton SCF acceleration scheme in Eq. (91). Thereafter, we demonstrate the methodology for graph-based shadow Born–Oppenheimer molecular dynamics simulations and how we can achieve stability and control of the residual charge errors without performing any SCF optimization prior to the force evaluation as in regular Born–Oppenheimer molecular dynamics. We then demonstrate how the graph-based methodology can be applied to molecular dynamics simulations of large complex molecular systems with tens-of-thousands of atoms. In all systems, we used periodic boundary conditions. At the end, we make a preliminary assessment of the scalability and the parallel efficiency.

The methods in this article were implemented and tested using semi-empirical SCC-DFTB theory48,72–74,158–161 based on the open-source electronic structure software package LATTE89,162 together with the PROGRESS and Basic Matrix Library (BML) libraries.9,88 No new parameterizations or optimizations of the SCC-DFTB energy functional were performed for this study. SCC-DFTB is an approximation of first-principles Kohn–Sham density functional theory, which is derived from a second or third-order expansion of the Kohn–Sham energy functional in the electronic charge fluctuations around a set of overlapping atomic electron densities. A minimal numerical basis set is used, and bond and overlap integrals are tabulated and parameterized using a Slater–Koster approximation. The electrostatic energy is approximated by screened Coulomb interactions between atomic net Mulliken partial charges. The Coulomb interaction between atoms decays as |RIRJ|−1 at large distances and is screened at short distances as the interaction between two penetrating Slater-like charge densities with the on-site term chosen as the chemical hardness or a Hubbard U parameter. In this formalism, the charge density, n, and the corresponding ground state density, ρ0[n], are reduced to vectors with components corresponding to the net Mulliken partial electron occupation for each atom.

The preconditioned Krylov subspace expansion for the kernel can be used to accelerate the convergence in the iterative optimization of the self-consistent ground state solution,61,154 as described in Eq. (91). This ground state SCF optimization is required in our graph-based shadow Born–Oppenheimer molecular dynamics simulations, but only in the first initial time step, where n(t0) is set to the exact regular Born–Oppenheimer ground state, ρmin(r). The SCF acceleration in Eq. (91) can be used to demonstrate our graph-based quantum response theory for fractional occupation numbers, which is required in the construction of the preconditioner and for the low-rank Krylov subspace approximation of the kernel, K. A significant reduction in the number of iterations required to reach a high level of convergence would demonstrate the expected performance of the theory. The Krylov subspace expansion we will use is truncated, and therefore, the kernel is not exact. The preconditioned Krylov subspace expansion is therefore only approximate, and in this case, the iterative updates using Eq. (90) correspond to a quasi-Newton scheme.

Figure 5 shows an example of the convergence of the root mean square error (RMSE) given from the root mean square of the residual charge function as a function of iterations using either the state-of-the-art Pulay direct inversion of the iterative subspace (DIIS) method157,163 or a quasi-Newton scheme (Kernel), as in Eq. (90). The test systems for this comparison are a solvated Trp-cage protein structure (2644 atoms) and a system with ammonium hydroxide in water (4071 atoms). A high electronic temperature was chosen with β−1 = 0.5 eV, which provides a notable deviation in the occupation numbers from a pure ensemble and thus demonstrates how the method also works for systems with non-integer occupation numbers. Both systems are fairly demanding to converge. The quasi-Newton scheme provides a significant acceleration of the SCF optimization and thus demonstrates how the graph-based canonical quantum-response theory works, in practice. However, each iteration involves low-rank updates that require repeated quantum response calculations on the graph (a total of six was used for these runs), each which cost about half as much as a full self-consistent field iteration using DIIS. The wall-clock time required to reach convergence is, therefore, in practice, about the same, but only if we ignore the additional cost of calculating the preconditioner. The quasi-Newton scheme represents an alternative acceleration method, for example, to the SCF mixing schemes by Broyden, Anderson, and Pulay155–157 and should be particularly competitive if the cost of the preconditioner can be ignored, which is the case for our graph-based QMD simulations where the preconditioner can be reused over hundreds or even thousands of time steps.

FIG. 5.

Root mean square error (RMSE) for the partial charges during the iterative self-consistent charge optimization in SCC-DFTB calculations of a solvated Trp-cage protein, as illustrated in Fig. 3, and for ammonia in water, as illustrated in Fig. 6. We are using either Pulay’s DIIS mixing scheme157,163 or the preconditioned Krylov subspace approximation for the quasi-Newton scheme (Kernel) in Eq. (90). The quasi-Newton scheme was initiated after five or six DIIS iterations.

FIG. 5.

Root mean square error (RMSE) for the partial charges during the iterative self-consistent charge optimization in SCC-DFTB calculations of a solvated Trp-cage protein, as illustrated in Fig. 3, and for ammonia in water, as illustrated in Fig. 6. We are using either Pulay’s DIIS mixing scheme157,163 or the preconditioned Krylov subspace approximation for the quasi-Newton scheme (Kernel) in Eq. (90). The quasi-Newton scheme was initiated after five or six DIIS iterations.

Close modal

The improved SCF acceleration in Fig. 5 demonstrates that our graph-based canonical quantum response theory and preconditioned Krylov subspace approximation work as expected. Beyond applications to single-point SCF optimization and molecular dynamics simulations, demonstrated below, the same graph-based quantum response theory and preconditioned Krylov subspace approach should also be applicable to the repeated SCF optimization required in geometry optimization, which could also reuse the same preconditioner over multiple atomic configurations.

To assess the performance of the graph-based shadow Born–Oppenheimer molecular dynamics scheme for more challenging simulations, we prepared a test system in which reactions are poised to occur. The test system consists of concentrated ammonium hydroxide in water, which in a microcanonical ensemble (NVE) simulation will evolve into ammonia by the reaction NH4+ + OH ⇄ NH3 + H2O. This system is highly reactive mainly because of two factors: (1) The concentration of the reactants is artificially high (29.5M or 36 times higher than its equilibrium concentration under ambient conditions) and (2) the initial coordinates are out of equilibrium, which leads to a rapid increase in temperature within the first few hundred MD time steps of NVE simulation.

A schematic representation of the test system is shown in Fig. 6(a). It contains a total of 4071 atoms with an initial set of 216 NH4+ + OH pairs apart from the water. The data-dependency graph was estimated from paths of length two of the combined graph of the full Hamiltonian and a thresholded density matrix (threshold 0.002). We used six rank-1 updates for the Krylov approximation of the preconditioned kernel. The system was partitioned into 128 subgraphs. In Fig. 6(b), we show one of the subsystems with its core and surrounding halo region, which was automatically selected using a graph-partitioning algorithm as implemented in the METIS software package.101,102 The QMD simulation was performed using 64 MPI ranks distributed across 16 Intel(R) Xeon(R) E5-2695 v4 @ 2.10 GHz central processing units (CPUs) each of them containing 36 cores.

FIG. 6.

(a) Ammonium hydroxide and water in a periodic box containing 4071 atoms. The system was partitioned using 128 subgraphs. (b) A core partition together with its surrounding halo region (in translucent blue large spheres). The halo atoms on the top are near those on the bottom due to the periodic boundary conditions.

FIG. 6.

(a) Ammonium hydroxide and water in a periodic box containing 4071 atoms. The system was partitioned using 128 subgraphs. (b) A core partition together with its surrounding halo region (in translucent blue large spheres). The halo atoms on the top are near those on the bottom due to the periodic boundary conditions.

Close modal

Figure 7 shows three quantities monitored over QMD simulation. The temperature (in red, upper panel) increases rapidly up to about 1450 K. The total energy (in black, mid panel) demonstrates a stable total energy without any systematic drift. The root means square error (RMSE) given from the root mean square of the residual charge function (in blue, lower panel) demonstrates a well-controlled stable behavior. The rapid initial changes in the electronic structure and temperature create fairly large oscillations in the total energy, where a few molecules are moving fast. The amplitude of the total energy oscillations decays as the system is reaching a thermal equilibrium. Some of the rapid initial oscillations also lead to a small increase in the residual charge.

FIG. 7.

Graph-based QMD simulation of ammonium hydroxide in water containing a total of 4071 atoms using extended Lagrangian shadow Born–Oppenheimer molecular dynamics. The initial configuration includes 216 NH4+ + OH pairs. The upper panel (red) shows the rapid increase in the statistical temperature. The middle panel (black) shows the oscillations in the total energy (kinetic + potential), and the lower panel (blue) shows the root mean square error (RMSE) given from the root mean square of the charge residual function, f(n) = ρ0[n] − n.

FIG. 7.

Graph-based QMD simulation of ammonium hydroxide in water containing a total of 4071 atoms using extended Lagrangian shadow Born–Oppenheimer molecular dynamics. The initial configuration includes 216 NH4+ + OH pairs. The upper panel (red) shows the rapid increase in the statistical temperature. The middle panel (black) shows the oscillations in the total energy (kinetic + potential), and the lower panel (blue) shows the root mean square error (RMSE) given from the root mean square of the charge residual function, f(n) = ρ0[n] − n.

Close modal

The QMD trajectory was inspected to identify reaction events. In Fig. 8, we show a local reaction event captured in a series of configuration snapshots. An initial NH4–OH complex is formed at 28 fs followed by the formation of NH3 and water at 34 fs. The rest of the simulation shows NH3 and water molecules defusing apart (see the snapshot at 126 fs).

FIG. 8.

Local molecular snapshots demonstrating the following reaction (from left to right panel): NH4+ + OH → NH3 + H2O. This was determined from the graph-based QMD simulation in Fig. 7.

FIG. 8.

Local molecular snapshots demonstrating the following reaction (from left to right panel): NH4+ + OH → NH3 + H2O. This was determined from the graph-based QMD simulation in Fig. 7.

Close modal

Despite the out-of-equilibrium initial configuration, followed by a rapid exothermic process with chemical changes as in Fig. 8, the total energy in Fig. 7 remains remarkably stable. No instabilities in the total energy or in the residual charge are observed. This example illustrates that the graph-based shadow Born–Oppenheimer molecular-dynamics scheme is capable of simulating a challenging reactive test system without the iterative SCF optimization steps that normally would be required prior to the force evaluations in a regular Born–Oppenheimer molecular dynamics simulation.

Our next test system was chosen to be a water box containing several thousands of atoms. The system was constructed using the GROMACS solvation tool.164 This type of system is non-reactive under the initial conditions and the simulation time-scales (i.e., no water dissociation is expected). It, therefore, allows us to use a relatively coarse graph, leading to more efficient calculations and the possibility of reaching longer-duration simulation times, even for larger systems. Figure 9(a) shows the water system containing 6495 atoms, and Fig. 9(b) shows a subgraph partition with its core and halo regions.

FIG. 9.

(a) Water system in a periodic box containing 6495 atoms. The system was partitioned using 256 subgraphs. (b) A core partitioning (in WDV large sphere representation) together with its halo region (in translucent blue large spheres). The halo atoms near the back-right face are close than those near the front-left face due to the periodic boundary conditions.

FIG. 9.

(a) Water system in a periodic box containing 6495 atoms. The system was partitioned using 256 subgraphs. (b) A core partitioning (in WDV large sphere representation) together with its halo region (in translucent blue large spheres). The halo atoms near the back-right face are close than those near the front-left face due to the periodic boundary conditions.

Close modal

Figure 10 shows the result of NVE simulation starting with an initial out-of-equilibrium structure. The calculations were performed by decomposing the system into 256 subgraphs. The QMD simulation was performed using 64 MPI ranks distributed across 16 Intel(R) Xeon(R) E5-2695 v4 @ 2.10 GHz CPUs each of them containing 36 cores. The MD time step and electronic temperature were set to 0.2 fs and β−1 = 0.5 eV. In addition, this simulation demonstrates stability and error control in the behavior of the total energy (in black, middle panel) and charge residual error (in blue, lower panel) as the statistical temperature increases to around 400 K (in red, upper panel).

FIG. 10.

Graph-based QMD simulation of the water system in Fig. 9. The upper panel shows the statistical temperature. The middle panel shows the fluctuations in the total energy per atom, and the lower panel shows the root mean square error (RMSE) calculated from the root mean square of the charge residual function, f(n) = ρ0[n] − n.

FIG. 10.

Graph-based QMD simulation of the water system in Fig. 9. The upper panel shows the statistical temperature. The middle panel shows the fluctuations in the total energy per atom, and the lower panel shows the root mean square error (RMSE) calculated from the root mean square of the charge residual function, f(n) = ρ0[n] − n.

Close modal

Our final test case is a system composed of eight Trp-cage synthetic polypeptides in an ammonium bicarbonate solution, shown in Fig. 11. A solvated simulation box with a single polypeptide and a charge-neutralizing combination of five HCO3 molecules and four NH4+ molecules was constructed using the multicomponent input generator of CHARMM-GUI,165 using input files for the polypeptide, HCO3 and NH4+. The polypeptide coordinates were obtained from the first model in RCSB PDB entry 1L2Y.166 It was expanded to a 2 × 2 × 2 system using PDBProp in AmberTools.167 The final system was a 43.923 Å3 cube simulation box with 64 112 atoms. It is highly challenging not to say practically unfeasible to perform quantum-mechanical Born–Oppenheimer simulations of systems of this complexity without super-computing access.

FIG. 11.

Trp-cage protein molecules with a solution of ammonium bicarbonate and water in a periodic box with a total of 64 112 atoms.

FIG. 11.

Trp-cage protein molecules with a solution of ammonium bicarbonate and water in a periodic box with a total of 64 112 atoms.

Close modal
FIG. 12.

QMD simulation with the graph-based extended Lagrangian shadow Born–Oppenheimer molecular dynamics method of the 64 112 atoms solvated Trp-cage molecules illustrated in Fig. 11. The upper panel shows the statistical temperature, the middle panel shows the fluctuations in the total energy per atom, and the lower panel shows the root mean square error (RMSE) from the root mean square of the charge residual function, f(n) = ρ0[n] − n.

FIG. 12.

QMD simulation with the graph-based extended Lagrangian shadow Born–Oppenheimer molecular dynamics method of the 64 112 atoms solvated Trp-cage molecules illustrated in Fig. 11. The upper panel shows the statistical temperature, the middle panel shows the fluctuations in the total energy per atom, and the lower panel shows the root mean square error (RMSE) from the root mean square of the charge residual function, f(n) = ρ0[n] − n.

Close modal

Figure 12 shows the results of a QMD simulation with the graph-based extended Lagrangian shadow Born–Oppenheimer molecular dynamics method. The simulation was performed on 32 nodes of the Chicoma Institutional Computing cluster at Los Alamos National Laboratory. Each node is equipped with two 64-core AMD Rome EPYC 7H12 processors and 512 GB memory, with MPI communication via a 100 Gb/s HPE/Cray Slingshot10 interconnect (no GPUs). The job was distributed over 1024 MPI tasks: 32 tasks per node, each using four OpenMP threads. The computation was distributed using 2048 subgraphs: two per MPI rank. The simulation was performed in a NVE ensemble using a time step of δt = 0.5 fs, an electronic temperature with β−1 = 0.1 eV, and three rank-1 updates in the approximation of the preconditioned kernel. The initial velocity distribution was randomly sampled from a Maxwell distribution with a temperature of 150 K. The size of the core partitions was on average about 30 atoms, and the total size of the subgraphs, including the halo, varied between 400 and 500 atoms. Even for this system—our largest—the behavior is robust and stable, as seen in the total energy fluctuations (in black, middle panel) and in the size of the charge residual (in blue, lower panel) in Fig. 12. This example demonstrates how the graph-based shadow Born–Oppenheimer molecular dynamics schemes is scalable to system sizes, including tens-of-thousands of atoms using a CPU-based compute cluster.

FIG. 13.

Strong scaling for a single MD step. Time in milliseconds as a function of the number of computing nodes is shown for the full MD step, the Kernel update, and the density matrix (DM) construction in black, red, and blue, respectively.

FIG. 13.

Strong scaling for a single MD step. Time in milliseconds as a function of the number of computing nodes is shown for the full MD step, the Kernel update, and the density matrix (DM) construction in black, red, and blue, respectively.

Close modal

To assess the preliminary parallelized efficiency of the graph-based approach to electronic structure calculations, we performed both strong and weak scaling studies. The strong scaling study was performed using a 16 704 atom water box, generated by expanding a 2088 atom box twofold along each side. The system was subdivided into 512 subgraph partitions. The calculation was distributed on either 1, 2, 4, or 8 nodes of the Chicoma cluster (architecture described in Test System C) using 32 MPI tasks per node, and timings were obtained for the total MD step (including steps that have not yet been parallelized), the kernel update, and the density matrix (DM) construction (Fig. 13). The number of rank updates for the preconditioned Krylov subspace kernel approximation was set to three, and the timings were measured after the initial time step, which includes a full SCF optimization of the electronic ground state and the construction of the preconditioner. Compared to using a single node, the total time required for an MD step decreased 1.7-fold, 2.4-fold, and 3.4-fold when increasing the number of nodes to 2, 4, and 8, respectively, corresponding to efficiencies of 85%, 60%, and 43%. Similar strong scaling efficiencies are seen for the kernel update and DM construction steps.

For the weak scaling study, water boxes were generated as before expanding an initial 2088 water box system. Increasing sizes were prepared and simulated using proportional compute resources. The calculations were performed on a set of homogeneous nodes in the Darwin research testbed cluster at Los Alamos National Laboratory. Each node is equipped with dual socket Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10 GHz processors and 125 GB memory. The nodes used are interconnected using 100 Gb/s Mellanox EDR InfiniBand. The initial 2088 atoms system was used as a reference; this system was simulated using a single compute node. The system was scaled up by factors of 2, 4, 8, and 16, while proportionally increasing the number of nodes. The reference system was decomposed into 64 partitions and the number of partitions was increased in proportion to the system size. This increase in the partitioning may not be the ideal choice for performance, but it simplifies our test. The simulations were performed using four ranks per node.

Figure 14 shows the results of the weak scaling study. Compared to the reference system on a single node, the relative time in the total cost of an MD step increases about 2.5 times for a system 16 times larger using 16 nodes. The density matrix (DM) construction step shows a similar scaling behavior. The Kernel update is comparatively more costly for larger systems: 3.5 times slower when scaling up by a factor of 16. This reflects an increased level of communication required for the Kernel calculation compared to the other calculations.

FIG. 14.

Weak scaling of a single MD step (in black). The relative timings are shown for the Kernel update, the density matrix (DM) construction, and the full molecular dynamics (MD) step.

FIG. 14.

Weak scaling of a single MD step (in black). The relative timings are shown for the Kernel update, the density matrix (DM) construction, and the full molecular dynamics (MD) step.

Close modal

In this article, we have presented a graph-based canonical quantum perturbation theory for applications in graph-based QMD simulations22 based on the most recent shadow potential formulations of XL-BOMD.61 This set of techniques enables stable, linear-scaling, shadow Born–Oppenheimer molecular dynamics simulations of charge sensitive or reactive chemical systems without involving any iterative SCF optimization prior to force evaluations. The formulation includes a preconditioned Krylov subspace approximation for the integration of the extended electronic degrees of freedom, which requires quantum response calculations for the electronic states, including fractional occupation numbers. The proposed graph-based canonical response calculations can be performed in parallel in the same way as for the electronic ground state, which is guided by the partitioning of a data-dependency graph that can be estimated from previous integration time steps or SCF iterations.

Graph-based electronic structure calculations are intended for studies of large systems with around a thousand atoms or more and require the separate calculations of subsystems that often contain several hundred atoms. Graph-based QMD simulations are therefore particularly well-suited for semi-empirical electronic structure theory. The methods in this article were implemented and tested using SCC-DFTB theory.

The proposed graph-based shadow Born–Oppenheimer molecular dynamics scheme was demonstrated in simulations of challenging chemically reactive systems, such as ammonium hydroxide in water. Simulations were stable both in the fluctuations of the total energy and in the residual charge error. This is in agreement with recent shadow potential formulations of XL-BOMD simulations performed in reactive systems with sensitive unsteady charge solutions, but without using the graph-partitioning scheme.61,84 We also demonstrated how graph-based canonical quantum response theory in combination with the preconditioned Krylov subspace approximation can be used in a quasi-Newton scheme to accelerate the SCF convergence.

The graph-based approach to electronic structure calculations can take advantage of emerging exascale computing resources.168 This should also be possible with the graph-based quantum response calculations. Our current implementation used in the examples is still preliminary, but demonstrates good parallel scaling, allowing for QMD simulations of tens-of-thousands of atoms. More development is needed, in particular, to take advantage of hybrid architectures using graphics processing units or AI-accelerators. The main goal of this paper is the underlying theoretical concepts and techniques for the graph-based electronic structure and response calculations, including fractional occupation numbers, which enable scalable extended Lagrangian shadow Born–Oppenheimer molecular dynamics simulations that also are applicable to simulations of more challenging, charge sensitive, or reactive chemical systems.

Recently, it has been shown how statistical machine learning techniques can be used to significantly enhance the accuracy of semi-empirical electronic structure methods.169–176 To use such “AI-boosted” semi-empirical methods in combination with graph-based QMD simulation techniques represents a promising path toward accurate simulations of large complex chemical systems.

In the supplementary material, we present a Matlab script with a numerical example of the graph-based expansion, which demonstrates the one-to-one mapping between the step-by-step matrix function expansion on a graph (with multiplications from the left or right hand side) and the collected expansions over the separate subgraphs and we also present the general workflow for the most basic form of the graph-based electronic structure method for calculating the density matrix.

This work was supported by the U.S. Department of Energy Office of Basic Energy Sciences (FWP LANLE8AN “Next Generation Quantum-Based Molecular Dynamics”) and by the U.S. Department of Energy through the Los Alamos National Laboratory (LANL), including the LANL Institutional Computing program. LANL is operated by Triad National Security, LLC, for the National Nuclear Security Administration of the U.S. Department of Energy under Contract No. 892333218NCA000001.

The authors have no conflicts to disclose.

Christian F. A. Negre: Conceptualization (supporting); Data curation (lead); Formal analysis (equal); Funding acquisition (supporting); Investigation (supporting); Methodology (supporting); Project administration (supporting); Resources (lead); Software (lead); Validation (equal); Visualization (equal); Writing – original draft (supporting); Writing – review & editing (equal). Michael E. Wall: Data curation (supporting); Formal analysis (equal); Resources (supporting); Software (supporting); Validation (equal); Visualization (equal); Writing – review & editing (equal). Anders M. N. Niklasson: Conceptualization (lead); Data curation (supporting); Formal analysis (equal); Funding acquisition (lead); Investigation (lead); Methodology (lead); Project administration (lead); Resources (supporting); Software (supporting); Validation (equal); Visualization (equal); Writing – original draft (lead); Writing – review & editing (lead).

The data that support the findings of this study are available from the corresponding authors upon reasonable request.

1.
2.
D. R.
Bowler
and
T.
Miyazaki
,
Rep. Prog. Phys.
75
,
036503
(
2012
).
3.
A. E.
Clark
,
H.
Adams
,
R.
Hernandez
,
A. I.
Krylov
,
A. M. N.
Niklasson
,
S.
Sarupria
,
Y.
Wang
,
S. M.
Wild
, and
Q.
Yang
,
ACS Cent. Sci.
7
,
1271
(
2021
).
4.
M. J.
Cawkwell
,
E. J.
Sanville
,
S. M.
Mniszewski
, and
A. M. N.
Niklasson
,
J. Chem. Theory Comput.
8
,
4094
(
2012
).
5.
J.
Finkelstein
,
J. S.
Smith
,
S. M.
Mniszewski
,
K.
Barros
,
C. F. A.
Negre
,
E. H.
Rubensson
, and
A. M. N.
Niklasson
,
J. Chem. Theory Comput.
17
,
2256
(
2021
).
6.
J.
Finkelstein
,
J. S.
Smith
,
S. M.
Mniszewski
,
K.
Barros
,
C. F. A.
Negre
,
E. H.
Rubensson
, and
A. M. N.
Niklasson
,
J. Chem. Theory Comput.
17
,
6180
(
2021
).
7.
H.
Shang
,
X.
Duan
,
F.
Li
,
L.
Zhang
,
Z.
Xu
,
K.
Liu
,
H.
Luo
,
Y.
Ji
,
W.
Zhao
,
W.
Xue
,
L.
Chen
, and
Y.
Zhang
,
Comput. Phys. Commun.
267
,
108045
(
2021
).
8.
NVIDIA corporation, cuSOLVER,
2021
; accessed: 15/4/2021.
9.
N.
Bock
,
C. F. A.
Negre
,
S. M.
Mniszewski
,
J.
Mohd-Yusof
,
B.
Aradi
,
J.-L.
Fattebert
,
D.
Osei-Kuffuor
,
T. C.
Germann
, and
A. M. N.
Niklasson
,
J. Supercomput.
74
,
6201
6219
(
2018
).
10.
NVIDIA corporation, cuBLAS, https://developer.nvidia.com/cuBLAS (
2021
); accessed: 15/4/2021.
11.
S. M.
Mniszewski
,
J.
Belak
,
J.-L.
Fattebert
,
C. F. A.
Negre
,
S. R.
Slattery
,
A. A.
Adedoyin
,
R. F.
Bird
,
C. S.
Chang
,
G.
Chen
,
S.
Ethier
,
S.
Fogerty
,
S.
Habib
,
C.
Junghans
,
D.
Lebrun-Grandi
,
J.
Mohd-Yusof
,
S. G.
Moore
,
D.
Osei-Kuffuor
,
S. J.
Plimpton
,
A.
Pope
,
S. T.
Reeve
,
L.
Ricketson
,
A.
Scheinberg
,
A. Y.
Sharma
, and
M. E.
Wall
, Special Journal Issue: ECP Co-design and Computational Motifs (unpublished) (
2020
).
12.
S.
Goedecker
and
L.
Colombo
,
Phys. Rev. Lett.
73
,
122
(
1994
).
13.
G.
Galli
,
Curr. Opin. Solid State Mater. Sci.
1
,
864
(
1996
).
14.
E.
Tsuchida
,
J. Phys.: Condens. Matter
20
,
294212
(
2008
).
15.
F.
Shimojo
,
R. K.
Kalia
,
A.
Nakano
, and
P.
Vashista
,
Phys. Rev. B
77
,
085103
(
2008
).
16.
M. J.
Cawkwell
and
A. M. N.
Niklasson
,
J. Chem. Phys.
137
,
134105
(
2012
).
17.
J.
VandeVondele
,
U.
Borštnik
, and
J.
Hutter
,
J. Chem. Theory Comput.
8
,
3565
(
2012
).
18.
M.
Arita
,
D. R.
Bowler
, and
T.
Miyazaki
,
J. Chem. Theory Comput.
10
,
5419
(
2014
).
19.
F.
Shimojo
,
S.
Hattori
,
R. K.
Kalia
,
M.
Kusaneth
,
W. W.
Mou
,
A.
Nakano
,
K.
Nomura
,
S.
Ohmura
,
P.
Rajak
,
K.
Shimamura
, and
P.
Vashista
,
J. Chem. Phys.
140
,
18A529
(
2014
).
20.
D.
Osei-Kuffuor
,
J. L.
Fattebert
, and
F.
Gygi
,
Phys. Rev. Lett.
112
,
046401
(
2014
).
21.
T.
Otsuka
,
M.
Taiji
,
D. R.
Bowler
, and
T.
Miyazaki
,
Jpn. J. Appl. Phys.
55
,
1102B1
(
2016
).
22.
A. M. N.
Niklasson
,
S. M.
Mniszewski
,
C. F. A.
Negre
,
M. J.
Cawkwell
,
P. J.
Swart
,
J.
Mohd-Yusof
,
T. C.
Germann
,
M. E.
Wall
,
N.
Bock
,
E. H.
Rubensson
, and
H.
Djidjev
,
J. Chem. Phys.
144
,
234101
(
2016
).
23.
H. N.
Djidjev
,
G.
Hahn
,
S. M.
Mniszewski
,
C. F.
Negre
,
A. M.
Niklasson
, and
V. B.
Sardeshmukh
, “
Graph partitioning methods for fast parallel quantum molecular dynamics
,” in
2016 Proceedings of the Seventh SIAM Workshop on Combinatorial Scientific Computing
(
Society for Industrial and Applied Mathematics
,
2016
), pp.
42
51
.
24.
M.
Lass
,
S.
Mohr
,
H.
Wiebeler
,
T.
Kühne
, and
C.
Plessl
, in
Proceedings of the Platform for Advanced Scientific Computing (PASC) Conference
(
ACM
,
2018
).
25.
H. N.
Djidjev
,
G.
Hahn
,
S. M.
Mniszewski
,
C. F. A.
Negre
, and
A. M. N.
Niklasson
,
Algorithms
12
,
187
(
2019
).
26.
M.
Lass
,
R.
Schade
,
T.
Kühne
, and
C.
Plessl
, in
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC)
(
IEEE Computer Society
,
2020
), p.
1127
.
28.
P. D.
Walker
and
P. G.
Mezey
,
J. Am. Chem. Soc.
115
,
12423
(
1993
).
29.
W.
Yang
and
T. S.
Lee
,
J. Chem. Phys.
103
,
5674
(
1995
).
30.
I. A.
Abrikosov
,
A. M. N.
Niklasson
,
S. I.
Simak
,
B.
Johansson
,
A. V.
Ruban
, and
H. L.
Skriver
,
Phys. Rev. Lett.
76
,
4203
(
1996
).
31.
K.
Kitaura
,
E.
Ikeo
,
T.
Asada
,
T.
Nakano
, and
M.
Uebayasi
,
Chem. Phys. Lett.
313
,
701
(
1999
).
32.
S.
Li
,
W.
Li
, and
T.
Fang
,
J. Am. Chem. Soc.
127
,
7215
(
2005
).
34.
Y.
Nishimoto
,
D. G.
Fedorov
, and
S.
Irle
,
J. Chem. Theory Comput.
10
,
4801
(
2014
).
35.
V. Q.
Vuong
,
Y.
Nishimoto
,
D. G.
Fedorov
,
B. G.
Sumpter
,
T. A.
Niehaus
, and
S.
Irle
,
J. Chem. Theory Comput.
15
,
3008
(
2019
).
36.
Y.
Nishimoto
and
S.
Irle
, “
The FMO-DFTB method
,” in
Recent Advances of the Fragment Molecular Orbital Method: Enhanced Performance and Applicability
, edited by
Y.
Mochizuki
,
S.
Tanaka
, and
K.
Fukuzawa
(
Springer Singapore
,
Singapore
,
2021
), pp.
459
485
.
37.
G. W.
Stewart
, in
Sparse Matrix Computations
, edited by
J. R.
Banch
and
D. J.
Rose
(
Academic Press
,
New York
,
1976
), pp.
113
130
.
38.
S.
Pissanetzky
,
Sparse Matrix Technology
(
Academic Press
,
London
,
1984
).
39.
G.
Brussino
and
V.
Sonnad
,
Int. J. Numer. Methods Eng.
28
,
801
(
1989
).
40.
Y.
Saad
,
Iterative Methods for Sparse Linear Systems
(
PWS Publishing
,
Boston
,
1996
).
41.
E.
Schwegler
and
M.
Challacombe
,
J. Chem. Phys.
105
,
2726
(
1996
).
42.
M.
Challacombe
and
E.
Schwegler
,
J. Chem. Phys.
106
,
5526
(
1997
).
43.
A. D.
Daniels
and
G. E.
Scuseria
,
J. Chem. Phys.
110
,
1321
(
1999
).
44.
A. M. N.
Niklasson
,
Phys. Rev. B
66
,
155115
(
2002
).
45.
A. M. N.
Niklasson
,
C. J.
Tymczak
, and
M.
Challacombe
,
J. Chem. Phys.
118
,
8611
(
2003
).
46.
E. H.
Rubensson
and
P.
Sałek
,
J. Comput. Chem.
26
,
1628
(
2005
).
47.
D. K.
Jordan
and
D. A.
Mazziotti
,
J. Chem. Phys.
122
,
084114
(
2005
).
48.
B.
Aradi
,
B.
Hourahine
, and
T.
Frauenheim
,
J. Phys. Chem. A
111
,
5678
(
2007
).
49.
E. H.
Rubensson
,
E.
Rudberg
, and
P.
Sałek
,
J. Chem. Phys.
128
,
074106
(
2008
).
50.
A. M. N.
Niklasson
, “
Density matrix methods in linear scaling electronic structure theory
,” in
Linear-Scaling Techniques in Computational Chemistry and Physics: Methods and Applications
, edited by
R.
Zalesny
,
M. G.
Papadopoulos
,
P. G.
Mezey
, and
J.
Leszczynski
(
Springer Netherlands
,
Dordrecht
,
2011
), pp.
439
473
.
51.
A.
Buluc
and
J. R.
Gilbert
,
SIAM J. Sci. Comput.
34
,
C170
(
2012
).
52.
N.
Bock
and
M.
Challacombe
,
SIAM J. Sci. Comput.
35
,
C72
(
2013
).
53.
U.
Borstnik
,
J.
VandeVondele
,
V.
Weber
, and
J.
Hutter
,
Parallel Comput.
40
,
47
(
2014
).
54.
V.
Weber
,
T.
Laino
,
A.
Pozdneev
,
I.
Fedulova
, and
A.
Curioni
,
J. Chem. Theory Comput.
11
,
3145
(
2015
).
55.
P.
Pinski
,
C.
Riplinger
,
E. F.
Valeev
, and
F.
Neese
,
J. Chem. Phys.
143
,
034108
(
2015
).
56.
L. A.
Truflandier
,
R. M.
Dianzinga
, and
D. R.
Bowler
,
J. Chem. Phys.
144
,
091102
(
2016
).
57.
A.
Kruchinina
,
E.
Rudberg
, and
E. H.
Rubensson
,
J. Chem. Theory Comput.
12
,
5788
(
2016
).
58.
R.
Schade
,
T.
Kenter
,
H.
Elgabarty
,
M.
Lass
,
O.
Schütt
,
A.
Lazzaro
,
H.
Pabst
,
S.
Mohr
,
J.
Hutter
,
T. D.
Kühne
, and
C.
Plessl
,
Parallel Comput.
111
,
102920
(
2022
).
59.

Graph-based linear scaling electronic structure theory has been recently used by Schade and co-workers, who call it the submatrix method or the non-orthogonal local submatrix method (NOLSM),58 but without referring to the original and equivalent method in Ref. 22.

60.
A. M. N.
Niklasson
,
J. Chem. Phys.
147
,
054103
(
2017
).
61.
A. M. N.
Niklasson
,
J. Chem. Phys.
152
,
104103
(
2020
).
62.
A. M. N.
Niklasson
,
Eur. Phys. J. B
94
,
164
(
2021
).
63.
64.
A. F.
Voter
and
F.
Montalenti
,
Annu. Rev. Mater. Res.
32
,
321
(
2002
).
65.
D.
Perez
,
B. P.
Uberuaga
,
Y.
Shim
,
J. G.
Amar
, and
A. F.
Voter
,
Annual Reports in Computational Chemistry
(
Elsevier
,
2009
), Vol. 5, pp.
79
98
; available at https:// www.sciencedirect.com/science/article/pii/S1574140009005040.
66.
D.
Perez
,
E. D.
Cubuk
,
A.
Waterland
,
E.
Kaxiras
, and
A. F.
Voter
,
J. Chem. Theory Comput.
12
,
18
(
2016
).
67.
S.
Baroni
,
S.
de Gironcoli
,
A.
Dal Corso
, and
P.
Giannozzi
,
Rev. Mod. Phys.
73
,
515
(
2001
).
68.
V.
Weber
,
A. M. N.
Niklasson
, and
M.
Challacombe
,
Phys. Rev. Lett.
92
,
193002
(
2004
).
69.
C.
Ochsenfeld
,
J.
Kussmann
, and
F.
Koziol
,
Angew. Chem.
43
,
4485
(
2004
).
70.
J.
Kussmann
and
C.
Ochsenfeld
,
J. Chem. Phys.
127
,
204103
(
2007
).
71.
J.
Kussmann
,
A.
Luenser
,
M.
Beer
, and
C.
Ochsenfeld
,
J. Chem. Phys.
142
,
094101
(
2015
).
72.
M.
Elstner
,
D.
Porezag
,
G.
Jungnickel
,
J.
Elsner
,
M.
Haugk
,
T.
Frauenheim
,
S.
Suhai
, and
G.
Seifert
,
Phys. Rev. B
58
,
7260
(
1998
).
73.
M. W.
Finnis
,
A. T.
Paxton
,
M.
Methfessel
, and
M.
van Schilfgaarde
,
Phys. Rev. Lett.
81
,
5149
(
1998
).
74.
B.
Hourahine
et al,
J. Chem. Phys.
152
,
124101
(
2020
).
75.
M. J. S.
Dewar
and
W.
Thiel
,
Theor. Chim. Acta
46
,
89
(
1977
).
76.
M. J. S.
Dewar
,
E. G.
Zoebisch
,
E. F.
Healy
, and
J. J. P.
Stewart
,
J. Am. Chem. Soc.
107
,
3902
(
1985
).
78.
C.
Bannwarth
,
S.
Ehlert
, and
S.
Grimme
,
J. Chem. Theory Comput.
15
,
1652
(
2018
).
79.
P. O.
Dral
,
X.
Wu
, and
W.
Thiel
,
J. Chem. Theory Comput.
15
,
1743
(
2019
).
80.
W.
Malone
,
B.
Nebgen
,
A.
White
,
Y.
Zhang
,
H.
Song
,
J. A.
Bjorgaard
,
A. E.
Sifain
,
B.
Rodriguez-Hernandez
,
V. M.
Freixas
,
S.
Fernandez-Alberti
,
A. E.
Roitberg
,
T. R.
Nelson
, and
S.
Tretiak
,
J. Chem. Theory Comput.
16
,
5771
(
2020
).
81.
G.
Zhou
,
B.
Nebgen
,
N.
Lubbers
,
W.
Malone
,
A. M. N.
Niklasson
, and
S.
Tretiak
,
J. Chem. Theory Comput.
16
,
4951
(
2020
).
82.
C.
Bannwarth
,
E.
Caldeweyher
,
S.
Ehlert
,
A.
Hansen
,
P.
Pracht
,
J.
Seibert
,
S.
Spicher
, and
S.
Grimme
,
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
11
,
e1493
(
2020
).
83.
G.
Golub
and
C. F.
van Loan
,
Matrix Computations
(
Johns Hopkins University Press
,
Baltimore
,
1996
).
84.
A. M. N.
Niklasson
,
J. Chem. Theory Comput.
16
,
3628
(
2020
).
85.
A. M. N.
Niklasson
, “
Density matrix methods in linear scaling electronic structure theory
,” in
Linear-Scaling Techniques in Computational Chemistry and Physics: Methods and Applications
, edited by
R.
Zalesny
,
M. G.
Papadopoulos
,
P. G.
Mezey
, and
J.
Leszczynski
(
Springer
,
Dordrecht, Netherlands
,
2011
), pp.
439
473
.
86.
C. F. A.
Negre
,
S. M.
Mniszewski
,
M. J.
Cawkwell
,
N.
Bock
,
M. E.
Wall
, and
A. M. N.
Niklasson
,
J. Chem. Theory Comput.
12
,
3063
(
2016
).
87.
A. M. N.
Niklasson
,
V.
Weber
, and
M.
Challacombe
,
J. Chem. Phys.
123
,
044107
(
2005
).
88.
A. M. N.
Niklasson
,
S. M.
Mniszewski
,
C. F. A.
Negre
,
M. E.
Wall
,
M. J.
Cawkwell
, and
N.
Bock
, PROGRESS version 1.0,
2016
, https://github.com/lanl/qmd-progress.
89.
M. J.
Cawkwell
et al, LATTE,
2010
, Los Alamos National Laboratory (LA-CC-10004), http://www.github.com/lanl/latte.
90.
R.
McWeeny
,
Proc. R. Soc. London, Ser. A
235
,
496
(
1956
).
91.
A. H. R.
Palser
and
D. E.
Manolopoulos
,
Phys. Rev. B
58
,
12704
(
1998
).
92.
K.
Németh
and
G. E.
Scuseria
,
J. Chem. Phys.
113
,
6035
(
2000
).
93.
A. M. N.
Niklasson
,
Phys. Rev. B
68
,
233104
(
2003
).
94.
E.
Rudberg
and
E. H.
Rubensson
,
J. Phys.: Condens. Matter
23
,
075502
(
2011
).
95.
E. H.
Rubensson
,
J. Chem. Theory Comput.
7
,
1233
(
2011
).
96.
P.
Suryanarayana
,
Chem. Phys. Lett.
555
,
291
(
2013
).
97.
E. H.
Rubensson
and
A. M. N.
Niklasson
,
SIAM J. Sci. Comput.
36
,
148
(
2014
).
98.
H.
Shang
,
W.
Liang
,
Y.
Zhang
, and
J.
Yang
,
Comput. Phys. Commun.
258
,
107613
(
2021
).
99.
J.
Finkelstein
,
E. H.
Rubensson
,
S. M.
Mniszewski
,
C. F. A.
Negre
, and
A. M. N.
Niklasson
,
J. Chem. Theory Comput.
18
,
4255
(
2022
).
100.
R.
Pederson
,
J.
Kozlowski
,
R.
Song
,
J.
Beall
,
M.
Ganahl
,
M.
Hauru
,
A. G. M.
Lewis
,
Y.
Yao
,
S. B.
Mallick
,
V.
Blum
, and
G.
Vidal
, “
Large scale quantum chemistry with tensor processing units
,”
J. Chem. Theory Comput.
19
(
1
),
25
32
(
2023
).
101.
G.
Karypis
and
V.
Kumar
,
SIAM J. Sci. Comput.
20
,
359
(
1999
).
102.
103.
J. W.
Neidigh
,
R. M.
Fesinmeyer
, and
N. H.
Andersen
, “
Designing a 20-residue protein
,”
Nat. Struct. Biol.
9
(
6
),
425
430
(
2002
).
104.
P.
Ghale
,
M. P.
Kroonblawd
,
S.
Mniszewski
,
C. F. A.
Negre
,
R.
Pavel
,
S.
Pino
,
V.
Sardeshmukh
,
G.
Shi
, and
G.
Hahn
,
SIAM J. Sci. Comput.
39
,
C466
(
2017
).
106.
107.
R. G.
Parr
and
W.
Yang
,
Density-Functional Theory of Atoms and Molecules
(
Oxford University Press
,
Oxford
,
1989
).
108.
A. M. N.
Niklasson
,
J. Chem. Phys.
129
,
244107
(
2008
).
109.
A. M. N.
Niklasson
,
Phys. Rev. Lett.
100
,
123004
(
2008
).
110.
P.
Souvatzis
and
A. M. N.
Niklasson
,
J. Chem. Phys.
140
,
044117
(
2014
).
111.
J.
Kolafa
,
J. Comput. Chem.
25
,
335
(
2003
).
112.
P.
Pulay
and
G.
Fogarasi
,
Chem. Phys. Lett.
386
,
272
(
2004
).
113.
J. M.
Herbert
and
M.
Head-Gordon
,
Phys. Chem. Chem. Phys.
7
,
3269
(
2005
).
114.
A. M. N.
Niklasson
,
C. J.
Tymczak
, and
M.
Challacombe
,
Phys. Rev. Lett.
97
,
123001
(
2006
).
115.
T. D.
Kühne
,
M.
Krack
,
F. R.
Mohamed
, and
M.
Parrinello
,
Phys. Rev. Lett.
98
,
066401
(
2007
).
117.
C.
Grebogi
,
S. M.
Hammel
,
J. A.
Yorke
, and
T.
Sauer
,
Phys. Rev. Lett.
65
,
1527
(
1990
).
118.
119.
J.
Gans
and
D.
Shalloway
,
Phys. Rev. E
61
,
4587
(
2000
).
120.
R. E.
Engle
,
R. D.
Skeel
, and
M.
Drees
,
J. Comput. Phys.
206
(
2
),
432
452
(
2005
).
121.
S. D.
Bond
and
B. J.
Leimkuhler
,
Molecular Dynamics and the Accuracy of Numerically Computed Averages
(
Cambridge University Press
,
Cambridge, UK
,
2007
).
122.
S.
Toxvaerd
,
O. J.
Heilmann
, and
J. C.
Dyre
,
J. Chem. Phys.
136
,
224106
(
2012
).
123.
K. D.
Hammonds
and
D. M.
Heyes
,
J. Chem. Phys.
152
,
024114
(
2020
).
124.
E.
Forest
and
R. D.
Ruth
,
Physica D
43
,
105
(
1990
).
125.
P. J.
Channell
and
C.
Scovel
,
Nonlinearity
3
,
231
(
1990
).
126.
R. I.
McLachlan
and
P.
Atela
,
Nonlinearity
5
,
541
(
1992
).
127.
B. J.
Leimkuhler
and
R. D.
Skeel
,
J. Comput. Phys.
112
,
117
(
1994
).
128.
J.
Finkelstein
,
C.
Cheng
,
G.
Fiorin
,
B.
Seibold
, and
N.
Grønbech-Jensen
,
J. Chem. Phys.
153
,
134101
(
2020
).
129.
R.
Car
and
M.
Parrinello
,
Phys. Rev. Lett.
55
,
2471
(
1985
).
130.
S.
Bonella
,
A.
Coretti
,
R.
Vuilleumier
, and
G.
Ciccotti
,
Phys. Chem. Chem. Phys.
22
,
10775
(
2020
).
131.
A.
Coretti
,
L.
Scalfi
,
C.
Bacon
,
B.
Rotenberg
,
R.
Vuilleumier
,
G.
Ciccotti
,
M.
Salanne
, and
S.
Bonella
,
J. Chem. Phys.
152
,
194701
(
2020
).
132.
A.
Coretti
,
T.
Baird
,
R.
Vuilleumier
, and
S.
Bonella
,
J. Chem. Phys.
157
,
214110
(
2022
).
133.
P.
Hohenberg
and
W.
Kohn
,
Phys. Rev.
136
,
B864
(
1964
).
134.
W.
Kohn
and
L. J.
Sham
,
Phys. Rev.
140
,
A1133
(
1965
).
135.
R.
Dreizler
and
K.
Gross
,
Density-Functional Theory
(
Springer-Verlag
,
Berlin, Heidelberg
,
1990
).
136.
M.
Weinert
and
J. W.
Davenport
,
Phys. Rev. B
45
,
13709
(
1992
).
137.
R. M.
Wentzcovitch
,
J. L.
Martins
, and
P. B.
Allen
,
Phys. Rev. B
45
,
R11372
(
1992
).
138.
C. C. J.
Roothaan
,
Rev. Mod. Phys.
23
,
69
(
1951
).
140.
T.
Hirakawa
,
T.
suzuki
,
D. R.
Bowler
, and
T.
Miyazaki
,
J. Phys.: Condens. Matter
29
,
405901
(
2017
).
141.

When n = ρ0[n], we have an exact self-consistent solution, and then, ρ0[n] = ρmin.

142.
A. M. N.
Niklasson
,
P.
Steneteg
,
A.
Odell
,
N.
Bock
,
M.
Challacombe
,
C. J.
Tymczak
,
E.
Holmström
,
G.
Zheng
, and
V.
Weber
,
J. Chem. Phys.
130
,
214109
(
2009
).
143.
P.
Steneteg
,
I. A.
Abrikosov
,
V.
Weber
, and
A. M. N.
Niklasson
,
Phys. Rev. B
82
,
075110
(
2010
).
144.
G.
Zheng
,
A. M. N.
Niklasson
, and
M.
Karplus
,
J. Chem. Phys.
135
,
044122
(
2011
).
145.
A. M. N.
Niklasson
,
M. J.
Cawkwell
,
E. H.
Rubensson
, and
E.
Rudberg
,
Phys. Rev. E
92
,
063301
(
2015
).
146.
Y.
Nishimoto
,
J. Chem. Phys.
146
,
084101
(
2017
).
148.
A. M. N.
Niklasson
and
M.
Challacombe
,
Phys. Rev. Lett.
92
,
193001
(
2004
).
149.
L. A.
Truflandier
,
R. M.
Dianzinga
, and
D. R.
Bowler
,
J. Chem. Phys.
153
,
164105
(
2020
).
150.
V.
Weber
,
A. M. N.
Niklasson
, and
M.
Challacombe
,
J. Chem. Phys.
123
,
044106
(
2005
).
151.
J. W.
Cooley
and
J. W.
Tukey
,
Math. Comput.
19
,
297
(
1965
).
152.
T.
Darden
,
D.
York
, and
L.
Pedersen
,
J. Chem. Phys.
98
,
10089
(
1993
).
153.

Note that this approach needs to be adapted to work with other representations or the electronic degrees of freedom besides some coarse grained charge density, e.g., of partial atomic charges.

154.
S.
Das
and
V.
Gavini
, “
Accelerating self-consistent field iterations in Kohn-Sham density functional theory using a low rank approximation of the dielectric matrix
, arXiv:2211.07894 (
2022
).
156.
D. G.
Anderson
,
J. Assoc. Comput. Mach.
12
,
547
(
1965
).
158.
D.
Porezag
,
T.
Frauenheim
,
T.
Köhler
,
G.
Seifert
, and
R.
Kaschner
,
Phys. Rev. B
51
,
12947
(
1995
).
159.
G.
Seifert
,
D.
Porezag
, and
T.
Frauenheim
,
Int. J. Quantum Chem.
58
,
185
(
1996
).
160.
T.
Frauenheim
,
G.
Seifert
,
M.
Elsterner
,
Z.
Hajnal
,
G.
Jungnickel
,
D.
Porezag
,
S.
Suhai
, and
R.
Scholz
,
Phys. Status Solidi B
217
,
41
(
2000
).
161.
M.
Gaus
,
Q.
Cui
, and
M.
Elstner
,
J. Chem. Theory Comput.
7
,
931
(
2011
).
162.
A.
Krishnapryian
,
P.
Yang
,
A. M. N.
Niklasson
, and
M. J.
Cawkwell
,
J. Chem. Theory Comput.
13
,
6191
(
2017
).
163.
P.
Pulay
,
G.
Fogarasi
,
F.
Pang
, and
J. E.
Boggs
,
J. Am. Chem. Soc.
101
,
2550
(
1979
).
164.
H. J. C.
Berendsen
,
D.
van der Spoel
, and
R.
van Drunen
,
Comput. Phys. Commun.
91
,
43
(
1995
).
165.
CHARMM-GUI, https://www.charmm-gui.org/; accessed: 27/11/2022.
166.
RCSB Protein Data Bank, 1L2Y, https://www.rcsb.org/structure/1L2Y; accessed: 27/11/2022.
167.
AmberTools22, https://ambermd.org/AmberTools.php; accessed: 27/11/2022.
168.
R.
Schade
,
T.
Kenter
,
H.
Elgabarty
,
M.
Lass
,
T. D.
Kühne
, and
C.
Plessl
, “
Breaking the exascale barrier for the electronic structure problem in ab initio molecular dynamics
,” arXiv:2205.12182 (
2022
).
169.
P. O.
Dral
,
O. A.
von Lilienfeld
, and
W.
Thiel
,
J. Chem. Theory Comput.
11
,
2120
(
2015
).
170.
R.
Ramakrishnan
,
P. O.
Dral
,
M.
Rupp
, and
O. A.
von Lilienfeld
,
J. Chem. Theory Comput.
11
,
2087
(
2015
).
171.
H.
Li
,
C.
Collins
,
M.
Tanha
,
G. J.
Gordon
, and
D. J.
Yaron
,
J. Chem. Theory Comput.
14
,
5764
(
2018
).
172.
J. J.
Kranz
,
M.
Kubillus
,
R.
Ramakrishnan
,
O. A.
von Lilienfeld
, and
M.
Elstner
,
J. Chem. Theory Comput.
14
,
2341
(
2018
).
173.
N.
Goldman
,
B.
Aradi
,
R. K.
Lindsey
, and
L. E.
Fried
,
J. Chem. Theory Comput.
14
,
2652
(
2018
).
174.
P.
Zheng
,
R.
Zubatyuk
,
W.
Wu
,
O.
Isayev
, and
P. O.
Dral
,
Nat. Commun.
12
,
7022
(
2021
).
175.
G.
Zhou
,
N.
Lubbers
,
K.
Barros
,
S.
Tretiak
, and
B.
Nebgen
,
Proc. Natl. Acad. Sci. U. S. A.
119
,
2120333119
(
2022
).
176.
F.
Hu
,
F.
He
, and
D. J.
Yaron
, “
Semiempirical Hamiltonians learned from data can have accuracy comparable to density functional theory
,” arXiv: 2210.11682 (
2022
).

Supplementary Material