Graph-based linear scaling electronic structure theory for quantum-mechanical molecular dynamics simulations [A. M. N. Niklasson et al., J. Chem. Phys. 144, 234101 (2016)] is adapted to the most recent shadow potential formulations of extended Lagrangian Born–Oppenheimer molecular dynamics, including fractional molecular-orbital occupation numbers [A. M. N. Niklasson, J. Chem. Phys. 152, 104103 (2020) and A. M. N. Niklasson, Eur. Phys. J. B 94, 164 (2021)], which enables stable simulations of sensitive complex chemical systems with unsteady charge solutions. The proposed formulation includes a preconditioned Krylov subspace approximation for the integration of the extended electronic degrees of freedom, which requires quantum response calculations for electronic states with fractional occupation numbers. For the response calculations, we introduce a graph-based canonical quantum perturbation theory that can be performed with the same natural parallelism and linear scaling complexity as the graph-based electronic structure calculations for the unperturbed ground state. The proposed techniques are particularly well-suited for semi-empirical electronic structure theory, and the methods are demonstrated using self-consistent charge density-functional tight-binding theory both for the acceleration of self-consistent field calculations and for quantum-mechanical molecular dynamics simulations. Graph-based techniques combined with the semi-empirical theory enable stable simulations of large, complex chemical systems, including tens-of-thousands of atoms.
I. INTRODUCTION
The immense promise of linear scaling electronic structure theory1–3 to study large-scale atomistic systems directly from the first principles of quantum mechanics is exceptionally difficult to realize, in practice. There are a number of obstacles and problems. These challenges include the following: (i) a high computational pre-factor where the linear scaling benefit appears only for very large systems that, in practice, are often beyond acceptable time limits or available computer resources—a challenge accentuated by highly competitive cubically scaling electronic structure solvers that have an impressive performance, especially on hybrid architectures;4–11 (ii) a reduced accuracy that is often difficult, if not impossible, to control, which may lead to instabilities and poor convergence of the self-consistent ground state solutions; and (iii) computationally expensive overheads that can limit parallel efficiency even for highly distributed calculations. These challenges are, in particular, limiting factors for quantum-mechanical Born–Oppenheimer molecular dynamics (QMD) simulations,12–21 where all these problems coalesce, limiting the system sizes or simulation times that can be studied. Graph-based linear scaling electronic structure theory22 was introduced to overcome some of these obstacles. Graph-based linear scaling QMD is a framework designed for non-metallic systems, and it is particularly efficient for quasi-low-dimensional structures as, for example, soft matter systems, such as solvated biomolecules, where the electronic overlap between the atoms can be represented by a sparse (low-dimensional) graph.22
Graph-based linear scaling electronic structure theory22 with its subsequent theoretical analysis, applications, and implementations23–26 is a very promising approach to study large atomistic systems and combines the natural parallelism of divide-and-conquer-like methods27–36 with the well-controlled and tunable accuracy of a numerically thresholded sparse matrix algebra.37–57 It has been recently considered how graph-based linear scaling electronic structure theory could be used to treat even hundreds of millions of atoms.58,59 Nevertheless, some shortcomings still remain. In particular, special care is required for QMD simulations of charge sensitive or reactive chemical systems, e.g., systems where the electronic energy gap between the Lowest Unoccupied Molecular Orbital (LUMO) and the Highest Occupied Molecular Orbital (HOMO) is opening and closing along the molecular trajectories. Electronic degeneracies, i.e., molecular states with equal energy at the chemical potential, and high susceptibilities associated with a low HOMO–LUMO gap can cause costly convergence and charge stability problems and difficulties in calculating accurate conservative forces, which may lead to nonphysical molecular trajectories.
In this article, we will combine graph-based linear scaling electronic structure theory with the most recent shadow potential formulations of extended Lagrangian Born–Oppenheimer molecular dynamics (XL-BOMD), including fractional molecular-orbital occupation numbers.60–62 This shadow Born–Oppenheimer molecular dynamics framework allows for QMD simulations of reactive or low-gap chemical systems that may have unsteady charge solutions,61 which can be very challenging for regular direct Born–Oppenheimer molecular dynamics simulations. To achieve the necessary accuracy and stability in this shadow, Born–Oppenheimer molecular dynamics formalism requires canonical quantum response calculations for electronic states with fractional occupation numbers at finite electronic temperatures. To perform these canonical response calculations, we will introduce a graph-based quantum perturbation theory for systems with fractional occupation numbers. With this improved graph-based framework for QMD simulations, we can then study challenging complex chemical systems using distributed computational platforms.
The prospect to use graph-based QMD to simulate chemically reactive soft matter systems is of particular interest when chemically relevant transitions appear as rare events beyond the times scales that are normally accessible to QMD simulations. Established accelerated molecular dynamics simulation methods63–66 are often difficult, if not impossible, to apply to soft matter systems that often have multiple shallow local minima where rare events are hard to detect or even to define. Nevertheless, by extending the system size, it is possible to systematically enhance the probability of a chemical relevant event and, thus, indirectly, accelerate the rare event dynamics. The natural parallelism, improved stability, and linear-scaling computational cost of our graph-based QMD framework make this possible.
In this paper, we will introduce graph-based quantum response calculations for systems with fractional occupation numbers. Our main motivation is to be able to perform graph-based QMD simulations using the most recent shadow potential formulations of XL-BOMD.61,62 However, canonical quantum perturbation calculations performed on a partitioned graph are a very useful technique by itself. Our graph-based quantum perturbation theory presented in this article, for example, could also be used for distributed, large-scale calculations of time-independent response properties, such as the magnetic susceptibility, phonon modes, electric polarizabilities, the Raman spectra, or the Born-effective charge.67–71
The theory and the techniques will be presented for Kohn–Sham density functional theory, but our formulation is general and can also be applied to other related effective single-particle methods, such as Hartree–Fock-based theory. However, the relevant system sizes for graph-based electronic structure calculations are often quite large, and each separate subgraph partition may include hundreds of atoms. The graph-based method for QMD simulations is therefore particularly well-suited for semi-empirical electronic structure methods that typically are one to two orders of magnitude faster compared to higher-level first principles methods. The new methodology will be demonstrated using self-consistent-charge density functional tight-binding theory (SCC-DFTB),72–74 but it is also applicable to other semi-empirical methods.75–82
Our main objectives in this article are the underlying theoretical concepts and techniques for the graph-based electronic structure and response calculations that enable large-scale QMD simulations of challenging complex (non-metallic) chemical systems. The specific performance of our implementation on different platforms or the particular chemical behavior and mechanisms of the simulated examples are not our priorities.
If not stated explicitly, we will assume that all operators are represented in some local atomic-orbital-like basis set. In general, we will assume that these matrix representations have been orthogonalized by a congruence transformation,50,83 and we will use atomic units throughout the text. For the extended electronic degrees of freedom in XL-BOMD, we use the electron density represented either as a continuous function or in a discretized form as a charge vector, which is particularly well-suited for SCC-DFTB theory. Generalizations to other representations of the extended electronic degrees of freedom, for example, using atomic multipoles, the density matrix, electronic wavefunctions, or the effective single-particle Hamiltonian will not be discussed here. The underlying theoretical approaches remain the same, but some significant details in the techniques may need to be adapted to the different representations of the extended electronic degrees of freedom, e.g., see Ref. 84.
This article is outlined as follows. First, we discuss the basic ideas behind general graph-based matrix function expansions and how it applies to graph-based linear scaling electronic structure theory. Then, we present XL-BOMD in its most recent form that uses an approximate shadow Born–Oppenheimer potential energy surface. In this formulation, a kernel that determines the metric tensor of the generalized harmonic oscillator extension appears in the equations of motion for the electronic degrees of freedom, which makes the electronic degrees of freedom oscillate around a closer estimate of the exact ground state density. The action of this kernel in the electronic equations of motion can be updated on-the-fly using a preconditioned low-rank Krylov subspace approximation that is generated from quantum response calculations. To perform these response calculations, we then introduce a time-independent graph-based quantum perturbation theory for systems with fractional occupation numbers. The graph-based canonical quantum response calculations can be partitioned into response calculations over separate subsystems, whose result can be collected for the response of the composite system. This graph-based canonical quantum perturbation theory allows for large-scale response calculations at finite electronic temperatures with a natural parallelism. We then combine the different electronic structure methods into a framework for graph-based shadow Born–Oppenheimer molecular dynamics simulations. We also present how the methodology can be used as an alternative low-complexity approach to accelerate self-consistent field iterations. The theory is then demonstrated for a few testbed examples using SCC-DFTB theory. Finally, we give a brief summary and our conclusions.
II. GRAPH-BASED LINEAR SCALING ELECTRONIC STRUCTURE THEORY
Graph-based linear scaling electronic structure theory is a powerful and flexible approach to achieve linear scaling complexity. The procedure combines a well-controlled and tunable accuracy of a thresholded sparse matrix algebra with the natural parallelism and flexibility of a divide-and-conquer-like scheme.22
A. Numerically thresholded sparse matrix algebra
In a matrix that is numerically thresholded, all matrix elements with an absolute value below some tunable numerical tolerance, τ, are set to zero. Matrix operations with numerically thresholded matrices can often be performed with great efficiency because only the non-zero elements above the numerical threshold have to be included in calculations. This can often lead to a significant reduction in the computational cost, where sometimes the total number of arithmetic operations only scales linearly (or stays constant) with the matrix size.1,2,85 The numerical thresholding enables a tunable adaptive accuracy that is easy to control—the numerical cut-off tolerance, τ, is simply increased or decreased.49 In spirit, the numerical thresholding is similar to changing the numerical precision of the floating point representation. However, sparse matrix operations can often be problematic when they are performed in parallel on a distributed computer architecture. The necessary data transfer between nodes and a non-local data storage can lead to a large overhead. In contrast, dense matrix operations, such as matrix–matrix multiplications, can often be performed with high parallel efficiency.
B. Matrix polynomials on a graph
A sparse matrix has an associated graph, where all the non-zero matrix elements are represented by edges between vertices of a graph. If we calculate a polynomial function of a matrix, X, in a step-by-step process (generating each higher-order polynomial term by multiplying by X either from the left-hand side or from the right-hand side), we can represent the step-by-step evolution of the matrix elements with absolute values above the threshold, τ, as edges on a graph. This process is schematically illustrated in Fig. 1, where the new XY term is generated by multiplying Y from the left by X before thresholding. Numerical thresholdings are typically performed locally after each matrix–matrix operation. However, here, we perform a global numerical threshold, where we keep all elements, whose absolute values at any point in the polynomial expansion appear above the numerical threshold, τ. This is illustrated with the X15 = X51 matrix elements in Fig. 1, which are kept after the X × Y multiplications even if they were below τ after the last matrix multiplication because they were above τ in a previous step. The graph associated with the globally thresholded matrix polynomial then represents the data-dependency or the data-connectivity graph, Gτ, which determines how data are allowed to flow across the vertices during the polynomial matrix expansion and only the non-zero matrix entries corresponding to Gτ are used during a polynomial expansion.
A globally thresholded matrix polynomial is thus given by the step-by-step polynomial matrix expansion, where we only include matrix elements corresponding to the vertices and edges of the data-dependency graph, Gτ, after each matrix–matrix operation. This procedure, where we use a fixed sparsity pattern determined by the graph Gτ, we refer to as matrix polynomial on a graph—denoted, for example, by fG(X) or f(X)|G.
For the matrix operations on a graph, it is important to remove any fill-in elements that do not belong to the graph after each individual matrix–matrix multiplication in a step-by-step procedure after each matrix multiplication.22 For example, we do not perform the threshold after Y2 = X3, but, instead, already for the intermediate operation Y1 = X × X and then again for Y2 = X × Y1 (here with a left-hand side multiplication), as illustrated in Fig. 1. Only with this incremental step-by-step procedure can we cover all possible data dependencies that may appear in the polynomial matrix expansion. The example in Fig. 1 is for a symmetric matrix polynomial and a symmetric data-connectivity graph. Generalizations to matrix operations on non-symmetric directed data–dependency graphs are straightforward (see the supplementary material).
Instead of identifying the data-dependency graph from a global numerical threshold of a matrix polynomial, we can, alternatively, simply choose any data–connectivity graph, G, that then limits the allowed data flow during a polynomial matrix expansion on this graph. A matrix polynomial calculated on this graph is given by the same incremental step-by-step expansion, where only matrix elements corresponding to the graph G are kept after each increase in the polynomial order. Examples are given in the supplementary material. The advantage of using a graph determined by a global numerical threshold is that we can then easily control and tune the accuracy of the matrix function by increasing or decreasing the threshold, τ. This is not possible when we chose the graph in some arbitrary way.
A graph G can be partitioned into subgraphs consisting of non-overlapping core (c) parts and overlapping halos (h) (see Fig. 2). For example, we can choose one subgraph for each separate vertex of G, including all its edges and its directly connected neighboring vertices. This partitioning forms a set of overlapping subgraphs. The non-overlapping vertex of a graph is its core part, and the overlapping shared edges and vertices belong to the halo.
There is an important and useful one-to-one relation between a sparse matrix function expansion on a graph G, which may have been determined from a global numerical threshold or just chosen arbitrarily, and a collection of separate matrix function expansions using a dense matrix algebra over a set of smaller principal submatrices, where the submatrices are determined by the overlapping subgraphs from the partitioning of the data-dependency graph, G. This observation forms the basis of graph-based linear scaling electronic structure theory.22 The equivalence enables a natural parallelism in electronic structure calculations, similar to a divide-and-conquer approach, but with the tunable accuracy of a numerically thresholded sparse matrix algebra. After the initial formulation of graph-based linear scaling electronic structure theory, a more rigorous mathematical analysis was subsequently given in Refs. 23 and 25.
C. One-to-one relation between a sparse matrix function and collected small dense matrix functions
Note that the data-connectivity graph G does not have to be determined by the global numerical threshold of f(X). Any sparse graph can be chosen. The relation in Eq. (2) still holds (see the supplementary material). This freedom provides a lot of flexibility in how we can control the accuracy and improve the computational efficiency. In particular, we can always add edges to the graph without losing accuracy.
The relation between the matrix function on the graph and the collected submatrix polynomials is illustrated in Fig. 2. The partitioning of the data-dependency graph, Gτ, into subgraphs determines the principal submatrices, x(i). Each subgraph and the corresponding submatrix have a core and a halo (h). For example, in subgraph 1 in Fig. 2, vertex 1 (yellow) is in the core and vertices 2 and 3 (orange) are in the halo. Here, each core is chosen to be a single vertex (corresponding to a diagonal matrix element) and only the subgraphs for vertices 1, 6, and 7 are shown out of a total of nine subgraphs. The principal submatrices, x(i), may have matrix elements that are not present in the connectivity graph corresponding to matrix elements, Xij, that are initially 0 (empty white matrix entries), but which become finite at some point during the matrix function expansion. An example is the edge between nodes 2 and 3 in subgraph 1 in Fig. 2. The separate results of the matrix elements of the core columns of the dense submatrix function expansions are then extracted and put together column by column (or row by row) to assemble the matrix function expansion of the full sparse matrix that otherwise would be calculated on the data connectivity graph, Gτ, or with the corresponding global numerical threshold.22,24,26 This is the “collect” operation, as indicated by the subscript on the very right-hand side of Eq. (2) and in the lower right corner of Fig. 2.
The one-to-one relation between a matrix function on a graph and its collected submatrix functions also holds for a polynomial matrix expansion using a variety of non-commuting and non-symmetric matrices, including directed (non-symmetric) data-dependency graphs, G. In these more general cases, some special care is required to account for the allowed data flow22 (see the supplementary material). For a matrix expansion of non-commuting or non-symmetric matrices on a general directed data-dependency graph, the principal submatrices are extracted either by columns or by rows. The higher-order polynomial terms on the graph are then built by multiplying matrices either from the left-hand side (for the column case) or from the right-hand side (for the row case) in the incremental step-by-step polynomial expansion. Only matrix elements on the graph are kept after each matrix operation. This matrix polynomial on the graph is then equivalent to the collected set of the submatrix polynomials that are collected either by columns or by rows from their core parts. Examples that illustrate the separate column- and row-wise versions for the non-commuting or non-symmetric polynomial expansions on a graph are provided in the supplementary material.
D. Fermi-operator expansion on a graph
E. Estimating the connectivity graph
The data connectivity graph, Gτ, given from a global numerical threshold, τ, of some matrix function, is, in general, not known a priori. In fact, a global numerical threshold of a matrix function requires that we first calculate the full matrix function before it can be partitioned into separate submatrix calculations. This may seem like a highly unpractical approach. However, in graph-based QMD simulations, or in iterative self-consistent field optimizations, the data connectivity graph, Gτ, can be estimated from previous steps, including redundant estimates from new connections formed as the atoms move or the electronic structure changes.22 Our favorite choice is to estimate the data-dependency graph from the non-zero structure of the numerically thresholded density matrix (either in the orthogonalized or the atomic-orbital representation) from a previous self-consistent-field (SCF) iteration or molecular dynamics time step, including any new matrix elements that may have formed in the thresholded Hamiltonian. Additional connectivity is then added by performing paths of length two of this graph. After performing the Fermi-operator expansions on the partitioned subgraphs, the collected density matrix is then, once again, numerically thresholded and the process is repeated. This technique allows for a rapid adaptation to new data-dependency flows and sparsity patterns in combination with an easily numerically tunable accuracy.22 This method will also be used in the examples of our graph-based QMD scheme.
An important feature of the different estimates of the data connectivity graph is that, even if we chose to have a fixed core part, the halo is fluctuating in size after each MD time step or SCF iteration.22 This can lead to some limitations in designing calculations.
F. Choosing the graph partitioning
The flexibility in determining the connectivity graph Gτ extends to the partitioning of the subgraphs into the core and the halo, which in turn determine the principal submatrices. In particular, the core is not limited to the case of a single matrix element illustrated in Fig. 2. The computational efficiency can be increased by including multiple matrix elements in the core while keeping the combined number of halo elements per core element small.22 Highly efficient core partitionings can be constructed using off-the-shelf graph methods22,26,101,102 that identify communities of closely connected vertices. Such graph partitioning, or other alternative partitioning methods,58 has been shown to decrease the average proportion of halo vertices in the subgraphs, leading to a significant reduction in the computational overhead.22,25,26,58
It is important to note that the data connectivity graph guided by the range of the thresholded density matrix is governed mainly by the electronic overlap of the molecular orbitals and not by the interatomic distances, e.g., based on some radial cutoff. This can be seen in the example of a solvated Trp-cage protein structure shown in Fig. 3, where the halo atoms surrounding a core region can be quite non-uniform. The data connectivity graph therefore gives a picture of the electronic overlap or the chemical bonding in the same way as the numerically thresholded density matrix, . This also means that graph-based QMD will be inefficient for metallic systems, where we have itinerant electronic states that, in practice, overlap with all atoms. The graph-based calculations will still work correctly, but the lack of sparsity will lead to a single subgraph that encloses the whole system. The natural parallelism and reduced linear scaling complexity are then lost.
Partitioning the graph by the density matrix will naturally divide soft matter systems into chemically relevant units that will often correspond to a chemically sound division. As an example of the latter, in Fig. 3, we show the partitioning result applied to a system composed of a Trp-cage miniprotein (a 20 residues peptide) solvated in water, which is an archetypal model for a soft matter system. By partitioning the connectivity graph for the merged graphs of the density matrix and the corresponding Hamiltonian matrix, using an off-the-shelf community detection algorithm,101 we can see how key parts can be revealed. The left panel of Fig. 3 shows a tyrosine residue that automatically has been captured as a core part. We have added the surrounding halo atoms in translucent blue to show the extent of the connectivity. The right panel of Fig. 3 shows how a closely connected water cluster is captured within a core partitioning. The ability of graph-partitioning the data-connectivity graph determined by the electronic overlap on identifying chemically intuitive fragments could be of great interest as an automatic analysis tool for large complex simulations, in particular, if fragments appear in core partitions that are consistent over time.
Another example of an automatic graph clustering is shown in Fig. 4, where a METIS community detection algorithm101,102 was used to identify the core parts of a graph corresponding to a phenyl dendrimer system. The left panel shows the molecular structure, and the right panel shows the clustered graph with the more closely interconnected communities. This automatic core partitioning can then be used in the graph-based electronic structure calculations to reduce the computational overhead.
G. Newton–Raphson scheme to find the chemical potential
To study chemical systems where the electronic HOMO–LUMO gap is small or vanishing, it is necessary to avoid instabilities. This is of particular importance in QMD simulations, where discontinuous crossings of nearly degenerate states may appear almost instantaneously. The problems can be avoided if we introduce fractional occupation numbers, i.e., we use a temperature-dependent free energy formulation for the electronic states.105–107 Using fractional occupation numbers leads to some additional challenges in our graph-based approach to electronic structure calculations. For example, it is no longer sufficient to determine the shared chemical potential μ in Eq. (3) or (5) over all the subgraph systems somewhere in the shared HOMO–LUMO gap.22,97 Instead, we need the exact shared value of the chemical potential, μ. This can be achieved using an iterative Newton–Raphson scheme, as presented in Ref. 108, which rapidly determines the chemical potential that gives the correct total occupation for a system with fractional occupation numbers.
The Newton–Raphson method is quadratically convergent. In QMD simulations or in self-consistent field iterations, the chemical potential from the previous step can be used as an initial guess. It can also be propagated or extrapolated from a sequence of previous time steps. Often, only a few iterations are then necessary to reach a tightly converged value of the shared chemical potential.
The Newton–Raphson scheme in Eq. (15) illustrates the general power of the graph-based approach to electronic structure calculations with its equivalence between the matrix function on a graph and its collected matrix functions of the principal submatrices, which are described by the graph partitioning construction in Eq. (7). Most algorithms for a full composite sparse system can be partitioned into operations over independent subsystems, which allows for a natural parallelism.
The Newton–Raphson scheme in Eq. (15) is efficient for systems with fractional occupation numbers and a small HOMO–LUMO gap. For systems with integer occupation of the electronic states and a finite HOMO–LUMO gap, there are more efficient alternative techniques that can be used even in combination with recursive Fermi-operator expansion methods.22,97
III. XL-BOMD
In graph-based QMD simulations, the overhead from the required self-consistent electronic ground state optimization, which is required prior to each force evaluation, can be reduced by using modern Car–Parrinello–like methods.22,58 In these methods, the electronic degrees of freedom are either propagated dynamically using extended Lagrangian Born–Oppenheimer molecular dynamics (XL-BOMD)60,62,109,110 or are extrapolated from previous time steps.58,111–115 In this section, we will briefly present the most recent formulations of XL-BOMD,50,61 where we use a shadow Born–Oppenheimer potential energy surface and a preconditioned low-rank Krylov subspace approximation for the kernel in the integration of the electronic equations of motion. We will use this updated shadow potential version of XL-BOMD to improve our original graph-based QMD simulation framework.22 Previously, the graph-based XL-BOMD simulations sometimes could require a few SCF iterations in each time step for small-gap systems with unsteady ground-state charge solutions. With the more recent formulation of XL-BOMD, it is possible to efficiently simulate more challenging and charge sensitive systems, e.g., systems where the electronic HOMO–LUMO gap is opening and closing along the molecular trajectories, without any SCF iterations.61 Ordinarily, electronic degeneracies appearing at the chemical potential when the gap is closing can cause costly convergence problems and difficulties in achieving accurate conservative forces, which may lead to unphysical molecular-dynamics trajectories. The most recent shadow potential formulation of XL-BOMD, including finite electronic temperatures and fractional occupation numbers, can avoid these problems.
A. Shadow potential energy surface
The shadow potential formulation of XL-BOMD is based on a general principle that is quite simple, yet very powerful. Instead of calculating expensive approximate forces for the underlying exact interatomic potential, U(R), we design an approximate shadow potential, , that closely follows the exact solution, but for which we can calculate exact forces cost-effectively. This approach is usually referred to as a backward error analysis or a shadow Hamiltonian approach116–123 and is often associated with the design of symplectic or geometric integration schemes.124–128
In XL-BOMD, the shadow potential, , depends not only on the nuclear coordinates, R, but also on the electron density, n(r). Thus, in contrast to regular QMD and in the spirit of Car–Parrinello dynamics,129 XL-BOMD includes the electronic degrees of freedom, n(r), and its time derivative, , as additional extended dynamical field parameters, apart from the nuclear coordinates, R, and their velocities, . The equations of motion in XL-BOMD are derived in an adiabatic limit, where the extended electronic degrees of freedom are assumed to be fast compared to the nuclear motion. This corresponds to the same basic assumption (but for a classical field) as in the Born–Oppenheimer approximation. A similar adiabatic condition can also be enforced in Car–Parrinello molecular dynamics using Lagrange multipliers, which forms the basis of the mass-zero constrained dynamics scheme by Bonella et al.130–132
Note that when we include the effects of finite electronic temperatures, we allow for thermally excited states as given by the fractional occupation numbers in Kohn–Sham DFT.105–107 In principle, this goes beyond the regular ground-state Born–Oppenheimer approximation. However, here, we also consider an instantaneously thermally equilibrated ground state solution a part of a generalized Born–Oppenheimer approximation.
In general, E(R, ρ) is a nonlinear energy functional. The constrained minimization, Eq. (19), over all physically relevant electron densities, which in Kohn–Sham density functional theory are determined by single-particle states, leads to a system of non-linear eigenvalue equations. This non-linear problem needs to be solved iteratively with an SCF optimization procedure. Alternatively, the ground state can be found with an iterative direct energy minimization scheme, which in many ways resembles the iterative SCF optimization procedure. The ground-state optimization can be quite expensive, and unless the solution is well converged, the calculated forces from U(R) may not be sufficiently conservative, which could invalidate the results of a QMD simulation. This shortcoming can be avoided with a backward error analysis or a shadow Hamiltonian approach.
B. The ground state density matrix for the shadow Born–Oppenheimer potential
C. Equations of motion
D. Approximating the kernel
To facilitate the presentation of how the kernel, K(r,r′), can be approximated, we will use a matrix-vector notation.
The directional derivatives in Eq. (37) along vj can be calculated using time-independent quantum perturbation theory. The charge vector, vj, generates a linear perturbation, H1, in the potential of the unperturbed Kohn–Sham Hamiltonian, H0, i.e., we get a linear first-order perturbation in the Hamiltonian, where H(λ) = H0 + λH1. The first-order response in the charge density is then given from the first-order perturbation in the wavefunctions or the density matrix. How we can perform such response calculations on a partitioned graph, including fractional occupation numbers, will be presented in Sec. IV and is one of the main objectives of this paper.
IV. CANONICAL QUANTUM PERTURBATION THEORY ON A GRAPH
The time-independent canonical quantum response calculations required for the directional derivatives in Eq. (37), which are used in the preconditioned kernel approximation in Eq. (41), can be formulated in terms of density matrix perturbation theory.61,68,145,147–149 Density matrix perturbation theory for fractional occupation numbers can be constructed from matrix function expansions.145 We can therefore design a graph-based canonical quantum perturbation theory if we use the one-to-one mapping between the matrix function expansion on a graph and the collection of the core parts from the dense matrix functions of the principal submatrices that are determined by the partitioned subgraphs, i.e., as in Fig. 2 or in Eq. (7). First, we show how the time-independent linear response in the density matrix can be calculated on a partitioned graph, and then, we look at the response in observables and, in particular, the linear response in the electron density.
A. Linear response in the density matrix
B. Response in observables
In general, we never need to construct the full collected density matrix response matrix from the core parts of the separate subgraph density matrix responses, and . Instead, response properties, such as the response in the electron density or the response of the partial charges, can be calculated first for the separate subsystems and then collected. In this way, the amount of data transfer can be reduced when the response calculations are performed on a distributed parallel platform.
The data-dependency graph, Gτ, that is needed for the graph-based quantum response calculations can be determined not only from an estimate of the globally thresholded expansions of the ground state density matrix, , and the Kohn–Sham Hamiltonian, H0, but also from the response density matrix, . The matrix is often less sparse than ;150 however, in practice, we have found that it is sufficient to estimate Gτ from the same graph structures of a thresholded and H0, as discussed in Sec. II E.
Our graph-based canonical quantum perturbation theory was motivated by the need to calculate the response in the electron density, as in Eq. (37), which is needed to approximate the kernel in our shadow potential formulation of XL-BOMD.61 Graph-based quantum perturbation theory is a very useful and general tool in its own right, however. In addition, beyond the present application, it should also be possible to use the theory to reduce the complexity of calculations of a broad range of time-independent response properties of large extended systems, including properties such as the magnetic susceptibility, phonon modes, electric polarizabilities, the Born-effective charge, and Raman spectra.45,67–69
V. GRAPH-BASED SHADOW BORN–OPPENHEIMER MOLECULAR DYNAMICS
Later, in our demonstration of the graph-based shadow Born–Oppenheimer molecular dynamics simulations, we will use semi-empirical SCC-DFTB theory, which is naturally described using the vector notation. The theory is general, however, and can be applied to electronic structure methods with different representations of the electronic structure, perhaps with some significant modifications.
This section mirrors and partially overlaps with the response theory discussed in Sec. IV. However, the analysis in this section complements the previous discussion. Instead of a general presentation of a graph-based canonical density matrix perturbation theory, we now focus on the specific techniques required for the graph-based shadow Born–Oppenheimer molecular dynamics simulations. First, we look at the construction of a preconditioner and then how the low-rank preconditioned Krylov subspace approximation can be performed on a graph for the integration of the electronic equation of motion.
A. Preconditioner for a partitioned graph
The effectiveness of the Krylov subspace approximation of acting on the preconditioned residual function, , in Eq. (58) depends on how close the preconditioner, K0, is to the exact solution, J−1. It is possible to construct K0 by calculating a regularized kernel for an approximate test system, for example, the full composite system at the initial time step of a QMD simulation or for some similar equilibrated structure. The principal submatrices of K0 can then be extracted on-the-fly during an MD simulation as preconditioners for the partitioned subgraphs. Such a preconditioner could be reused, possibly over thousands of time steps. Without a parallel implementation, the computational cost currently limits this approach to systems with ≲O(104) atoms. Fortunately, our graph-based methods enable a modified parallel approach where the preconditioner for the whole system is decomposed into a set of smaller separate preconditioners, one for each subgraph. This approach must account for the possibility that the halos surrounding the core regions of the subgraphs can change in each time step, which increases the complexity. It is also necessary to take care that application of the preconditioner conserves the total charge. We now describe such an approach.
B. Preconditioned Krylov subspace approximation for the electronic equations of motion
The block-diagonal preconditioner, K0 in Eq. (72), can be used to accelerate the Krylov subspace approximation of the kernel acting on the residual function in the equations of motion for the electronic degrees of freedom in Eq. (58), which then can be approximated as in Eq. (41). To achieve this, we need to construct the orthogonalized preconditioned Krylov subspace expansion of vectors {vi} and . With the graph-based approach to quantum response calculations, it is possible to perform the most demanding tasks of this Krylov subspace expansion in parallel on the separate subsystems. To explain how this is achieved, we will show how the first vectors, v1, , and v2, are constructed. We will then present the general algorithm.
C. Self-consistency acceleration
VI. EXAMPLES AND ANALYSIS
In this section, we will show some simulation examples of graph-based quantum response theory and extended Lagrangian shadow Born–Oppenheimer molecular dynamics using the preconditioned Krylov subspace approximation for the integration of the extended electronic degrees of freedom. First, we briefly present SCC-DFTB theory, which forms the basis for our implementation. We then demonstrate the graph-based quantum response theory as it is applied to the preconditioned Krylov subspace approximation in the quasi-Newton SCF acceleration scheme in Eq. (91). Thereafter, we demonstrate the methodology for graph-based shadow Born–Oppenheimer molecular dynamics simulations and how we can achieve stability and control of the residual charge errors without performing any SCF optimization prior to the force evaluation as in regular Born–Oppenheimer molecular dynamics. We then demonstrate how the graph-based methodology can be applied to molecular dynamics simulations of large complex molecular systems with tens-of-thousands of atoms. In all systems, we used periodic boundary conditions. At the end, we make a preliminary assessment of the scalability and the parallel efficiency.
A. Implementation with SCC-DFTB theory
The methods in this article were implemented and tested using semi-empirical SCC-DFTB theory48,72–74,158–161 based on the open-source electronic structure software package LATTE89,162 together with the PROGRESS and Basic Matrix Library (BML) libraries.9,88 No new parameterizations or optimizations of the SCC-DFTB energy functional were performed for this study. SCC-DFTB is an approximation of first-principles Kohn–Sham density functional theory, which is derived from a second or third-order expansion of the Kohn–Sham energy functional in the electronic charge fluctuations around a set of overlapping atomic electron densities. A minimal numerical basis set is used, and bond and overlap integrals are tabulated and parameterized using a Slater–Koster approximation. The electrostatic energy is approximated by screened Coulomb interactions between atomic net Mulliken partial charges. The Coulomb interaction between atoms decays as |RI − RJ|−1 at large distances and is screened at short distances as the interaction between two penetrating Slater-like charge densities with the on-site term chosen as the chemical hardness or a Hubbard U parameter. In this formalism, the charge density, n, and the corresponding ground state density, ρ0[n], are reduced to vectors with components corresponding to the net Mulliken partial electron occupation for each atom.
B. Graph-based accelerated SCF optimization
The preconditioned Krylov subspace expansion for the kernel can be used to accelerate the convergence in the iterative optimization of the self-consistent ground state solution,61,154 as described in Eq. (91). This ground state SCF optimization is required in our graph-based shadow Born–Oppenheimer molecular dynamics simulations, but only in the first initial time step, where n(t0) is set to the exact regular Born–Oppenheimer ground state, ρmin(r). The SCF acceleration in Eq. (91) can be used to demonstrate our graph-based quantum response theory for fractional occupation numbers, which is required in the construction of the preconditioner and for the low-rank Krylov subspace approximation of the kernel, K. A significant reduction in the number of iterations required to reach a high level of convergence would demonstrate the expected performance of the theory. The Krylov subspace expansion we will use is truncated, and therefore, the kernel is not exact. The preconditioned Krylov subspace expansion is therefore only approximate, and in this case, the iterative updates using Eq. (90) correspond to a quasi-Newton scheme.
Figure 5 shows an example of the convergence of the root mean square error (RMSE) given from the root mean square of the residual charge function as a function of iterations using either the state-of-the-art Pulay direct inversion of the iterative subspace (DIIS) method157,163 or a quasi-Newton scheme (Kernel), as in Eq. (90). The test systems for this comparison are a solvated Trp-cage protein structure (2644 atoms) and a system with ammonium hydroxide in water (4071 atoms). A high electronic temperature was chosen with β−1 = 0.5 eV, which provides a notable deviation in the occupation numbers from a pure ensemble and thus demonstrates how the method also works for systems with non-integer occupation numbers. Both systems are fairly demanding to converge. The quasi-Newton scheme provides a significant acceleration of the SCF optimization and thus demonstrates how the graph-based canonical quantum-response theory works, in practice. However, each iteration involves low-rank updates that require repeated quantum response calculations on the graph (a total of six was used for these runs), each which cost about half as much as a full self-consistent field iteration using DIIS. The wall-clock time required to reach convergence is, therefore, in practice, about the same, but only if we ignore the additional cost of calculating the preconditioner. The quasi-Newton scheme represents an alternative acceleration method, for example, to the SCF mixing schemes by Broyden, Anderson, and Pulay155–157 and should be particularly competitive if the cost of the preconditioner can be ignored, which is the case for our graph-based QMD simulations where the preconditioner can be reused over hundreds or even thousands of time steps.
The improved SCF acceleration in Fig. 5 demonstrates that our graph-based canonical quantum response theory and preconditioned Krylov subspace approximation work as expected. Beyond applications to single-point SCF optimization and molecular dynamics simulations, demonstrated below, the same graph-based quantum response theory and preconditioned Krylov subspace approach should also be applicable to the repeated SCF optimization required in geometry optimization, which could also reuse the same preconditioner over multiple atomic configurations.
C. QMD: Test system A
To assess the performance of the graph-based shadow Born–Oppenheimer molecular dynamics scheme for more challenging simulations, we prepared a test system in which reactions are poised to occur. The test system consists of concentrated ammonium hydroxide in water, which in a microcanonical ensemble (NVE) simulation will evolve into ammonia by the reaction + OH− ⇄ NH3 + H2O. This system is highly reactive mainly because of two factors: (1) The concentration of the reactants is artificially high (29.5M or 36 times higher than its equilibrium concentration under ambient conditions) and (2) the initial coordinates are out of equilibrium, which leads to a rapid increase in temperature within the first few hundred MD time steps of NVE simulation.
A schematic representation of the test system is shown in Fig. 6(a). It contains a total of 4071 atoms with an initial set of 216 + OH− pairs apart from the water. The data-dependency graph was estimated from paths of length two of the combined graph of the full Hamiltonian and a thresholded density matrix (threshold 0.002). We used six rank-1 updates for the Krylov approximation of the preconditioned kernel. The system was partitioned into 128 subgraphs. In Fig. 6(b), we show one of the subsystems with its core and surrounding halo region, which was automatically selected using a graph-partitioning algorithm as implemented in the METIS software package.101,102 The QMD simulation was performed using 64 MPI ranks distributed across 16 Intel(R) Xeon(R) E5-2695 v4 @ 2.10 GHz central processing units (CPUs) each of them containing 36 cores.
Figure 7 shows three quantities monitored over QMD simulation. The temperature (in red, upper panel) increases rapidly up to about 1450 K. The total energy (in black, mid panel) demonstrates a stable total energy without any systematic drift. The root means square error (RMSE) given from the root mean square of the residual charge function (in blue, lower panel) demonstrates a well-controlled stable behavior. The rapid initial changes in the electronic structure and temperature create fairly large oscillations in the total energy, where a few molecules are moving fast. The amplitude of the total energy oscillations decays as the system is reaching a thermal equilibrium. Some of the rapid initial oscillations also lead to a small increase in the residual charge.
The QMD trajectory was inspected to identify reaction events. In Fig. 8, we show a local reaction event captured in a series of configuration snapshots. An initial NH4–OH complex is formed at 28 fs followed by the formation of NH3 and water at 34 fs. The rest of the simulation shows NH3 and water molecules defusing apart (see the snapshot at 126 fs).
Despite the out-of-equilibrium initial configuration, followed by a rapid exothermic process with chemical changes as in Fig. 8, the total energy in Fig. 7 remains remarkably stable. No instabilities in the total energy or in the residual charge are observed. This example illustrates that the graph-based shadow Born–Oppenheimer molecular-dynamics scheme is capable of simulating a challenging reactive test system without the iterative SCF optimization steps that normally would be required prior to the force evaluations in a regular Born–Oppenheimer molecular dynamics simulation.
D. QMD: Test system B
Our next test system was chosen to be a water box containing several thousands of atoms. The system was constructed using the GROMACS solvation tool.164 This type of system is non-reactive under the initial conditions and the simulation time-scales (i.e., no water dissociation is expected). It, therefore, allows us to use a relatively coarse graph, leading to more efficient calculations and the possibility of reaching longer-duration simulation times, even for larger systems. Figure 9(a) shows the water system containing 6495 atoms, and Fig. 9(b) shows a subgraph partition with its core and halo regions.
Figure 10 shows the result of NVE simulation starting with an initial out-of-equilibrium structure. The calculations were performed by decomposing the system into 256 subgraphs. The QMD simulation was performed using 64 MPI ranks distributed across 16 Intel(R) Xeon(R) E5-2695 v4 @ 2.10 GHz CPUs each of them containing 36 cores. The MD time step and electronic temperature were set to 0.2 fs and β−1 = 0.5 eV. In addition, this simulation demonstrates stability and error control in the behavior of the total energy (in black, middle panel) and charge residual error (in blue, lower panel) as the statistical temperature increases to around 400 K (in red, upper panel).
E. QMD: Test system C
Our final test case is a system composed of eight Trp-cage synthetic polypeptides in an ammonium bicarbonate solution, shown in Fig. 11. A solvated simulation box with a single polypeptide and a charge-neutralizing combination of five H molecules and four molecules was constructed using the multicomponent input generator of CHARMM-GUI,165 using input files for the polypeptide, H and . The polypeptide coordinates were obtained from the first model in RCSB PDB entry 1L2Y.166 It was expanded to a 2 × 2 × 2 system using PDBProp in AmberTools.167 The final system was a 43.923 Å3 cube simulation box with 64 112 atoms. It is highly challenging not to say practically unfeasible to perform quantum-mechanical Born–Oppenheimer simulations of systems of this complexity without super-computing access.
Figure 12 shows the results of a QMD simulation with the graph-based extended Lagrangian shadow Born–Oppenheimer molecular dynamics method. The simulation was performed on 32 nodes of the Chicoma Institutional Computing cluster at Los Alamos National Laboratory. Each node is equipped with two 64-core AMD Rome EPYC 7H12 processors and 512 GB memory, with MPI communication via a 100 Gb/s HPE/Cray Slingshot10 interconnect (no GPUs). The job was distributed over 1024 MPI tasks: 32 tasks per node, each using four OpenMP threads. The computation was distributed using 2048 subgraphs: two per MPI rank. The simulation was performed in a NVE ensemble using a time step of δt = 0.5 fs, an electronic temperature with β−1 = 0.1 eV, and three rank-1 updates in the approximation of the preconditioned kernel. The initial velocity distribution was randomly sampled from a Maxwell distribution with a temperature of 150 K. The size of the core partitions was on average about 30 atoms, and the total size of the subgraphs, including the halo, varied between 400 and 500 atoms. Even for this system—our largest—the behavior is robust and stable, as seen in the total energy fluctuations (in black, middle panel) and in the size of the charge residual (in blue, lower panel) in Fig. 12. This example demonstrates how the graph-based shadow Born–Oppenheimer molecular dynamics schemes is scalable to system sizes, including tens-of-thousands of atoms using a CPU-based compute cluster.
F. Scalability
To assess the preliminary parallelized efficiency of the graph-based approach to electronic structure calculations, we performed both strong and weak scaling studies. The strong scaling study was performed using a 16 704 atom water box, generated by expanding a 2088 atom box twofold along each side. The system was subdivided into 512 subgraph partitions. The calculation was distributed on either 1, 2, 4, or 8 nodes of the Chicoma cluster (architecture described in Test System C) using 32 MPI tasks per node, and timings were obtained for the total MD step (including steps that have not yet been parallelized), the kernel update, and the density matrix (DM) construction (Fig. 13). The number of rank updates for the preconditioned Krylov subspace kernel approximation was set to three, and the timings were measured after the initial time step, which includes a full SCF optimization of the electronic ground state and the construction of the preconditioner. Compared to using a single node, the total time required for an MD step decreased 1.7-fold, 2.4-fold, and 3.4-fold when increasing the number of nodes to 2, 4, and 8, respectively, corresponding to efficiencies of 85%, 60%, and 43%. Similar strong scaling efficiencies are seen for the kernel update and DM construction steps.
For the weak scaling study, water boxes were generated as before expanding an initial 2088 water box system. Increasing sizes were prepared and simulated using proportional compute resources. The calculations were performed on a set of homogeneous nodes in the Darwin research testbed cluster at Los Alamos National Laboratory. Each node is equipped with dual socket Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10 GHz processors and 125 GB memory. The nodes used are interconnected using 100 Gb/s Mellanox EDR InfiniBand. The initial 2088 atoms system was used as a reference; this system was simulated using a single compute node. The system was scaled up by factors of 2, 4, 8, and 16, while proportionally increasing the number of nodes. The reference system was decomposed into 64 partitions and the number of partitions was increased in proportion to the system size. This increase in the partitioning may not be the ideal choice for performance, but it simplifies our test. The simulations were performed using four ranks per node.
Figure 14 shows the results of the weak scaling study. Compared to the reference system on a single node, the relative time in the total cost of an MD step increases about 2.5 times for a system 16 times larger using 16 nodes. The density matrix (DM) construction step shows a similar scaling behavior. The Kernel update is comparatively more costly for larger systems: 3.5 times slower when scaling up by a factor of 16. This reflects an increased level of communication required for the Kernel calculation compared to the other calculations.
VII. SUMMARY AND CONCLUSIONS
In this article, we have presented a graph-based canonical quantum perturbation theory for applications in graph-based QMD simulations22 based on the most recent shadow potential formulations of XL-BOMD.61 This set of techniques enables stable, linear-scaling, shadow Born–Oppenheimer molecular dynamics simulations of charge sensitive or reactive chemical systems without involving any iterative SCF optimization prior to force evaluations. The formulation includes a preconditioned Krylov subspace approximation for the integration of the extended electronic degrees of freedom, which requires quantum response calculations for the electronic states, including fractional occupation numbers. The proposed graph-based canonical response calculations can be performed in parallel in the same way as for the electronic ground state, which is guided by the partitioning of a data-dependency graph that can be estimated from previous integration time steps or SCF iterations.
Graph-based electronic structure calculations are intended for studies of large systems with around a thousand atoms or more and require the separate calculations of subsystems that often contain several hundred atoms. Graph-based QMD simulations are therefore particularly well-suited for semi-empirical electronic structure theory. The methods in this article were implemented and tested using SCC-DFTB theory.
The proposed graph-based shadow Born–Oppenheimer molecular dynamics scheme was demonstrated in simulations of challenging chemically reactive systems, such as ammonium hydroxide in water. Simulations were stable both in the fluctuations of the total energy and in the residual charge error. This is in agreement with recent shadow potential formulations of XL-BOMD simulations performed in reactive systems with sensitive unsteady charge solutions, but without using the graph-partitioning scheme.61,84 We also demonstrated how graph-based canonical quantum response theory in combination with the preconditioned Krylov subspace approximation can be used in a quasi-Newton scheme to accelerate the SCF convergence.
The graph-based approach to electronic structure calculations can take advantage of emerging exascale computing resources.168 This should also be possible with the graph-based quantum response calculations. Our current implementation used in the examples is still preliminary, but demonstrates good parallel scaling, allowing for QMD simulations of tens-of-thousands of atoms. More development is needed, in particular, to take advantage of hybrid architectures using graphics processing units or AI-accelerators. The main goal of this paper is the underlying theoretical concepts and techniques for the graph-based electronic structure and response calculations, including fractional occupation numbers, which enable scalable extended Lagrangian shadow Born–Oppenheimer molecular dynamics simulations that also are applicable to simulations of more challenging, charge sensitive, or reactive chemical systems.
Recently, it has been shown how statistical machine learning techniques can be used to significantly enhance the accuracy of semi-empirical electronic structure methods.169–176 To use such “AI-boosted” semi-empirical methods in combination with graph-based QMD simulation techniques represents a promising path toward accurate simulations of large complex chemical systems.
SUPPLEMENTARY MATERIAL
In the supplementary material, we present a Matlab script with a numerical example of the graph-based expansion, which demonstrates the one-to-one mapping between the step-by-step matrix function expansion on a graph (with multiplications from the left or right hand side) and the collected expansions over the separate subgraphs and we also present the general workflow for the most basic form of the graph-based electronic structure method for calculating the density matrix.
ACKNOWLEDGMENTS
This work was supported by the U.S. Department of Energy Office of Basic Energy Sciences (FWP LANLE8AN “Next Generation Quantum-Based Molecular Dynamics”) and by the U.S. Department of Energy through the Los Alamos National Laboratory (LANL), including the LANL Institutional Computing program. LANL is operated by Triad National Security, LLC, for the National Nuclear Security Administration of the U.S. Department of Energy under Contract No. 892333218NCA000001.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Christian F. A. Negre: Conceptualization (supporting); Data curation (lead); Formal analysis (equal); Funding acquisition (supporting); Investigation (supporting); Methodology (supporting); Project administration (supporting); Resources (lead); Software (lead); Validation (equal); Visualization (equal); Writing – original draft (supporting); Writing – review & editing (equal). Michael E. Wall: Data curation (supporting); Formal analysis (equal); Resources (supporting); Software (supporting); Validation (equal); Visualization (equal); Writing – review & editing (equal). Anders M. N. Niklasson: Conceptualization (lead); Data curation (supporting); Formal analysis (equal); Funding acquisition (lead); Investigation (lead); Methodology (lead); Project administration (lead); Resources (supporting); Software (supporting); Validation (equal); Visualization (equal); Writing – original draft (lead); Writing – review & editing (lead).
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding authors upon reasonable request.
REFERENCES
When n = ρ0[n], we have an exact self-consistent solution, and then, ρ0[n] = ρmin.
Note that this approach needs to be adapted to work with other representations or the electronic degrees of freedom besides some coarse grained charge density, e.g., of partial atomic charges.