Due to their improved accuracy, double-hybrid density functionals emerged as an important method for molecular electronic-structure calculations. The high computational costs of double-hybrid calculations in the condensed phase and the lack of efficient gradient implementations thereof inhibit a wide applicability for periodic systems. We present an implementation of forces and stress tensors for double-hybrid density functionals within the Gaussian and plane-waves electronic structure framework. The auxiliary density matrix method is used to reduce the overhead of the Hartree–Fock kernel providing an efficient and accurate methodology to tackle condensed phase systems. First applications to water systems of different densities and molecular crystals show the efficiency of the implementation and pave the way for advanced studies. Finally, we present large benchmark systems to discuss the performance of our implementation on modern large-scale computers.
I. INTRODUCTION
Most applications of electronic structure methods are currently based on Density Functional Theory (DFT).1,2 DFT methods are widely available in standard quantum chemical software and are computationally efficient, and a range of low scaling approaches were developed.3–19 Different density functionals are often classified within a hierarchical scheme (Jacob’s ladder analogy) with respect to their explicit dependence on information of the electronic density.20 On the first rung, we find functionals based on the Local Density Approximation (LDA) employing the local density only. On the second rung, functionals additionally take the density gradient into account [Generalized Gradient Approximation (GGA)] while meta-GGA functionals augment them with an additional dependence on the kinetic energy density and/or the Laplacian of the density.21–27 Up to the third rung, density functionals usually suffer from self-interaction errors that are, at least partially, removed on the fourth rung by the (partial) inclusion of exact [Hartree–Fock (HF)] exchange in hybrid functionals.28–34 Dispersion interactions, crucial for the description of biomolecules, molecular crystals, liquids, or interactions on surfaces, are still absent on the hybrid functional level. Its inclusion is achieved with either empirical correction schemes or explicitly non-local functionals.35–41
Specifically, for weakly interacting systems, common functionals have severe shortcomings, and a systematic approach that allows an easy construction of DFT functionals is missing. In contrast, correlated methods derived from Wave Function Theory (WFT) provide systematic approaches with increasing accuracy, e.g., methods based on HF theory, Møller–Plesset perturbation theory, coupled-cluster theory, or the Random-Phase Approximation (RPA).42–47 The advanced WFT methods, having formally at least a quartic scaling behavior, are usually more accurate but also more expensive than hybrid density functionals. Yet, recent low-scaling variants of these methods allow the treatment of larger systems.48–59 As a drawback, correlated methods require much larger basis sets than DFT methods. This basis set dependence can be tackled with range-separated approaches or explicitly correlated methods.60,61 Dispersion interactions are, unlike in local density functionals, mostly well described; however, some systems require higher levels of theory for improved accuracy.
Double-Hybrid (DH) functionals try to combine the best of both worlds: DFT is augmented with the description of long-ranged interactions from a correlated method like MP2 or RPA.62–64 DH functionals can be routinely applied to molecular systems, but large computational resources are necessary for condensed phase calculations. Therefore, the number of applications to condensed phase systems is rather limited. Nevertheless, in these applications, the increased accuracy of DH functionals over (meta)-GGA functionals and hybrid functionals for the description of condensed phase systems has been established.65–68 A variety of DH functionals, relying on various underlying hybrid density functionals and the correlated methods, are described in the literature.63,69–74 For practical applications, the availability of analytical gradients for these methods is important, enabling geometry optimizations and molecular dynamics simulations.
Even for smaller condensed phase systems, the calculation of HF exchange still requires large computational resources. The total number of electron repulsion integrals (ERIs) of atom centered basis functions increases at most quartically with respect to the system size. This scaling can be reduced by different techniques up to linear scaling.75–82 Despite of linear scaling approaches, HF calculations may suffer from large prefactors due to the high operation count of the integral calculation step, load-balancing problems in parallel applications, or their large memory footprint. Modern approaches make use of specialized libraries with optimized hardware-specific kernels, the Resolution-of-the-Identity (RI) approach or the Auxiliary Density Matrix Method (ADMM), and a few target modern Graphics Processing Unit (GPU) hardware.83–90 HF implementations for periodic systems need special care due to the singularity at the gamma point and the slow decay of the Coulomb operator. These problems are addressed by the use of short-ranged operators or by direct summation approaches.91–97 Specifically, HF and hybrid functional calculations with augmented basis sets for weakly interacting systems are computationally demanding because of the large extent of the additional diffuse basis functions and the resulting larger number of significant integrals.98–102
In Secs. II–VII, we will present an implementation of DH functionals based on MP2 with nuclear gradients and stress tensors suitable for periodic calculations at the Γ-point. We will further show the extension to the ADMM approximation for HF exchange. Extensive tests for accuracy and efficiency are included.
II. THEORETICAL BACKGROUND
In the following discussion, a, b, c represent an unoccupied (virtual) molecular orbital (MO), i, j, k occupied MOs, p, q denote any MO, Greek letters μ, ν, … denote primary basis set (PBS) indices, and capital Latin letters P, Q, R, … denote RI basis function indices. A hat indicates quantities and indices of the Auxiliary Density Matrix Method (ADMM). We will consider Γ-point calculations only such that all quantities can be chosen real. All basis functions and orbitals are periodically repeated, integrals are over the computational cell, and Coulomb integrals are calculated using Ewald techniques.
The total energy Etot of a density functional can be written as
with the one-particle Hamiltonian hμν, the one-particle density matrix for closed-shell systems Pμν = 2∑iciμciν, coefficients of the canonical MOs cpμ, the one-particle density , and the eXchange–Correlation (XC) functional EXC[ρ]. The MO coefficients are eigenvectors of the generalized eigenvalue problem (Kohn–Sham equation),2
with the matrix elements of the Hartree potential and the XC potential , the elements of the overlap matrix , and the MO energy ɛi of MO i. For DH functionals, the XC functional has the form
with the DFT exchange functional EDFT,X, the Hartree–Fock exchange functional EHF,X, the DFT correlation functional EDFT,C, the WFT correlation functional EWFT,C, and the corresponding scaling factors αDFT,X, αHF,X, αDFT,C, and αWFT,C, respectively. DFT XC functionals are usually local functionals written as integrals of a given function of the density and its gradient depending on the complexity of the DFT part. The HF exchange functional is given by
with the electron repulsion integral (ERI) in Mulliken notation
with the interaction potential v(r). Usually, the Coulomb potential is employed, but for range-separated functionals or certain implementations of HF for periodic systems, it might be of a different kind.33,34,69,91,96,97,103,104
Because the potentials in the Kohn–Sham equation depend on the unknown density, the Kohn–Sham equation has to be solved iteratively starting from an appropriate initial guess while recalculating densities and potentials subsequently until self-consistency (SCF approach). Under these conditions, the total energy functional is variational with respect to the density. Due to the high computational costs of WFT correlation methods, most DH functionals consider the wave-function correlation (WFC) functional to be only a post-SCF correction such that the total energy functional is not variational.
The WFC functional is the energy expression of MP2, RPA, or any other correlated method. Here, we will focus on the MP2 method given by
with the ERI in MO basis (ia|jb) = ∑μνκλciμcaνcjκcbλ(μν|κλ) and the MP2 transition amplitude .
The Auxiliary Density Matrix Method (ADMM) for the acceleration of HF calculations relies on the relationship
with the auxiliary density , the auxiliary density matrix , the MO coefficients in the auxiliary basis set (ABS) , and the auxiliary basis functions . The ABS contains a smaller number of primitive basis functions and more localized basis functions to reduce the computational costs. The error in the density is accounted for by the ADMM exchange correction functional EADMM. Common choices are the Perdew–Burke–Ernzerhof (PBE), the OPTX, or the B88 exchange functionals.24,105,106
There are several constraints for the auxiliary density matrix and the auxiliary MO coefficients possible. For an overview about different flavors of ADMM, we refer to the literature.86,107 In the following, we will consider the ADMM2 method only. The ADMM2 model requires the similarity of the MOs represented in both basis sets in a least-square sense such that the auxiliary MO coefficients are given by the expression
Let C and be the matrix of MO coefficients in the PBS and the ABS, respectively. Both are related by with the projector matrix , the overlap matrix in the auxiliary basis , and the overlap matrix between the ABS and the PBS Q. The analogous relation for the density matrices is given by .
A very successful approach to accelerate the calculation of ERIs is the Resolution-of-the-Identity (RI) approximation.108,109 It is based on the idea of fitting densities given by a product basis with a suitable ABS. In the following, we will focus on approximating ERIs with the Coulomb potential using a Coulomb metric, i.e., by fitting the Coulomb potential to obtain
with (P|Q)−1 being the matrix element of the inverse of the matrix with elements (P|Q). We will apply the RI approximation to the MP2 method, resulting in a significant speed-up and reduction of memory requirements.43,110 The fourfold transformation of the ERIs in the atomic orbital basis to the MO basis is simplified because of
with the intermediate
Because the total energy functional of DH functionals is not variational with respect to the density, we have to explicitly include the convergence constraints into the gradient calculation
With the method of Lagrangian multipliers, we derive similarly to the RI-MP2 method the following expression for the gradients:
with the MP2-relevant intermediates
the density matrix
the energy-weighted density matrix
the derivative of the Fock matrix
and the derivative of the overlap matrix . Note that the second line of the derivative of the Fock matrix vanishes without the ADMM. The matrix Zia is the solution of the Z-vector equation,
with the second order energy kernel
The Z-vector equations form a system of linear equations allowing a direct solution; however, these equations are commonly solved iteratively because the computational efforts of the exact solution scale as with N being a measure of the system size.
We briefly describe the implementation in CP2K, where the integral calculation is performed separately before the contraction steps. First, the integrals are calculated batch-wise in the atomic orbital basis in small subgroups, each responsible for all integrals with a given subset of auxiliary functions. Contraction of two-center and three-center integrals with the MO coefficients provides quantity BPia distributed among all processes with the RI index distributed among the subgroups and the virtual index distributed among the processes of a subgroup. In the following contraction step, all unique pairs of occupied orbitals are assigned to one of the aforementioned subgroups, each determining subsequently all four-center ERIs with the different orbital pairs according to Eq. (11). From the ERIs, the MP2 amplitudes and the energy contributions, and for a gradient calculation, additionally the intermediate ΓPia and the diagonal elements of the density matrices, can be calculated by exchanging the required data between the subgroups. In that fashion, only a subset of the MP2 amplitudes has to be available at the same time lowering the memory requirements. Communication costs are reduced by replicating parts of the integrals BPia within subsets of the subgroups. The non-diagonal elements of the density matrix are determined within a canonical reformulation. For further details, we refer the reader to the respective publications.110–112
Without the ADMM, the projectors become the unity matrix, the ABS is equal to the PBS, and the exchange correction functional is zero. In this case, we obtain the already known equations for DH gradients.113 Without a DFT functional, we are left with HF MOs and the equations reduce to their MP2 equivalents.114,115 Thus, the implemented methodology is a generalization of the common DH gradient theory. Due to the consideration of the ADMM approximation, the equations are related to those by Kumar et al. for linear response theory.116
The open-source CP2K software package is based on the quickstep algorithm.99,117 Its Gaussian and Plane Waves (GPW) approach employs a dual representation of orbitals and densities in a mixed Gaussian and plane-wave basis. The local nature of the Gaussian basis allows a linear-scaling implementation of the integral evaluation, whereas the plane-wave basis enables a linear scaling evaluation of the Hartree potential with respect to the system size. The plane-wave basis requires the use of pseudo potentials for which CP2K employs the Goedecker–Teter–Hutter form.118
In contrast to other implementations of correlated methods with localized basis sets, such as the periodic CRYSCOR project, which extends CRYSTAL to correlated methods, CP2K does not exploit any symmetry within the system, does not localize the orbitals, and the gradient implementation does not allow for all-electron calculations, making it more suitable for dynamics simulations that inhibit symmetry at all.119,120
III. COMPUTATIONAL DETAILS
The DFT and ADMM kernels were implemented into CP2K and are available with version 9.1.99 We employed Libint 2.6 for the calculation of ERIs.83 Calls to the routines of the BLAS/LAPACK/PBLAS/ScaLAPACK libraries were accelerated with cray-libsci_acc, version 20.10.1. Calls to PDGEMM exploited the COSMA library, version 2.5.1.121 All calculations were run on the Piz Daint supercomputer with 12 MPI processes per node. Each node is equipped with a NVIDIA Tesla P100 GPU.
All periodic HF calculations were carried out with a truncated Coulomb operator. The cutoff radius is chosen half of the least distance between two equivalent atoms of neighboring cells (minimum image convention).103
For this study, we will consider the DSD-PBEP86 functional, which is a GGA-based DH functional, with separately scaled same-spin and opposite-spin MP2 contributions and directly optimized dispersion correction.122 In the GMTKN55 study, it was one of the best performing functionals with an excellent performance for all different use cases.123 Because its HF part requires only the Coulomb potential and the required PBE exchange and P86 correlation functional are widely available, it is easily implemented in the most electronic structure software.
We employed the following test systems: three molecular crystals NH3, HCN, and Ar, three systems with 64 water molecules of different density, the benzene crystal, and the anatase crystal. The molecular crystals are the same as the ones used in previous studies.67,71 The three water-containing systems were taken from a training set of 1593 structures for a machine learning (ML) set of water.124 We propose the benzene and the anatase crystals as benchmark systems for double hybrid and MP2 gradients and potentially also for other wave function based methods to be run on large scale parallel computers with thousands of compute nodes.125
The PBSs are correlation-consistent valence-only basis sets of double-zeta (DZ) and triple-zeta (TZ) qualities.67,110 For Ar, we added the diffuse basis functions of the Dunning basis sets.126 These are not required for the other systems as discussed in a preceding study.67 The basis sets of Ti were taken from a previous study.127 For the ADMM calculations, we used ABSs of DZ quality for all elements. Suitable RI basis sets of the elements H, C, N, and O for the MP2 calculations were taken from Del Ben et al., those for Ar were optimized according to Weigend et al., and the non-augmented basis sets published in Stein et al.43,67,110 The pseudopotentials were optimized for the PBE0 method. All DH calculations used PBE orbitals as initial guess. The Schwarz screening parameter was set to values of 10−7 without the ADMM and of 10−10 with the ADMM, and derivative integrals use a corresponding 10−7 screening parameter for all calculations.
IV. ACCURACY
In the left panel of Fig. 1, we present the errors introduced by the ADMM approximation for specific energy differences of the test systems. For the three water systems, we use the differences in total energies. For the molecular crystals, we determined the cohesive energies corrected by the basis set superposition error (BSSE).128 With a TZ quality PBS, we find an error of at most 1.5 mEh ≈ 1 kcal/mol (gray lines in the diagram). Thus, the general accuracy of the double-hybrid functional is preserved with the ADMM.
ADMM errors for energy differences. For the water systems, we employ the difference in total energies of the medium and low density structure [H2O(m − l)] and the high and low density structure [H2O(h − l)] per water molecule. For the molecular crystals, we report the cohesive energies corrected by the basis set superposition error (BSSE). Left panel: signed errors. Right panel: absolute relative errors.
ADMM errors for energy differences. For the water systems, we employ the difference in total energies of the medium and low density structure [H2O(m − l)] and the high and low density structure [H2O(h − l)] per water molecule. For the molecular crystals, we report the cohesive energies corrected by the basis set superposition error (BSSE). Left panel: signed errors. Right panel: absolute relative errors.
On the right panel of Fig. 1, the corresponding absolute relative errors are shown. We find that the energy differences of the water systems have errors of 20%–100%, whereas the cohesive energies of the molecular crystals have a much lower relative error. The rather small energy differences between the medium- and low-density structures cause high relative errors, but the absolute error is still small. The cohesive energies of the molecular crystals are much larger with Ar having the smallest cohesive energy of 3.8 mEh with DSD-PBEP86-D3 (experimental value: 2.9 mEh).129
In Fig. 2, we report the mean absolute ADMM errors of the forces and stress tensor components. For the forces, we find an error of at most 0.1 mEh/a0. The average force component of the water system at medium density is 29.3 mEh/a0 providing an average relative error of 0.3%. For the stress tensor, we found an error of usually not more than 0.1 GPa corresponding to a relative error of at most 10% for the diagonal elements. Due to symmetry reasons, the forces on the argon atoms are zero in all cases.
Mean absolute ADMM errors for forces and stress tensor components. “H2O(l),” “H2O(m),” and “H2O(h)” refer to the water-containing systems of low, medium, and high density, respectively.
Mean absolute ADMM errors for forces and stress tensor components. “H2O(l),” “H2O(m),” and “H2O(h)” refer to the water-containing systems of low, medium, and high density, respectively.
V. PERFORMANCE
Figure 3 reports the computational timing of all methods and test systems. Except for Ar, we find for the given setting no reduction in the computational costs with the PBSs of DZ quality, whereas the ADMM is able to reduce the timings by 20%–70% for the TZ PBS. For Ar, we observe an improvement of about 50% and 80% with the PBSs of DZ and TZ qualities, respectively. These findings suggest for all systems except from Ar with a DZ basis set that the similar quality of the ABSs and the PBSs and the tighter threshold of the Schwarz criterion of the ADMM result in an increase in the computational timings because of an increased number of significant integrals. In contrast, we employed the augmented PBSs for Ar, which result in a significantly larger number of electron repulsion integrals than for the non-augmented ABS.
Computational time in node hours for different systems, PBSs, and methods. For a better comparison, we report also relative timings of ADMM calculations. (Augmented) basis sets of double-zeta (DZ) and triple-zeta (TZ) qualities were employed. “+ADMM” refers to the additional application of the ADMM approximation.
Computational time in node hours for different systems, PBSs, and methods. For a better comparison, we report also relative timings of ADMM calculations. (Augmented) basis sets of double-zeta (DZ) and triple-zeta (TZ) qualities were employed. “+ADMM” refers to the additional application of the ADMM approximation.
In the left panel of Fig. 4, we present a breakdown of the computational timings of the different methods. We find that the computational costs of MP2 are independent of the presence of the ADMM approximation. This has to be expected because the actual MP2 energy calculation and the assembly of the quantities of the MP2 force calculation are not directly affected by the ADMM approximation. Thus, the reduction or increase in the computational costs is caused by the solution of the Kohn–Sham equation and the Z-vector equations. For all systems apart from Ar, the costs of the integral calculation and SCF steps behave as follows: For the DZ PBS, the relative costs are 20%–44% without the ADMM and 31%–63% with the ADMM. However, for the TZ PBS, the relative costs are getting smaller when using the ADMM, namely, dropping from 19%–75% to 4%–12%. The computational costs of the MP2 energy calculation are at most as high as the computational costs of the integral calculation. We conclude that the quintical scaling limit of the MP2 method was not yet achieved in the examples, and further optimizations of the integral calculation step should be targeted.
Subdivision of the total computation time (left panel) and of the SCF+forces part only (right panel). “MP2 energy” refers to the quintically scaling computation steps (“mp2_ri_gpw_compute_en” of the CP2K timing report). “MP2 integrals” refers to the computation of the integrals and their derivatives (sum of “mp2_ri_gpw_compute_in” and “calc_ri_mp2_nonsep” of the CP2K timing report). “SCF+forces” refers to the difference of the total CP2K run time and the aforementioned parts. “4c” and “4c forces” refer to the timings of the routines “integrate_four_center” and “derivatives_four_center,” respectively, of the CP2K timing report. “rest” is the difference of “SCF+forces” and the compute time of the HF integrals. Each group of bars represents in the given order the timing information of DZ without the ADMM, DZ with the ADMM, TZ without the ADMM, and TZ with the ADMM.
Subdivision of the total computation time (left panel) and of the SCF+forces part only (right panel). “MP2 energy” refers to the quintically scaling computation steps (“mp2_ri_gpw_compute_en” of the CP2K timing report). “MP2 integrals” refers to the computation of the integrals and their derivatives (sum of “mp2_ri_gpw_compute_in” and “calc_ri_mp2_nonsep” of the CP2K timing report). “SCF+forces” refers to the difference of the total CP2K run time and the aforementioned parts. “4c” and “4c forces” refer to the timings of the routines “integrate_four_center” and “derivatives_four_center,” respectively, of the CP2K timing report. “rest” is the difference of “SCF+forces” and the compute time of the HF integrals. Each group of bars represents in the given order the timing information of DZ without the ADMM, DZ with the ADMM, TZ without the ADMM, and TZ with the ADMM.
If we consider the different contributions to the computational costs not directly related to MP2, as presented in the right panel of Fig. 4, we observe that the non-MP2 part is dominated by the calculation of the four center integrals and their derivatives. Thus, the costs of that part are reduced due to the ADMM as anticipated. The overhead of the ADMM projector matrix and the calculation of the exchange correction is negligible in all calculations.
In Fig. 5, we report the total number of calculated integrals. Apart from the DZ basis sets, the ADMM reduces the number of significant integrals by at least one order of magnitude, and in the case of Ar and the TZ basis set, by two orders of magnitude.
Number of four-center integrals (left panel) and four-center derivative integrals (right panel) calculated for different systems, PBSs, and methods. The number of integrals calculated with the ADMM is independent of the PBS.
Number of four-center integrals (left panel) and four-center derivative integrals (right panel) calculated for different systems, PBSs, and methods. The number of integrals calculated with the ADMM is independent of the PBS.
In Fig. 6, the GPU usage according to the Slurm reference is shown. Slurm is the job scheduling system employed by the CSCS supercomputing centre hosting the supercomputer on which all calculations were run (see its manual for further information).130 According to the Slurm manual, it provides the percentage of computational time that is spent on GPU. We observe an increase in GPU usage in accordance with a decrease in the computational time. For the ADMM calculations with a DZ basis set, the GPU usage drops from 40%–50% to 25%–45%. For the TZ PBS, the GPU usage increase from 20%–60% to 60%–75% when employing the ADMM. Finally, for Ar with a TZ PBS, we even find an increase from 14% to 86%. These numbers are in agreement with the fact that the exchange Fock matrix calculation is not GPU accelerated.
GPU usage averaged over all GPUs and estimates according to the Slurm reports for different systems and basis sets. Estimates based on the GPU usage without the ADMM and the total computational timings with and without the ADMM.
GPU usage averaged over all GPUs and estimates according to the Slurm reports for different systems and basis sets. Estimates based on the GPU usage without the ADMM and the total computational timings with and without the ADMM.
Finally, we show in Fig. 7 the strong scaling behavior of the ADMM-accelerated DH calculations. We used 64 water molecules at the ambient density and 12 ranks per node. The smallest possible number of nodes is determined by the memory requirements of MP2 being 115 GB. Due to the parallel memory overhead of the quickstep implementation, we observe a peak memory per process of 3878 MB, resulting in total memory costs of about 727 GB, which corresponds to at least 12 nodes of the Piz Daint supercomputer. With 256 nodes, we observe a parallel efficiency of 55%, which provides a reasonable scaling of approximately one order of magnitude with respect to the amount of computational resources.
Strong scaling plot for the ADMM-DSD-PBEP86-D3 functional on Piz Daint with 12 MPI processes per node.
Strong scaling plot for the ADMM-DSD-PBEP86-D3 functional on Piz Daint with 12 MPI processes per node.
VI. LARGE BENCHMARK SYSTEMS
The exponential increase in computational power of modern computers makes calculations with DH functionals for large systems more and more feasible.132 We will demonstrate the current state with calculations carried out for large benchmark systems performed efficiently on compute nodes with 1024 GPUs. We present two different benchmark systems for which we calculated energies and forces: the benzene crystal and the anatase crystal. The geometrical parameters, the number of basis functions, and the number of floating point operations of the quintically scaling steps for each test system are summarized in Table I. The computational timings and some performance related parameters are summarized in Table II.
Geometrical parameters of the benchmark systems. See the available input files in the Materialscloud for further information.131
System . | Super cell size . | a; b; c (Å) . | α; β; γ (deg) . | Reference . |
---|---|---|---|---|
Benzene | 2 × 2 × 2 | 7.398; 9.435; 6.778 | 90; 90; 90 | 125 |
Anatase | 1 × 1 × 1 | 9.674; 9.826; 15.125 | 74.9; 75.9; 78.3 |
System . | Super cell size . | a; b; c (Å) . | α; β; γ (deg) . | Reference . |
---|---|---|---|---|
Benzene | 2 × 2 × 2 | 7.398; 9.435; 6.778 | 90; 90; 90 | 125 |
Anatase | 1 × 1 × 1 | 9.674; 9.826; 15.125 | 74.9; 75.9; 78.3 |
Information about the performance of the benchmark systems on the Piz Daint supercomputer. N, o, n, v, nADMM, and nRI refer to the number of atoms, the number of occupied orbitals, unoccupied orbitals, basis functions, ADMM auxiliary basis functions, and RI basis functions, respectively. mem provides the minimal required memory for the MP2 gradient calculation according to Ref. 111. FLOPs refers to the number of floating point operations of the quintical scaling steps. t provides the computational time in node hours.
System . | N . | o . | n . | v . | nADMM . | nRI . | mem (GB) . | FLOPs . | t . |
---|---|---|---|---|---|---|---|---|---|
Benzene | 384 | 480 | 8256 | 7776 | 3456 | 20 352 | 1703 | 1.7 · 1018 | 515 |
Anatase | 144 | 576 | 5424 | 4848 | 2928 | 15 360 | 961 | 7.2 · 1017 | 354 |
System . | N . | o . | n . | v . | nADMM . | nRI . | mem (GB) . | FLOPs . | t . |
---|---|---|---|---|---|---|---|---|---|
Benzene | 384 | 480 | 8256 | 7776 | 3456 | 20 352 | 1703 | 1.7 · 1018 | 515 |
Anatase | 144 | 576 | 5424 | 4848 | 2928 | 15 360 | 961 | 7.2 · 1017 | 354 |
Anatase is a highly investigated material in the context of photocatalysis.133–137 Due to its large bandgap, it can be tackled with perturbation theory. From a computational point of view, the inclusion of semicore states and high-angular quantum number polarization functions and the large RI basis set (in our case, up to l = 6) provide a computationally demanding system.
Finally, we discuss the importance of the different calculation parts with respect to the total computational time. In Table III, we report the relative computational costs of the two calculations. The quintical scaling steps dominate the computational cost with a relative contribution of roughly 60%. The HF calculation still contributes to 10% of the costs, which is mostly related to the limited scaling of the underlying communication algorithm.103
Relative timings and GPU usage of the benchmark systems on 1024 nodes of the Piz Daint supercomputer (see Fig. 4 for definitions).
System . | HF (%) . | MP2 integrals (%) . | MP2 energy (%) . | GPU usage (%) . |
---|---|---|---|---|
Benzene | 9.3 | 19 | 59 | 38 |
Anatase | 10 | 17 | 62 | 29 |
System . | HF (%) . | MP2 integrals (%) . | MP2 energy (%) . | GPU usage (%) . |
---|---|---|---|---|
Benzene | 9.3 | 19 | 59 | 38 |
Anatase | 10 | 17 | 62 | 29 |
VII. DISCUSSION
Because the calculation of the HF kernel still consumes a significant amount of computation time of the DH gradient calculation, a reduction of its costs is of general interest. The ADMM approximation was developed to accelerate HF calculations while conserving the accuracy. We find that the ADMM provides a significant reduction of the total computational time of DH gradient calculations with the TZ PBSs. In addition, the size of the ADMM basis, the most important parameter determining the performance of HF calculations, is the Schwarz screening threshold. We increased this threshold for standard calculations as much as possible without significant loss of accuracy, whereas we kept a tighter value for the ADMM. This means that the performance benefit of the ADMM is actually the lowest possible value because of the unoptimized Schwarz threshold. For water systems, we found that a threshold of order 10−7 can be used for the ADMM. In practice, however, an optimized Schwarz screening threshold could be used to push ADMM accuracy without increasing the overall costs of the DH calculations.
The accuracy of the ADMM determined here is sufficient for Machine Learning (ML) approaches. The ML fitting procedure removes potential numerical noise from the training data, making it more robust. The applicability to other kinds of applications depends on the magnitude of the expected energy differences. For a higher accuracy, we recommend a slightly larger ABS, the actual size has to be adjusted to the available resources and the required accuracy. For molecular crystals with sufficiently large cohesive energies, the accuracy is considered sufficient because the employed functional introduces a larger uncertainty. Thus, for molecular dynamic simulations with the same kinds of systems, the presented methodology will provide suitable accuracy.
The ADMM could be combined with RI-HF exchange or other methods, such as the Chain-of-Spheres (COS) approach, to accelerate HF exchange calculations further.85,138,139 This combination of approaches has the potential to further reduce the time for HF and response calculations. However, as the HF part of the DH gradient calculations has already been reduced significantly by the ADMM, no major further relative improvements can be expected. The main advantage of such a combination might be in two other areas: RI-HFX shifts the computational most demanding part from the calculation of ERIs to matrix multiplications and tensor contractions. This allows efficient algorithms with a much smaller memory footprint and makes GPU acceleration accessible. We will investigate this approach and will report results in the future.
VIII. CONCLUSIONS
We presented three achievements: an efficient implementation of forces and stress tensors of DH functionals in the CP2K software package, application of ADMM HF and response calculations to reduce the computational costs of HF exchange in the context of DH functionals energy and gradients, and establishment of benchmark systems for MP2-based functionals in the condensed phase. Our approach exploits modern GPUs efficiently and reduces the prefactors of the lower scaling steps significantly. The methods allow to efficiently use large and diffuse basis functions in the condensed phase.
With the provided methodology, we open the way to perform geometry optimizations and molecular dynamics simulations and to calculate training data for machine learning approaches on the double hybrid functional level of systems with a finite bandgap. The main advantage of the ADMM is its universal applicability and its scaling behavior for condensed systems. Furthermore, it allows an efficient treatment of diffuse basis functions, which would otherwise increase the computational costs of HF significantly. For the calculation of DH gradients, the ADMM allows us to reduce the computational costs of the HF kernel to at most 10% while conserving chemical accuracy.
The main drawback of the current implementation is the still quintical scaling of the underlying MP2 method. To overcome this issue, we will have to employ either low-scaling variants of MP2 or a computationally lower scaling method, such as RPA or OS-MP2.
ACKNOWLEDGMENTS
We gratefully acknowledge the support of the MARVEL National Centre for Competency in Research funded by the Swiss National Science Foundation (Grant Agreement No. 51NF40-182892). This work was supported by a grant from the Swiss National Supercomputing Centre (CSCS) under Project Nos. d110 and c18.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
DATA AVAILABILITY
The data that support the findings of this study are openly available in the Materialscloud at https://doi.org/10.24435/materialscloud:3v-pw. The changes to the CP2K source code are available in release version 9.1 of CP2K under https://github.com/cp2k/cp2k.