We review the GPAW open-source Python package for electronic structure calculations. GPAW is based on the projector-augmented wave method and can solve the self-consistent density functional theory (DFT) equations using three different wave-function representations, namely real-space grids, plane waves, and numerical atomic orbitals. The three representations are complementary and mutually independent and can be connected by transformations via the real-space grid. This multi-basis feature renders GPAW highly versatile and unique among similar codes. By virtue of its modular structure, the GPAW code constitutes an ideal platform for the implementation of new features and methodologies. Moreover, it is well integrated with the Atomic Simulation Environment (ASE), providing a flexible and dynamic user interface. In addition to ground-state DFT calculations, GPAW supports many-body GW band structures, optical excitations from the Bethe–Salpeter Equation, variational calculations of excited states in molecules and solids via direct optimization, and real-time propagation of the Kohn–Sham equations within time-dependent DFT. A range of more advanced methods to describe magnetic excitations and non-collinear magnetism in solids are also now available. In addition, GPAW can calculate non-linear optical tensors of solids, charged crystal point defects, and much more. Recently, support for graphics processing unit (GPU) acceleration has been achieved with minor modifications to the GPAW code thanks to the CuPy library. We end the review with an outlook, describing some future plans for GPAW.
I. INTRODUCTION
The electronic-structure (ES) problem, i.e., the solution to the time-independent Schrödinger equation for a collection of electrons and atomic nuclei, forms the starting point for the quantum-mechanical treatment of matter. Indeed, all chemical and physical properties of any substance (solid, molecule, surface, etc.) can, in principle, be obtained from the energies and wave functions that constitute the solution. A pioneering step towards solving the many-body ES problem was the formulation and formal proof of density functional theory (DFT) by Hohenberg and Kohn in 19641 and a practical scheme for its solution by Kohn and Sham in 1965.2 At present, most codes solving the ES problem from first principles are based on DFT. Such codes are extremely powerful and allow one to determine the atomic structure of solids and molecules containing hundreds of atoms with a relative error below 1%.3–5 Once the atomic structure of the compound has been solved, its properties (electronic, magnetic, optical, topological, etc.) can, in principle, be determined. The evaluation of properties often involves theories beyond the formal DFT framework to account for effects such as temperature and lattice vibrations,6,7 many-body interactions in excited states,8,9 or time dependence.10,11 As such, first-principles atomistic calculations often involve two successive phases: the solution of the ground-state ES problem (including ion dynamics) and the subsequent evaluation of physical properties. This review is structured accordingly, as Secs. III and IV deal with the first phase while Secs. V–IX are devoted to the second.
In recent years, the scientific significance of ES codes has shifted from a useful tool to describe and understand matter at the atomic scale to an independent driver of the discovery and development of new materials.12–16 This change in scope has been fueled by the exponential increase in computer power accompanied by improved numerical algorithms17,18 as well as the use of workflow management software for high-throughput computations19–22 and the adoption of machine-learning techniques to leverage the rapidly growing data generated by ES codes.23–25 In parallel with these capacity-extending developments, continuous progress in the fundamental description of exchange–correlation effects has advanced the predictive power of ES calculations to a level where they rival experiments in terms of accuracy for many important properties.26–31
The grid-based projector-augmented wave (GPAW) code was originally intended as a Python-based multigrid solver of the basic DFT equations within the projector-augmented wave (PAW) formalism.32 The name GPAW, accordingly, was an abbreviation for “grid-based projector-augmented wave.” At present, other choices than regular grids for representations of the wave functions exist in GPAW, but the name has stuck. During the years 2005–2010, GPAW evolved to a full-blown DFT package33 supporting most of the functionality expected from a modern ES code, in addition to a few more specialized features, including real-time propagation of wave functions34 and an alternative basis of numerical atomic orbitals [referred to as the linear combination of atomic orbitals (LCAOs) basis]35 to supplement the real-space grid. In 2011, a plane-wave (PW) basis set was also implemented. At present, the possibility to use three different types of basis sets and even combine them within a single run remains a unique feature of GPAW, rendering the code very versatile.
The implementation of the PW basis set laid the groundwork for GPAW’s linear-response module, which at present supports the calculation of linear response functions,36 total energies from the adiabatic connection fluctuation–dissipation theorem,29,37 the GW self-energy method for quasiparticle band structures,38 the Bethe–Salpeter Equation (BSE) for optical excitations,39 and more. The code also supports a wide range of features related to the calculation of magnetism and spin–orbit effects. Examples include spin-spiral calculations using the generalized Bloch theorem,40 external magnetic fields, orbital magnetization, magnetic anisotropy,41 adiabatic magnon dispersions from the magnetic force theorem,42 and dynamic magnetic response from time-dependent density functional theory (TDDFT).43 For solids, the k-space Berry phases can be computed directly from the Bloch orbitals and may be used to obtain spontaneous polarization,44 Born effective charges, piezoelectric response tensors,45 and various indices characterizing the band topology.46
In addition, GPAW can compute the localization matrices forming the basis for the construction of Wannier functions with, e.g., the Atomic Simulation Environment (ASE)47 or Wannier90.48 Electrostatic corrections to the formation energies of charged point defects in insulators are implemented, as are calculations of the hyperfine coupling and zero-field splitting for localized electron spins. GPAW also offers the possibility to perform time-independent, variational calculations of localized electronic excitations in, e.g., molecules or crystal point defects, using direct orbital optimization strategies implemented for all three types of basis sets.49–51 This provides an efficient and robust alternative to traditional “ΔSCF” approaches. GPAW can also be used to describe ultrafast electron dynamics within time-dependent density functional theory (TDDFT), with wave functions represented either on a real-space grid34 or in the LCAO basis.52 The latter can provide a significant speed-up due to the relatively small size of the basis.53–55 The LCAO representation also forms the basis for the calculation of electron–phonon couplings as well as non-linear optical spectra such as Raman scattering56 (which can alternatively be obtained in the PW mode as a finite difference of the dielectric tensor), second-harmonics generation,57 and shift currents58 using higher-order perturbation theory.
II. WHY GPAW?
A. User’s perspective
There are dozens of electronic-structure codes available for interested users. The codes differ in their license (in particular, whether open or proprietary), the underlying programming language (e.g., Fortran, C, and Python), their treatment of core electrons (all-electron vs pseudopotentials), the employed representations of the wave functions (plane waves, atom-centered orbitals, and real-space grids), and the beyond-DFT features they support. Why should one choose GPAW?
In this section, we describe some of the features that make GPAW interesting from the point of view of a common user who wants to perform electronic-structure calculations. Section II B focuses on its possibilities for more advanced users, who may want to modify the code or implement completely new functionalities.
A first point to note is that GPAW is written almost exclusively in Python and is directly integrated with the Atomic Simulation Environment. This integration with ASE makes the setup, control, and analysis of calculations easy and flexible. The programming language is of course a key issue for developers, but the common user also benefits from Python and the ASE/GPAW integration. A typical stand-alone program only offers a fixed (though, of course, possibly large) set of tasks that it can perform, while Python scripting allows for a more flexible use of the code. This could, for example, mean combining several different GPAW calculations in new ways. Another advantage is that “inner parts” of the code like the density or the Kohn–Sham eigenvalues are directly accessible in a structured format within Python for further analysis. It is even possible to “open up the main loop” of GPAW and have access to, inspect, and also modify key quantities during program execution (see Fig. 1).
The variable calc is the ground-state DFT calculator object, and its icalculate method yields a context object at every self-consistent field (SCF) step. As seen, one can use this in a for-loop to implement special logic for the termination of the SCF iterations or for diagnostics. In this example, the memory usage is written to the log-file for the first 15 SCF iterations.
The variable calc is the ground-state DFT calculator object, and its icalculate method yields a context object at every self-consistent field (SCF) step. As seen, one can use this in a for-loop to implement special logic for the termination of the SCF iterations or for diagnostics. In this example, the memory usage is written to the log-file for the first 15 SCF iterations.
As already mentioned in the introduction, GPAW distinguishes itself from other available ES codes by supporting three different ways of representing the wave functions. The most commonly used basis set is plane waves (PW), which is appropriate for small or medium-sized systems where high precision is required. Convergence is easily and systematically controlled by tuning the cutoff energy. A large number of advanced features and “beyond-DFT” methods are available in the PW mode. These include the calculation of hybrid functionals, RPA total energies, linear-response TDDFT, and many-body perturbation theory techniques like GW and the Bethe–Salpeter equations. The new graphics processing unit (GPU) implementation also uses the PW mode.
The wave functions can alternatively be represented on real-space grids, which was the original approach in GPAW. The implementation of this so-called finite-difference (FD) mode relies on multi-grid solutions of the Poisson and Kohn–Sham equations. The FD mode allows for more flexible boundary conditions than the PW mode, which is restricted to periodic supercells. The boundary conditions may, for example, be taken to reflect the charge distribution in the unit cell. Calculations in the FD mode can be systematically converged through lowering the grid spacing, but the approach to full convergence is slower than in the PW mode. The FD mode is particularly well suited for large systems because the wave-function representation allows for large-scale parallelization through real-space decomposition. Furthermore, it is possible to perform time-propagation TDDFT, including Ehrenfest dynamics, in this mode.
The third representation of the wave functions is a basis of numerical atom-centered orbitals in the linear combination of atomic orbitals (LCAOs) mode. The size of the basis set can be varied through the inclusion of more angular momentum channels, additional orbitals within a channel, or polarization functions. GPAW comes with a standard set of orbitals, but a basis-set generator is included with the code so that users may construct different basis sets depending on their needs and requirements. The LCAO mode is generally less accurate than the PW and FD modes, but it allows for the treatment of considerably larger systems—more than ten thousand atoms. It is also possible to study electron dynamics through the fast implementation of time-propagation DFT, and Ehrenfest dynamics is under development.
As explained, the different modes have different virtues and limitations, and it can, therefore, be an advantage to apply several modes to a project. For larger systems, it is, for example, possible to divide structure optimization into two steps. First, an optimization is performed with the fast LCAO basis, leading to an approximately correct structure. This is then followed by an optimization in either the PW or FD mode, which now requires much fewer steps because of the good initial configuration. Due to the ASE/Python interface, this combined calculation can easily be performed within a single script.
Since GPAW was originally created with the FD mode only, and the LCAO mode was added next, some features have been implemented for only those modes. Examples are real-time TDDFT (see Sec. VII) and electron–phonon coupling (see Sec. VI G). Conversely, some new features only work for the PW mode, which was added after the real-space modes. Examples are RPA total energies (see Sec. VI C) and the calculation of the stress tensor. To summarize, given the limitations just mentioned, users should most of the time use PW or LCAO mode, and the choice will depend on the accuracy needed and the resources available. The GPAW webpage has a table showing up-to-date information about which features work with which modes.59
B. Developer’s perspective
The GPAW source code is written in the Python and C languages and is hosted on GitLab,60 licensed under the GNU General Public License v3.0. This ensures the transparency of all features and allows developers to fully customize their experience and contribute new features to the community.
An advantage of having Python code is that the Python script you write to carry out your calculations will have access to everything inside a GPAW calculation. An example showing the power and flexibility this affords is the possibility of having user-code inserted inside the self-consistent field (SCF) loop, as demonstrated in Fig. 1.
At the time of this writing (July 2023), GPAW has two versions of the ground-state DFT code in the main branch of the code. There is the older version that has grown organically since the birth of GPAW: it has many features but also a lot of technical debt that makes it harder to maintain and less ideal to build new features on top of. The newer ground-state code addresses these issues by having a better overall design.
The new design greatly improves the ease of implementing new features. The goal is to have the new code be feature-complete so that it can pass the complete test suite and then delete the old code once that is achieved. At the moment, we recommend that all production calculations be performed with the old code and that work on new features be performed on top of the new code, even though certain features are not yet production-ready. Three new features, not present in the old code base, have already been implemented based on the new code: GPU implementation of PW mode calculations (see Sec. III B 9), reuse of the wave functions after unit-cell changes during cell optimization, and spin-spiral calculations (see Sec. V E).
GPAW uses pytest61 for its test suite, which currently consists of ∼1600 unit and integration tests (see Table I). A subset of those tests runs as part of GitLab’s continuous integration (CI), thereby checking the correctness of every code change. Unfortunately, the full test suite is too time-consuming to run as part of CI, so we run that nightly both in serial as well as in parallel using MPI.
Number of files and number of lines of code in the git repository of GPAW. The Python source-code files are split into three parts: the actual code, the test suite, and code examples in the documentation.
Type of file . | Files . | Lines . |
---|---|---|
Python (the code) | 513 | 146 604 |
C | 80 | 19 719 |
Python (test suite) | 681 | 47 147 |
Python (documentation) | 744 | 32 014 |
Type of file . | Files . | Lines . |
---|---|---|
Python (the code) | 513 | 146 604 |
C | 80 | 19 719 |
Python (test suite) | 681 | 47 147 |
Python (documentation) | 744 | 32 014 |
Many of the code examples in GPAW’s documentation, exercises, and tutorials62 require resources [time and number of central processing units (CPUs)] beyond what would make sense to run as part of the pytest test suite. For that, we use MyQueue21 to submit those scripts as jobs to a local supercomputer every weekend. At the moment, this amounts to ∼5200 core-hours of calculations.
As can be seen from Table I, the majority of the code is written in Python, which is an interpreted language that is easy to read, write, and debug.
Interpreter-executed code will not run as efficiently as code that is compiled into native machine code. It is, therefore, important to make sure that the places in the code where most of the time is spent (hot spots) are in native machine code and not in the interpreter. GPAW achieves this by implementing the hot spots in C-code with Python wrappers that can be called from the Python code. Examples of such computationally intensive tasks are applying a finite-difference stencil to a uniform grid, interpolating from one uniform grid to another, or calculating overlaps between projector functions and wave functions. In addition, we have Python interfaces to the numerical libraries FFTW,63 ScaLAPACK,64 ELPA,17 BLAS, Libxc,65,66 libvdwxc,67 and MPI. Finally, GPAW makes heavy use of the NumPy68 and Scipy69 Python packages. NumPy provides us with the numpy.ndarray data type, which is an N-dimensional array that we use for storing wave functions, electron densities, potentials, matrices like the overlap matrix or the LCAO wave function coefficients, and much more. The use of NumPy arrays allows us to use the many sub-modules of SciPy to manipulate data. This also gives us an efficient memory layout, allowing us to simply pass a pointer to the memory whenever we need to call the C-code from the Python code. With this strategy, we can get away with having most of the code written in a relatively slow interpreted language and still have most of the time spent in highly efficient C-code or optimized numerical libraries.
The advantage of the original FD-mode, where there are no Fourier transforms of the wave functions, is that the algorithms should parallelize well for large systems. In practice, it has turned out that FD-mode has a number of disadvantages: (1) Due to integrals over the unit cell being performed as sums over grid-points, there will be a small periodic energy variation as you translate atoms, and the period of the variation will be equal to the grid-spacing used (the so-called egg-box error); (2) The system sizes that are typically most interesting for applications of DFT are too small for the parallel scalability to be the decisive advantage; (3) The memory used to store the wave functions on uniform grids in real-space is significant. In contrast, the PW mode has practically no egg-box error, is very efficient for the most typical system sizes, and often uses a factor of 10 less memory compared to an FD mode calculation of similar accuracy. The main advantages of LCAO mode are low memory usage and high efficiency for large systems; for small unit cells with many k-points, the PW mode is the most efficient. One disadvantage of LCAO-mode is egg-box errors: LCAO-mode uses the same uniform grids as used in FD-mode for integration of matrix elements like and, therefore, has similar egg-box energy variation. A second disadvantage of LCAO-mode is that, as for any localized basis-set, reaching the complete basis-set limit is more involved compared to PW and FD modes. This can have severe consequences, even for ground-state calculations of difficult systems such as Cr2, for example.70 In PW or FD modes, the complete basis set limit is easy to reach by simply increasing the number of plane-waves or grid-points, respectively, which leads to smooth convergence.32 At the moment, we only provide double-ζ polarized (DZP) basis sets, and going beyond DZP is left for users to do themselves.
III. GROUND-STATE DFT
A. Projector augmented-wave method
The diverging Coulomb potential causes rapid oscillations in electronic wave functions near the nuclei, and special care is required to be able to work with smooth wave functions. The projector augmented-wave (PAW) method by Blöchl71 is a widely adopted generalization of pseudopotential methods, utilizing their strength of smooth pseudo-wave functions while retaining a mapping from all-electron wave functions (|ψn⟩) to pseudo-wave functions .
This integral is numerically evaluated in the Cartesian product of a Lebedev angular grid and a non-uniform radial grid (a denser mesh closer to the nucleus) for each atom.
B. Numerical implementation
1. Wave-function representations
2. All-electron quantities
The beauty of the PAW method is that you never need to transform the pseudo-wave functions to all-electron wave functions, but you can do it if you want to. GPAW has tools for interpolating the pseudo-wave functions to a fine real-space grid and adding the PAW corrections. A fine real-space grid is needed to properly represent the cusp and all the oscillations necessary for the wave function to be orthogonal to all the frozen core states.
GPAW also has tools for calculating the all-electron electrostatic potential. This is useful for transmission electron microscopy (TEM) simulations.72 Most TEM simulations have relied on the so-called independent atom model (IAM), where the specimen potential is described as a superposition of isolated atomic potentials. While this is often sufficient, there is increasing interest in understanding the influence of valence bonding.73 This can be investigated by a TEM simulation code such as abTEM,74 which can directly use ab initio scattering potentials from GPAW.
3. Solving the Kohn–Sham equation
The default method for solving the Kohn–Sham equation for PW and FD modes is to do iterative diagonalization combined with density mixing; for LCAO mode, we do a full diagonalization of the Hamiltonian. Alternatively, one can do direct minimization as described in Sec. III B 5.
For PW and FD modes, we need an initial guess of the wave functions. For this, we calculate the effective potential from a superposition of atomic densities and diagonalize an LCAO Hamiltonian in a small basis set consisting of all the pseudo-partial waves corresponding to bound atomic states.
Each step in the self-consistent field (SCF) loop consists of the following operations: (1) diagonalization of the Hamiltonian in the subspace of the current wave functions (skipped for LCAO); (2) one or more steps through the iterative eigensolver (except for LCAO, where a full diagonalization is performed); (3) update of eigenvalues and occupation numbers; and (4) density mixing and symmetrization. See previous work32,33 and Ref. 75 for details.
GPAW has two kinds of Poisson equation solvers: direct solvers based on Fourier transforms or Fourier-sine transforms, and iterative multi-grid solvers (Jacobi or Gauss–Seidel). The default is to use a direct solver, whereas iterative solvers may be chosen for larger systems where they can be more efficient.
For 0-, 1-, and 2-dimensional systems, the default boundary conditions are to have the potential go to zero on the cell boundaries. This becomes a problem for systems involving large dipole moments. The potential due to the dipole is long-ranged and, therefore, the converged potential requires large vacuum sizes. For molecules (0D systems), the boundary conditions can be improved by adding multipole moment corrections to the density so that the corresponding multipoles of the density vanish. The potential of these corrections is added to the obtained potential. The same trick is used to handle charged systems. For slabs (2D systems), a dipole layer can be added to account for differences in the work functions on the two sides of the slab.
4. Updating wavefunctions in dynamics
Simulations commonly move the atoms without changing other parameters. If an atom moves only slightly, we would expect most of the charge in its immediate vicinity to move along with it. We use this to compute an improved guess for the wave functions in the next self-consistency loop with FD or PW mode, where the eigensolver is iterative.
The method could be further improved by using the LCAO basis set and the overlap matrix to prevent double-counting.
5. Direct minimization
The GPAW formulation of DM is applicable to all representations of the orbitals available in GPAW, as well as Kohn–Sham (unitary invariant) and orbital density-dependent (nonunitary invariant) energy functionals, and can be used for both finite and extended systems. In LCAO calculations, the reference orbitals are expressed as a linear combination of the atomic basis functions Φ, Ψ0 = ΦC0, where the matrix of coefficients C0 is fixed. Therefore, only a minimization with respect to the elements of matrix A is required.82 For plane wave and real-space grid representations, a minimization in the space tangent to the reference orbitals is sufficient if the functional is unitary invariant. Otherwise, if the functional is nonunitary invariant, such as when self-interaction correction is used (see Sec. III C 5), an inner loop minimization in the occupied–occupied block of the matrix A is performed to make the energy stationary with respect to the unitary transformation of the occupied orbitals.50
The DM method avoids diagonalization of the Hamiltonian at each step, and as a result, it usually involves a smaller computational effort. The DM method has also been shown to be more robust than conventional eigensolvers and density mixing in calculations of molecules and extended systems.82 However, the current implementation does not support a finite-temperature distribution of occupation numbers and thus can only be used for systems with a finite bandgap.
6. Convergence criteria
The modular architecture of GPAW allows the user to have precise control over how the SCF loop decides that the electronic structure has converged to sufficient precision. GPAW contains simple keywords for common convergence criteria, such as “energy,” “forces,” (electron) “density,” and “work function,” which are sufficient for the most common use cases.
Internally, all convergence criteria are Python objects that are instances of convergence classes. For each step through the SCF loop, each convergence object is called. When any convergence object is called, it is passed a context object that contains the current state of the calculation, such as the wavefunctions and the Hamiltonian. The criterion can thus pull relevant data from the calculation to decide if it is convergent. Because the convergence criterion itself is an object, it can store information such as previous values of the energy for comparison to the new value. When all convergence criteria report that they are converged, the calculation as a whole is considered to be converged and terminates.
This modular nature gives the user full control over how each convergence criterion operates. For example, the user can easily ask the energy criterion to check the differences in the last four values of the energy rather than the last three. If a convergence criterion itself is expensive to calculate, it can make sense to not check it until the rest of the convergence criteria are met. This can be accomplished by activating an internal “calculate_last” flag within the convergence criterion.
Users can easily add their own custom convergence criteria to the SCF loop. If a user would like to use a criterion not included by default with GPAW, it is straightforward to write a new criterion as a Python class and pass this to the convergence dictionary of GPAW. For example, if one wanted to be sure the bandgap of a semiconductor was converged, the criterion could check the bandgap at each iteration, compare it to stored values from previous iterations, and report that the calculation is converged when the peak-to-peak variance among the last n iterations is below a given threshold.
7. PAW datasets and pseudopotentials
GPAW can read PAW datasets from (possibly compressed) XML files following the PAW-XML specification.83 Dataset files for most of the periodic table can be downloaded from the GPAW webpage or installed with the gpaw install-data command-line tool. The datasets are available for the local-density approximation (LDA), Perdew–Burke–Ernzerhof (PBE),3 revPBE,346 RPBE,345 and GLLBSC97 functionals. The electronic structure code Abinit84 also reads the PAW-XML format, allowing GPAW and Abinit to share PAW dataset collections such as the Jollet–Torrent–Holzwarth collection.85
Specialized datasets can be generated with the gpaw dataset command-line tool. This allows one to tweak the properties of a dataset. Some examples could be: (1) add more/less semi-core states; (2) increase/decrease the augmentation sphere radius to make the pseudo-wave functions more/less smooth; (3) add/remove projector functions and corresponding pseudo and all-electron partial waves; or (4) base the PAW dataset on a different XC-functional. These changes will affect the accuracy and cost of the calculations.
GPAW is also able to use norm-conserving pseudopotentials (NCPP) such as Hartwigsen-Goedecker-Hutter (HGH)86 and pseudopotentials in the Unified Pseudopotential Format (UPF), such as SG15.87 Non-local NCPPs can be considered an approximation to PAW: in the PAW description, the non-local part of the Hamiltonian [the term containing in Eq. (6)] will adapt to the environment, whereas for a NCPP, will be diagonal and have a fixed value taken from a reference atom. Because of norm-conservation, NCPPs will have .
8. Parallelization
GPAW can parallelize over various degrees of freedom, depending on the type of calculation, and it implements multiple algorithms for achieving good parallel performance and scalability. In calculations involving k-points, parallelization over them is typically the most efficient, as little communication is needed during the summation of wave functions to calculate the density and any derived quantities. As the number of k-points is often limited, especially in large systems, parallelization is also possible over real-space grids in the FD and LCAO modes, as well as over plane waves in the PW mode. All modes also support parallelization over electronic bands, which is particularly efficient for real-time TDDFT where the time-propagation of each band can be carried out independently. Additional parallelization possibilities exist depending on calculations, such as over electron–hole pairs in linear-respose TDDFT calculations.
Parallelization is performed mainly with MPI. In the FD and LCAO modes, it is possible to use OpenMP within shared-memory nodes, which can improve performance when the number of CPU cores per node is large. Dense linear algebra, such as matrix diagonalization and Cholesky decomposition, can be carried out with the parallel ScaLAPACK or ELPA libraries. This applies to both the direct diagonalization in the LCAO mode as well as to subspace diagonalizations in the iterative Davidson and Residual Minimization Method with Direct Inversion in the Iterative Subspace (RMM-DIIS) methods in the FD and PW modes.
For ground-state calculations, GPAW will divide the cores into three MPI communicators: k-points, bands, and domain. When parallelizing over k-points and/or bands, all the cores of the k-point and/or band communicators will have a copy of the density (possibly distributed over the domain communicator). GPAW has the option to redistribute the density from the domain-communicator to all cores so that operations such as evaluating the XC-energy and solving the Poisson equation can be performed more efficiently.
For each k-point, all the cores in the band and domain communicators will cooperate on calculating the matrix elements of the Hamiltonian and the overlap operators. Dense-matrix linear-algebra operations on those matrices can be performed on a single core (most efficient for small systems) or with ScaLAPACK, where the matrices are distributed over either all or some of the cores from the pool of cores in the band and domain communicators.
One drawback of Python is that in large parallel calculations, its import mechanism may incur a heavy load on the filesystem because all the parallel processes are trying to read the same Python files. GPAW tries to alleviate this by using a special “broadcast imports” mechanism: during initial module imports, only a single process loads the modules; afterward, MPI is used to broadcast the data to all the processes.
Parallel scalability depends strongly on the calculation mode and the system. The FD mode offers the best scalability for high core counts, as only nearest-neighbor communication is needed across domains over a domain decomposition. In PW mode, the limiting factor is all-to-all communication in parallelization over plane waves. In LCAO mode, communications arise from multi-center integrals of basis functions across domains. At best, GPAW scales to tens or hundreds of nodes in supercomputers.
9. GPU implementation
The GPU implementation of GPAW works both on NVIDIA and Advanced Micro Devices (AMD) GPUs, targeting either CUDA or Heterogeneous-Compute Interface for Portability (HIP) backends, respectively. GPAW uses a combination of manually written GPU kernels, external GPU libraries (such as cuBLAS/hipBLAS), and CuPy.88 CuPy offers an easy-to-use Python interface for GPUs centered around a NumPy-like GPU array and makes many hardware details completely transparent to the end-user.
In the manually written GPU kernels, both GPU backends (CUDA and HIP) are targeted using a header-only porting approach89 in which generic GPU identifiers are translated to vendor-specific identifiers at compile time. For example, to allocate GPU memory, the identifier gpuMalloc is used in the code, which is then translated either to cudaMalloc or hipMalloc depending on which GPU backend is targeted. This allows us to avoid unnecessary code duplication and still target multiple hardware platforms natively.
An earlier GPU implementation of GPAW90,91 served as the starting point for the recent work on a new GPU code based on the rewritten ground-state code. The objects that store quantities like , , , , and use Numpy arrays for the CPU code and CuPy arrays when running on a GPU. At the moment, GPUs can be used for total-energy calculations with LDA/generalized gradient approximation (GGA) in the PW mode.
Parallelization to multiple GPUs is performed using MPI. Each MPI rank is assigned a single GPU, and communication between the GPUs is handled by MPI. Support for GPU-aware MPI makes it possible to do direct GPU-to-GPU communication without unnecessary memory copies between the GPU device and the host CPU.
C. XC-functionals
1. Libxc and libvdwxc
The libxc library66 provides implementations of several (semi-)local variants of the XC functional, given by the LDA, GGA, and meta-GGA (MGGA)91 families. These are available in GPAW by a combination of their names from libxc, e.g., “GGA_X_PBE+GGA_C_PBE” for PBE.3 At the moment, using MGGA functionals from libxc in GPAW requires libxc to be compiled with --disable-fhc. Most MGGA functionals in libxc do not depend on the laplacian of the density—those that do are not supported.
Additionally, GPAW provides its own implementation of several (semi-)local functionals called by their short names, e.g., TPSS,93 PBE,3 and LDA, the latter with the correlation of Perdew and Wang.94 Several hybrids (see below) are implemented in GPAW with the support of the libxc library for their local parts.
For fully non-local van der Waals functionals, like the van der Waals (vdW)-DF functional,31 GPAW uses the efficient fast Fourier-transform convolution algorithm by Román-Pérez and Soler95 as implemented in the libvdwxc library.67
All LDA, GGA, and common MGGA functionals, such as Strongly Constrained and Appropriately Normed semilocal density functional (SCAN)28 and TPSS, from libxc can be used in GPAW.
2. GLLB-sc
3. Hubbard U
DFT+U calculations using a Hubbard-like correction can be performed in GPAW to improve the treatment of Coulombic interactions of localized electrons. This correction is most commonly applied to the valence orbitals of transition metals to assist with obtaining experimental bandgaps of oxides,101 which may otherwise be underestimated.102,103 In addition, formation energies and magnetic states are often improved due to a more correct description of the electronic structure. It may also be applied to main-group elements such as N, O, P, or S, but this is less commonly performed.104
GPAW supports the normalization of the occupation matrix, accounting for the truncated portion of the wave function outside the augmentation sphere. To maintain consistency with other codes that do not support normalization, this normalization can be disabled, but large disagreements are expected when applied to p orbitals. GPAW is one of the few codes that currently supports multiple simultaneous corrections to orbitals on the same atom; this is useful when two types of orbitals, such as p and d orbitals, are nearly degenerate but both are partially occupied.
There is no Ueff that is strictly correct, but methods such as RPA107 or linear response108,109 can allow for a value to be calculated from first principles. More commonly, Ueff is chosen semi-empirically to fit experimental properties such as formation energy,110 bandgap111 or, more recently, machine learning predictions.112
4. Hybrids
Hybrid functionals, especially range-separated functionals (see below), correct problems present in calculations utilizing (semi-)local functionals such as the wrong asymptotic behavior of the effective potential leading to an improper description of Rydberg excitations,113 the improper calculation of the total energy against fractional charges leading to the charge delocalization error,114 and the wrong description of (long-range) charge transfer excitations due to the locality of the exchange hole.115–117
Details on the FD-mode implementation of long-range corrected RSF can be found in Refs. 124 and 125. In general, the FD-mode implementation of hybrids is limited to molecules, and forces have not been implemented. The PW-mode implementation of hybrids handles k-points, exploits symmetries, and can calculate forces.
5. SIC
PZ-SIC has been shown to give accurate results in cases where commonly used KS functionals fail. This includes, for example, the Mn dimer, where the PBE functional gives qualitatively incorrect results while the corrected functional gives close agreement with high-level quantum chemistry results as well as experimental measurements.133 Another example is the defect state of a substitutional Al atom in α-quartz.134 In addition, PZ-SIC has been shown to improve the values of the excitation energy of molecules obtained in variational calculations of excited states50,135 (see Sec. VIII B).
6. BEEF
A great strength of Kohn–Sham DFT and its extensions is that a reasonably high accuracy of physical, material, and chemical properties can be obtained at a relatively moderate computational cost. DFT is thus often used to simulate materials, reactions, and properties where, de facto, there is no “better” alternative. Even though more accurate electronic structure methods in principle might exist for a given application, the poor scaling of systematically more accurate methods often makes these computationally infeasible at the given system size that is being studied using DFT. Therefore, one is often in the situation where the accuracy of a given DFT calculation of some materials or chemical properties cannot be verified against, e.g., a more accurate solution of the Schrödinger equation, even on the biggest available supercomputers.
On the other hand, the wealth of available XC functionals naturally allows one to look at how sensitive a DFT result is to the choice of functional, and often accuracy is, therefore, being judged primarily by applying a small set of different XC functions, especially if no accurate benchmark theoretical simulation or experimental measurement is available. A challenge, however, is that the different available functionals are often known to be particularly good at simulating certain properties and poor at others. It is thus not at all clear how much one should trust a given functional for a given simulated property. The Bayesian Error Estimation (BEE) class of functionals136 attempts to develop a practical framework for establishing an error estimate from a selected set of functional “ingredients.”
Assume that the XC model M(a) is a function of a set of parameters, a, that can be varied freely. If a benchmark dataset, D, of highly accurate properties established from experiments or higher-accuracy electronic structure simulations is available, we can attempt to identify the ensemble of models given by some distribution function, P(M(a)), such that the most likely model in the ensemble, M(a0), makes accurate predictions for the benchmark dataset, while the spread of the ensemble reproduces the spread between the predictions of the most likely model and the benchmark data.
Figure 2 shows an ensemble of exchange enhancement factors Fx(s) from the BEEF-vdW functional, where s = |∇n|/(2kFn) is the reduced density gradient. The approach has given rise to several functionals, BEEF-vdW,137 meta-BEEF (mBEEF),138 and mBEEF-vdW,139 which all include error estimation and are readily available in GPAW. Through the ASE interface, one can, for example, utilize the error ensembles to establish error estimates on Python-implemented models that are using DFT simulations for their parametrization. An example of this is the application of error estimates to adsorption scaling relations, microkinetics, and material selection.140
Bayesian ensemble of XC functionals around BEEF–vdW. The orange solid line is the BEEF–vdW exchange enhancement factor, while the blue lines depict Fx(s) for 50 samples of the randomly generated ensemble. Dotted black lines mark the exchange model perturbations that yield DFT results ±1 standard deviation away from BEEF–vdW results.
Bayesian ensemble of XC functionals around BEEF–vdW. The orange solid line is the BEEF–vdW exchange enhancement factor, while the blue lines depict Fx(s) for 50 samples of the randomly generated ensemble. Dotted black lines mark the exchange model perturbations that yield DFT results ±1 standard deviation away from BEEF–vdW results.
One risk to the approach of establishing error estimates from a small selected “ensemble” of XC-functionals is clearly that if the simulated property in question is poorly described by all functionals in the ensemble, then the error estimate might also become poor. This could, for example, be the case if one tried to establish an error estimate for bandgaps in oxides or van der Waals bonding of adsorbates on surfaces based on an ensemble of GGA XC-functionals, since no GGA functional may be accurate for simulating either property.
IV. ION DYNAMICS
GPAW can be employed as a “black-box” calculator, supplying energies and forces to other programs such as, e.g., ASE, which then perform optimization of ground state geometries and reaction paths or carry out molecular dynamics. In fact, this is a key design principle behind GPAW. Methodological developments and general implementations that are not directly dependent on fast access to detailed electronic structure information should preferably be implemented externally by GPAW. This led to the maximal simplicity of the GPAW code itself while also allowing for the external code to be utilized and tested with other electronic structure codes and atomistic potentials. A key to the high efficiency of GPAW simulations involving ionic displacements is the versatile implementation of restraints in ASE. Here, many types of restraints are readily accessible, from the simple removal of degrees of freedom to more exotic restraints allowing for rigid molecule dynamics,141 harmonic restoring forces,142 spacegroup preservation, and combined ionic-unit cell dynamics.143 Many algorithms are available for various structure optimization and molecular dynamics tasks.
A. Structure relaxation
Local structure optimization in GPAW is typically achieved through the use of an optimizer from ASE. Here, a larger range of standard optimizers are available, such as quasi-Newton algorithms including BFGS144–147 and limited-memory BFGS,148 or Newtonian dynamics-based algorithms such as MDMin and FIRE.149 The fact that the optimizers have been implemented externally in GPAW provides benefits in terms of simple procedures for restarting long simulations and monitoring their convergence. Some optimizers from SciPy69 are also available through the open-source ASE package, which provides a simple way to interface any optimizer in the SciPy format to GPAW. Preconditioning is implemented in an accessible way,150 which often leads to significant performance improvements.
Of the classical optimization methods, the quasi-Newton algorithms are often highly competitive. Here, one builds up information on the Hessian or inverse Hessian matrix from calculated forces, ultimately leading to an accurate harmonic model of the potential energy surface in the vicinity of the local minimum. Such algorithms can, however, have problems both dealing with anharmonicity in the potential energy surface and with any noise in the electronic structure simulations. It often makes sense instead to fit a Gaussian process to the calculated energies and forces and minimize the potential within this model. This is implemented as the so-called GPmin method in ASE and often converges on the order of three times faster than the best quasi-Newton optimizers.151
B. Reaction paths and barriers
Reliable calculations of energy barriers are of key importance for determining the rate of atomistic processes. In many quantum-chemical codes utilizing accurate atom-centered basis functions, this is achieved using analytical second derivatives in the direct search for first-order saddle points. This approach is less useful in the plane-wave based or grid-based modes of GPAW. The Dimer method152 is implemented in ASE and can be used with GPAW, but often one would like to have an overview of the entire reaction path of an atomic-scale process to verify that the correct energy barrier for a process has been determined and to obtain a full overview of the atomistic mechanism. For this purpose, the Nudged Elastic Band (NEB) method is typically employed through the ASE interface to GPAW. Both the original method153 and a range of later improvements are available.154–159 Special care has to be taken in selecting the optimizer for a NEB algorithm, as this choice can have a drastic influence on the convergence rate. For optimization of the reaction path, drastic performance improvements can be obtained by carrying out the optimization in a surrogate machine-learning model fitted to the potential energy surface.160–162 GPAW has been used to drive NEB calculations on both surface systems163–166 as well as in molecules.166,167
C. Global structure optimization
GPAW is integrated with various tools for global optimization of structures, compositions, as well as material morphologies. Some quite generally applicable global optimization tools available are basin hopping,168 minima hopping,169 and genetic algorithms.170 Some of the most powerful global optimization problems addressed using GPAW rely on machine-learning accelerated global-optimization strategies. These strategies have, for example, been applied to surfaces, clusters,171 and crystal structures generally.172 In other machine-learning accelerated global-optimization routines, GPAW was used to generate initial databases of surface and bulk systems and for later model validation in a strategy that uses Gaussian processes to generate surrogate potential-energy surfaces. These were then explored with Bayesian-optimization techniques, achieving speed-ups over more conventional methods of several orders of magnitude in finding the optimal structures of the systems under investigation.173–175 The method has been augmented by introducing extra (hyper)dimensions by interpolating between chemical elements, which speeds up the efficiency of the global search.176 GPAW has been integrated with a Covariance Matrix Adaptation Evolution Strategy (CMA-ES) framework, providing energies and forces, generating training data, and evaluating CMA-ES candidate structures.177
D. Molecular dynamics and QM/MM
Ab initio molecular dynamics (MD) can be performed and analyzed through the ASE interface to GPAW. This includes standard simulations such as constant-NVE, constant-NVT, and constant-NPT simulations. There is access to, e.g., Langevin-, Andersen-, Nose–Hoover-, and Berendsen-dynamics. Due to the externalization of the dynamics, the development of novel algorithms and analysis tools becomes facile. Some examples of the use of GPAW span ab initio MD studies that have explored the liquid structure of water178 and the water/Au(111) electrochemical interface.179
Furthermore, GPAW is capable of working with external electrostatic potential terms from user-supplied point charge values and positions, enabling Quantum Mechanical (QM)/Molecular Mechanical (MM) simulations. Since GPAW is from the outset designed around highly efficient grid operations, the computational overhead of evaluating this potential as well as the resulting forces on the MM point charges from the QM density is kept low.180 GPAW has been central in a range of studies of ion dynamics in solution, both in and out of equilibrium. By using the molecular-dynamics functionality of ASE, researchers have performed QM/MM MD simulations of excited-state bond formation in photocatalyst model systems181–184 and electron transfer, as well as coupled solute–solvent conformational dynamics in photosensitizer systems.167,185 Work on polarizable embedding QM/MM within GPAW is ongoing.186,187
V. MAGNETISM AND SPIN
Many important technological applications utilize magnetic order or the manipulation of spin in materials. GPAW has a wide range of functionalities that facilitate the analysis of magnetic properties. This includes calculations with non-collinear spin, the inclusion of external magnetic fields, spin–orbit coupling, and spin spiral calculations within the generalized Bloch theorem. The implementation of these features is described below, while additional methods for calculating magnetic excitations are described in Sec. VI D.
A. Spin–orbit coupling
Band structure of the WS2 monolayer obtained from PBE with non-selfconsistent spin–orbit coupling. The colors indicate the expectation value of Sz for each state. The gray lines show the band structure without spin–orbit coupling.
Band structure of the WS2 monolayer obtained from PBE with non-selfconsistent spin–orbit coupling. The colors indicate the expectation value of Sz for each state. The gray lines show the band structure without spin–orbit coupling.
B. Self-consistent non-collinear magnetism
Spin–orbit coupling may be included by adding Eq. (28) to the Kohn–Sham Hamiltonian, and this constitutes a fully self-consistent framework for spin–orbit calculations within LDA.
C. Orbital magnetization
We note that this expression does not entail cutoff radii for the atomic contributions to the orbital magnetic moments; instead, the entire all-electron partial wave is included, although the all-electron atomic expansion is only formally exact inside the PAW spheres. Additionally, this means that the unbounded, i.e., non-normalizable, all-electron partial waves must be excluded in Eq. (35).
A prerequisite for nonzero orbital magnetization is broken time-reversal symmetry, as represented by a complex Hamiltonian that is not unitarily equivalent to its real counterpart. Practically, this means that a finite orbital magnetization requires magnetic order and either the inclusion of spin–orbit coupling or non-coplanar spin textures. In GPAW, the spin–orbit interaction can be included either self-consistently in a non-collinear calculation or as a non-self-consistent post-processing step following a collinear calculation. The orbital magnetization has been calculated using Eq. (35) for the simple ferromagnets bcc-Fe, fcc-Ni, fcc-Co, and hcp-Co, where the spin–orbit interaction is included non-self-consistently. The PAW-ACA results are displayed in Table II along with MT-ACA results, modern theory results, and measurements from experiments, demonstrating mainly that the PAW-ACA can be an improvement over the MT-ACA and secondly that there is a decent agreement between the PAW-ACA and the modern theory for these systems.
Calculated and measured values for the orbital magnetization in units of μB per atom.
Crystal easy axis . | bcc-Fe [001] . | fcc-Ni [111] . | fcc-Co [111] . | hcp-Co [001] . |
---|---|---|---|---|
PAW-ACA | 0.0611 | 0.0546 | 0.0845 | 0.0886 |
MT-ACA197 | 0.0433 | 0.0511 | 0.0634 | 0.0868 |
Modern theory197 | 0.0658 | 0.0519 | 0.0756 | 0.0957 |
Experiment198 | 0.081 | 0.053 | 0.120 | 0.133 |
D. Constant B-field
Spin-flop transition in Cr2O3. The alignment of the spins with respect to the magnetic field is sketched for small and large magnitudes of the field. The canting (the small ferromagnetic component after the spin flop) is exaggerated for visualization. The actual canting is roughly 1°.
Spin-flop transition in Cr2O3. The alignment of the spins with respect to the magnetic field is sketched for small and large magnitudes of the field. The canting (the small ferromagnetic component after the spin flop) is exaggerated for visualization. The actual canting is roughly 1°.
E. Spin spirals
Using the GPAW functionality to compute the spin-spiral energy as a function of q, one can then search for the ground-state wave vector Q that minimizes the energy. In Fig. 5, we show that the monolayer NiBr2 has an incommensurate spin-spiral ground state and that the local magnetic moment on the Ni atom depends only weakly on the wave vector q.
LSDA spin-spiral spectrum of the NiBr2 monolayer (structure taken from the C2DB15). The ground state has an incommensurate wave vector Q ≃ [0.1, 0.1, 0]. The magnetic moment displays only weak longitudinal fluctuations, and the bandgap remains finite for all wave vectors q, indicating that a Heisenberg model description of the material would be appropriate. The inset shows the spin–orbit correction to the spin-spiral energies (in meV) as a function of the normal vector n of the planar spin spiral. n is depicted in terms of its stereographic projection in the upper hemisphere above the monolayer plane. The spiral plane is found to be orthogonal to Q and tilted slightly with respect to the out-of-plane direction.
LSDA spin-spiral spectrum of the NiBr2 monolayer (structure taken from the C2DB15). The ground state has an incommensurate wave vector Q ≃ [0.1, 0.1, 0]. The magnetic moment displays only weak longitudinal fluctuations, and the bandgap remains finite for all wave vectors q, indicating that a Heisenberg model description of the material would be appropriate. The inset shows the spin–orbit correction to the spin-spiral energies (in meV) as a function of the normal vector n of the planar spin spiral. n is depicted in terms of its stereographic projection in the upper hemisphere above the monolayer plane. The spiral plane is found to be orthogonal to Q and tilted slightly with respect to the out-of-plane direction.
Furthermore, the orientation of the ground state spin spiral can be obtained by including spin–orbit coupling non-self-consistently in the projected spin–orbit approximation202 and search for the orientation that minimizes the energy. The ordering vector Q and the normal vector n of the spiral plane thus constitute a complete specification of the magnetic ground state within the class of single-q states. The normal vector is of particular interest since spin spirals may lead to spontaneous breaking of the crystal symmetry, and the normal vector largely determines the direction of magnetically induced spontaneous polarization.40 In Fig. 5, we show that n is perpendicular to the wavevector Q in the monolayer NiBr2 ground state, corresponding to a cycloidal spin spiral.
VI. RESPONSE FUNCTIONS AND EXCITATIONS
Linear response functions are the bread and butter of condensed matter physics. Their applications include the description of dielectric screening, optical and electron energy loss spectra, many-body excitations, and ground state correlation energies. In this section, we describe the methods available in GPAW for calculating electronic response functions as well as GW quasiparticle band structures and optical excitations from the Bethe–Salpeter Equation (BSE). In addition to the electronic response function, GPAW can also calculate the transverse magnetic susceptibility, which holds information about the magnetic excitations, e.g., magnons, and can be used to derive parameters for classical Heisenberg spin models and to estimate magnetic transition temperatures. Finally, we present methods to calculate Raman spectra of solids and quadratic optical response tensors for describing second harmonics generation and the Pockels electro-optical effect.
A. Linear-response TDDFT
1. Implementation for periodic systems
2. Spectral representation
B. Dielectric function
The evaluation χ0 is computationally demanding since it involves an integration over the Brillouin zone (BZ) as well as a summation over occupied and unoccupied states. The k-point convergence can be increased substantially with the tetrahedron method.207 Contrary to simple point integration, where the δ-function in Eq. (52) is replaced by a smeared-out Lorentzian, the tetrahedron method utilizes linear interpolation of the eigenvalues and matrix elements.
In Fig. 6, we compare the dielectric function computed using the two different integration methods for two prototypical cases: (semiconducting) Si and (metallic) Ag. Since Si is semiconducting, there are no low-energy excitations, and consequently, the imaginary part of the dielectric function is zero, while the real part is flat for low frequencies. The point integration and tetrahedron integration yield the same value for ω → 0, but the tetrahedron integration avoids the unphysical oscillations of ɛ at higher frequencies exhibited by the point integration due to the finite k-point sampling. For metals, the k-point convergence is even slower, and the difference between point integration and tetrahedron integration is thus even more pronounced for Ag. By increasing the k-point sampling, the point integration results will eventually approach the results obtained with the tetrahedron method, but at a much higher computational cost.
Real and imaginary parts of the dielectric function for Ag (solid) and Si (dashed) using the tetrahedron and point integration methods.
Real and imaginary parts of the dielectric function for Ag (solid) and Si (dashed) using the tetrahedron and point integration methods.
1. Screening in low-dimensional systems
C. Adiabatic-connection fluctuation–dissipation theorem
The random phase approximation (RPA) can be derived from ACFDT if the XC-kernel fxc is neglected. RPA has the strength of capturing non-local correlation effects and provides high accuracy across different bonding types, including van der Waals interactions.29,211–213
Simple exchange–correlation kernels can also be incorporated into the response function, such as the adiabatic LDA (ALDA) kernel. However, the locality of adiabatic kernels leads to divergent characteristics of the pair-correlation function.214 This issue can be overcome by a renormalization (r) scheme,37 which is implemented in GPAW as rALDA. This class of renormalized kernels provides a significantly better description of the short-range correlations and, hence, also yields highly accurate total correlation energies.215–218
D. Magnetic response
1. Transverse magnetic susceptibility
In GPAW, the transverse magnetic susceptibility is calculated using a plane-wave basis as described in Sec. VI A 1, with the notable exception that no special care is needed to treat the optical limit since the Hartree kernel plays no role in the Dyson Eq. (64). In terms of the temporal representation, it is at the time of this writing only possible to do a literal evaluation of Eq. (65) at the frequencies of interest. For metals, this means that η has to be left as a finite broadening parameter, which has to be carefully chosen depending on the k-point sampling (see Ref. 43).
2. The spectrum of transverse magnetic excitations
For collinear magnetic systems, the spectrum S+− is composed of two types of excitations: collective spin-wave excitations (referred to as magnons) and excitations in the Stoner-pair continuum (electron–hole pairs of opposite spin). Since GPAW employs a plane-wave representation of the spectrum, , one can directly compare the calculational output to the inelastic neutron scattering cross-section measured in experiments.220 In particular, one can extract the magnon dispersion directly by identifying the position of the peaks in the spectrum diagonal (see Fig. 7). In this way, GPAW allows the user to study various magnon phenomena in magnetic systems of arbitrary collinear order, such as nonanalytic dispersion effects in itinerant ferromagnets,43 correlation-driven magnetic phase transitions,221 and the emergence of distinct collective modes inside the Stoner continuum of an antiferromagnet.222
Spectrum of transverse magnetic excitations for ferromagnetic hcp-Co evaluated at q = 5qM/6. The spectrum was calculated using eight empty-shell bands per atom, a plane-wave cutoff of 800 eV, a (60, 60, 36) k-point mesh, and a spectral broadening of η = 50 meV. In the upper panel, the spectrum diagonal is depicted for the first and second Brillouin zones out of the hexagonal plane, from which the acoustic and optical magnon frequencies can be respectively extracted. In the lower panel, the spectrum of majority excitations is shown. The full spectral weight A(ω) is calculated as the sum of all positive eigenvalues of S+−, the acoustic and optical mode lineshapes a0(ω) and a1(ω) are obtained via the two largest eigenvalues (which are significantly larger than the rest), and the Stoner spectrum is extracted as the difference AS(ω) = A(ω) − a0(ω) − a1(ω).
Spectrum of transverse magnetic excitations for ferromagnetic hcp-Co evaluated at q = 5qM/6. The spectrum was calculated using eight empty-shell bands per atom, a plane-wave cutoff of 800 eV, a (60, 60, 36) k-point mesh, and a spectral broadening of η = 50 meV. In the upper panel, the spectrum diagonal is depicted for the first and second Brillouin zones out of the hexagonal plane, from which the acoustic and optical magnon frequencies can be respectively extracted. In the lower panel, the spectrum of majority excitations is shown. The full spectral weight A(ω) is calculated as the sum of all positive eigenvalues of S+−, the acoustic and optical mode lineshapes a0(ω) and a1(ω) are obtained via the two largest eigenvalues (which are significantly larger than the rest), and the Stoner spectrum is extracted as the difference AS(ω) = A(ω) − a0(ω) − a1(ω).
Additionally, one can analyze the spectrum more intricately by extracting the majority and minority eigenmodes from S+− as the positive and negative eigenvalues, respectively. Contrary to the analysis of the plane-wave diagonal, this makes it possible to completely separate the analysis of each individual magnon lineshape as well as the many-body Stoner continuum (see Fig. 7).
3. Linear response MFT
Upon calculation of the exchange parameters , it is straightforward to compute the magnon dispersion within the classical Heisenberg model using linear spin-wave theory and to estimate thermal quantities such as the Curie temperature (see, e.g., Ref. 42). In Fig. 8, the MFT magnon dispersion of hcp-Co is compared to the majority magnon spectrum calculated within LR-TDDFT. For Co, the two are in excellent agreement except for the dispersion of the optical magnon mode along the K − M − Γ high-symmetry path, where MFT underestimates the magnon frequency and neglects the fine structure of the spectrum. The fine structure of the ALDA spectrum appears due to the overlap between the magnon mode and the Stoner continuum. This gives rise to so-called Kohn anomalies (nonanalytical points in the magnon dispersion), which are a trademark of itinerant electron magnetism. Since itinerancy is largely ignored in a localized spin model such as (68), one cannot generally expect to capture such effects.
Magnon spectrum of ferromagnetic hcp-Co calculated using ALDA LR-TDDFT (shown as a heat map) compared to the spin-wave dispersion of Liechtenstein MFT. The acoustic magnon mode a0(ω) is shown to the left of the A-point, while the optical magnon mode a1(ω) is shown to the right. Note that the two modes are degenerate at both the A and K high-symmetry points. The calculations were carried out using eight empty-shell bands per atom, a plane-wave cutoff of 800 eV, and a (36, 36, 24) k-point mesh. A finite value of η = 100 meV was used to broaden the ALDA spectrum. For the MFT calculations, the dispersion was calculated using linear spin-wave theory based on a Heisenberg model of closed-packed spherical sites centered at each of the Co atoms.
Magnon spectrum of ferromagnetic hcp-Co calculated using ALDA LR-TDDFT (shown as a heat map) compared to the spin-wave dispersion of Liechtenstein MFT. The acoustic magnon mode a0(ω) is shown to the left of the A-point, while the optical magnon mode a1(ω) is shown to the right. Note that the two modes are degenerate at both the A and K high-symmetry points. The calculations were carried out using eight empty-shell bands per atom, a plane-wave cutoff of 800 eV, and a (36, 36, 24) k-point mesh. A finite value of η = 100 meV was used to broaden the ALDA spectrum. For the MFT calculations, the dispersion was calculated using linear spin-wave theory based on a Heisenberg model of closed-packed spherical sites centered at each of the Co atoms.
E. GW approximation
The GW self-energy is calculated on a plane-wave basis using full frequency integration along the real axis to evaluate the convolution between G and W.38 Compared to alternative schemes employing contour deformation techniques or analytical continuation,223–225 this approach is time-consuming but numerically accurate and can provide the full spectral function. A highly efficient and accurate evaluation of the self-energy based on a multipole expansion of the screened interaction, W,226 is currently being implemented, and a GPU version of the full GW code is under development.
For low-dimensional structures, in particular atomically thin 2D semiconductors, GPAW can use a truncated Coulomb interaction to avoid interactions between periodical images when evaluating W.208 It has been shown that the use of a truncated Coulomb kernel leads to slower k-point convergence.209 To mitigate this drawback, a special 2D treatment of W(q) at q = 0 can be applied to significantly improve the k-point convergence.227 A detailed account of the GW implementation in GPAW can be found in Ref. 38.
Figure 9 shows two matrix elements of the dynamical GW self-energy for the valence and conduction band states at the gamma point of bulk silicon. As can be seen, the agreement with the corresponding quantities obtained with the Yambo GW code228 is striking.
The real (Re) and imaginary (Im) parts of the self-energy matrix elements at the Gamma point for valence (top) and conduction (bottom) bands evaluated with Yambo and GPAW for bulk silicon. Both codes are using full frequency integration with a broadening of 0.1 eV. Yambo is using norm-conserving pseudo-potentials, and GPAW is its standard PAW setup. The k-point grid was 12 × 12 × 12, the plane wave cutoff was 200 eV, and the number of bands was 200 for both codes. The results are virtually indistinguishable.
The real (Re) and imaginary (Im) parts of the self-energy matrix elements at the Gamma point for valence (top) and conduction (bottom) bands evaluated with Yambo and GPAW for bulk silicon. Both codes are using full frequency integration with a broadening of 0.1 eV. Yambo is using norm-conserving pseudo-potentials, and GPAW is its standard PAW setup. The k-point grid was 12 × 12 × 12, the plane wave cutoff was 200 eV, and the number of bands was 200 for both codes. The results are virtually indistinguishable.
F. Bethe–Salpeter equation (BSE)
In GPAW, the construction of the BSE Hamiltonian (73) proceeds in two steps.39 First, the static screened interaction is calculated at all inequivalent q-points in a plane-wave basis, and the kernel is then subsequently expressed in a basis of two-particle KS states. The first step is efficiently parallelized over either states or k-points, and the second step is parallelized over pair densities. The Hamiltonian elements are thus distributed over all CPUs, and the diagonalization is carried out using ScaLAPACK such that the full Hamiltonian is never collected on a single CPU. The dimension of the BSE Hamiltonian and memory requirements are, therefore, only limited by the number of CPUs used for the calculation. We note that the implementation is not limited to the Tamm–Dancoff approximation, but calculations become more demanding without it. The response function may be calculated for spin-paired as well as spin-polarized systems, and spin–orbit coupling can be included non-selfconsistently.45,229 In low-dimensional systems, it is important to eliminate spurious screening from periodic images of the structure, which is accomplished with the truncated Coulomb interaction in Ref. 208.
The most important application of BSE is arguably the calculation of optical absorption spectra of solids, where BSE provides an accurate account of the excitonic effects that are not captured by semi-local approximations for KHxc (44). In 2D systems, the excitonic effects are particularly pronounced due to inefficient screening,209,229 and in Fig. 10, we show the 2D polarizability of WS2 calculated from BSE with q = 0. Comparing with Fig. 3, it is observed that the absorption edge is expected to be located at the K point, where spin–orbit coupling splits the highest valence band by 0.45 eV. This splitting is seen as two excitonic peaks below the bandgap, which is interpreted as distinct excitons originating from the highest and next-highest valence bands (the splitting of the lowest conduction band is negligible in this regard). For comparison, we also show the RPA polarizabilty, obtained with the BSE module by neglecting the screened interaction in the kernel, and this shows the expected absorption edge at the bandgap. This yields identical results to the Dyson equation approach of Sec. VI A but has the advantage that the eigenvalues and weights of Eq. (76) are obtained directly, such that the artificial broadening η may be varied without additional computational cost. The eigenstate decomposition also allows one to access “dark states,” and the BSE calculation reveals two (one for each valley) triplet-like excitons that are situated 70 meV below the lowest bright exciton in Fig. 10.
2D polarizability of WS2 calculated from the BSE and the RPA. For this calculation, we included spin–orbit coupling and used the 2D Coulomb truncation to eliminate screening from periodic images. A gamma-centered uniform k-point grid of 48 × 48 was applied, and eight valence states and eight conduction states (shifted by 1 eV to match the GW bandgap45) were included in the Tamm–Dancoff approximation. This yields a BSE Hamiltonian of size N × N with N = 147 456, which is easily diagonalized with ScaLAPACK on 240 CPUs.
2D polarizability of WS2 calculated from the BSE and the RPA. For this calculation, we included spin–orbit coupling and used the 2D Coulomb truncation to eliminate screening from periodic images. A gamma-centered uniform k-point grid of 48 × 48 was applied, and eight valence states and eight conduction states (shifted by 1 eV to match the GW bandgap45) were included in the Tamm–Dancoff approximation. This yields a BSE Hamiltonian of size N × N with N = 147 456, which is easily diagonalized with ScaLAPACK on 240 CPUs.
G. Electron–phonon coupling
The electron–phonon coupling is the origin of several important material properties, ranging from electrical and thermal conductivity to superconductivity. In addition, it provides access to the deformation potential, which can be used to obtain transport properties for electrons and holes in solids.231
H. Raman spectrum
The predominant approach for calculating involves the use of the Kramers–Heisenberg–Dirac (KHD) method. Within the KHD framework, the Raman tensor is determined by taking the derivative (utilizing a finite-difference method) of the electric polarizability concerning the vibrational normal modes. Alternatively, one can employ time-dependent third-order perturbation theory to compute Raman tensors. These two approaches are equivalent when local field effects are negligible.56 However, each approach comes with its own set of advantages and drawbacks. The KHD method is computationally more efficient but is limited to computing first-order Raman processes. The perturbative approach can be extended to higher-order Raman processes involving multiple phonons, but it is more computationally demanding, necessitating a greater number of bands and a finer k-mesh grid to achieve convergence. The perturbative approach has been implemented in GPAW and is elaborated below, while the KHD method has been implemented in the ASR package,22 utilizing GPAW as the computational backend.
Figure 11 shows the polarization-resolved Raman spectrum of bulk MoS2 in the 2H phase as computed with a laser frequency of ωin = 488 nm. This example uses only the resonant term, as the other contributions are small in this case. These calculations used a 700 eV plane wave cutoff and a 2 × 2 × 2 k-point mesh in a 3 × 3 × 2 supercell. Force constants and potential changes were calculated simultaneously. The present implementation uses the ASE phonon module as the backend to perform the finite difference displacement and to obtain the phonon eigenvalues and eigenvectors. The convergence of the supercell size for phonons and potential changes must be checked carefully by the users for each given system to achieve the desired precision of line positions and intensities. As Raman spectrum calculations only use q = 0 phonons and electron–phonon matrices, relatively small supercells often yield satisfactory results.
Polarization-resolved Raman spectrum of bulk MoS2 in the 2H phase at ω = 488 nm. Phonons and potential changes were computed using a 700 eV plane wave cutoff and a 2 × 2 × 2 k-point mesh in a 3 × 3 × 2 supercell. Each peak has been labeled according to its irreducible representation.
Polarization-resolved Raman spectrum of bulk MoS2 in the 2H phase at ω = 488 nm. Phonons and potential changes were computed using a 700 eV plane wave cutoff and a 2 × 2 × 2 k-point mesh in a 3 × 3 × 2 supercell. Each peak has been labeled according to its irreducible representation.
It is worth noting that we have conducted a comparison of the calculated spectra using the ASR package and observed a high level of agreement for several materials, for example, MoS2. The results obtained from both methods closely align with each other in terms of peak positions and dominant peaks, and minor disagreements between the two methods can be attributed to differences in implementation details and the distinct approximations employed by each. Specifically, within the ASR package, we utilized the phonopy package236 to compute phonon frequencies and eigenvectors, whereas in GPAW, we directly computed phonon frequencies and eigenvectors using ASE’s phonon module. Furthermore, in the ASR implementation, we rigorously enforced the symmetry of the polarizability tensor, and the ASR results, therefore, typically exhibit a more accurate adherence to the required symmetry of the Raman tensor compared to the GPAW implementation.
I. Quadratic optical response functions
VII. REAL-TIME TDDFT
The real-time TDDFT (RT-TDDFT) scheme, also known as time-propagation TDDFT, is implemented in FD34 and LCAO52 modes. It requires non-periodic boundary conditions but is not restricted to the linear regime and can be applied to model molecules in strong fields. The method may be combined with hybrid quantum–classical modeling to simulate the dynamical interaction between molecules and plasmon resonances at metal surfaces.240 The LCAO-RT-TDDFT is the more recent implementation, supporting versatile analyses and enabling the modeling of large systems thanks to its efficiency.52–54,241–247 We focus on the capabilities of the LCAO version in this section, but some of the described functionalities are also available in FD mode.
Starting from the ground state, Eq. (91) is propagated numerically forward. After each step, a new density is computed, and is updated accordingly. The user can freely define the external potential, and implemented standard potentials include the delta-kick and a Gaussian pulse , where is the dipole operator in the direction x.
During the propagation, different time-dependent variables can be recorded, and after the propagation, they can be post-processed into quantities of physical and chemical interest. As a basic example, the time-dependent dipole moment recorded for a delta-kick perturbation can be Fourier-transformed to yield the photoabsorption spectrum.248 Observables are recorded by attaching observers to the calculation, and implemented observers include writers for the dipole moment, magnetic moment, KS density matrix in frequency space, and wave functions.
RT-TDDFT calculations can be started from the ground state or continued from the last state of a previous time propagation, and the time-limiter feature allows one to limit jobs to a predefined amount of wall time. Together with the continuation capability, this facilitates time propagation in short chunks, efficiently using shared high-performance resources.
A. Kohn–Sham decomposition
The Fourier transform of the induced density matrix can be built on the fly during time propagation through the density-matrix observer. Details on the implementation are described in Ref. 54. The KS density matrix in frequency space is related to the Casida eigenvectors and gives similar information as the solution of the Casida equation.54 Observables such as the polarizability can be decomposed into a sum over the electron–hole part of the KS density matrix ρnn, where fn > fn′. This enables illustrative analyses, e.g., by visualizing ρnn′(ω) on energy axes as a transition contribution map,249 from which the nature of the localized surface plasmon resonance can be understood (Fig. 12; see Ref. 54 for detailed discussion).
Photoabsorption spectrum for an Ag147 icosahedral nanoparticle and the transition contribution map at 3.8 eV. The map reveals that transitions between KS states near the Fermi level contribute constructively to the plasmon resonance, while transitions from occupied states at the d-band edge contribute destructively (screening).
Photoabsorption spectrum for an Ag147 icosahedral nanoparticle and the transition contribution map at 3.8 eV. The map reveals that transitions between KS states near the Fermi level contribute constructively to the plasmon resonance, while transitions from occupied states at the d-band edge contribute destructively (screening).
B. Hot-carrier analysis
The KS density matrix is a practical starting point for analyzing hot-carrier generation in plasmonic nanostructures. In the regime of weak perturbations, the KS density matrix resulting from arbitrary pulses can be obtained from delta-kick calculations by a post-processing routine.245,246 By decomposing the matrix into different spatial and energetic contributions, hot-carrier generation in nanostructures245,247 and across nanoparticle-semiconductor243 and nanoparticle-molecule242,246 interfaces can be studied. Computational codes and workflows for such hot-carrier analyses are provided in Refs. 245 and 246.
C. Circular dichroism for molecules
D. Radiation reaction potential
Plasmonic or collective molecular excitations are strongly susceptible to any kind of optical interaction. Induced currents will couple via Maxwell’s equation of motion to the optical environment and result in radiative losses, i.e., decay towards the ground state. It is possible to solve the Maxwell problem formally by obtaining Green’s tensor G⊥(ω). The dipolar interaction between the electric field and the electronic dipole can be absorbed into a local potential suitable for Kohn–Sham TDDFT, where R(t) is the total electronic dipole moment. A detailed discussion can be found in Ref. 250.
For many simple structures, such as free-space, one-dimensional waveguides, or dielectric spheres, G⊥(ω) is analytically known, and radiative losses can then be included in TDDFT without additional computational cost. The tutorials on the GPAW webpage62 include an example of one-dimensional waveguides for which the user can specify the cross-sectional area and the polarization of the propagating modes. Extending the functionality of the radiation-reaction potential to 3D free-space and the collective interaction of large ensembles in Fabry–Pérot cavities from first principles251 is essential for the understanding of polaritonic chemistry. This functionality is currently under development.
E. Ehrenfest dynamics
Molecular dynamics (MD) simulations usually rely on the Born–Oppenheimer approximation, where the electronic system is assumed to react so much faster than the ionic system that it reaches its ground state at each time step. Therefore, forces for the dynamics are calculated from the DFT ground-state density. While this approximation is sufficiently valid in most situations, there are cases where the explicit dynamics of the electronic system can affect the molecular dynamics or the movement of the atoms can affect averaged spectral properties. These cases can be handled using so-called Ehrenfest dynamics, i.e., time-dependent density functional theory molecular dynamics (TDDFT/MD).
Ehrenfest dynamics is implemented in the FD mode.252 A description of the theory and a tutorial is available on the GPAW webpage.62 This functionality has been used to model the electronic stopping of ions including core-electron excitations,253 study charge transfer at hybrid interfaces in the presence of water,254 simulate the coherent diffraction of neutral atom beams from graphene,255 model the dependence of carbon bond breaking under Ar+-ion irradiation on sp hybridization,256 and reveal charge-transfer dynamics at electrified sulfur cathodes.257 An LCAO implementation, inspired by recent work in the Siesta code,258,259 is currently under development.
VIII. EXCITED-STATE DFT METHODS
A. Improved virtual orbitals
The linear response TDDFT approach generally provides reasonably accurate excitation energies for low-lying valence excited states, where the orbitals associated with the holes and excited electrons overlap significantly. However, it tends to fail for excitations involving spatial rearrangement of the electrons, such as charge transfer,115,260 Rydberg,261 and doubly excited states.262
Some of these problems can be alleviated by using range separated functionals (see Sec. III C 4). However, these functionals come with a significantly increased computational cost due to the evaluation of exchange integrals. Moreover, due to the missing cancellation of Coulomb and exchange terms for canonical unoccupied orbitals within Hartree–Fock theory, one obtains spurious unoccupied orbitals. This leads to a slow convergence of linear-response TDDFT calculations with respect to the number of unoccupied orbitals when hybrid and range-separated functionals are used.124,125
Substantial improvement in convergence with respect to unoccupied orbitals can be obtained using improved virtual orbitals, as devised by Huzinaga and Arnau.263,264 In this approach, a modified Fock operator is used for the unoccupied orbitals, which mimics the interaction between a hole arbitrarily chosen among the occupied ground-state orbitals and the excited electron. This leads to faster convergence in linear-response TDDFT calculations with hybrid and range-separated functionals and also makes it possible to evaluate excited state properties. For example, the energetics of long-range charge transfer can be obtained by means of ground state calculations because the difference between the energy of an improved virtual orbital and a hole tends to approximate the excitation energy. The improved virtual orbitals approach is available in GPAW, and details on the implementation are described in Ref. 125.
B. Variational excited-state calculations
GPAW also offers the possibility to perform excited-state calculations using an alternative time-independent density functional approach265 (sometimes referred to as the “ΔSCF” method), which does not suffer from the limitations of linear-response TDDFT mentioned in Sec. VIII A. The method involves variational optimization of the orbitals corresponding to a specific excited state by optimizing the electronic energy to a stationary point other than the ground state. The computational effort is similar to that of ground-state calculations, and the variational optimization guarantees that the Hellmann–Feynman theorem is fulfilled. Therefore, all the ground-state machinery available in GPAW to evaluate atomic forces can be used for geometry optimization and for simulating the dynamics of atoms in the excited state. Furthermore, coupling this time-independent, variational approach for excited-state calculations with external MM potentials (see Sec. IV) does not involve additional implementation efforts compared to ground-state calculations and provides a means for performing excited-state QM/MM molecular dynamics simulations that include the state-specific response of a solvent, i.e., the response due to changes in the electron density of the solute.
Variationally optimized excited states correspond to single Slater determinants of optimal orbitals with a non-aufbau occupation and are typically saddle points on the electronic energy surface. Hence, variational calculations of excited states are prone to collapsing to lower-energy solutions, which preserve the symmetry of the initial guess. A simple maximum overlap method (MOM)266,267 is available in GPAW to address this problem. At each SCF step, the MOM occupies those orbitals that overlap the most with the orbitals of a non-aufbau initial guess, usually obtained from a ground-state calculation. The MOM, however, does not guarantee that variational collapse is avoided, and convergence issues are common when using SCF eigensolvers with density-mixing schemes developed for ground-state calculations.
1. Direct orbital optimization
To alleviate the issues leading to variational collapse and achieve more robust convergence to excited-state solutions, GPAW contains two alternative strategies that are more reliable than conventional SCF eigensolvers with the MOM. They are based on direct optimization of the orbitals and use saddle point search algorithms akin to those for transition-state searches on potential energy surfaces of atomic rearrangements. These approaches also facilitate variational excited-state calculations of nonunitary invariant functionals, such as self-interaction corrected functionals (see Sec. III C 5).
2. Example applications of direct optimization
The efficiency and robustness of the direct-optimization approaches, combined with the possibility of choosing different basis-set types, make variational calculations of excitated states in GPAW applicable to a great variety of systems ranging from molecules in the gas phase or solution to solids.
State-specific orbital relaxation enables the description of challenging excitations characterized by large density rearrangements. Figure 13 shows the error on the vertical excitation energy of a charge-transfer excitation in the twisted N-phenylpyrrole molecule49 obtained with direct optimization in GPAW using the LDA, PBE, and BLYP functionals and an sz+aug-cc-pVDZ53 basis set as compared to the results of linear-response TDDFT calculations with the same basis set and functionals, as well as the hybrid functionals PBE0 and B3LYP (results from Ref. 269). For the variational calculations, the energy of the singlet excited state is computed using the spin-purification formula Es = 2Em − Et, where Em is the energy of a spin-mixed state obtained by promoting an electron in one spin channel and Et is the energy of the triplet state with the same character. The variational calculations underestimate the theoretical best-estimate value of the excitation energy (5.58 eV) in Ref. 269 by 0.15–0.3 eV, an error that is significantly smaller than that of linear-response TDDFT calculations with the same functionals (−2.0 eV) or with the more computationally intensive PBE0 hybrid functional (−0.85 eV).269
Applications of time-independent, variational calculations of excited states to a molecule in vacuum (left), a molecule in solution (middle), and a solid-state system (right). Left: Deviation of the calculated excitation energy of a charge-transfer excited state in the N-phenylpyrrole molecule from the theoretical best estimate of 5.58 eV.269 The results of linear-response TDDFT calculations with hybrid functionals are from Ref. 269. Middle: Time-evolution of the interligand angles of the [Cu(dmphen)2]+ complex upon photoexcitation to the lowest metal-to-ligand charge-transfer (MLCT) state in acetonitrile. The experimental results from femtosecond x-ray scattering measurements270 are compared to the average over excited-state molecular dynamics trajectories obtained using the QM/MM electrostatic embedding scheme in GPAW167 (see Sec. IV) and convoluted with the experimental instrument-response function.270 Right: Vertical excitation energy for excitations in the negatively charged nitrogen-vacancy center in diamond obtained with the r2SCAN functional as compared to the results of previous calculations using an advanced quantum embedding approach.271 The orbitals involved in the electronic transitions are visualized in the inset (the C atoms are gray and the N atom is orange).
Applications of time-independent, variational calculations of excited states to a molecule in vacuum (left), a molecule in solution (middle), and a solid-state system (right). Left: Deviation of the calculated excitation energy of a charge-transfer excited state in the N-phenylpyrrole molecule from the theoretical best estimate of 5.58 eV.269 The results of linear-response TDDFT calculations with hybrid functionals are from Ref. 269. Middle: Time-evolution of the interligand angles of the [Cu(dmphen)2]+ complex upon photoexcitation to the lowest metal-to-ligand charge-transfer (MLCT) state in acetonitrile. The experimental results from femtosecond x-ray scattering measurements270 are compared to the average over excited-state molecular dynamics trajectories obtained using the QM/MM electrostatic embedding scheme in GPAW167 (see Sec. IV) and convoluted with the experimental instrument-response function.270 Right: Vertical excitation energy for excitations in the negatively charged nitrogen-vacancy center in diamond obtained with the r2SCAN functional as compared to the results of previous calculations using an advanced quantum embedding approach.271 The orbitals involved in the electronic transitions are visualized in the inset (the C atoms are gray and the N atom is orange).
The method has also been used to simulate the photoinduced structural changes of photocatalytic metal complexes and concomitant solvation dynamics.167,183,184,270,272 Figure 13 shows an application to the prototypical copper complex photosensitizer [Cu(dmphen)2]+ (dmphen = 2,9-dimethyl-1,10-phenanthroline) in acetonitrile, where QM/MM molecular dynamics simulations elucidated an intricate interplay between deformation of the ligands and rearrangement of the surrounding solvent molecules following a photoexcitation to a metal-to-ligand charge-transfer state.167,270
The last example of an application shown in Fig. 13 is a calculation of the excited states of a solid state system,273 the negatively charged nitrogen-vacancy center in diamond, which comprises a prototypical defect for quantum applications. The system is described with a large supercell of 511 atoms, and the calculations use a plane-wave basis set. In contrast to previous reports, a range of different density functionals are found to give the correct energy ordering of the excited states, with the r2SCAN functional providing the best agreement with high-level many-body quantum embedding calculations with an error of less than 0.06 eV.271,273 This example shows that the direct optimization methods in GPAW are promising tools for simulating excited states in extended systems, where alternative approaches are either computationally expensive or lack accuracy.
IX. OTHER FEATURES
A. Electric polarization
In Fig. 14, we show an example of this for tetragonal KNbO3, which is a well-known ferroelectric.277 The polar structure was relaxed under the restraint of tetragonal symmetry using PBE (λ = 1) and then linearly interpolated to the inverted structure (λ = −1) passing through a centrosymmetric point at λ = 0. There are infinitely many polarization branches differing by the polarization quantum ec/Vcell (c being the lattice constant in the z-direction), and the spontaneous polarization is obtained by choosing a single branch and evaluating the difference in formal polarization at λ = 1 and λ = 0. Interestingly, the centrosymmetric point has a non-vanishing polarization given by half the polarization quantum. This is allowed due to the multi-valued nature of formal polarization, and such “topological polarization” in non-polar materials has been shown to yield gapless surface states.278,279 Here, however, we merely use the topological polarization to emphasize the importance of evaluating the spontaneous polarization as the change in PF along an adiabatic path.
Formal polarization along an adiabatic path connecting two states of polarization and the energy along the path in tetragonal KNbO3. The spontaneous polarization is obtained as the difference in polarization between a polar ground state (λ = 1) and a non-polar reference structure (λ = 0). The energy along the path is also shown.
Formal polarization along an adiabatic path connecting two states of polarization and the energy along the path in tetragonal KNbO3. The spontaneous polarization is obtained as the difference in polarization between a polar ground state (λ = 1) and a non-polar reference structure (λ = 0). The energy along the path is also shown.
We exemplify this in the well-known case of 2D ferroelectric GeS,281–283 where we obtain a spontaneous polarization of 490 pC/m44,284 in excellent agreement with previous calculations.283 The lattice and electronic 2D in-plane polarizabilities are Å and Å, respectively.45 For comparison, the non-polar case of 2D MoS2 yields Å and Å.45
B. Berry phases and band topology
The method has been applied to a high-throughput search for new topological two-dimensional materials,46 and in Fig. 15, we show the calculated Berry phases of 1T′-MoS2.287 Due to time-reversal symmetry, the Berry phases at Γ and Y are twofold degenerate, and therefore no perturbation that conserves time-reversal symmetry can open a gap in the Berry-phase spectrum. This property is diagnostic for the quantum spin Hall insulating state and closely related to the presence of gapless edge states.285
Berry phases of the quantum spin Hall insulator 1T′-MoS2 obtained from PBE with non-selfconsistent spin–orbit coupling. The colors indicate the expectation value of Sz for each state as defined in Ref. 46.
Berry phases of the quantum spin Hall insulator 1T′-MoS2 obtained from PBE with non-selfconsistent spin–orbit coupling. The colors indicate the expectation value of Sz for each state as defined in Ref. 46.
The eigenvalues of Eq. (102) may also be used to calculate the electronic contribution to the formal polarization since the sum of all individual Berry phases yields the same value as one may obtain from Eq. (98). The present approach is, however, more involved since it requires the construction of a smooth gauge, which is not needed in Eq. (98).
C. Wannier functions
D. Point defect calculations with hybrid functionals
Point defects play a crucial role in many applications of semiconductors.293,294 First-principles calculations can be used to determine the atomic structure, formation energy, and charge-transition levels of point defects. It is well established that the best description of point defects in semiconductors/insulators is obtained using range separated hybrids, such as the HSE06 xc-functional.5,295 To illustrate the use of GPAW for point defect calculations, we determine the formation energy diagrams of the CN and CB defects in the hexagonal boron nitride (hBN) crystal with the HSE06 functional. These defects have been proposed to be responsible for the deep-level luminescence signal with a zero-phonon line (ZPL) around 4.1 eV.295,296 The results are compared to similar results obtained with the Vienna Ab initio Simulation Package (VASP) software package.
Here, and are the total energies of the crystal with the point defect in charge state q and of the neutral pristine crystal, respectively. μi is the chemical potential of the element i, while ni is the number of atoms added to (ni > 0) or removed from (ni < 0) the crystal to create the defect. EF is the chemical potential of the electrons, i.e., the Fermi level, which is written as EF = EVBM + ΔEF, where VBM is the valence band maximum. Finally, Ecorr is a correction term that accounts for (i) the spurious electrostatic interaction between the periodic images of the defect and their interaction with the compensating homogeneous background charge and (ii) the potential shift between the pristine and defect systems.
For more details on the methodology of point defect calculations, we refer the reader to the excellent review papers on the topic.297–300
All calculations have been performed using the HSE06 functional with the default mixing parameter α = 0.25, a plane wave cutoff of 800 eV, and forces converged to 0.01 eV/Å. The lattice of the hBN crystal was fixed at the experimental parameters (a = 2.50 Å and c = 6.64 Å).301 The bandgap of the pristine crystal was determined to be 5.58 eV (using 8 × 8 × 4 k-points), in good agreement with the experimental bandgap of 6.08 eV.302 The structure of the point defects was relaxed in a 4 × 4 × 2 (128 atom) supercell using Γ-point k-point sampling. For each defect, three different charge states (q = 1, 0, −1) were considered. The corrections (Ecorr) due to image charges and potential alignment are evaluated following Freysoldt, Neugebauer and Van de Walle303 as implemented in GPAW.
Figure 16 shows the defect formation energies as a function of Fermi level for CN and CB at N-rich and N-poor conditions, respectively. We can see that under N-rich conditions, CB is energetically lower, whereas CN is favorable under N-poor conditions. CB shows a 1/0 charge transition at 3.73 eV above the VBM, whereas CN has a 0/−1 charge transition at 3.26 eV deep inside the bandgap. We find good agreement with a similar study295 also employing plane waves and the HSE06 functional (VASP calculations). Minor discrepancies can be attributed to the use of a different supercell size and a slightly higher fraction of the non-local mixing parameter (α = 0.31).
Defect formation energies for CN and CB under N-rich and N-poor conditions, respectively. The dashed lines are reproduced from Ref. 295.
Defect formation energies for CN and CB under N-rich and N-poor conditions, respectively. The dashed lines are reproduced from Ref. 295.
E. Point-group symmetry representations
GPAW allows for the automatized assignment of point-group symmetry representations for the pre-computed Kohn–Sham wavefunctions. This can be used for determining the wave-function symmetry representations for both molecules304 and extended structures305 to analyze, for example, the symmetry-involved degeneracy of the bands and selection rules for dipole transitions.
When doing the analysis, the user needs to input the coordinates of the center of symmetry (typically the coordinates of a single atom) and the point group for which the analysis is run. It is possible to analyze only a part of the wave function by selecting a cutoff radius from the center of symmetry beyond which the parts of the wave function are neglected. This enables the investigation of the purity of the local symmetry even if the symmetry of the Hamiltonian is broken far from the center of symmetry.304,305 To date, point groups of C2, C2v, C3v, D2d, D3h, D5, D5h, I, Ih, Oh, Td, and Th are implemented.
F. Band-structure unfolding
When studying defect formation, charge-ordered phases, or structural phase transitions, it is often necessary to perform DFT calculations on a supercell. A super-cell (SC) calculation comes with the cost of having to account for many more electrons in the unit cell when compared to the primitive cell (PC). This implies that, besides the increased computational effort, the band structure of a SC contains more bands in a smaller Brillouin zone as compared to the PC. In order to compare electronic band structures between SC and PC, unfolding the band structure of the SC into that of the primitive cell (PC) becomes convenient.
G. The QEH model
The quantum-electrostatic heterostructure (QEH)308 model is an add-on GPAW feature for calculating the dielectric response and excitations in vertical stacks of 2D materials, also known as van der Waals (vdW) heterostructures. The QEH model can be used independently from the GPAW code, but it relies on the GPAW implementation for the calculation of the fundamental building blocks used by the model, as elaborated below.
The dielectric screening in 2D materials is particularly sensitive to changes in the environment and depends on the stacking order and thickness of the 2D heterostructure, providing a means to tune the electronic excitations, including quasi-particle bandgaps and excitons. While the dielectric response of freestanding layers can be explicitly represented ab initio in GPAW at the linear-response TDDFT, GW, and BSE levels of theory, the lattice mismatch between different 2D layers often results in large supercells that make these many-body approaches infeasible. Since the interaction between stacked layers is generally governed by van der Waals interactions, the main modification to the non-interacting layers’ dielectric response arises from the long-range electrostatic coupling between layers.
From the density-response function, the dielectric function is obtained on the basis of monopole/dipole perturbations on each layer in the heterostructure. While building blocks pre-computed with GPAW for a large variety of 2D materials are provided with the QEH package, GPAW offers the possibility of calculating custom building blocks for any 2D material, as explained in Ref. 310.
As an illustrative example of the QEH model, we show how the static dielectric function of a heterostructure can be engineered by multi-layer stacking. Figure 17 shows the static dielectric function of a vdW heterostructure made up of N MoS2 layers and N WSe2 layers. We see that the dielectric function increases significantly as a function of the number of layers, eventually approaching a bulk limit. The knowledge of the layer-dependence of the dielectric response for such a heterostructure could be further exploited to investigate inter- and intra-layer excitonic properties and band-edge renormalization effects.
The static macroscopic dielectric function of a vdW heterostructure interface as a function of the number of layers. The heterostructure is made up of N MoS2 layers on one half and N WSe2 layers on the other half (see inset). Increasing the number of layers eventually leads to a bulk-like limit for the chosen stacking configuration.
The static macroscopic dielectric function of a vdW heterostructure interface as a function of the number of layers. The heterostructure is made up of N MoS2 layers on one half and N WSe2 layers on the other half (see inset). Increasing the number of layers eventually leads to a bulk-like limit for the chosen stacking configuration.
H. Solvent models
The presence of a solvent has a large effect on the energetics and the electronic structure of molecules or extended surfaces. In particular, the arguably most important solvent, water, is able to stabilize ions, or zwitterions, that would not form in the gas phase. The main effect relates to the large permittivity of water (ɛr = 78) that effectively screens Coulomb interactions.
A convenient and computationally lean method to describe this effect is the inclusion of a position-dependent solvent permittivity ɛ(r) in the electrostatics via the Poisson solver.311 The solvent is represented solely as a polarizable continuum that averages out all movements and re-arrangements of the solvent molecules and their electrons. The computational cost is, therefore, practically the same as a calculation in a vacuum.
This implementation allows the calculation of the solvation free energies of neutral and ionic species in solution.311,312 The nature of the solvent is described by its permittivity, an effective repulsive potential characterizing the smooth solvent distribution around the solute, and an effective surface tension that summarizes dispersive and repulsive interactions between the solute and solvent.311 The description is, therefore, well suited for any solvent that can be effectively represented by a continuum, where usually high-permittivity solvents show the largest effects. Local solute–solvent interactions like hydrogen bonds can be considered by the explicit inclusion of single solvent molecules as part of the solute.313 The solvent model works with periodic as well as finite boundary conditions but is so far available for LCAO and FD modes only. The model can be further applied to periodic surfaces interfaced with an electrolyte to reproduce reasonable potential drops within the simulation of electrochemical reaction processes, as we will elaborate in the following.
I. Charged electrochemical interfaces
Simulating atomistic processes at a solid–liquid interface held at a controlled electrode potential is most appropriately performed in the electronically grand-canonical ensemble.316,318,319 Here, electrons can be exchanged dynamically with an external electron reservoir at a well-defined electrochemical potential. In a periodic system, a non-zero net charge would lead to divergence of the energy; therefore, any fractional electrons that are added to or removed from the system must be compensated by an equal amount of counter charge. Several approaches able to account for this change in boundary conditions have recently been brought forward.317–321 In GPAW, this is conveniently accomplished with the introduction of a jellium slab of equal and opposite charge to the needed electronic charge; the jellium is embedded in an implicit solvent localized in the vacuum region above the simulated surface (cf. Sec. IX H). As a particular highlight of this Solvated Jellium Method (SJM),322 as it is known in GPAW, we are able to localize the excess charge on only the top side of the simulated atomistic surface, which occurs naturally by introducing the jellium region solely in the top-side vacuum and electrostatically decoupling the two sides of the cell via a dipole correction. Both a purely implicit and a hybrid explicit-implicit solvent can be applied.
In constant-potential mode, the potential is controlled by a damped iterative technique that varies Ne to find the target ϕe; in practice, a trajectory (such as a relaxation or nudged elastic band) is run, where in a first series of SCF cycles the potential equilibrates. Upon achieving a target potential within the given threshold, the code will conduct the chosen geometry-optimization routine under constant potential control.
J. Constrained DFT
Constrained DFT (cDFT)331–333 is a computationally efficient method for constructing diabatic or charge/spin-localized states. GPAW includes a real-space implementation of cDFT,334 which can be used in both the FD and LCAO modes. Compared to most cDFT implementations, in GPAW, the periodicity can be chosen flexibly between isolated molecules and periodic systems in one-, two-, or three-dimensions with k-point sampling.
cDFT has been widely used for computing transfer rates within Marcus theory, which depends on the reorganization and reaction (free) energies and the diabatic coupling matrix element; the GPAW-cDFT implementation includes all the needed tools for obtaining these parameters for bulk, surface, and molecular systems.334,335 Recently, the cDFT approach has been combined with molecular-dynamics methods to compute the reorganization energy at electrochemical interfaces336 as well as with grand-canonical DFT methods (see Sec. IX A) to construct fixed electron-potential diabatic states.337
K. Orbital-free DFT
The OFDFT scheme implemented in GPAW offers the advantage of accessing all-electron values while maintaining computational linear-scaling time with respect to system size. To achieve this, we employ the PAW method in conjunction with real-space methods, obtaining a mean absolute error of 10 meV per atom when compared to reference all-electron values.339
While OFDFT functionals perform better using local pseudopotentials in bulk materials, the OFDFT PAW implementation can be interesting for assessing density functionals. For example, in studying large-Z limits or semiclassical limits of density functionals, the all-electron values allow us to find highly performing OFDFT functionals.340
L. Zero-field splitting
M. Hyperfine coupling
X. OUTLOOK
As described in this review, GPAW is a highly versatile code that is both maintenance-, user-, and developer-friendly at the same time. The continued expansion of the code requires substantial effort and can be lifted only because of the dedicated team of developers contributing at all levels. There are currently a number of ongoing as well as planned developments for GPAW, which will further improve the performance and applicability of the code. We are currently finishing a major refactoring of the code, which will make it even more developer-friendly and facilitate easier implementation of new functionality.
Another priority is to improve the parallelization of hybrid functional calculations in plane wave mode by enabling parallelization over bands and k-points. In the same vein, there is ongoing work to support LCAO-based hybrid functional calculations using a resolution-of-identity approach. A natural next step would then be LCAO-based GW calculations. Such a method could potentially be very efficient compared to plane wave calculations, but it is currently unclear if the accuracy can be maintained with the limited LCAO basis. In relation to quasiparticle calculations, there are plans to implement (quasiparticle) self-consistent GW and vertex corrected GW using nonlocal xc-kernels from TDDFT. Constrained RPA calculations that provide a partially screened Coulomb interaction useful for ab initio calculation of interaction parameters in low-energy model Hamiltonians are currently being implemented.
Underlying any GPAW calculation are the PAW potentials. The current potentials date back to 2009. A new set of potentials, including both soft and norm-conserving potentials (for response function calculations), is being worked on.
As described herein, GPAW already has an efficient implementation of real-time TDDFT in the LCAO basis, while Ehrenfest dynamics is supported only in the comparatively slower grid mode. Work to enable Ehrenfest dynamics in LCAO mode is ongoing.
The current version of GPAW supports GPU acceleration only for standard ground-state calculations. The CuPy library greatly simplifies the task of porting GPAW to GPU, and we foresee that large parts of the code, including more advanced features such as linear response and GW calculations, will become GPU compatible.
ACKNOWLEDGMENTS
K.S.T. acknowledges funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program Grant No. 773122 (LIMA) and Grant Agreement No. 951786 (NOMAD CoE). K.S.T. is a Villum Investigator supported by VILLUM FONDEN (Grant No. 37789). Funding for A.O.D., G.L., and Y.L.A.S. was provided by the Icelandic Research Fund (Grant Nos. 196279, 217734, and 217751, respectively). F.N. has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie Grant Agreement No. 899987. M.M.M. was supported by the Academy of Finland (Grant No. 338228). T.O. acknowledges support from the Villum Foundation Grant No. 00029378. S.K. and K.W.J. acknowledge support from the VILLUM Center for Science of Sustainable Fuels and Chemicals, which is funded by the VILLUM Fonden research (Grant No. 9455). T.B. was funded by the Danish National Research Foundation (DNRF Grant No. 146). J.S. acknowledges funding from the Independent Research Fund Denmark (DFF-FTP) through Grant No. 9041-00161B. C.S. acknowledges support from the Swedish Research Council (VR) through Grant No. 2016-06059 and funding from the Horizon Europe research and innovation program of the European Union under the Marie Skłodowska-Curie Grant Agreement No. 101065117. Partially funded by the European Union. The views and opinions expressed are, however, those of the author(s) only and do not necessarily reflect those of the European Union or REA. Neither the European Union nor the granting authority can be held responsible for them. T.S. received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (Grant Agreement No. 756277-ATMEN). O.L.-A. has been supported by Minciencias and the University of Antioquia (Colombia). K.T.W. was supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences, Chemical Sciences, Geosciences, and Biosciences Division, Catalysis Science Program, and the SUNCAT Center for Interface Science and Catalysis. G.K. acknowledges funding from V-Sustain: The VILLUM Centre for the Science of Sustainable Fuels and Chemicals (Grant No. 9455). Additional funding: Knut and Alice Wallenberg Foundation (Grant No. 2019.0140; J.F. and P.E.), the Swedish Research Council (Grant No. 2020-04935; J.F. and P.E.), the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie Grant Agreement No. 101065117 (C.S.). Computations and code development work have been enabled by the resources provided by the Niflheim Linux cluster supercomputer installed at the Department of Physics at the Technical University of Denmark, CSC–IT Center for Science, Finland, through national supercomputers and through access to the LUMI supercomputer owned by the EuroHPC Joint Undertaking, and the National Academic Infrastructure for Supercomputing in Sweden (NAISS) at NSC, PDC, and C3SE, partially funded by the Swedish Research Council through Grant Agreement No. 2022-06725. We gratefully acknowledge these organizations for providing computational resources and facilities.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Jens Jørgen Mortensen: Writing – original draft (equal). Ask Hjorth Larsen: Writing – original draft (equal). Mikael Kuisma: Writing – original draft (equal). Aleksei V. Ivanov: Writing – original draft (equal). Alireza Taghizadeh: Writing – original draft (equal). Andrew Peterson: Writing – original draft (equal). Anubhab Haldar: Writing – original draft (equal). Asmus Ougaard Dohn: Writing – original draft (equal). Christian Schäfer: Writing – original draft (equal). Elvar Örn Jónsson: Writing – original draft (equal). Eric D. Hermes: Writing – original draft (equal). Fredrik Andreas Nilsson: Writing – original draft (equal). Georg Kastlunger: Writing – original draft (equal). Gianluca Levi: Writing – original draft (equal). Hannes Jónsson: Writing – original draft (equal). Hannu Häkkinen: Writing – original draft (equal). Jakub Fojt: Writing – original draft (equal). Jiban Kangsabanik: Writing – original draft (equal). Joachim Sødequist: Writing – original draft (equal). Jouko Lehtomäki: Writing – original draft (equal). Julian Heske: Writing – original draft (equal). Jussi Enkovaara: Writing – original draft (equal). Kirsten Trøstrup Winther: Writing – original draft (equal). Marcin Dulak: Writing – original draft (equal). Marko M. Melander: Writing – original draft (equal). Martin Ovesen: Writing – original draft (equal). Martti Louhivuori: Writing – original draft (equal). Michael Walter: Writing – original draft (equal). Morten Gjerding: Writing – original draft (equal). Olga Lopez-Acevedo: Writing – original draft (equal). Paul Erhart: Writing – original draft (equal). Robert Warmbier: Writing – original draft (equal). Rolf Würdemann: Writing – original draft (equal). Sami Kaappa: Writing – original draft (equal). Simone Latini: Writing – original draft (equal). Tara Maria Boland: Writing – original draft (equal). Thomas Bligaard: Writing – original draft (equal). Thorbjørn Skovhus: Writing – original draft (equal). Toma Susi: Writing – original draft (equal). Tristan Maxson: Writing – original draft (equal). Tuomas Rossi: Writing – original draft (equal). Xi Chen: Writing – original draft (equal). Yorick Leonard A. Schmerwitz: Writing – original draft (equal). Jakob Schiøtz: Writing – original draft (equal). Thomas Olsen: Writing – original draft (equal). Karsten Wedel Jacobsen: Writing – original draft (equal). Kristian Sommer Thygesen: Writing – original draft (equal).
DATA AVAILABILITY
The data that support the findings of this study are openly available in the git-repository at https://gitlab.com/jensj/gpaw-paper-2023.