We present an overview of the onetep program for linear-scaling density functional theory (DFT) calculations with large basis set (plane-wave) accuracy on parallel computers. The DFT energy is computed from the density matrix, which is constructed from spatially localized orbitals we call Non-orthogonal Generalized Wannier Functions (NGWFs), expressed in terms of periodic sinc (psinc) functions. During the calculation, both the density matrix and the NGWFs are optimized with localization constraints. By taking advantage of localization, onetep is able to perform calculations including thousands of atoms with computational effort, which scales linearly with the number or atoms. The code has a large and diverse range of capabilities, explored in this paper, including different boundary conditions, various exchange–correlation functionals (with and without exact exchange), finite electronic temperature methods for metallic systems, methods for strongly correlated systems, molecular dynamics, vibrational calculations, time-dependent DFT, electronic transport, core loss spectroscopy, implicit solvation, quantum mechanical (QM)/molecular mechanical and QM-in-QM embedding, density of states calculations, distributed multipole analysis, and methods for partitioning charges and interactions between fragments. Calculations with onetep provide unique insights into large and complex systems that require an accurate atomic-level description, ranging from biomolecular to chemical, to materials, and to physical problems, as we show with a small selection of illustrative examples. onetep has always aimed to be at the cutting edge of method and software developments, and it serves as a platform for developing new methods of electronic structure simulation. We therefore conclude by describing some of the challenges and directions for its future developments and applications.

## I. PRINCIPLES OF LINEAR-SCALING DENSITY FUNCTIONAL THEORY

A common starting point for performing density functional theory (DFT) calculations with linear-scaling computational cost is to use the one-particle density matrix

which provides a complete description of the fictitious Kohn–Sham (KS) system, where {*ψ*_{i}} are the KS orbitals and {*f*_{i}} are their occupancies. KS DFT calculation methods working directly with the density matrix have been developed such as the approach by Li, Nunes, and Vanderbilt.^{1} The density matrix is a quantity that decays exponentially with the distance |**r** − **r**′| (for gapped systems at zero temperature and metals at a finite temperature), as formulated by Walter Kohn in his “nearsightedness of electronic matter” principle.^{2,3} To take practical advantage of this principle and develop methods with reduced or linear-scaling computational cost, we need to truncate the exponentially decaying tail of the density matrix. As the system size (number of atoms) increases, we will eventually reach the point where the remaining amount of information increases linearly with the size of the system. This is not straightforward to implement if we use the KS orbitals directly. A more practical approach is to work with a set of localized orbitals {*ϕ*_{α}} that, in general (but not necessarily), are non-orthogonal as non-orthogonality can result in better localization.^{4} These localized orbitals can be considered to be connected to the KS orbitals via a linear transformation **M** as follows:

where we have used Greek indices for non-orthogonal components and Latin indices for orthogonal components. We also use the Einstein implicit summation convention for repeated greek indices. We will follow these conventions throughout this paper. It is easy to show that the density matrix can be expressed in terms of the local orbitals in the following form:^{5,6}

where the matrix **K**, which we call the density kernel, is the representation of the density matrix in the basis of the localized orbitals as follows:

Linear-scaling schemes can be developed by following this framework. Calculations with such schemes need to obey the idempotency property for the density kernel

which is equivalent to the necessary requirements of KS orbital orthonormality and integer occupancies of 1 or 0 (for calculations in materials with a gap).

## II. ONETEP: LINEAR-SCALING DFT WITH LARGE-BASIS SET ACCURACY

One class of linear-scaling approach typically uses a fixed basis set of localized functions, such as Gaussian atomic orbitals or numerical atomic orbitals. However, it is difficult to systematically improve the quality of such basis sets toward convergence while simultaneously maintaining linear-scaling behavior, as this requires the inclusion of a large number of extended atomic basis functions, which can make it difficult to impose the necessary localization constraints on the density matrix.

To overcome this limitation, methods have emerged, which aim to achieve linear-scaling cost with near-complete basis set accuracy or “plane wave accuracy.” The main idea in these methods is to optimize both the non-orthogonal localized orbitals {*ϕ*_{α}} and the density kernel **K**, both subject to localization constraints, so as to achieve the variational freedom of a large basis set while retaining the localization and small matrix sizes of a minimal AO basis. onetep^{7} belongs to this class of methods; other codes in this category include conquest^{8} and bigdft.^{9} In onetep, the localized orbitals {*ϕ*_{α}} are called non-orthogonal generalized Wannier functions (NGWFs) and are optimized self-consistently during the calculation.^{10}

In onetep, we aim to achieve controllable accuracy equivalent to that of plane-wave pseudopotential calculations, so the basis set we use to expand the local orbitals is a set of periodic sinc (psinc) basis functions {*D*_{i}},^{10,11} which are unitary transformations of plane waves. The quality of the psinc set is controlled by a single parameter, the psinc kinetic energy cut-off, in analogy with plane wave codes. Thus, the psinc basis set provides a uniform description of space, similar to plane waves, and the kinetic energy cutoff controls the resolution of this description, providing a finer resolution with an increase in the cut-off value. Typical kinetic energy cutoff values in routine calculations are in the range of 500–1000 eV, although higher or lower values are possible and have been used in specialized cases.

Another advantage of using a psinc basis set is that many operations on the NGWFs that are needed for the construction of the Hamiltonian in the NGWF representation can be performed efficiently using Fast Fourier Transforms (FFTs), as in plane wave codes. This involves transforming the NGWFs to reciprocal space, applying the appropriate operator in diagonal form (e.g., the kinetic energy operator), and transforming back into real space. However, unlike in plane wave codes, these operations need to be performed with computational effort that is both relatively small and independent of the size of the system (i.e., independent of the number of atoms). To achieve this, we perform the FFTs on NGWFs in a miniature simulation cell, which we call the “FFT box.”^{12} In order to preserve important properties of the integrals [such as hermiticity and uniformity of the quantum mechanical (QM) operators] computed with the FFT box, particular care must be taken in how this box is defined and used, as we have described in early publications.^{13} These publications also present the use of the FFT box for the efficient construction of the electronic density and the NGWF gradient.^{10}

The ground state KS energy can now be considered as the minimum of a functional of the NGWFs and the density kernel

where the local potential $V^loc=V^ps,loc+V^H$ is the sum of the Hartree potential and the local pseudopotential, $V^nl$ is the non-local part of the pseudopotential in the Kleinman and Bylander representation, and $n=nr=\rho r,r$ is the electronic density.

As the NGWFs are optimized to minimize the ground state energy, it turns out that they can provide a good description of the valence bands, but a very poor description of the conduction bands. For this reason, in order to provide a set of functions suitable for describing the conduction bands (e.g., for excited state calculations), we can optimize a second set of NGWFs {*χ*_{α}}, after the completion of the ground state calculation, which provide an accurate description of a set of low-lying conduction bands (see Sec. IV E 1).

## III. PARALLELIZATION WITHIN ONETEP

onetep was designed from its inception as a parallel code,^{14} and the extent and efficiency of its parallelism have continued to improve during its development.^{15,16} Originally created within the paradigm of the Message Passing Interface (MPI), the code is now able to extensively harness “hybrid” parallelism enabled by OpenMP multithreading. MPI divides the available computational resources into processes, each of which can then be subdivided by OpenMP into threads, each occupying a single physical core. Each of the principal data structures and computational tasks is distributed in a manner designed to best exploit the locality on which the linear-scaling formalism relies. Data and tasks associated with atoms are distributed according to an algorithm designed to balance the competing demands of locality, aimed at ensuring nearby atoms are on the same or nearby processes to minimize communication, and load-balance, aimed at ensuring all processes have nearly equal numbers of NGWFs. This algorithm first sorts all the atoms based on a Peano space-filling curve^{17} and then assigns them to processes based on a modified greedy algorithm,^{18} looping over sets of atoms with decreasing numbers of NGWFs so that the largest are assigned first.

Data associated with each atom, particularly the coefficients of each atom-centered NGWF, are distributed and held locally on the process of that atom. The functions of each atom are then ideal for a second level of parallelism, implemented using OpenMP threading, whereby each thread treats a single function at a time, for example, when applying an operator to the set of functions. As demonstrated in Ref. 16, this leads to efficient scaling with respect to both MPI processes and threads, enabling the code to take advantage of very large-scale HPC (high-performance computing) architectures, with efficient runs on as many as 16 384 parallel cores. Subsequent work has extended the threaded parallelism to the implicit solvent model and exact exchange functionalities, detailed in Secs. IV D 1 and IV B 1. However, more optimization is always possible, and a current limitation is that MPI communications within threaded regions are handled with CRITICAL regions that are only entered by one thread at a time, which causes whole-cell grid operations to scale relatively poorly with thread count beyond around 8–10 threads.

Figures 1 and 2 demonstrate the linear-scaling capabilities of onetep in two very different systems. Figure 1 presents results for amyloid fibril segments of varying sizes using several different functionals, demonstrating linear-scaling behavior at least up to 14 000 atoms.^{19} Figure 2 presents results for slabs of anatase-structure TiO_{2} of varying sizes and shows linear-scaling behavior at least up to 3500 atoms.^{20}

## IV. FEATURES AND CAPABILITIES OF ONETEP

In this section, we describe the various capabilities of onetep. Each feature has its own subsection, which includes a brief introduction to the functionality as well as relevant citations to other publications for those interested in either the theory or the application of the functionality. The capabilities are grouped thematically—the themes are “Improvements to core algorithms and functionality,” “Beyond-DFT methods,” “Dynamics and structure optimization,” “Including environmental effects,” “Excitations and spectroscopy,” and “Partitioning the density/energy.”

### A. Improvements to core algorithms and functionality

#### 1. Projector augmented wave methods

Blöchl’s projector augmented wave (PAW) approach enables all-electron (AE) calculations with the efficiency, systematic accuracy, and transferability of plane-wave pseudopotential calculations.^{21} PAW works by relating the soft pseudo-(PS-)wavefunctions, easily represented by plane-waves or on a grid, to AE wavefunctions, varying rapidly near the nuclei, by means of a linear transformation augmenting the soft part of the wavefunction with partial waves near each atom. PAW increases the accuracy feasible at a given computational cost, enabling efficient treatment of, for example, transition metal oxides, lanthanides, and actinides, which would be highly challenging for norm-conserving pseudopotentials (NCPPs). PAW also allows calculation of quantities requiring knowledge of wavefunctions, densities, and electric fields in the regions near to the atomic nuclei, which are not correctly represented by norm-conserving methods. Hine^{20} introduced a novel adaptation of the PAW formalism suited to linear-scaling DFT within onetep. The nomenclature here follows that of Ref. 22, which also contains an introduction to the PAW method as applied to traditional DFT.

The fundamental PAW transformation relates AE orbitals |*ψ*_{n}⟩ to PS orbitals $|\psi \u0303n$,

$|\psi \u0303n$ are the projectors, and |*φ*_{ν}⟩ and $|\phi \u0303\nu $ are the AE and PS partial waves (see Refs. 22 and 20), respectively.

Within the linear-scaling DFT approach, we do not have direct access to the eigenstates, so we can instead use an equivalent transformation to relate the AE density matrix to the PS density matrix,

The NGWFs in onetep, $\varphi \alpha r$, which are constructed out of psinc functions and thus equivalent to the plane waves of standard PAW, are an ideal means to represent this “soft” part of the density matrix (DM), so one can define

Notably, this allows one to retain the same basic forms of almost all the optimization algorithms for the kernel and NGWFs, having first redefined the overlap matrix between NGWFs to account for the PAW overlap operator,

Further complications associated with the representation of the compensation density, non-local potential energy terms, NGWF gradient, forces, and preconditioning must also be dealt with: these are discussed in detail in Ref. 20. The result is a method that has demonstrably superior convergence properties compared to NCPPs (see Fig. 2 of Ref. 20) and thus can achieve greater accuracy for a given computational cost. At fixed parameters such as NGWF radius and cutoff energy, the overhead of a PAW calculation is relatively low, at around 10%–20% compared to NCPPs. Linear-scaling computational cost remains just as achievable as for NCPPs, since the majority of the terms in PAW appear naturally on a per-atom basis. Reference 20 also shows excellent parallel scaling, demonstrating efficient simulation of 1824 atoms on 1920 cores.

#### 2. Ensemble density functional theory

Ensemble density functional theory (EDFT)^{23–25} calculations are available in onetep. Within this formalism, the KS states have fractional occupancies, which can be determined in several different ways, including the Methfessel–Paxton,^{26} Gaussian smearing,^{27,28} and Fermi–Dirac^{24} schemes. In onetep, the occupancies are determined by the Fermi–Dirac distribution, regulated by a constant electronic temperature. The macroscopic state function is the Helmholtz free energy, which obeys the variational principle of quantum mechanics. onetep directly minimizes the Helmholtz free energy functional to self-consistently find the KS states and their fractional occupancies.^{29}

The onetep approach requires a Hamiltonian diagonalization step in each SCF (self-consistent field) iteration, which makes the overall cost scale with the cube of the system size. However, the cost of the Hamiltonian diagonalization is kept to a minimum as a consequence of using a minimal set of NGWFs. In this way, the dimension of the Hamiltonian matrix is reduced to the minimum possible, and the prefactor due to diagonalization is reduced significantly. The remaining steps of the method can be performed at a linear-scaling cost due to the sparsity of the matrices required in the NGWF representation.

Additionally, parallel diagonalization techniques can be used to scale the diagonalization of the Hamiltonian to many computational cores and to distribute the associated memory requirements. As a result, simulations on metallic systems with thousands of atoms can be achieved using the onetep EDFT implementation.

In free-spin EDFT calculations, the charge and net spin of a system do not necessarily have to be integers, and the net spin can additionally be relaxed during a calculation. This makes the exploration of systems with unknown magnetic properties much easier.

#### 3. Fermi operator expansion

As in linear-scaling methods for insulators, the idea behind the Fermi operator expansion (FOE)^{30,31} is to be able to perform an SCF calculation without using a cubic-scaling diagonalization step. FOE, however, also works for metallic systems where we want to apply fractional occupancies to the KS states.

In FOE, the density matrix, with finite-temperature electronic occupancies, is constructed as a polynomial expansion of the matrix form of the Fermi–Dirac occupancy formula,

where **I** is the identity matrix, **H** is the Hamiltonian matrix, **S** is the overlap matrix, *μ* is the chemical potential, and *β* = 1/*k*_{B}*T*. Equation (11) cannot be applied directly as the condition number of the matrix to be inverted will be too large in general. Many options for expansions have been proposed over the last three decades.^{32–38} In onetep, we have implemented the AQuA-FOE method described in detail in Ref. 39. AQuA-FOE takes advantage of the trigonometric mapping between the Fermi–Dirac occupancy formula and the hyperbolic tangent function. In this way, we can construct the density kernel at an electronic temperature many times that of the target temperature. The target temperature density kernel can then be recovered by repeatedly halving the temperature of the density kernel with the matrix analog of the hyperbolic tangent double angle formula. A fixed expansion length is always employed at the high temperature to generate the density matrix, requiring fewer terms to reach a given accuracy than at lower temperature.

To build the high temperature density kernel, we use a Chebyshev expansion of Eq. (11) using the fast re-summing algorithm of Liang *et al.*^{33} As an alternative option, this algorithm can be used as the sole FOE method in onetep without the annealing and quenching steps.

The AQuA-FOE method in onetep can also find the chemical potential that conserves electron number by using a safeguarded Newton’s algorithm and the analytic derivative of the number of electrons with respect to the chemical potential. The chemical potential of the density kernel is then updated using the addition theorem of the hyperbolic tangent function so that the expansion need not be recalculated from scratch.

The AQuA-FOE method is presently invoked within the EDFT method (Sec. IV A 2) in onetep in order to produce finite temperature density kernels for each trial Hamiltonian matrix. In the future, this method could be used within density-mixing type or direct minimization algorithms.

#### 4. Hybrid NGWFs

onetep exploits the nearsightedness of the electronic structure (Sec. I) by using NGWFs that are strictly localized within spherical regions and which give rise to sparse matrix representations that may be truncated in real space in a systematically controllable manner and enable scalability of the method to very large systems. There are cases, however, in which it is advantageous to use a hybrid approach, in which the representation of electronic structure is localized along certain directions (Wannier-like) and delocalized along others (Bloch-like). Use cases of such a hybrid approach include layered systems and surface slabs, in which localization is applied only in the direction normal to the layer or slab, and nanowires, in which localization is applied in the two directions normal to the nanowire axis.

The concept of such so-called “hybrid Wannier functions” was first introduced by Sgiarovello *et al.*^{40} This idea, within the formalism of maximally localized Wannier functions,^{41} was subsequently used, for example, to study the dielectric properties of layered materials^{42,43} and topological insulators.^{44,45} In these applications, the hybrid Wannier functions were obtained by post-processing the Bloch eigenstates from a traditional DFT calculation.^{46,47} In onetep, we have introduced an approach in which hybrid NGWFs are optimized directly *in situ*, avoiding the system-size bottleneck associated with the *O*(*N*^{3}) scaling of the traditional DFT calculation. We have also included the flexibility for the user to choose the lattice vector directions that are represented by a localized description and those that are extended, providing the full range of options in going from fully localized NGWFs to hybrid NGWFs that are localized in either two directions or one direction to a fully extended Bloch-like representation (see Fig. 3). Just as in the case of fully localized NGWFs, the hybrid NGWFs are constrained to be strictly zero outside of their localization regions.

Figure 4 shows the total energy per atom, calculated with hybrid NGWFs that are localized along the [001] direction (“2D-HNGWFs”) with different localization lengths *r*_{cut}, for a 2 × 2 × 10 slab supercell of an eight-atom cubic diamond carbon conventional unit cell. The slab is terminated with hydrogen atoms on its {001} surfaces, and periodic replicas in the [001] direction are separated by 15 Å vacuum regions. We use a kinetic energy cutoff of 1200 eV for the psinc basis, norm-conserving pseudopotentials for the C and H atoms, the local-density approximation (LDA) for exchange and correlation, and the Brillouin zone is sampled at the Γ-point only. The carbon diamond lattice parameter is set to its equilibrium bulk value within LDA-DFT, which we calculate to be 3.539 Å. The energies shown in Fig. 4 are referenced to an equivalent calculation with the traditional plane-wave code castep.^{49} As can be seen, with hybrid NGWFs, it is possible to attain equivalence with a traditional plane-wave approach using a localization length of around 5 Å.

While the relaxation of the localization constraints results in a less favorable scaling of computational effort with system-size than using fully localized orbitals, the computational cost still only scales linearly with the size of the system along the directions in which the localization constraints are still applied (e.g., the layer/slab normal in a layered/slab system). Furthermore, in our experience, there are additional advantages that offset the nominal increase in computational cost per self-consistent iteration: the use of hybrid NGWFs has a tendency to improve the conditioning of the energy minimization, resulting in faster convergence; and the accuracy of the ionic forces tends to be better with a reduction in the “egg-box” effect^{50–54} that is usually present in real-space local-orbital methods. For further details on these issues, refer to Ref. 48.

#### 5. Poisson equation multigrid solver (DL_MG)

dl_mg^{55} is a library that solves variants of the Poisson equation with the general form

in 3D over a cuboid domain with periodic, Dirichlet or mixed boundary conditions. A regular discretization grid is used over which the derivatives are approximated with finite differences of adjustable order. The library is developed in close coordination with onetep and is distributed as part of the package.

In Eq. (12), $\epsilon r$ is the relative permittivity, $vr$ is the electrostatic potential (ESP), and $ntotr$ is the fixed charge density. These terms feature in the generalized Poisson equation (GPE)

The second term on the right-hand side of Eq. (12) appears in the Poisson–Boltzmann equation (P–BE), representing the charge density of mobile ions (e.g., electrolyte). In this term, *z*_{i} and $c\xafi$ are the electric charge and the average bulk concentration (*N*_{i}/*V*) for the mobile ion of type *i*, respectively, *β* = 1/*k*_{B}*T* is the inverse temperature, and $0\u2264\lambda r\u22641$ is the ion accessibility function that accounts for the short range repulsion effect between mobile ions and fixed ionic cores. *α* and *γ* are constants that depend on the used unit systems.

In addition to solving the non-linear Poisson–Boltzmann equation [NLP–BE, Eq. (12)], dl_mg employs specialized algorithms to solve both the GPE [Eq. (13)] and linearized Poisson–Boltzmann equation (LP–BE),

In brief, dl_mg finds the solution for the GPE or P–BE in two stages:

A solution is found for the second order finite difference discretization using a multigrid algorithm for the GPE/LP–BE or a combination of the inexact Newton method and multigrid for the NLP–BE.

The solution corresponding to the chosen finite difference order (if larger than two) is found with a defect correction procedure that starts from the second order solution. The largest finite difference order implemented is 12.

See Ref. 55 for a detailed description of the theory and methods employed in dl_mg.

### B. Beyond-DFT methods

#### 1. Orbital-dependent exchange–correlation functionals

On the well-known Jacob’s ladder of density functional approximations^{56,57} (Fig. 5), orbital-dependent functionals occupy the higher rungs above the local density approximation (LDA) and the generalized gradient approximation (GGA). Dependence on the form of the KS orbitals is added on top of the usual dependence on density *n* in LDA and density gradient ∇*n* in GGA. This increases the flexibility available in the functional forms, enabling the satisfaction of additional physical constraints and/or better fitting of empirical data. The upshot of this is that improved accuracy is possible, though not guaranteed.

Two families of orbital-dependent functionals are currently supported in onetep: meta-GGAs and hybrids, which depend on the KS orbitals via the kinetic energy density, *τ*, and the Hartree–Fock exchange energy, $ExHF$, respectively. The primary challenge of supporting both families in onetep is maintaining the onetep formalism’s $O(N)$ scaling when evaluating quantities with dependence on the KS orbitals.

##### a. Meta-GGA functionals.

The orbital dependence of the meta-GGAs is via *τ*, i.e.,

with

where the summation is over the occupied KS orbitals. In onetep, *τ* is evaluated in terms of the NGWFs and density kernel,

This quantity has the same structure as the charge density *n*, but with ∇*ϕ*_{α} in place of *ϕ*_{α}, and can be evaluated using the same procedure,^{14} subject to the subtle approximation that extra overlapping pairs of NGWFs *αβ* that would result from the delocalization of {*ϕ*_{α}} upon application of the gradient operator are neglected.^{19}

Once *τ* is constructed, evaluation of $ExcMGGA$ is simple, since the semi-local exchange–correlation energy density *ϵ*_{xc}(**r**) depends only on *n*, ∇*n*, and *τ* at point **r**. The difficulty arises in evaluating the exchange–correlation potential,

and the quantities that depend on this, such as the gradient of *E*[{*ϕ*_{α}}, **K**] [Eq. (6)] with respect to **K** and the NGWFs. Since $ExcMGGA$ is a functional of *τ*, but the dependence of *τ* on *n* is unknown, evaluation of the functional derivative is not straightforward. onetep’s implementation of meta-GGAs sidesteps this issue by employing the “functional derivatives of *τ*-dependent functionals with respect to the orbitals” (FDO) method,^{58,59} which yields a generalized Kohn–Sham approach^{60} with an XC potential comprised of a local GGA-like part and a non-multiplicative orbital-specific *τ*-dependent part. This can be expressed in a “per-NGWF” form

which can be easily inserted wherever the product of the XC potential and an NGWF arises (e.g., in Hamiltonian matrix elements and energy gradient expressions).

In onetep v5.2, two meta-GGA XC functionals are available: PKZB^{61,62} and B97M-rV.^{63,64} The former is an early meta-GGA form (1999), implemented primarily for testing and validation. The latter is a modern and very accurate^{65} semi-empirical functional developed through a combinatorial approach, where the *τ*-dependent functional form itself was obtained by fitting to empirical data. B97M-rV additionally accounts for dispersion effects via a contribution from the (revised) VV10 non-local correlation functional^{66,67} (see also Sec. IV B 2).

Energies computed with meta-GGA XC in onetep using a minimal basis of NGWFs show excellent agreement with large basis set calculations in qchem, and the cost of these calculations scales linearly with system size, *N*.^{19} Figure 1 shows that single point energy calculations on amyloid fibril segments up to 13 696 atoms are only a factor of two to three times more costly for meta-GGAs compared to Perdew–Burke-Ernzerhof (PBE) (a GGA functional). Consequently, even very large systems (∼10^{4} atoms) are accessible with meta-GGA XC in onetep.

##### b. Hybrid functionals.

Hybrid functionals include some fraction of the Hartree–Fock exchange (HFx) energy, $ExHF$. For example, the widely used Becke three-parameter hybrids^{68,69} (e.g., B3LYP^{70}) have the form

mixing gradient-corrected exchange and correlation (from $ExLDA$, $\Delta ExB88$, $EcLDA$, and $\Delta EcGGA$) and HFx in fractions determined by the following three fitted parameters: *a*_{0}, *a*_{x}, and *a*_{c}.

Whereas *τ*(**r**) is a local quantity, depending on the orbitals at **r**, HFx is non-local, i.e.,

with an exchange energy density that depends on the form of the orbitals over all space,

The non-locality of HFx presents a challenge to onetep’s linear-scaling computational framework, which is premised on the “nearsightedness” of electron–electron interactions (Sec. I). Nevertheless, onetep is able to perform hybrid functional DFT with $O(N)$ scaling computational cost.^{71} This is achieved through a combination of a distance-based exchange interaction cutoff and the use of an auxiliary basis of atom-centered truncated spherical waves (SWs).

The HFx energy can be expressed in terms of the NGWFs and density kernel, i.e.,

with exchange matrix elements

where, for the sake of brevity, we have introduced the notation

The distance-based cutoff simply ensures that the number of non-zero elements in **X** increases linearly with system size by setting *X*_{αβ} = 0 where the distance between the centers of NGWFs *ϕ*_{α} and *ϕ*_{β} is greater than the cutoff, *r*_{X} (Fig. 6).

The SW basis reduces the cost of evaluating the electron repulsion integrals (ERIs) in Eq. (24) in two ways. First, inserting a resolution-of-the-identity in the SW basis (SWRI) into Eq. (24) converts the four-index ERIs into products of more manageable two- and three-index objects, i.e.,

where {*f*_{p}} are SWs, *V*^{pq} is the inverse of the Coulomb metric matrix in the SW basis, and (*ϕ*_{α}*ϕ*_{δ}|*f*_{p}) and (*f*_{q}|*ϕ*_{γ}*ϕ*_{β}) are electrostatic integrals over NGWFs and SWs. Second, the evaluation of the two- and three-index objects themselves is simplified by the availability of analytic expressions for the Coulomb potentials of the SWs (SWpots).^{72}

The SWs are products of spherical Bessel functions, *j*_{l}(*x*), and real spherical harmonics, $Zlm(r^)$, localized within spheres of radius *a*,

SWs are a convenient basis in which to expand quantities used in onetep, since, like NGWFs, they are both strictly localized within a sphere (radius *a*, set to the NGWF localization radius) and are systematically improvable with respect to a kinetic energy cutoff parameter.^{73} Since SWs are angular-momentum-resolved, they have also found utility as a basis for onetep’s distributed multipole analysis (DMA, Sec. IV F 1) and projected density of states (p-DOS, Sec. IV E 4) functionality.

Linear-scaling Hartree–Fock exchange in onetep was originally described in 2013.^{71} Although the initial implementation of ERI evaluation was not parallelized, $O(N)$ scaling of computational cost was demonstrated for short polyethylene and polyacetylene chains (∼ 10 s of atoms).

Recent developments have accelerated $O(N)$ Hartree–Fock exchange in onetep, making hybrid functional calculations feasible on much larger systems, with ∼ 2000 atoms becoming accessible with routinely available supercomputing hardware (Fig. 7). This dramatic improvement in performance was achieved by resolving bottlenecks in two major components of the implementation. First, a novel mixed numerical–analytic approach was developed for evaluating elements of the Coulomb metric matrix in the SW basis [**V** in Eq. (26)], reducing the computational cost (memory and execution time) of this step by orders of magnitude.^{75} Second, the parallel implementation of the evaluation of Hartree–Fock exchange was completely overhauled with efficient data distribution among MPI processes, extensive use of OpenMP threading, and intelligent caching of computed quantities in memory.^{76} The excellent parallel scalability of the overhauled implementation is illustrated in Fig. 8.

While great strides have been made in reducing the computational effort associated with hybrid XC calculations in onetep, using semi-local functionals remains considerably less costly. onetep can perform semi-local XC calculations on systems of >10^{4} atoms with routinely available computational resources (see, e.g., Fig. 1). However, applying hybrid XC to systems of this size remains prohibitively expensive. A pragmatic solution to this issue is to employ embedded mean-field theory (EMFT, Sec. IV D 3), wherein hybrid XC may be applied in a small active region (<1000 atoms) that is embedded in a larger environment, treated with a semi-local XC functional.

The release of onetep (v5.2) at the time of writing supports global hybrid functionals in ground-state DFT calculations and conduction NGWF optimization (Sec. IV E 1). The core framework for applying global hybrids in linear-response time-dependent DFT (LR-TDDFT) (Sec. IV E 2; see also Sec. II.C of Ref. 77) has been developed and will be included as a user-accessible feature in an upcoming release. Support for range-separated hybrid functionals in onetep is also planned, though this is currently at an early stage of development and is thus a longer term aspiration.

#### 2. Dispersion and van der Waals interactions

Properties and processes in many scientifically interesting and technologically important systems depend on weak dispersion interactions, such as London or van der Waals (vdW) interactions. Examples include molecular crystals, physisorption of molecules on surfaces, and inter-layer interactions in layered two-dimensional materials and heterostructures. Large-scale electronic structure methods are well-suited for studying such structurally complex systems; however, standard density functionals such as the LDA and GGA do not account for these non-local and long-range interactions. onetep includes two methods designed to alleviate this problem.

##### a. Empirical dispersion corrections.

A popular approach to remedying the problem of dispersion interactions is by including a damped London energy expression as part of the total DFT energy expression with perhaps the most well-known of these methods being the Grimme D2^{78} and D3^{79} variants.

In addition to the original D2 empirical dispersion correction of Grimme,^{78} the damping functions of Elstner^{80} and Wu and Yang^{81} have been implemented in the onetep code using new parameter sets reoptimized^{82} for biomolecular interactions. This reoptimization has proven critical for the accurate simulation of dispersion interactions, with the interaction energy root mean square error computed at the PBE level improving from 5.915 kcal/mol to 0.813 kcal/mol with respect to the literature values of the 60 complexes used to fit the parameters. Dispersion corrections have also been included in the calculation of forces, enabling accurate structures to be produced through geometry optimization (see Sec. IV C 1).

##### b. van der Waals density functionals.

Dion *et al.*^{83} developed a general non-local contribution to the correlation energy functional for describing vdW interactions within the framework of DFT, which takes the form

where *ϕ*(**r**_{1}, **r**_{2}) is a kernel that depends on the separation *r*_{12} = |**r**_{1} − **r**_{2}| and the electronic density *ρ*(**r**) at and in the neighborhood of **r**_{1} and **r**_{2} [see Eqs. (14)–(16) of Ref. 83 for the definitions of *ϕ*(**r**_{1}, **r**_{2})]. Equation (28) is a double integral over the whole simulation cell, i.e., over six spatial dimensions, and computing it directly is prohibitive for large-scale systems. Román-Pérez and Soler^{84} noted that the kernel *ϕ* may be written as

where *q*_{1} and *q*_{2} are the values of a universal function $q0\rho (r),|\u2207\rho (r)|$ (see Ref. 83 for details) at the points **r**_{1} and **r**_{2}, respectively, and the kernel is then represented in terms of a set of interpolating polynomials (cubic spline functions) *p*_{α}(*q*) for a specific set of values of the interpolation points *q*_{α} for each *r*_{12}. In Ref. 84, it was found that ∼20 interpolation points on a logarithmic mesh was sufficient to accurately describe the kernel up to a cut-off value of *q*_{0} ≤ *q*_{c} at which point *q*_{0} is smoothly saturated. The benefit of using the expression in Eq. (29) for the kernel *ϕ* is that Eq. (28) can be evaluated using Fourier methods as follows:

*θ*_{α}(**r**) ≡ *ρ*(**r**)*p*_{α}[*q*_{0}(**r**)], *ϕ*_{αβ}(*r*) ≡ *ϕ*(*q*_{α}, *q*_{β}, *r*), and $\theta \u0303\alpha (k)$ and $\varphi \u0303\alpha \beta (k)$ are their respective Fourier transforms. In practice, *ϕ*_{αβ}(*k*) is computed and tabulated once and for all on a radial grid, and the main computational effort during a calculation is associated with the evaluation of the Fourier transforms $\theta \u0303\alpha (k)$. Expressions for the non-local potential corresponding to the non-local energy (required for self-consistent calculations and computing Hellmann–Feynman forces) may also be evaluated (see Refs. 84 and 85 for details).

In the case of the original vdW functional of Ref. 83 (known as “vdW-DF”), the total exchange and correlation functional is given by

which is the sum of the exchange term from revPBE,^{86} the correlation term from PW92 LDA,^{87} and the non-local correlation discussed above. In onetep, we have also implemented a number of further vdW density functionals within the same implementational framework, including vdW-DF2,^{88} optPBE-vdW, optB88-vdW and optPBE_{κ=1}-vdW,^{89} optB86b-vdW,^{90} and C09x-vdW.^{91}

#### 3. Hubbard-corrected density functional theory: DFT+*U* and beyond

DFT+*U*,^{95–97} occasionally known as LDA+*U* or LSDA+*U*, is a technique that can be used to significantly improve on conventional DFT’s description of strongly correlated systems and also many systems not normally thought of as strongly correlated. Local and semi-local functionals often fail to qualitatively reproduce the physics associated with localized, partially filled electronic subspaces of 3*d* and 4*f* character, including the emergence of Mott–Hubbard insulating gaps and magnetic order. Challenging systems in this regard include the first-row transition metals and lanthanoids (and the oxides of both), but also certain magnetic semiconductors, organometallic molecules, and even biomolecules.^{98} Other localized subspaces, including those with 2*p* and 4*d* character, or those that are not partially filled, spin-polarized, or spherically symmetrich or atom-centered, are increasingly being successfully targeted using flexible DFT+*U* implementations.

Several varieties and generalizations of DFT+*U* are implemented in onetep,^{99–101} including the corresponding contributions to the ionic forces and phonon energies. Extensions to conduction band optimization and the excited state regime (TDDFT+*U*^{102}), as well as combinations with other functionalities such as PAW (see Sec. IV A 1), are also implemented. The DFT+*U* method preserves linear-scaling with respect to system size, and in its conventional fixed-projector form, it introduces only a small increase in the computational pre-factor.^{99} A key strength of the onetep DFT+*U* code is its modular nature and the ability to test diverse population analysis schemes when defining the DFT+*U* target subspaces. It provides a convenient starting point for the implementation of constrained DFT^{103} (Sec. IV B 4) and DMFT^{104,105} (Sec. IV B 5) in onetep. The onetep DFT+*U* code has been used for fundamental developments in strong electronic correlation effects and self-interaction error correction,^{106–108} and it has been extended to calculate Hubbard *U* and Hund’s *J* parameters from first principles^{109} and to approximately enforce Koopman’s condition.^{110}

DFT+*U* works by adding a corrective term to the total energy for each subspace selected.^{95–97} This is commonly interpreted as a correction for the self-interaction error exhibited by the approximate XC functional,^{111,112} applied only to the subspaces where it is thought to be most problematic. Available in onetep is the DFT+*U*+*J* energy correction functional,^{97,111,113} given by

Here, *U*^{I} represents the net, occupancy–occupancy type Hartree + XC self-interaction of subspace *I*, taking account of the spin *σ* and screening by the remainder of the system appropriately.^{109}^{,}*J* is a counterpart to *U* that quantifies subspace self-interaction of magnetization–magnetization type. We simply have DFT+*U* either when *J* = 0 or when the last, unlike-spin term is neglected (giving an effective *U*: *U*_{eff} = *U* − *J*.) Another *J* term can be optionally added (see Ref. 113), and separate Hubbard parameters for the linear (*U*_{1}) and quadratic (*U*_{2}) energy terms and spin channels (*U*^{σ}) can be used.^{110} $nmI\sigma m\u2032$, the density matrix of subspace *I* and spin *σ*, is defined through the set of non-orthogonal projector functions ${|\phi mI}$ by

$\rho ^\sigma $ is the global KS density matrix, and **O**^{I} is the inverse of the projector overlap matrix $Omm\u2032I=\u27e8\phi mI|\phi m\u2032I\u27e9$.

Different Hubbard corrections can be applied to atoms of the same chemical species, which is helpful if their chemical environments are not equivalent. The suggested population analysis for DFT+*U* is based on the pseudo-atomic orbitals used to initialize the NGWFs at the beginning of a onetep calculation. Hydrogenic atomic orbitals of a chosen effective nuclear charge *Z* may alternatively be used, as can a subset of the optimized NGWFs from a previous calculation, in a projection update procedure, which can be iterated to self-consistency.^{101} The projection of the local density of states (LDOS) (Sec. IV E 4) onto the union of the DFT+*U*+*J* subspaces can also be calculated. With *U* and *J* set to zero, the functionality can still prove useful, e.g., for its pseudo-atomic population analysis or for breaking spin and spatial symmetries.^{107,109}

#### 4. Constrained density functional theory

Constrained density functional theory (cDFT)^{114–118} is an established variational^{103,119} generalization of DFT that enables practical simulation of charge (spin) excitation energies and transfer, as well as exchange coupling parameters in magnetic systems. cDFT also provides a practical and computationally convenient way of correcting intrinsic deficiencies of standard DFT, such as self-interaction errors (SIEs) and static electronic correlation.^{120–122} For an extensive review of the wide range of applications of cDFT, see Ref. 118 and references therein.

cDFT finds ground state solutions of a Kohn–Sham DFT-like Hamiltonian, augmented with an additional (external) potential: the constraining potential(s), *V*_{c}. The energy *W* of a DFT system (of energy *E*[*n*]) subject to a constraint (*N*_{c}) on the electronic occupation of a suitably defined subspace can be written as^{123}

where $\rho ^$ is the electronic density operator and $P^$ is the projection operator for the cDFT-subspace(s). Maximization of *W* with respect to *V*_{c} determines the *V*_{c} needed to enforce the targeted electronic occupation (*N*_{c}) in the cDFT-subspace.^{103,119} Atomic forces can then be calculated,^{100,123} allowing geometry optimization and molecular dynamics (MD) simulations.

The cDFT implementation in onetep rests on the use of non-orthogonal, atom-centered projectors (|*φ*_{n}⟩) to calculate the electronic occupation of the given cDFT-subspace, needed in Eq. (35), as

where *O*^{mn} is an element of the matrix **O**^{−1} and *O*_{mn} = ⟨*φ*_{m}|*φ*_{n}⟩. The mathematical framework underlying the tensorially invariant approach to cDFT-subspace occupancies, energies, and forces is extensively discussed in Refs. 100 and 123.

In onetep, the cDFT problem is solved via nested conjugated gradients,^{14} with the *W* maximization loop nested between the NGWF (*ϕ*_{α}) and density kernel (**K**) loops. Figure 9(a) illustrates this procedure for a nitrogen molecule, where an electronic occupancy difference of one electron between the two atoms is enforced, demonstrating excellent convergence within 10 NGWF iterations.

In analogy with DFT+*U*^{99–101} (Sec. IV B 3), the cDFT problem can be solved self-consistently by using the NGWFs (*ϕ*_{α}) optimized for a given cDFT constraint as cDFT-projectors (*φ*_{n}) for a successive cDFT-optimization, as shown in Figs. 9(b) and 9(c), providing rapid convergence, as shown in Fig. 9(d). In a cDFT geometry optimization, a ground state geometry can be obtained initially using cDFT-projectors obtained self-consistently for the initial geometry. New cDFT-projectors can then be found for the current geometry, and the process iterated until *W* is self-consistent. Figure 9(d) shows that this converges extremely rapidly—within one or two steps. Figures 9(e) and 9(f) report the rate of convergence of this fully self-consistent *W* and the optimized *N*–*N* bond-distance (*d*_{NN}), respectively, with respect to the localization radius of the cDFT projectors and NGWFs for two different cDFT-subspaces. *W* and *d*_{NN} are different for the two subspaces for small radii, but as the radii grow, increasing the variational freedom of the problem, the values of *W*_{SC} and *d*_{NN} for the subspaces converge. This indicates that the self-consistent projector implementation in onetep is relatively insensitive to the choice of the cDFT-subspace representation, demonstrating its robustness.

cDFT in onetep has been applied to photovoltaic materials,^{124,125} to interfaces in graphene-based composite materials,^{123} and to the limitations of cDFT in constraining SIE-free solutions in standard DFT exchange–correlation functionals.^{110} Going forward, we expect applications of cDFT to large-scale systems (e.g., biomolecules, nanoparticles, nanotubes, etc.) currently not amenable to simulation by standard DFT or beyond DFT approaches.

#### 5. Dynamical mean field theory

Dynamical mean field theory (DMFT) is a Green’s function method tailored for treating systems with strong local electronic correlation. Formally, correlation can be defined as the physics/chemistry due to multi-determinantal wavefunctions (that is, beyond Hartree–Fock theory). The effects of correlation encompass dispersion interactions, particle lifetimes, magnetism, satellites, collective excitations, the Kondo effect, Mott insulators, and metal–insulator transitions. Many (but not all) of these phenomena appear in systems containing transition metals or rare-earth atoms whose 3*d*- or 4*f*-electron shells are partially filled. Electrons in these shells are in especially close proximity with one another, and thus, their interaction is too pronounced to be adequately described by semi-local DFT, which can even provide qualitatively incorrect descriptions of the electronic structure.

DMFT maps the many-body electronic problem to an Anderson impurity model (AIM) Hamiltonian with a self-consistency condition.^{126,127} This model Hamiltonian includes Hubbard and Hund’s-like terms, and local quantum fluctuations are fully taken into account —that is, the self-energy is dynamic but is assumed to be local. Consequently, DMFT can be expected to yield improved results where strong correlation is expected to be (a) important and (b) localized (cf. GW, which can account for non-local self-energies but only includes correlation effects perturbatively). Like DFT+*U* (Sec. IV B 3), DMFT is typically used to treat localized regions where correlation is important, as defined by some atom-centered Hubbard projectors.^{128} In the case of DFT+*U*, each correlated subspace is subjected to an additional potential; in DFT+DMFT, these subspaces are subjected to a full Green’s function treatment.

The AIM that the system is mapped on to contains both impurity and bath sites. The impurity sites typically correspond to the 3*d*/4*f* orbitals, while the bath sites collectively represent the rest of the physical system. The AIM Hamiltonian is parameterized by inter-site hopping terms, which are determined via a fitting procedure, and Hubbard *U* and Hund’s *J* parameters, which are external user-defined parameters that can be determined using linear response or cRPA.^{109,111,129} Alternatively, they can be treated as variational parameters.

Once the AIM Hamiltonian has been constructed, it is solved using exact diagonalization, continuous-time quantum Monte Carlo, or some other method, thereby obtaining the impurity Green’s function and self-energy. This is typically the most computationally expensive step in DMFT and places a severe restriction on the total number of sites (impurity plus bath) one can use. The self-energies from each correlated subspace are then “upfolded” using these localized quantities to construct the total self-energy. During this upfolding, one must correct for the exchange and correlation already accounted for at the DFT level. The whole process is then repeated until the total self-energy is converged.

DMFT has proved capable of capturing complex electronic behavior where semi-local DFT fails. For example, DMFT corrects the lattice constants of strontium vanadate, iron oxide, elemental cerium, and delta-phase plutonium, which semi-local DFT substantially underestimates.^{130–132} Similar improvements have been observed for spectral properties such as densities of states and optical absorption.^{133} DMFT also captures important dynamic properties that are enhanced by strong correlation, such as satellite peaks in the photoemission spectra of cerium and nickel.^{134,135} It can capture magnetic behavior both above and below the Curie temperature,^{135} has been used to study high-temperature superconductivity in cuprates^{136,137} and other materials,^{138,139} and can predict mass renormalization in heavy fermion materials.^{140–144}

The implementation of DFT + DMFT using onetep involves an interface with the toscam code toolbox.^{105} onetep and toscam are responsible for separate sections of the DMFT loop: onetep performs much of the initialization and mappings, while toscam is responsible for solving the AIM. As the calculation proceeds, control alternates between these two programs, with the entire procedure being driven by an overarching script. toscam will be freely available for download in the near future. For more details, refer to Ref. 105. onetep + toscam has already been used to explain the insulating *M*_{1} phase of vanadium dioxide,^{104} to demonstrate the importance of Hund’s coupling in the binding energetics of myoglobin,^{145,146} and to study the super-exchange mechanism in the dicopper oxo-bridge of hemocyanin and tyrosinase.^{147}

### C. Dynamics and structure optimization

#### 1. Geometry optimization

onetep can relax ionic positions with respect to forces either in Cartesian coordinates using the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm (following the original castep implementation^{49}) or in the limited memory l-BFGS version;^{148} or in delocalized internal coordinates using the approach of Andzelm *et al.*^{149}

The forces in onetep are calculated as the sum of local and non-local pseudopotential terms with Ewald and, where applicable, non-linear core correction terms.^{53} Since onetep’s NGWF basis is atom centered and local, the forces also include Pulay terms due to the change in basis functions with respect to atomic positions.^{54} The total forces in onetep are calculated as

Pre-conditioners are used to speed up the geometry optimizers in onetep. By default, (l-)BFGS uses the physics-based pre-conditioner suggested by Pfrommer,^{150} which takes an estimate for the average optical phonon frequency at the Γ-point as input. The algorithm also estimates this quantity throughout the calculation in order that restarts and future calculations may be sped up. onetep’s geometry optimization performance may be improved dramatically^{151} by using the force-field based pre-conditioners of Packwood *et al.*^{152} These methods take simple interatomic potentials to construct initial sparse approximations to the Hessian matrix. These pre-conditioners are available with the l-BFGS geometry optimizer.

onetep does not perform any extrapolation of NGWFs or density kernels between geometry optimizer iterations; this helps prevent biasing the end result. For computational expediency, however, onetep instead takes the final NGWFs from the previous geometry optimizer iteration as initial NGWFs to the next iteration. The NGWFs are shifted with respect to the simulation grid to keep the ions at their center. On restarted calculations onetep can read the Hessian matrix or update vectors along with the final NGWFs and density kernel from the previous run.

onetep supports constraints on individual atoms in all geometry optimizer modes. Constraints may be to fix an atom in place, to keep it on a vector, or to keep it on a plane. Relaxation of the lattice parameters, however, is not possible within onetep due to the limited demand for the majority of applications of onetep. Although onetep does not calculate stresses for periodic systems, pressure can be applied to finite systems via the electronic enthalpy method, demonstrated in Sec. V C.

#### 2. Phonons and thermodynamic properties

The calculation of phonon frequencies is implemented through the well-established finite-displacement methodology (FDM);^{153,154} in our case, the dynamical matrix is calculated using a central difference formula with a choice of either a three- or five-point stencil in each Cartesian direction (i.e., 6*N*/12*N* displacements in total for an *N*-atom system).

For crystalline materials, the FDM requires a supercell geometry to capture a sufficiently dense grid of *q*-points in the vibrational Brillouin zone; this is a natural choice for a linear-scaling code such as onetep. If a crystalline material is specified, only the atoms in the unit cell are displaced, and either a real-space truncation of the force constant matrix or a Slater–Koster style interpolation is used to obtain the phonon frequencies at arbitrary *q*-points. In order to significantly speed up the calculation, the converged self-consistent density kernel and NGWFs from the unperturbed system are taken as the starting guess for each displacement; it is also possible to limit the degrees of freedom to reduce the number of displacements required and to use a task-farming approach to run the displacements in batches as independent onetep calculations that can be run in parallel.

In addition to the phonon band structure and DOS, the code calculates a number of thermodynamic properties: the zero-point energy, and the free energy, entropy, internal energy, and specific heat in a given temperature range. These are calculated on a regular grid of *q*-points, which can be chosen independently of the original supercell repetitions. onetep does not calculate the effect of the long-range electric fields associated with longitudinal phonons near the Γ point (LO–TO splitting). Tests on bulk silicon have shown that well-converged quantities can be obtained using a 5 × 5 × 5 cubic supercell (1000 atoms) and a 10 × 10 × 10 *q*-point grid for the thermodynamic properties, while the phonon DOS requires a significantly denser *q*-point grid.^{155}

#### 3. First principles molecular dynamics

First principles molecular dynamics (MD) simulations constitute a powerful tool to study systems that are inherently dynamical, such as in conformational analysis of (bio)molecules at finite temperatures, bond breaking in chemical reactions, adsorption, and solvation phenomena. MD simulations are also employed to search for equilibrium structures under different external conditions^{156–158} as well as for computing vibrational, IR, and Raman spectra^{159} (see Sec. IV E 6). Two main paradigms exist to perform MD simulations within the DFT framework: Born–Oppenheimer MD (BOMD) and Car–Parrinello MD (CPMD).^{160,161} The Car–Parrinello scheme involves direct propagation of the electronic degrees of freedom along with the nuclear degrees of freedom. On the other hand, in the Born–Oppenheimer MD (BOMD) scheme, one solves the static electronic problem to find the KS energy self-consistently [*E*_{SCF}({**R**_{J}})] at every MD step. Hence, the electronic wavefunctions are never propagated, since electrons are assumed to instantaneously adjust to the motion of the nuclei^{162} (Born–Oppenheimer approximation). This makes BOMD more suitable to linear-scaling approaches since the localization constraints are more easily imposed during the KS energy minimization. The other main advantage of BOMD over CPMD is that a larger time step can be chosen, which, in turn, results in fewer MD steps and reduced simulation times. Moreover, no extra ambiguous parameters, such as fictitious electronic masses, are required to stabilize the dynamics. On the other hand, in BOMD, the stability of the generated trajectory is quite sensitive to the accuracy of the ionic forces and more generally to the degree of convergence of the SCF loop. This is due to the well-known fact that the error in the forces is first order with respect to the error in the incompletely converged wave-function, even though the error in the Kohn–Sham energy is second order.^{163}

Within the onetep framework, the BOMD equations assume the following form:

where **R**_{I} is the position of the *I*th ion at time *t* and *M*_{I} is its mass. $FISCF$ is the force acting on ion *I* (for a detailed description of the computation of ionic forces in onetep, see Ref. 53).

The BOMD equations of motion are integrated using a finite difference scheme (which usually preserves the symplectic structure of phase space)—in onetep, we employ the Velocity-Verlet (VV) algorithm.^{164}

It is possible to couple different thermostats to the BOMD equations in Eq. (38) in order to sample ensembles other than the microcanonical (also known as *NVE*). In particular, the following thermostats are available in onetep: Andersen,^{165} Langevin,^{166} Nosé–Hoover chains,^{167} Berendsen,^{168} and Bussi.^{169}

For each MD step, the dominant computational effort is associated with solving the KS equations self-consistently, whereas the computation of the ionic forces and the integration of the BOMD equations amount to a small fraction of the walltime. The number of SCF cycles required to reach a given level of self-consistency can be substantially reduced by using improved initial guesses for the electronic degrees of freedom. Various approaches have been implemented, which exploit the information included in the MD history to build more efficient initial guesses and speed up the SCF loop. These methods can be grouped into two major categories: extrapolation methods and propagation methods. Various flavors of linear and polynomial extrapolation are available in onetep. Here, the electronic degrees of freedom are represented as polynomials with respect to time *t*. All these schemes assume that the NGWFs and the density kernel can be treated as independent degrees of freedom and therefore can be independently extrapolated. However, the density kernel can be considered as the representation of the density matrix in the NGWF basis. Therefore, one can also first extrapolate the NGWFs and then project the density kernel onto this new basis. Accordingly, two density kernel projection methods have been implemented: naïve projection and tensorial-compliant projection.^{170} The latter takes into account the tensorial nature of the density kernel in the NGWF basis.

For the propagation schemes, we have implemented the approach developed by Niklasson *et al.* based on an extended Lagrangian (EL) propagation.^{171} These schemes have shown superior performances in terms of energy conservation compared to simpler schemes (by alleviating the issue of the breaking of time-reversal symmetry). In practice, three flavors of EL propagation can be chosen in onetep: the naïve approach (deprecated), EL with dissipation (dEL),^{171} and inertial EL with a thermostat (iEL).^{172} The latter two methods show similar performance in terms of energy conservation and speed-up of the SCF cycle.^{173}

It should be stressed that the self-consistent solution is independent of the initial guess only at exact convergence. In practice, some breaking of time-reversal symmetry is to be expected, resulting from memory effects due to the failure of the Hellmann–Feynman theorem, as the electronic system is never exactly converged to its ground state.^{163}

#### 4. Nudged elastic band calculations

Given decent initial guesses, geometry optimization (Sec. IV C 1) is a staple in locating useful minimum-energy structures. Transition states, often just as important as the endpoints in chemical reactions, diffusion problems, and solid state transitions, are less straightforward to locate. They are the maximum-energy points on the minimum-energy paths connecting structures. Several approaches exist to approximate these saddle points, with the nudged elastic band (NEB)^{174} approach being among the most robust. NEB attempts to minimize several intermediate structures, or “beads,” onto the minimum-energy path from which the maximal point can be approximated or directly measured. The need for a strong transition state searching functionality in linear-scaling DFT packages is highlighted in a recent review of the opportunities and challenges facing this community.^{175} To this end, a robust and scalable implementation of NEB has been designed for onetep.

NEB approximates the minimum energy path by connecting several intermediate structures, known as beads, typically initially guessed from linear interpolation, to their neighbors with springs of zero natural length. In order to ensure the beads lie equally distributed and on the minimum energy path, certain components of both the real and spring forces are projected out. This chain-of-states is then evolved until the “nudged” force on each bead is below some tolerance. Given a perfect approximation of the path tangent at each point on the path, the beads will lie on the minimum energy path, regardless of bead count. In practice, the path tangent at each bead’s position is approximated,^{176} requiring a few beads to resolve minimum energy paths.

The implementation of NEB in onetep handles each bead in parallel with an internal driver approach. Based on a comparison of NEB minimizers,^{177} both FIRE and the chain-global l-BFGS (GL-BFGS) minimizers were selected for the implementation in onetep. A climbing-image mode^{178} is available in which the highest-energy bead moves in a modified potential energy surface (PES) in which the saddle point becomes an energy minimum, allowing the bead to climb the path to sample the transition state exactly.

NEB is not a global search—the path it finds depends on the initial guess path. For this reason, significant control is given to the user over the initial definition of the path; an optional intermediate structure can be defined in the input file or each bead’s position can be directly read from an input XYZ file.

### D. Including environmental effects

#### 1. Implicit solvent model

onetep’s $O(N)$ formalism and scalable implementation make it an excellent tool for simulating large and complicated systems, such as biomolecules, nanomaterials, and surfaces. The environment in which these types of systems exist—often some kind of liquid solvent—can intimately affect their behavior and properties. Some means of representing solvent environments is therefore vital for realistic and informative electronic structure calculations on these systems.

A direct approach to solvent modeling would be to simply include a number of solvent molecules in the electronic structure simulation of the system of interest. This explicit quantum mechanical approach is superficially very simple, as it does not require the system to be partitioned into solute and solvent regions and treats internal solute and solute–solvent interactions on an equal footing. However, this approach is generally prohibitively expensive, even with an $O(N)$ method. In addition, even if sufficient resources are available to run an electronic structure calculation on the entire solute–solvent system, realistic results require many calculations in order to statistically average over solvent degrees of freedom.

Explicit atomistic solvent models can be made cheaper by treating the solvent environment at a lower level of theory, as in embedded mean-field theory (EMFT, Sec. IV D 3) and QM/MM, Sec. IV D 2). A further simplification is to represent the solvent as an unstructured dielectric medium. In this case, solvent–solvent interactions and solvent degrees of freedom are accounted for implicitly, and we only need to be concerned with the atomistic details of the solute.

In onetep’s implicit solvent model, the solute is embedded in a cavity within the solvent, which is represented as a polarizable dielectric medium. The polarization of the dielectric medium by the solute charge induces a reaction potential that is incorporated into the Kohn–Sham Hamiltonian. The result is a self-consistent reaction field (SCRF) method in which the electrostatic interactions between the solute and implicit solvent are captured within the SCF procedure.

onetep’s implicit solvation model shares the essential features of the continuum dielectric SCRF method with other widely adopted models, such as COnductor-like Screening MOdel (COSMO)^{179} and PCM,^{180} but differs in the specifics of its implementation. In particular, the model has the following key features:

The generalized Poisson equation [Eq. (13) with

*α*= −4*π*, since onetep uses atomic units] is solved for the total electrostatic potential, $v$(**r**), in 3D using a high-order defect-corrected multigrid method [using the dl_mg library (see Sec. IV A 5)].^{55}The dielectric permittivity,

*ε*(**r**), is defined in terms of the electron density,*n*_{elec}(**r**), and smoothly varies between vacuum [*ε*(**r**) = 1] and bulk solvent [*ε*(**r**) =*ε*_{∞}] values based on the form proposed by Fattebert and Gygi,^{181}

which has only two parameters: *n*_{0}, the electron density value at the center of the solute–solvent interface, and *β*, which determines the width of the solute–solvent transition [between *ε*(**r**) = 1 and *ε*_{∞}].

Non-electrostatic contributions to the free energy of solvation are computed by scaling the surface area of the vacuum cavity surrounding the solute,

*S*, by a surface tension,*γ*, as used by Scherlis*et al.*^{182}to extend Fattebert and Gygi’s electrostatic model, i.e.,

*S*[*n*_{elec}] is evaluated by integration over a thin film centered on *n*_{0},^{182,183} and *γ*_{eff} is an effective surface tension, scaled to take account of solute–solvent dispersion–repulsion effects.^{184},onetep’s minimal parameter implicit solvent model (MPSM) enables the computation of free energies of solvation of small molecules with accuracy comparable to or better than other widely used implicit solvent models, but with only two empirical parameters (e.g., Fig. 10—see Ref. 184 for further details). The model has also proven to be very useful for large-scale DFT calculations, where screening due to the polarizable dielectric medium corrects for spurious electrostatic effects that arise at artificial surface–vacuum interfaces. This effect has been very convincingly demonstrated for proteins and water clusters, where unscreened dipole moments across the system *in vacuo* can artificially close the HOMO–LUMO gap (e.g., Fig. 11—see Ref. 192).

onetep now also includes the functionality to calculate energies in electrolyte solutions in both open boundary conditions (OBCs) and periodic boundary conditions (PBCs).^{193} The source term in the GPE now also includes the charge density of mobile electrolyte ions, i.e.,

with *n*_{ions}[$v$](**r**) dependent on the electrostatic potential, $v$. The charge density of mobile electrolyte ions can be written in terms of their space-dependent concentration, *c*_{i}[$v$](**r**), as

where *z*_{i} is the charge on ion *i*. The concentrations can be described in terms of the bulk concentrations of ions ($ci\u221e$) as

where $\beta =1kBT$ and *λ*(**r**) is the electrolyte accessibility function, which smoothly varies from zero in the solute to one in the bulk electrolyte solution and prevents electrolyte charge accumulation near the solute. The expression in OBCs is the well-known Boltzmann concentration of ions. The expression in PBCs includes a renormalization factor in order to conserve the ions. onetep also includes the functionality for a linear approximation of the above concentrations when the electrostatic potential is small or when *z*_{i}$v$ ≪ *k*_{B}*T*. The above set of equations is solved self consistently within dl_mg (Sec. IV A 5).

#### 2. QM/MM and polarizable embedding

Due to the long-ranged nature of Coulombic interactions, it is often necessary to include hundreds of atoms, if not more, in DFT calculations before the relevant properties are sufficiently converged.^{194} Suitable truncation of larger systems to computationally affordable sizes usually proves to be far from trivial.^{192} Hybrid (QM/MM) approaches offer one way of addressing this problem, embedding a quantum mechanical (QM) calculation within an environment described at the molecular mechanical (MM) level of theory. This is well justified in systems whose properties of interest are spatially localized, with the quality of obtained results depending on the physical soundness of the interface and the accuracy of the MM description.

onetep implements a fully self-consistent interface to the MM suite tinker,^{195} which implements the state-of-the-art polarizable force-field AMOEBA^{196,197} and standard fixed-point-charge force-fields, such as GAFF.^{198} In the case of AMOEBA, the two subsystems (QM and MM) interact via multipolar electrostatics and are fully mutually polarizable with direct energy minimization encompassing the entire (QM + MM) system. Distributed multipole analysis (DMA,^{199–201} Sec. IV F 1) is used to represent the QM charge density in the form of point multipoles (up to a quadrupole) that are compatible with the AMOEBA formalism. In order to prevent unphysical charge spilling from the QM subsystem onto the point charges, which is an artifact of the point multipole model, damping of electrostatic interactions is used. Additionally, we employ a simple model for Pauli repulsion between QM and MM subsystems, yielding a suitable repulsive potential that ensures a physically sound localization of the QM charge density.

Our implementation^{202,203} is the only one to date enabling *in situ* minimization of localized orbitals in the presence of a polarizable MM embedding and the only one where the cost of the QM calculation is linear-scaling. This model has been demonstrated to yield remarkably good energetics across the QM/MM interface (as compared to fully QM calculations) even for minimal QM subsystems.

#### 3. Embedded mean-field theory

In many potential applications of *ab initio* atomistic modeling, the important physics or chemistry of the system of interest is associated with a small “active region,” which would ideally be described using a high accuracy QM method. However, the rest of the system, or the “environment,” is still significant through its interaction with the active region. QM/MM (Sec. IV D 2) and implicit solvent (Sec. IV D 1) methods are two ways of approaching this problem; however, if the environment also requires QM treatment, a quantum embedding approach is required. Quantum embedding schemes enable an expensive high accuracy method to be used for the active region and a cheaper lower accuracy method for the environment.^{204,205}

The quantum embedding formalism implemented within onetep is the embedded mean field theory (EMFT) scheme, first proposed by Fornace *et al.*^{206,207} In general, the density matrix or kernel is partitioned into regional sub-blocks,

where A labels the active region and B labels the environment. Each atom and its associated NGWFs are assigned to a region.

For DFT-in-DFT embedding, the only difference between the lower and higher accuracy methods will be in the exchange–correlation functional—*E*_{xc} in Eq. (6). The other terms in Eq. (6), which we will collectively call *E*_{0}[{*ϕ*_{α}}, **K**] here for brevity, are the same at both levels of theory. In EMFT, the total energy for the system is then written as

where $Exclow$ and $Exchigh$ are the exchange–correlation functionals of the lower and higher accuracy methods, respectively, and *n* and *n*_{AA} are the total and active region electron density, respectively.

The block orthogonalization procedure used in previous implementations of EMFT to avoid charge spilling^{208} interferes strongly with the ability to optimize the basis functions (the NGWFs) in onetep. To avoid this, all NGWFs are optimized at the lower level of theory and then fixed for the optimization of **K** within EMFT. Results on a variety of test systems show that the error associated with this approximation is 1% or less of the difference between the full high- and low-level calculations.^{207} Embedding a hybrid functional calculation within a semi-local DFT environment enables us to extend the use of HFx to periodic structures containing a subsidiary active region. Our implementation of EMFT has been successfully applied to the calculation of excited state and encapsulation energies for a variety of systems consisting of several thousand atoms^{207} and is capable of performing hybrid-in-semi-local ground state total energy calculations. Forces are also available within the EMFT framework. Future developments will seek to make this compatible with time-dependent DFT (TDDFT) (Sec. IV E 2), enabling the use of time-dependent EMFT (TD-EMFT).^{209}

#### 4. Open boundary conditions

The psinc basis functions employed in onetep are unitary transformations of plane waves (cf. Sec. II). This enables onetep to leverage Fourier transforms for parts of the calculation that are cumbersome in real space, such as evaluating the Hartree potential, but also implies an underlying periodicity of the system. This can become problematic for systems that are not genuinely periodic, particularly when they are not charge-neutral or possess a significant dipole moment—in such cases, non-negligible artificial interactions between the periodic images arise. The traditional approach of vacuum padding and correcting^{210} for the dependence of the energy on the size of the simulation cell has since been improved upon with a number of techniques for implementing open boundary conditions (OBCs) (see Ref. 211 and references therein). In onetep, the artificial periodicity can be abrogated (thus modeling OBCs) using three different approaches: cut-off Coulomb (CC),^{212,213} Martyna–Tuckerman (MT),^{214} and direct real-space calculations using a multigrid solver (MG)^{55} to solve the Poisson equation.

##### a. Cut-off Coulomb.

The standard periodic approach evaluates the Coulomb integral over all space, assuming an infinite periodic repetition of the simulation cell. This leads to significant errors in isolated systems with net charges or large multipole moments, as the periodic images artificially interact with one another via long-range electrostatics. The cut-off Coulomb approach counters this by only allowing Coulombic interactions within a specified region, thereby emulating the electrostatics of an isolated system with a periodic representation of the charge density.^{212} By selecting an appropriate region, $D$, one can eliminate spurious electrostatic interactions between charges in the periodic replicas and the home cell while also retaining the correct description of the potential between charges in the isolated system,

The modified Coulomb interaction term can then be convolved with the charge density to give the Hartree potential

Here, point **r** experiences Coulomb potentials from all *n*(**r**′) in region $D$, and potentials from charge densities outside this region are set to 0. $VHCC(r)$ is evaluated in reciprocal space through a Fourier transform of (48). The form of the expression for $VHCC(G)$ is determined by the shape of the region $D$, which also reflects the periodicity of the system. The simplest of these is the 3D Coulomb cut-off type, which truncates the Coulomb interaction to a sphere of radius *R*_{c}, thereby eliminating periodicity in the system. This can be extended to 2D (slab) and 1D (wire) geometries, maintaining periodic behavior in a desired plane or axis. The modified derivations of $VHCC(G)$ for each cut-off type are given in the work of Rozzi *et al.*^{213} and Sohier *et al.*^{215}

To ensure sufficient space between the charge densities of the home cell and those of the periodic images, onetep places the unit cell within a larger, padded cell where *n*(**r**) = 0 for all **r**. Calculations of $VHCC(r)$ on a grid are performed by placing the charge density of the unit cell within the padded simulation cell. This is Fourier transformed to give *ñ*(**G**) and multiplied with the appropriate values of $v$^{CC}(**G**) to give $VHCC(G)$. These values are subsequently reverse Fourier transformed, and the real-space Hartree potential is extracted from the padded grid to give $VHCC(r)$. A similar approach is used for the local pseudopotential energy term, while for the core–core interaction, the usual Ewald sum is replaced with a direct Coulombic sum of interactions computed in real space. For more details about the implementation in onetep, refer to Ref. 211.

##### b. Martyna–Tuckerman approach.

This approach retains Fast Fourier Transforms (FFTs) for the calculation of Hartree and local pseudopotential energy terms. The Coulomb operator is still periodic but is modified in such a way that the interactions between neighboring cells are eliminated.

Under PBCs, the real-space representation of a function *f*(**r**) is

where $f\u0303(G)$ is the Fourier transform of *f*(**r**), and the reciprocal space vectors **G** are chosen to be commensurate with the simulation cell. The resultant function *f*_{per}(**r**) has the periodicity of the cell and is a superposition of periodically repeated functions *f*(**r**), one in each cell. This is illustrated on the example of a Gaussian function in Fig. 12 (top).

The Martyna–Tuckerman approach^{214} replaces *f*_{per}(**r**) with a different function *f*_{MIC}(**r**),

which is also periodic but is not constructed as a superposition of periodically repeated functions; rather, the minimum image convention (MIC) is used to construct it from a *single* function residing in the simulation cell while still satisfying periodicity. This is illustrated in the bottom panel of Fig. 12.

A detailed description of the form of $f\xaf(G)$ required to accomplish the above, and of the treatment of the **G** = 0 term, can be found in Ref. 211 and in the original paper by Martyna and Tuckerman.^{214} The modified Hartree potential $V\xafH(G)$ and the modified local pseudopotential $V\xaflocps(G)$ are obtained according to Eqs. (22)–(25) of Ref. 211. The core–core interaction is evaluated using a direct Coulombic sum rather than via Ewald summation.

An important consequence of the MIC approach is that the length of the simulation cell must be at least twice that of the molecule in any dimension; otherwise, unphysical interactions occur between the molecule and the nearest periodic image.

##### c. Direct real-space calculations using a multigrid solver.

In this approach, FFTs are not used in the evaluation of the Hartree, local pseudopotential, and core–core energy terms. Instead, these calculations are performed in real space.

The Hartree potential under OBCs is obtained using a multigrid solver^{55} (also see Sec. IV D 1) to solve the Poisson equation,

on a real-space grid. Defect correction^{216} is used to improve the accuracy of the second-order solution to a higher order (up to 12th). Dirichlet boundary conditions of the form

must be provided on the faces of the simulation cell *∂*Ω. To mitigate the computational cost of evaluating Eq. (52), charge coarse-graining and bilinear interpolation are used.

The local pseudopotential term is obtained directly in real space according to Eq. (B4) of Ref. 216. As in our implementation of the Martyna–Tuckerman method, the core–core interaction is evaluated using a direct Coulombic sum rather than via Ewald summation.

One important limitation of this technique is that the multigrid solver used in onetep only supports orthorhombic (cuboid) grids.

### E. Excitations and spectroscopy

#### 1. Conduction NGWFs and band structure calculations

As mentioned in Sec. II, while the NGWF basis is optimized to accurately represent the occupied states, the unoccupied states are typically over-estimated in energy and are in some cases missing.^{217,218} As described in Ref. 218, a second set of ‘conduction NGWFs’ may therefore be optimized to represent a select number of low energy conduction states using a projection operator-based approach. Aside from being non-self-consistent, the calculation proceeds along similar lines to a valence onetep calculation. The optimized conduction NGWFs are then combined with the valence NGWFs to form a joint NGWF basis. Provided that one uses sufficiently large conduction NGWF radii (typically larger than valence NGWF radii), it is possible to generate a joint NGWF basis that is capable of accurately representing both the occupied states and a given number of (bound) unoccupied states.^{218,219} One can use this basis to calculate optical absorption spectra using Fermi’s golden rule,^{218,219} which has been employed to study energy transport in the Fenna–Matthews–Olson light-harvesting pigment–protein complex,^{220,221} or TDDFT^{77,222} (Sec. IV E 2), or to calculate electron energy loss spectra^{223} (EELS) (Sec. IV E 5). Alternatively, the joint NGWF basis may be directly used to calculate band structures.

As described in Ref. 224, two possible approaches to calculating band structure are implemented in onetep. The first approach, denoted the tight-binding (TB) style approach, is related to both TB and Wannier interpolation, while the second approach, denoted the **k** · **p** style approach, is related to **k** · **p** perturbation theory. In the TB approach, **k**-dependent phase factors are directly built into the construction of the Hamiltonian *matrix* at a given **k**-point, while in the **k** · **p** approach, a **k**-dependent Hamiltonian *operator* is derived, with the Hamiltonian matrix being the NGWF matrix elements of this operator.^{225} For both approaches, the Hamiltonian is constructed in the joint basis at each of the requested **k**-points and diagonalized to get the corresponding energies. A method for unfolding supercell band structures has also been implemented.

For a well converged NGWF basis, the two methods give very similar band structures. However in cases where the NGWFs are not well converged, e.g., due to too small localization radii or too loose convergence criteria, the **k** · **p** approach tends to result in unphysical gaps at the Brillouin zone boundary with corresponding discontinuities in the unfolded band structure.^{224} It is therefore generally recommended to use the TB style approach. As an example, Fig. 13 shows the calculated band structure for a carbon nanotube (CNT) using the TB style approach. Although for this system the low energy conduction states are already relatively well represented, it is necessary to use a joint basis in order to accurately reproduce the band structure over the energy range considered.

#### 2. Linear-response time-dependent DFT

Many potential applications of linear-scaling electronic structure methods involve the prediction of optical properties of large-scale systems, ranging from nanostructured materials to pigment–protein complexes. Often, as in the prediction of UV–vis absorption spectra, one is mainly interested in a small number of low-lying excited states. These states can be efficiently obtained in the framework of linear-response time-dependent DFT (LR-TDDFT), where the problem of calculating excitation energies is recast into solving for the lowest eigenvalues of an effective two-particle Hamiltonian.^{226,227} In onetep, the resulting non-Hermitian eigenvalue problem is solved by exploiting an equivalence of the problem to that of a set of classical harmonic oscillators.^{228}

To achieve linear-scaling computations of excitation energies, in onetep, the transition density matrix *ρ*^{{1}}(**r**, **r**′) describing the lowest excitation of the system is expanded in terms of valence and conduction NGWFs optimized in a ground-state DFT calculation

where **K**^{{X}} and **K**^{{Y}} are the excitation and de-excitation contributions to the transition density matrix, respectively. The lowest excitation energy of the system can then be written as a functional of **K**^{{X}} and **K**^{{Y}}, and higher excitations can be computed by enforcing an effective orthogonality constraint between the transition density matrices of individual excitations. Both the full LR-TDDFT method and the simplified Tamm–Dancoff approximation,^{229} corresponding to setting **K**^{{Y}} to zero, are implemented in the onetep code.^{77,222}

Linear-scaling computational effort with system size for calculating a fixed number of low-lying excited states is then achieved by exploiting sparsity in **K**^{{X}} and **K**^{{Y}}. However, unlike for a ground-state DFT calculation, where the sparsity of the density kernel is defined by a simple cutoff radius, the sparsity patterns of **K**^{{X}} and **K**^{{Y}} can be chosen in a physically motivated way to restrict the space of possible solutions to the TDDFT eigenvalue problem. By dividing the system into smaller subsystems, and setting *K*^{{X}αβ} and *K*^{{Y}αβ} to zero when NGWFs *χ*_{α}(**r**) and *ϕ*_{β}(**r**) are associated with different subsystems, it is possible to suppress all contributions to *ρ*^{{1}}(**r**, **r**′) arising from charge transfer between subsystems^{230,231} (see Fig. 14). Since TDDFT systematically underestimates excitation energies of charge-transfer states,^{232} especially when using semi-local functionals, the subsystem approach is an efficient way to remove unphysical low-lying excited states. The approach has been used to study excitations of dyes in explicit solvent clusters^{231,233,234} and the computation of excitons in multi-chromophore pigment–protein complexes.^{230}

##### a. Excited state forces.

The ability to calculate excited state forces in onetep is not yet available in onetep v5.2, but it will be released in a future version. The implementation of excited state forces in onetep takes the approach of Furche and Ahlrichs,^{235} solving a Z-vector equation

where *F*_{direct} is the partial derivative of the TDDFT excitation energy with respect to atomic positions and *F*_{Pulay} is the excited state equivalent of the ground state Pulay term (see Sec. IV C 1). *F*_{Z} is a Lagrange multiplier compensation force contribution, required because the KS orbitals of the system will change with the atomic positions but are also constrained to be orthonormal.

The Z-vector term is calculated by solving a system of linear equations.^{235} We map into a spherical wave basis before projecting out valence contributions because the valence and conduction NGWFs in onetep are not optimized to represent the Lagrange multiplier term. As with PDOS calculations (Sec. IV E 4), using the spherical wave basis introduces very little error, but using an alternative basis optimized to represent the Z-vector is the subject of active development.

#### 3. Electronic transport

The continued miniaturization of electrical devices has led to an increasing interest in modeling charge transport at the atomic scale. Modeling electronic transport using first-principles quantum mechanics is challenging, however, as experimentally relevant systems typically require a simulation of not only the device but also substantial amounts of bulk material to model the semi-infinite leads that contact and connect with the device. Such calculations, therefore, generally require the consideration of hundreds if not thousands of atoms. onetep is therefore an ideal platform for calculating electronic transport. The optimized NGWFs provide a near-minimal yet highly accurate localized basis set, resulting in an efficient but unbiased calculation of electronic transport on a first-principles quantum-mechanical footing.

We have implemented within onetep methods to calculate zero-bias electronic transport, within the Landauer–Büttiker formalism.^{236–238} We refer the reader to Ref. 239 for the details; in short, however, the calculation involves embedding the desired device geometry within a (potentially larger) “auxiliary” ground-state DFT calculation, then extracting the Hamiltonian and overlap matrix elements (within the NGWF basis) to calculate the electronic transport coefficients as a post-processing step. The implementation exploits the sparsity of the Hamiltonian and overlap matrices and uses an optimized block-tri-diagonal matrix inversion algorithm to compute the exact transmission spectrum.^{240,241} For quasi-one-dimensional systems, the computational complexity scales linearly with the number of atoms; for systems with higher dimensions, the algorithm performs considerably better than a naïve algorithm using dense matrix inversions. Typically, calculating the transport properties of a system corresponds to only a few percent of the total computational time associated with the electronic structure calculation.

The functionality we have implemented (for both spin-restricted and spin-polarized DFT calculations) includes electronic transmission between an arbitrary number of semi-infinite leads; bulk transmission and band structures within the leads; density of states for the device and leads; and eigenchannel decompositions, defined as the eigenvectors of the transmission probability matrix,^{242} for transmission between arbitrary pairs of leads. Transmission eigenchannels are particularly useful for interpreting transport spectra as they provide a spatial representation of the non-mixing scattering states within the system.

#### 4. Local and angular-momentum projected density of states

onetep has an extensive set of density of states (DOS) methods. Plots may be made directly from onetep or by outputting OptaDOS^{243} compatible data-files. A raw DOS may be calculated from the eigenvalues of the Hamiltonian matrix

In practice, the delta function is replaced with a Gaussian with broadening *σ*, typically of the order of 0.1 eV. The local density of states (LDOS) is the projection of the DOS onto atom-local functions. The LDOS in a given region *I* is calculated in onetep by projecting each eigenstate onto the local orbitals of region *I* as

This provides a series of functions *ρ*_{I}(*ϵ*) for each of the chosen regions *I*, which may include only the NGWFs of a single atom or those of a group of atom types.

onetep can also project the DOS onto angular momentum resolved functions to produce a projected DOS (PDOS). To calculate the PDOS, an additional resolution of the identity is inserted into Eq. (56) using a basis of angular momentum resolved functions |*χ*_{l,m}⟩ onto which the NGWFs are projected,

where we need to include the inverse overlap matrix of angular momentum resolved functions, Λ, since this basis is also non-orthogonal.

There is a considerable amount of choice in the basis used for the angular momentum resolved functions, which are effectively a set of spherical harmonics multiplied by some radial term. In onetep, we have implemented two options for the radial term: spherical waves and pseudo-atomic orbitals (PAOs). The PAOs are the same as are used to initialize the NGWFs before optimization. For more details about the theory behind these options as well as tests and comparisons, refer to Ref. 244.

In using an angular momentum resolved basis, which is non-optimal for representing the orbitals, we inevitably lose something in the projection from NGWFs. This loss can be quantified through a spilling parameter,^{245} which is calculated after each PDOS run,

where *W*_{αlm,i} is the PDOS weight matrix,

We have found that PAOs are sufficient for almost all applications with less than ≈2% spilling and qualitatively correct PDOS profiles. The PAO basis is computationally fast and memory efficient; the spherical wave basis may be used to reduce the spilling to almost zero, but at the expense of more memory usage and slower evaluation.

The PDOS implementation in onetep can also produce band-centers, which are used extensively as descriptors for catalytic activity.

#### 5. Core loss spectroscopy

Experimental characterization of nanomaterials relies on a wide range of imaging and spectroscopy techniques. Prominent among these are techniques based on core loss: the concept that incident energy in the form of electrons [in electron energy loss spectroscopy (EELS)] or x rays [in X-ray absorption spectroscopy (XAS)] can cause characteristic electronic transitions from the occupied core levels to unoccupied conduction band levels. Given sufficiently high resolution in energy and space, these methods can even allow the local structure of the conduction band to be probed and elucidate properties such as coordination, bonding, and surface reconstruction. However, such experimental results are not always easy to interpret, and simulations capable of a realistic treatment of the material are highly complementary to experiment in distinguishing between possible structures. Simulation of EELS and XAS relies on the Fermi golden rule to give the imaginary dielectric function via dipole matrix elements,

onetep implements the calculation of these expressions as a post-processing tool^{223} capability, linked to the use of the projector augmented wave method^{20} (Sec. IV A 1). Care is taken to ensure robust evaluation of matrix elements of the form ⟨*ψ*_{i}|**r**|*ψ*_{c}⟩, as detailed in Ref. 246. Post-processing of the optical matrix elements to calculate EELS and XAS spectra can be carried out through an interface to the OptaDOS package.^{247}

#### 6. Anharmonic vibrational spectra and IR spectra

Density functional theory molecular dynamics (DFT-MD) provides an efficient framework for accurately computing several kinds of spectra, e.g., vibrational, infra-red (IR), and Raman, since it gives direct access to quantum time correlation functions of various dynamical variables from which spectra can be computed via time-dependent perturbation theory (or equivalently Green–Kubo formalism).^{248} The major advantage of DFT-MD approaches lies in its capability of inherently describing effects of polarization, temperature, and anharmonicity of the potential energy surface (PES) without having to rely on *a posteriori* corrections. In particular, DFT-MD in onetep (Sec. IV C 3) can provide a valuable tool for the interpretation of vibrational and IR signatures of large biological systems in both the gas-phase and the condensed-phase.

It can be shown that within linear response theory,^{248} the spectrum associated with a given perturbation, such as the spectrum associated with infrared radiation (IR), can be obtained from the Fourier transform of an autocorrelation function of the dynamical variable conjugate to the external field (in the case of IR spectra, this is the total dipole moment). This can be easily computed from a MD calculation of the unperturbed system via the fluctuation–dissipation theorem.^{249} For instance, it can be shown^{201} that in the case of IR spectra, what needs to be computed is

where *ω* is the frequency of the incoming radiation, *t* is time, $\beta =(kBT)\u22121$, *c* is the speed of light, *V* is the volume of the sample, and $\mu ^(t)$ is the total dipole moment operator. The integrand in Eq. (61) is the so-called Kubo autocorrelation function

*C*^{kubo}(*t*) is a real function, and at high temperature (*β* → 0), it approaches the classical autocorrelation function $Ccl(t)\u2261\mu (0)\mu (t)$, where the quantum operators are replaced by their classical vectorial quantities $\mu ^\u2192\mu $. In the case of vibrational spectra, one has a similar formula to the one in Eq. (61) with a different prefactor and using the autocorrelation function of the atomic velocities. As mentioned in Sec. IV C 2, onetep does not take into account the effect of LO–TO splitting.

In onetep, it is also possible to compute the IR spectrum of a subsystem (e.g., a solute molecule) within a larger system (e.g., a solvent). To compute the electronic dipole moment of a non-isolated system *A* within a larger environment *B*, a partitioning scheme is required. In the onetep framework, NGWFs are easily assigned to a given subsystem, as they are centered on atoms. However, non-orthogonality means that the density matrix *ρ*_{A} associated with subsystem *A* cannot be built from NGWFs centered on atoms belonging to *A* only. One way to overcome this obstacle, i.e., distributed multipole analysis, is described in Sec. IV F 1. A simpler method, which can be performed on-the-fly and offers results similar to DMA for IR, is given by the density kernel partitioning (DKP).^{201} In essence, the DKP method works by including all contributions from NGWFs centered on atoms in subsystem *A* and half of the contributions from NGWFs centered on atoms in *B*, which have a non-zero overlap with NGWFs centered on atoms in *A*.

Examples of IR spectra for simple molecules and small peptides in both the gas phase (no partitioning required) and the aqueous phase can be found in Ref. 201.

### F. Partitioning the density/energy

#### 1. Distributed multipole analysis

Distributed multipole analysis^{199,200,250} (DMA) is a technique for representing a continuous charge distribution in terms of a set of point multipoles. Atomic centers are usually, although not universally, used as the multipole sites. While DMA is typically performed in a Gaussian basis set,^{251,252} onetep implements it using the localized NGWF basis following a density-fitting approach.

In the first step, the electronic density is decomposed into two-center contributions from overlapping NGWFs. Each contribution is subsequently expanded in terms of auxiliary basis functions—truncated spherical waves (SWs) originating on both contributing atoms. The electrostatic (Coulomb) metric is used in the expansion, which corresponds to minimizing the self-interaction energy of the density difference between the original contribution and its SW expansion. As a result, the electronic density is decomposed into a sum of atom-centered contributions, each of which is a linear combination of SWs. Finally, spherical multipoles centered on atomic sites are obtained as linear combinations of SW expansion coefficients and simple radial integrals involving Bessel functions that can be calculated analytically.

The maximum angular momentum *ℓ* used in the SW basis determines the order of the expansion. onetep’s implementation supports orders up to hexadecapole (*ℓ* = 4), but values printed from the DMA procedure are restricted to monopoles, dipoles, and quadrupoles. A reference point can be specified for the multipoles, and in addition, atom-centered multipoles can be printed out, both electronic-only and total (valence-electronic + core). Additionally, onetep can output total multipoles in a commonly used format compatible with the gdma^{251} program. For more details, refer to Ref. 202 (Sec. II.D.2).

#### 2. Energy decomposition analysis

Molecular interactions arise from the balance of many physically unique components such as electrostatics, polarization, and charge transfer. It is the aim of energy decomposition analysis (EDA) to partition the molecular interaction energy into these individual components.^{253} The onetep EDA scheme^{254} is a hybrid approach based on the absolutely localized molecular orbital (ALMO) method of Khaliullin *et al.*^{255} and the localized molecular orbital EDA of Su and Li.^{256} Specifically, our approach separates the interaction energy into electrostatic, exchange, correlation, Pauli repulsion, polarization, and charge transfer terms, which allows full analysis of the interaction energy. A notable benefit of our method is that the polarization and charge transfer basis set dependence problem seen in the original ALMO EDA approach is remedied through exploiting the strictly localized property of the onetep orbitals.^{254}

The onetep EDA implementation features a number of modifications that make it highly suitable for the analysis of large-scale systems. This scalability is especially important where long-range interactions can dominate, e.g., electrostatics in biomolecular assemblies. This capability has been demonstrated through performing calculations with complete basis set accuracy on an entire thrombin protein–inhibitor complex comprising 4975 atoms.^{254}

The method is also capable of generating powerful electron density difference (EDD) plots^{254} that show electron density redistributions occuring on molecular polarization and charge transfer. These plots reveal key orbital features such as acceptor–donor pairs and have a high potential value for the pharmaceutical industry through suggesting regions on which to focus optimization efforts. EDA has been applied in combination with EDD plots to small, drug-like fragment interactions with the S1 pocket of the thrombin protein,^{257} highlighting a number of important interactions for thrombin–drug binding. Here, the analysis indicated the thrombin salt bridge interaction with charged amidine functional groups of drug molecules as being a key target for drug optimization efforts. This application of the onetep EDA to a pharmaceutical problem demonstrates the high level of insight that can be provided through visual EDD plots and quantitative EDA energies.

#### 3. Population analysis and classical force field derivation

In order to enhance the information content of the ground state electron density, which can be particularly crucial for the large system sizes that are commonly used in onetep, various methods of electronic population analysis are available. Natural population analysis (NPA) and the associated natural bond orbitals (NBOs) are commonly used in quantum chemistry in conjunction with atom-centered basis sets to provide a chemical picture of bonding in terms of localized Lewis-type bond and lone pair orbitals.^{258,259} The density matrix formalism, and atom-centered NGWF basis, lends itself naturally to this scheme; the transformation of the ground state single particle density matrix to natural atomic orbitals (NAOs), and hence NPA charges, is implemented in the onetep code.^{260} This functionality has allowed us to quantify metal–oxygen charge transfer in very large (1000+) atom models of the oxygenated myoglobin metalloprotein, even taking dynamical mean field theory effects into account at the metal center (see Sec. IV B 5).^{146}

Through an interface with the NBO 5 analysis package,^{259} the set of atom-centered NAOs may be further transformed, via natural hybrid orbitals, into NBOs for each pair of atoms in the system.^{260} The populated NBOs form the chemically intuitive, natural Lewis structure of the molecule (bond and lone pair orbitals) and are complemented by the unoccupied antibonding and Rydberg orbitals. This analysis allows us to build up chemically familiar pictures of the nature of bonding in complex systems. For example, we were able to reconcile ligand–metal back charge transfer processes in the aforementioned myoglobin complex with experimental L-edge x-ray absorption spectroscopy by measuring metal *d* orbital occupancies.^{146} An additional useful feature of the NBO analysis is that energetic lowering due to electronic delocalization to anti-bonding orbitals, which represents deviation from ideal Lewis description, may be estimated via second order perturbation theory. For example, we have used the relationship between this variational lowering of the energy and hydrogen bond strengths to estimate the contributions of various active site amino acids in the enzyme, chorismate mutase, to its catalytic rate enhancement.^{261}

A second technique commonly employed for the study of the chemical nature of molecules and materials is atomic partitioning of the electronic density among the constituent atoms in the system. In general, the partitioned atomic electron density is given by

where *n*(**r**) is the total electron density at position **r** and $w$_{A}(**r**) is a weighting factor. *A* and *B* label atoms. There is no unique weighting scheme to perform this partitioning, but common methods include Hirshfeld, iterative Hirshfeld (IH),^{262} and iterated stockholder atoms (ISA).^{263} Although Hirshfeld and ISA analyses are both available in onetep, our particular focus has been on density derived electrostatic and chemical (DDEC) analysis, which was first developed by Manz and Sholl,^{264} and has undergone continual development in recent years.^{265,266} We refer the reader to Refs. 264–266 for a full description, but, in brief, the DDEC method aims to optimize atomic weighting factors $w$_{A}(**r**) such that the derived atomic electron densities yield net atomic charges (obtained by integrating the atomic electron densities over all space) that are chemically meaningful while reproducing the electrostatic potential (ESP) surrounding the molecule or material. Earlier versions of DDEC (up to DDEC/c3) are implemented in onetep,^{267,268} while the latest versions may be accessed through an interface between onetep and the chargemol program.^{266}

DDEC charges have been computed for the 1000+ atom model of the myoglobin metalloprotein described above, and charge transfer to the oxygen molecule has been shown to agree well with the chemically intuitive value of −1*e*.^{268} Furthermore, analysis of DDEC charges of several small proteins reveals a number of desirable properties. Not only do the charges reproduce the ESP of the underlying onetep calculation well, but they are still accurate when transferred to conformations close to the native state of the protein, even for buried atoms (a feature that traditional charge derivation schemes struggle with).^{267} These features make DDEC electron density partitioning well suited to molecular mechanics force field design, uniquely providing a route between static large-scale electronic structure calculations and long time scale dynamics of biological molecules. To facilitate this, we have additionally implemented methods to derive off-site charges in onetep to effectively model electron anisotropy in force fields (such as lone pairs and *σ*-holes)^{269} as well as dispersion (or van der Waals) C_{6} coefficients using both the widely used Tkatchenko–Scheffler method^{270,271} and a new method to rigorously account for environmental screening.^{272}

Through an interface between onetep and the qubekit software,^{269} molecular mechanics force fields (named QUBE) for both small molecules and entire proteins may be readily computed using the electron density partitioning methods described above. For example, QUBE force fields have been derived for the alanine pentapeptide and two proteins, ubiquitin and GB3. Microsecond molecular dynamics simulations using the derived force fields are in good agreement with experiment [as measured by comparisons with nuclear magnetic resonance (NMR) J coupling data].^{273} Furthermore, the QUBE force field outperforms the established Optimized Potentials for Liquid Simulations (OPLS) force field in the prediction of binding free energies of six benzene analogs to the L99A mutant of T4 lysozyme (mean unsigned error of 0.85 kcal/mol).^{274} With several avenues for further improvement, there is hence significant potential for force fields derived from large-scale electronic structure calculations in the simulation of biomolecular dynamics and the design of future medicines.^{275,276}

#### 4. Electron localization descriptors

Electron localization descriptors provide a quantum valence shell electron pair repulsion (VSEPR) representation of molecules, crystalline solids, surfaces and coordination compounds. They are a useful visualization tool for indicating regions of lone pairs and distinguishing between bond orders. The electron localization function (ELF) was first introduced by Edgecombe and Becke in 1990 for Hartree–Fock theory and later extended to DFT by Becke *et al.*^{277–279} It is based on the conditional probability of an electron existing in the proximity of a like-spin reference electron, derived from the Hartree–Fock probability of finding two same-spin particles at two positions. As this quantity does not have an upper bound, it is inverted for more convenient graphical interpretation such that the localization of the reference electron is represented at unity. This visualization tool is applicable to large complex systems using the linear-scaling capabilities of onetep, particularly for biomolecular simulations, catalysis, and the design of nanostructured materials.

The exact kinetic energy density as defined in onetep, $\tau \sigma exact$, is extended into a *non-negative* probability density, the Pauli kinetic energy density, to describe electron localization,

where

where *β* runs over NGWFs that overlap with *ϕ*_{α}. *n*_{σ}(**r**) is the charge density for each spin *σ*, and ∇*n*_{σ}(**r**) is its gradient. To make $\tau \sigma P$ dimensionless, it is compared with the kinetic energy density of the uniform electron gas as a reference, i.e., the local spin density approximation (LSDA), $\tau \sigma LSDA(r)=356\pi 223n\sigma 53(r)$. This is reformulated to avoid open bounds, limiting the ELF to a more desirable finite range of values from zero to one,

The code has also been extended to include the simpler localized orbital locator (LOL). Here, only the exact kinetic energy is required instead of $\tau \sigma P$, again with a range of zero to one,

## V. APPLICATIONS OF onetep: EXAMPLES

Primarily due to its favorable scaling with system size, onetep has found widespread application to studies of nanomaterials, particularly nanocrystals,^{280,281} nanorods,^{282,283} and 2D materials.^{92–94} In all these cases, the combination of scaling to models of hundreds or thousands of atoms with plane-wave accuracy has allowed a quantitative prediction of optical and electronic properties. In this section, we present several applications of onetep to such systems as well as other problems of interest, spanning a wide range of topics.

### A. Conductance between carbon nanotubes

The electronic transport functionality of onetep (Sec. IV E 3) has been used to carry out calculations of conductance between two semi-infinite carbon nanotubes (CNTs): a fullerene-capped (5, 5) CNT and a zig-zag terminated (9, 0) CNT with two hydrogen atoms passivating dangling bonds on the terminal carbon atoms. Figure 15 shows the geometry of the auxiliary system for which the ground-state DFT calculation is performed. The device, indicated by the atoms within the box, consists of a left and right lead and a central region where electrons can tunnel between the leads. Outside the device, a number of buffer atoms are included in the ground-state electronic structure calculation only to ensure that the matrix elements at the extreme left/right of the device region are bulk-like, i.e., they are not directly influenced by the presence of the terminated ends of the CNTs.

Figure 16 shows the conductance spectra between the two CNTs, which are generally found to be two orders of magnitude lower than the maximum possible conductance for CNTs (4*e*^{2}/*h*). This is because the two CNTs have different chirality, suppressing inter-tube conductance.^{284} Three sharp peaks are observed at −0.5 eV, −0.2 eV, and 0.16 eV and can be explained by considering the eigenchannel decompositions. Inset (a) shows the eigenchannel with largest conductance close to a resonance. It can be seen that a significant eigenchannel weight has transferred across the junction to the second CNT. The sharp peaks suggest the presence of localized states. Indeed, by considering the (9, 0) CNT from the original auxiliary geometry in isolation and examining the eigenstates, a localized end-state is discovered at approximately the same energy as the conductance resonance. Inset (b) shows this end state, demonstrating that it has the same form as the eigenchannel in inset (a) and is responsible for the resonance in the conductance. The asymmetry in the line shape of the resonances is typical of Fano resonant scattering between bulk states and localized defect states.^{285} In contrast, inset (c) shows the same eigenchannel as shown in inset (a), but away from a resonance where the conductance is low, and the eigenchannel is seen to be almost entirely localized within the source CNT.

### B. Supported nanoparticles in heterogeneous catalysis

Metal oxide supports often play an active part in heterogeneous catalysis by moderating both the structure and the electronic properties of the metallic catalyst particle. In order to provide some fundamental understanding on these effects, onetep has been used in collaboration with Johnson Matthey for an investigation of the binding of O and CO on Pt nanoparticles supported on titania (anatase) surfaces^{287} (see Fig. 17). They have found that there is strong interaction of the Pt nanoparticles with the support so it is important to include both in the simulations: simpler models where either isolated nanoparticles are used or slab surfaces are used instead of nanoparticles cannot capture these crucial effects. Furthermore, they were able to separate geometric and electronic effects of the support and found that a larger contribution to ligand binding energy arose from the former. They also made use of electronic descriptors, including the d-band center and an electronic density based descriptor in order to further rationalize their results in these complex systems. This pioneering work provided an approach for creating realistic models for the study of supported metal catalysts, which are ubiquitous in many industrial applications ranging from batteries and fuel cells to the bulk synthesis of industrial chemicals.

### C. Piezochromic properties of nanocrystals

onetep has been used to study the effect of pressure-induced deformations on the optoelectronic properties of CdS nanocrystals coated with organic ligands.^{281} The application of pressure to these finite systems employs the electronic enthalpy method,^{183,288} and optical absorption spectra were calculated using the linear-response implementation of time-dependent DFT in onetep (Sec. IV E 2). Whereas in the bulk, the application of pressure increases the bandgap, for these nanocrystals, the sign of the shift depends upon the choice of ligand, as shown in Fig. 18. Hole-accepting ligands such as phenyl groups hybridize strongly with the HOMO causing it to be localized at the surface, in contrast to the LUMO that is confined within the core, leading to suppression of the oscillator strength for transitions between these states. Increasing pressure causes the HOMO to spread into the core and increasing overlap with the LUMO and the associated oscillator strength, resulting in an overall red shift of the absorption. This dependence upon the choice of ligand, which derives from the environment of the nanocrystal, suggests new possibilities for controlling the optoelectronic properties of nanomaterials for applications such as pressure sensors.

### D. Band structure projection in 2D material heterostructures

Heterostructures are ubiquitous in real applications of 2D materials. Any 2D material that is incorporated into any kind of device automatically becomes, in effect, part of a heterostructure, since the atomic-scale widths of these materials mean that their “bulk” regions are never really independent of their surroundings. Furthermore, novel physics often emerges from the hybridization of the band structure that results when materials are combined. Controlling the properties of layered material heterostructures is therefore crucial to the success of devices based on the novel capabilities of 2D materials.

However, while monolayer 2D materials are very well-studied, the theoretical insight into heterostructures available from traditional DFT is limited by the large system sizes required to study the interfaces between pairs of materials, which, in general, will be incommensurate and may be rotated with respect to one another. Furthermore, while the effect of interlayer interactions is weak, given that the layers are bonded by only van der Waals interactions, there can be subtle, yet crucial, effects from hybridization. The combination available in onetep of non-local vdW functionals and band structure unfolding techniques with large-scale models of heterostructures is thus of great interest for applications in this field.

onetep has been applied to calculations on a range of heterostructure combinations including MoS_{2}/MoSe_{2},^{92} MoSe_{2}/WSe_{2},^{94} and hBN/phosphorene.^{93} Band structure changes caused by stacking and rotation of the layers have been observed and characterized in all these cases. Changes in spectral weight and band structure between the monolayers and heterostructured interfaces show how lattice mismatch (MoS_{2}/MoSe_{2}) or spacer layers (phosphorene/hBN/phosphorene) can allow the component monolayers to retain more independence in heterostructures than in homo-stacks. Significantly, comparison of band structures with ARPES results for aligned and non-aligned stack models (see Fig. 19) led to the confirmation that there can be co-existence of commensurate and incommensurate domains within the interface between a single pair of monolayer transition metal dichalcogenide flakes.^{94}

## VI. COMMUNITY ACTIVITIES

onetep was developed from the beginning to be an academic platform to enable cutting edge developments in large-scale electronic structure theory. At the same time, the philosophy behind the actual software has been to make it as efficient and simple (user friendly) to use as possible, even if this requires significantly more software engineering effort than a traditional academic code that is often targeted toward a small number of expert, highly specialized users. The reasoning behind this choice is that the new electronic structure capabilities that are developed in onetep are meant to be available long term for use in the future for application in challenging problems in industry and academia. This has led to the early adoption of the code by industry via commercialization by Accelrys (since acquired by Dassault Systèmes BIOVIA) in late 2004. This relationship has grown over the years, and onetep is currently seen as a powerful computational simulation tool by industry, which is marketed currently by Dassault Systèmes BIOVIA.^{289}

The popularity and usage of the code has also increased in academia with a large number of groups that are not associated with the developers using it for a multitude of applications. In 2017, onetep became the Flagship Project of CCP9, the EPSRC-funded UK Collaborative Computational Project for the Study of the Electronic Structure of Condensed Matter.^{290} An academic license for the code covers an entire research group and provides access to the full source code and no limitations on the size of calculations that can be run, the use of HPC resources, or the number of calculations that can be run simultaneously. An academic license is obtained from the onetep website^{291} and is currently free to UK academics members of CCP9 (all UK academics can join). For non-UK academics, there is a modest administration fee for the license.

The community of onetep users is growing. The leading events are the onetep Masterclasses, which provide training to both academic and industrial users of the code. A distinct feature of these events is that, unlike usual code training schools where the participants train on set exercises, each participant is allocated two personal tutors who help them work directly on the problems of their interest so that they are ready to continue work with the code after the Masterclass. Apart from this hands-on training, the Masterclass consists of lectures on the general theory of onetep and specialized lectures of specific topics voted for by the participants. Each Masterclass has about 20 participants who are a mixture of academic and industrial users of the code.

Another more specialized community activity are the onetep coding retreats. These last for about a week and bring together people who have considerable experience of developing the code so that they can work together intensively to develop specific functionalities, achieve inter-operability between existing capabilities, eliminate known bugs, clean up or refactor parts of the code, and write documentation.

onetep is ultimately a community code and as such it is one of the codes of the CCP9 and UKCP^{292} consortia for the electronic structure community. Through these consortia, the community can learn about the most recent developments and applications of the code and also influence the developments according to its needs. On the international side, onetep is one of the codes of the Psi-k European network for electronic structure;^{293} it is part of activities of this network and its large community, which extends worldwide and provides a forum for interaction between the users and developers of electronic structure codes via conferences, workshops, mailing list, and other resources. onetep is also one of the community code partners of The Molecular Sciences Software Institute (MolSSI)^{294} in the US, a nexus for science, education, and cooperation serving the worldwide community of computational molecular scientists.

## VII. CONCLUSIONS AND FUTURE DIRECTIONS

We have presented an overview of the onetep program for linear-scaling density functional theory (DFT) calculations with large basis set (plane-wave) accuracy on parallel computers. The DFT energy is computed from the density matrix, which is constructed from spatially localized orbitals we call non-orthogonal generalized Wannier functions (NGWFs). During the calculation, the density matrix and the NGWFs are optimized with localization constraints. A basis set of periodic sinc (psinc) functions, which is equivalent to plane waves, is used to expand the NGWFs. By taking advantage of density matrix localization, onetep is able to perform calculations with thousands of atoms with computational effort, which scales linearly with the number or atoms. We have described the large and diverse range of capabilities of the code, which include different boundary conditions, many types of DFT exchange–correlation functionals, and finite electronic temperature DFT methods for metallic systems. The code also has a range of methods that go beyond DFT such as non-local exchange and strong correlation. There are advanced capabilities for geometry relaxation, molecular dynamics, phonons, and transition states. Excited state calculations are available through time-dependent DFT as well as relevant experimental observables such as electronic transport, UV/VIS spectra, core loss spectroscopy, and anharmonic vibrational spectra. The code is able to include the effect of the chemical environment in calculations via a solvation model, QM/MM embedding, and QM-in-QM embedding approaches. An extensive variety of electronic properties are also available ranging from density of states to distributed multipole analysis and methods for partitioning charges and interactions between fragments. Calculations with onetep provide unique insights into large and complex systems that require an accurate description at the atomic level ranging from biomolecular to chemical, to materials and physical problems, as we showed with a small selection of illustrative examples.

The first version of the onetep code was completed in summer 2004.^{7} While able to run calculations with thousands of atoms and demonstrate linear-scaling behavior with plane-wave basis set accuracy, the only type of calculation it was able to do was single-point energy. Since then there has been a tremendous increase in the capabilities of the code that now contains an enormous and varied range of functionalities, as we have described in this paper, and the code is being used for applications that cover all possible areas of applicability of electronic structure theory.

A community code and project with unique capabilities such as onetep needs to remain at the cutting edge of physical theory method developments as well as software developments. The future directions for the developments of the code will depend on the needs of the users and the applications that they wish to do with the code. It is likely that the range of capabilities will continue to grow. It is also expected that more developments toward multiscale methods will be done in order to include environment (e.g., solvent, support, etc.) effects in the calculations. This will likely involve implementations of computationally cheaper semi-empirical variants of DFT that will take advantage of the linear-scaling framework of onetep to perform extensive configurational sampling, as in molecular dynamics simulations. Developments are likely to take place also at the other end of the spectrum of accuracy with the implementation of more sophisticated exchange–correlation functionals and the implementation of quantum chemistry wavefunction-based methods, which will be based on the two electron repulsion integral engine for NGWFs that we have developed for hybrid functionals. Another interesting direction would be to develop machine-learning methods to estimate certain quantities and hence reduce the calculational cost.

Finally, a code such as onetep needs to keep up with progress in supercomputing resources. In particular, it is imperative to be able to take advantage of the new disruptive technologies that are emerging and will make exascale computing a reality. Due to its linear-scaling computational cost, onetep has unique potential for simulations of complex materials with high industrial relevance. Often, a common feature of these challenging problems is that they require very large scales of simulation with many thousands of atoms and also a large number of these simulations (configuration sampling). Thus, to make such simulations realistic enough to be able to have predictive power exascale computing is absolutely necessary. While we have made excellent progress in developing onetep to run efficiently on “conventional” large multicore HPC resources, we have a lot of work to do to meet the challenge of exascale computing. Radically different approaches are needed for the redesign of our core algorithms to run efficiently on these resources that will require significant software engineering work over the coming years. This will involve also taking advantage of GPUs, an area where onetep already has some capability^{295} and can be significantly improved in terms of performance. Overall, a code and project such as onetep needs to be always under development in order to remain useful for its target applications.

## AUTHOR’S CONTRIBUTIONS

J.C.A.P, J.A., and J.C.W. contributed equally to this work.

## ACKNOWLEDGMENTS

Computing resources were provided by a number of facilities including the Scientific Computing Research Technology Platform of the University of Warwick, the IRIDIS-5 supercomputer of the University of Southampton, the Imperial College High Performance Computing Service, the University of Cambridge High Performance Computing Service, and the TASK Academic Computer Centre (Gdańsk, Poland). The use of the ARCHER supercomputer was supported by both the UKCP consortium, funded by EPSRC Grant No. EP/P022030/1, and the Materials Chemistry Consortium, funded by EPSRC Grant Nos. EP/L000202/1 and EP/R029431/1. We acknowledge the use of Athena at HPC Midlands+, which was funded by the EPSRC through Grant No. EP/P020232/1, as part of the HPC Midlands+ consortium, and the UK Materials and Molecular Modeling Hub for computational resources, which was partially funded by EPSRC (Grant No. EP/P020194/1). We acknowledge funding under the embedded CSE program (Project Nos. eCSE01-004, eCSE07-006, and eCSE08-15) of the ARCHER UK National Supercomputing Service, and the distributed CSE program of HECToR, the former UK National Supercomputing Service, which supported the development of dl_mg. J.A., J.C.A.P., J.C.W., J.D., C.-K.S., P.D.H., A.A.M., and N.D.M.H. acknowledge the support of EPSRC (Grant Nos. EP/P02209X/1, and previously EP/F010974/1, EP/G05567X/1, and EP/J015059/1), which supported earlier development of onetep . J.C.A.P., L. Andrinopoulos, R.A.B., R. J. Charlton, F.C., A.G., D.D.O., L.E.R., V.V., T.J.Z., P.D.H., N.D.M.H., and A.A.M. all acknowledge support from the Thomas Young Centre for Theory and Simulation of Materials through Grant No. TYC-101. J.C.A.P. also acknowledges support from St Edmund Hall, University of Oxford. L. Andrinopoulos was supported by a studentship from the EPSRC Grant No. EP/G05567X/1. R. J. Charlton, A.G., and T.J.Z. were supported through studentships in the Centre for Doctoral Training in Theory and Simulation of Materials at Imperial College London, funded by the EPSRC through Grant Nos. EP/G036888/1 and EP/L015579/1. T.J.Z. acknowledges funding under the embedded CSE program of the ARCHER UK National Supercomputing Service (eCSE02-15). A.E.A.A., K.K.B.D., and E.L. were supported through studentships in the Centre for Doctoral Training in Computational Methods for Materials Science at the University of Cambridge, funded by the EPSRC through Grant No. EP/L015552/1. E.W.T. was supported through a studentship at the Cambridge Centre for Doctoral Training in Nanoscience and Nanotechnology, funded by the EPSRC through Grant No. EP/G037221/1. V.V. and A.A.M. were supported by the EPSRC through Grant No. EP/S025324/1. N.D.M.H. was supported by the EPSRC through Grant No. EP/P01139X/1. V.V. was also supported through a studentship at the Doctoral Training Centre of the Institute for Complex System Simulations, funded by the EPSRC through Grant No. EP/G03690X/1. G.A.B. acknowledges the EPSRC and Pacific Northwest National Laboratory (PNNL) for Ph.D. funding. Q.H. also acknowledges the EPSRC for research studentship funding. M.J.S.P. would like to thank the BBSRC and Boehringer Ingelheim for an industrial CASE studentship. R. J. Clements was supported through a studentship in the Centre for Doctoral Training in Next Generation of Computational Modelling, funded by the EPSRC through Grant No. EP/L015382/1. A.B., J.D., and J.C.W. acknowledge funding from the Faraday Institution (faraday.ac.uk; EPSRC Grant No. EP/S003053/1) through Grant No. FIRG003. A.R.S. acknowledges the support of the EPSRC for a high end computing studentship, funded by Grant No. EP/F038038/1, through the UKCP consortium. M.C.P., J.M.E., and T.J.Z. acknowledge support from the EPSRC through Grant No. EP/J017639/1. M.C.P. and J.M.E. acknowledge support from the EPSRC through Grant Nos. EP/J015059/1 and EP/P034616/1. M.C.P., N.D.M.H., and S.M.-M.D. acknowledge support from the EPSRC through Grant No. EP/G055904/1. J.M.E. also acknowledges support from Ministerio de Ciencia e Innovación of Spain through the María de Maeztu grant MDM-2017-0767. D.J.C. acknowledges support from the EPSRC through Grant No. EP/R010153/1. G.T. acknowledges support from the EPSRC through Grant Nos. EP/I004483/1, EP/P022189/1, and EP/P022189/2. D.D.O.R. acknowledges the support of Science Foundation Ireland (SFI) through AMBER (The Advanced Materials and Bioengineering Research Centre, Grant Nos. SFI/12/RC/2278 and SFI/12/RC/2278_P2), the European Regional Development Fund (ERDF), and the National University of Ireland. M.C.P. and D.D.O.R. acknowledge support from the EPSRC through Grant No. EP/F032773/1. D.D.O. and G.T. also acknowledge support by the Royal Irish Academy-Royal Society International Exchange Cost Share Programme (No. IE131505). Structures of the PBTZT-stat-BDTT-8 polymer (Figs. 7 and 8) were kindly provided by Gabriele Boschetto.

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

## REFERENCES

_{2}Se

_{3}, Bi

_{2}Te

_{3}and Sb

_{2}Te

_{3}with a single Dirac cone on the surface

_{2}/MoSe

_{2}heterostructures