A review of the present status, recent enhancements, and applicability of the Siesta program is presented. Since its debut in the mid-1990s, Siesta’s flexibility, efficiency, and free distribution have given advanced materials simulation capabilities to many groups worldwide. The core methodological scheme of Siesta combines finite-support pseudo-atomic orbitals as basis sets, norm-conserving pseudopotentials, and a real-space grid for the representation of charge density and potentials and the computation of their associated matrix elements. Here, we describe the more recent implementations on top of that core scheme, which include full spin–orbit interaction, non-repeated and multiple-contact ballistic electron transport, density functional theory (DFT)+U and hybrid functionals, time-dependent DFT, novel reduced-scaling solvers, density-functional perturbation theory, efficient van der Waals non-local density functionals, and enhanced molecular-dynamics options. In addition, a substantial effort has been made in enhancing interoperability and interfacing with other codes and utilities, such as wannier90 and the second-principles modeling it can be used for, an AiiDA plugin for workflow automatization, interface to Lua for steering Siesta runs, and various post-processing utilities. Siesta has also been engaged in the Electronic Structure Library effort from its inception, which has allowed the sharing of various low-level libraries, as well as data standards and support for them, particularly the PSeudopotential Markup Language definition and library for transferable pseudopotentials, and the interface to the ELectronic Structure Infrastructure library of solvers. Code sharing is made easier by the new open-source licensing model of the program. This review also presents examples of application of the capabilities of the code, as well as a view of on-going and future developments.
I. INTRODUCTION
The possibility of treating large systems with first-principles electronic structure methods has opened up new research avenues in many disciplines. The Siesta method and its implementation have been key in this development, offering an efficient and flexible simulation paradigm based on the use of strictly localized basis sets. This approach enables the implementation of reduced scaling algorithms, and its accuracy and cost can be tuned in a wide range from quick exploratory calculations to highly accurate simulations, matching the quality of other approaches, such as plane-wave (PW) methods.
The Siesta method has been described in detail in Ref. 1 with an update in Ref. 2. In this paper, we shall describe its present status, highlighting its strengths and documenting the steps that have recently been taken to improve its capabilities, performance, ease of use, and visibility in the electronic structure community.
As we shall see, the improvements touch many areas. We can underline the implementation of new core electronic structure features [density functional theory (DFT)+U, spin–orbit interaction, and hybrid functionals], modes of operation [improved time-dependent density functional theory (TD-DFT), density functional perturbation theory (DFPT)], and analysis methods and procedures to access new properties. A major effort has been spent in enhancing the interoperability of the code at various levels [sharing of pseudopotentials (PPs), a new wannierization interface opening the way to sophisticated post-processing, and an interface to multiscale methods]. Very significant performance enhancements have been made, notably to the TranSiesta module through improved algorithms and to the core electronic structure problem through the development of interfaces to new solvers. These advances have put Siesta in a prominent place in the high-performance electronic structure simulation scene, a role reinforced by its participation in important international initiatives and by its new open-source licensing model.
This manuscript is organized as follows: We provide an overview of the underlying methodology and the capabilities of Siesta in Sec. II, which serves to place the code in the wider ecosystem of electronic structure materials simulation. Section III presents the recent developments in and around the code, which are covered in Subsections III A–III O. To demonstrate Siesta’s utility in the context of electronic structure calculations, we briefly present some relevant applications and survey a few areas in which Siesta is being profitably used in Sec. IV. Plans for the future evolution of Siesta are outlined in Sec. V.
II. KEY CONCEPTS OF Siesta
A. Theory background and context
Siesta appeared as a consequence of the push for linear-scaling electronic structure methods of the mid-1990s, which has been reviewed, for example, in Refs. 3 and 4. Siesta was the first linear-scaling self-consistent implementation of density functional theory (DFT).5,6
The Siesta method relies on atomic-like functions of finite support as basis sets7,8—of arbitrary number, angular momentum, radial shape, and centers—combined with a discretization of space for the computation of the Kohn–Sham (KS) Hamiltonian terms that involve more than two centers. The electron–ion interaction is represented by norm-conserving pseudopotentials. These key ingredients, through the optimized handling of sparse matrices, are used to compute the self-consistent Hamiltonian and overlap matrices with a computational expense that scales linearly with system size. The method is completed with a choice of solvers for that Hamiltonian, from optimized (but cube-scaling) diagonalization methods to reduced-scaling solvers of different flavors.
The orbitals in the Siesta basis set are made of the product of a real spherical harmonic and a radial function, which is numerically tabulated in a grid. The shape of the radial part is, in principle, totally arbitrary, but the experience accumulated has proven that the numerical solution of the Schrodinger equation for a confined isolated atom with the corresponding pseudopotential is a very good choice in terms of accuracy vs computational cost. Fuller descriptions of the mechanisms to generate and optimize these pseudo-atomic orbitals (PAOs) are given in Refs. 8–10.
The auxiliary real-space grid is an essential ingredient of the method as it allows the efficient representation of charge densities and potentials as well as the computation of the matrix elements of the Hamiltonian that cannot be handled as two-center integrals. This grid can be seen as the reciprocal space of a set of plane waves, and its fineness is most conveniently parameterized by an energy cutoff (the “density” cutoff of plane-wave methods). There are limits to the softness of the functions that can be described with such a grid, so core electrons are not considered (although semi-core electrons usually are), and their effect is incorporated into pseudopotentials. The real-space grid is also used to solve the Poisson equation involved in the computation of the electrostatic potential from the charge density through the use of a fast-Fourier-transform method. This means that Siesta uses periodic boundary conditions (PBC). Non-periodic systems, such as molecules, tubes, or slabs, are treated using appropriate supercells.
Siesta is now a mature code with more than 20 years of existence. In this period, the most important algorithms behind our implementation have been already fully described and documented in a series of papers. Readers interested in the details of how the basic elements defining the method are combined, as well as other relevant implementation details that make the method practical, can find them in Ref. 1 and in the update with the new capabilities of the code in Ref. 2.
We note that the term Siesta is regularly used to describe both the method (as outlined in Refs. 5 and 6) and its implementation in a computer program. The Siesta method is at the basis of later independent implementations, such as OpenMX11 and QuantumATK.12 Other subsequent codes are built on the method by revising some of the fundamental ingredients. This is the case of FHI-aims,13 which uses a more sophisticated real-space grid (atom-centered), thus extending the core scheme to all-electron calculations.
In this paper, we describe new additions to the Siesta code based on independent methodological advances, either pre-existent or specifically developed for Siesta, as specified and cited in the appropriate sections below.
B. Overview of Siesta capabilities
As a general purpose implementation, Siesta can provide the standard functionality available in mainstream DFT codes (energies, forces, molecular-dynamics simulations, band structures, and densities of states) and shares with those codes the basic current limitations of DFT (notably the description of strongly correlated systems).
What makes Siesta different from most other codes is the atomic-like and strictly localized character of its basis set, which is at the root of its key strengths. The use of a “good first approximation” to the full problem implies, first, that a much smaller number of basis functions are needed. Second, the finite-support of the orbitals leads to sparsity and the possibility to use reduced-scaling methods. Thus, high performance emerges almost by default.
First, consider the basis cardinality: the number of basis orbitals per atom in a typical Siesta calculation is of the order of 10–20. This is to be compared with a few hundred in the typical plane-wave (PW) calculation. Furthermore, for systems whose description needs a vacuum region (e.g., slabs for surface calculations and 2D monolayers), empty space is essentially “free” for Siesta, whereas PW codes still need a basis set determined by the total size of the simulation cell. Siesta is then quite capable of dealing with systems composed of dozens to hundreds of atoms on modest hardware, even when using cubic-scaling diagonalization solvers, which are the default as they are universally applicable.
Electronic structure solvers with a more favorable size-scaling can be applied to suitable systems. For example, one of Siesta’s earlier calculations, in 1996, was a linear-scaling run for a strand of DNA with 650 atoms, performed on a desktop workstation of the era.6 Reduced size-scaling is also a feature of the PEXSI solver described in Sec. III G 1 and of the NTPoly solver mentioned in Sec. III G 2. In addition to time-to-solution efficiency, these solvers have a smaller memory footprint than diagonalization, as the relevant matrices are kept in a sparse form rather than converting them to a dense format.
Crucially, Siesta’s baseline efficiency can be scaled up to ever-larger systems by parallelization. Both distributed [message-passing (MPI)] and shared-memory (OpenMP) parallelization options are implemented in the code. As will be shown by some examples in Sec. IV, non-trivial calculations with thousands of atoms are used in applications in different contexts from molecular biology to electronic transport.
Work on the performance aspects of the code is continuous, mostly on the solvers, which usually consume most of the computer time due to the very high efficiency of the Hamiltonian setup module in Siesta. This task is facilitated (see Sec. III O) by leveraging external libraries and developments generated by a number of international initiatives in which Siesta participates. The code can still run efficiently in modest hardware while also being able to exploit massive levels of parallelism in large supercomputers (see Fig. 1).
Parallel strong scaling of SIESTA-PEXSI and the (Scalapack) diagonalization approach for a DNA chain and a graphene–boron nitride stack, prototypes of large (hundreds of thousands of orbitals) quasi-one-dimensional (1D) and quasi-two-dimensional (2D) systems. “ppp” stands for the number of MPI processes used in each pole computation, and “sp” the sparsity of the Hamiltonian (for more details, see Sec. III G 1).
Parallel strong scaling of SIESTA-PEXSI and the (Scalapack) diagonalization approach for a DNA chain and a graphene–boron nitride stack, prototypes of large (hundreds of thousands of orbitals) quasi-one-dimensional (1D) and quasi-two-dimensional (2D) systems. “ppp” stands for the number of MPI processes used in each pole computation, and “sp” the sparsity of the Hamiltonian (for more details, see Sec. III G 1).
It is also worth noting that the atomic character of the basis set enables the use of a very intuitive suite of analysis tools since most of the concepts relating to chemical bonding use the language of atomic orbitals. Hence, Siesta has a natural advantage in this area. Partial densities of states and atomic and crystal populations (COOP/COHP) are routinely used to gain insights into the stability and other properties of materials. For a recent example, see Ref. 14. Similarly, an atomic basis provides a very natural and adequate language for the first-principles simulation of electronic ballistic transport in nanosized systems via the Green’s function-based Keldysh formalism implemented in TranSiesta,15 a part of the Siesta package.
A very high number of citations of the Siesta papers testify to the successful application of the code to widely different systems. With regard to specific capabilities and the levels of accuracy achievable, we can distinguish several levels. First, Siesta implements DFT, one of the most versatile materials simulation frameworks. DFT has its shortcomings, notably in regard to the description of strongly correlated systems, but these are being addressed (see Secs. III C and III E). Second, Siesta uses pseudopotentials to represent the electron–ion interaction. The pseudopotential approach is firmly rooted in a sound physical approximation (that bonding effects depend mostly on the valence electrons); however, it is at a disadvantage when core-electron effects are important (but see Sec. III N 6). Third, Siesta employs periodic boundary conditions (PBC) for the solution of the Poisson problem, sharing with plane-wave codes the need to resort to repeated supercells for the study of low-dimensional systems and to special techniques for the treatment of charged systems. However, it is important to note that, unlike plane-wave codes, Siesta is only bound to PBC because of the present treatment of the Hartree term of the single-particle Hamiltonian. This limitation is addressed by the incorporation of alternative Poisson solvers, as described in Sec. III O, which allow for open boundary conditions, as for isolated nano-systems, and hybrid open/periodic boundary conditions in different dimensions, as for isolated wires and slabs. It should be remembered that the three approximations mentioned in this paragraph are very widely used in the community, shared by some of the most popular electronic structure codes.
Fourth, with regard to Siesta-specific approximations, particularly the basis set, it should be stressed that Siesta is limited to basis sets composed of functions that are the product of a radial part and spherical harmonics, but it does not constrain on how many, where such functions are centered, and the size of their finite-support region. Calculations can flexibly range from quick exploration to very high-quality simulations (one may recall that accuracy gold standards in the electronic structure are provided by quantum-chemistry methods based on LCAO).
The use of an atomic-orbital basis set implies however the limitation of non-uniformity of convergence. As opposed to plane-wave methods in which a single energy cutoff parameter monotonically determines the quality of the calculation, there is no univocal procedure for the choice of an appropriate basis set. This is a well-known problem shared by the whole quantum-chemistry community, and there is widely used and tested know-how. As shown in Fig. 2, it is possible to attain, in practice, accuracy comparable to that of well-converged plane-wave calculations. The reader is also referred to Secs. IV B and IV C 1 for the demonstrated examples of the accuracy of the code, among many others in the literature.
Basis set convergence for the binding energy (Eb) of a water dimer. Details can be found in Ref. 16. The horizontal dotted line represents the converged plane-wave (PW) calculation (1300 eV cutoff) for the same system (dimer geometry and box), pseudopotentials and density functional, using the ABINIT code.17 Inset: deviation of Eb vs the PW reference. The deviation for the last point is of 10 μeV.
Basis set convergence for the binding energy (Eb) of a water dimer. Details can be found in Ref. 16. The horizontal dotted line represents the converged plane-wave (PW) calculation (1300 eV cutoff) for the same system (dimer geometry and box), pseudopotentials and density functional, using the ABINIT code.17 Inset: deviation of Eb vs the PW reference. The deviation for the last point is of 10 μeV.
To close this section, we stress that it has been a traditional and deliberate attitude by the Siesta team that, although proposing sensible starting points to users as defaults, the choice of fundamental approximations and inputs to the program (not only basis sets but also density functionals and pseudopotentials) is a responsibility of the users, who retain full control and the flexibility to adapt the code to their specific needs. Nevertheless, tools for basis optimization are provided with the program, new curated databases of pseudopotentials are coming online, and new ways to ameliorate the correlation problem are being implemented. Some of these developments are described in subsequent sections.
III. RECENT DEVELOPMENTS IN Siesta
A. New distribution model and development infrastructure
A few years ago, in 2016, a decision was made to change the licensing model for Siesta: traditionally, it had always been free of charge to academics, but non-academic use required a special license and redistribution was not permitted. Now, Siesta is formally an open-source program distributed according to the terms of the GNU General Public License (GPL).18 At the same time, the development infrastructure was made more transparent and scalable using the Launchpad platform19 earlier and the Gitlab service now.20 The net effect of the changes has been a more fertile and dynamic development, with more contributors who can have direct access to the various branches of development and a better experience for users who can download codes and raise issues in an integrated platform.
These changes have been substantial for the core developers, and the transitory period is still being felt. The main code base is gradually gaining new developments that were planned long in advance, and new ones made possible by the greater openness and fluidity of the development model. Most of the new features described below are already part of public releases, but a few are undergoing the last stages of testing before release. The work-flow is also moving from long-lived releases, hard to maintain with bug-fixes, to more frequent releases that will be maintained for a shorter time.
B. New pseudopotential format for interoperability
PSML (for PSeudopotential Markup Language)21,22 is a file format for norm-conserving pseudopotential data, which is designed to encapsulate as much as possible the abstract concepts involved in the domain and to provide appropriate metadata and provenance information. This extra level of formalization aims at removing the interoperability problems associated with bespoke pseudopotential formats, which usually were designed to serve the needs of specific generators and client codes and thus contain implicit assumptions about the meaning of the data or lack information not considered relevant.
PSML files can be produced by the oncvpsp23 and atom24 pseudopotential generator programs and are a download-format option in the Pseudo-Dojo database of curated pseudopotentials.25,26
The software library libPSML21,22 can be used by electronic structure codes to transparently extract the information in a PSML file and incorporate it into their own data structures or to create converters for other formats. It is currently used by Siesta and abinit,17,27 making possible a full pseudopotential interoperability and facilitating comparisons of calculation results.
The use of this new format opens the door to benefit from the availability of a Periodic Table of reliable and accurate norm-conserving pseudopotentials, easing in most cases the task of pseudopotential quality control.
C. DFT+U for correlated systems
The LDA+U method, initially developed by Anisimov and co-workers28 with the objective to improve the treatment of the electron–electron interaction for localized electrons within the bare LDA description, has been implemented in Siesta. The idea behind the LDA+U consists in describing the “strongly correlated” electronic states of a system (typically, localized d or f orbitals) using the Hubbard model, whereas the rest of valence electrons are treated at the level of “standard” approximate DFT functionals.29 In the current version of Siesta, the implementation is based on the simplified rotationally invariant functional proposed by Dudarev and co-workers.30 Here, the corrections are made invariant under rotation of the atomic orbitals used to define the occupation number of the correlated subspace, at the cost of retaining only the lowest order Slater integrals in the factorization of the integrals of the Coulomb kernel of the electron–electron interaction and neglecting the higher order ones (i.e., taking the exchange interaction J as 0). The expression of the corrective term as a function of the occupation number of the localized correlated orbital ℓm with spin σ within the atom I is given by
where only one interaction parameter UIℓ is needed to specify the interaction per atom and ℓ-shell. In the practical Siesta implementation, the populations on the correlated orbitals are computed using non-overlapping (i.e., orthogonal) localized projectors. They can be generated using either (i) the same algorithm used to produce the first-ζ orbitals of the basis set, but with a larger energy shift, or (ii) cutting the exact solution of the pseudoatom with a Fermi function.
The results of the LDA+U method are sensitively dependent on the numerical value of the effective on-site electronic interaction, the Hubbard U. Although, in principle, the value of U can be computed from first principles using linear response methods,31 a common practice is to tune it semiempirically, seeking agreement of certain properties (for instance, bandgaps or magnetic moments) with the available experimental measurements. Then, the fitted U is used in subsequent calculations to predict other properties.
The LDA+U corrects localized states for which the self-interaction correction is expected to be stronger and is an effective method to improve the description of the (underestimated) bandgap of insulators, as shown in Fig. 3 for the case of NiO. Once the Hubbard correction is switched on, the optical bandgap increases up to 3.08 eV [from the bare Generalized-Gradient-Approximation–Perdew–Burke–Ernzerhof (GGA-PBE) value of 1.08 eV], very close to the experimental value for the onset of optical absorption in NiO32 (3.10 eV). The magnetic moment on the Ni atom is also properly described with a value of 1.67 μB, which lies well within the experimental range of values (between 1.64 μB33 and 1.9 μB34), and improves on the result of 1.39 μB obtained with a bare GGA–PBE functional.
Band structure of NiO in the undistorted rock-salt type structure with rhombohedral symmetry introduced by a type-II antiferromagnetic order. The experimental lattice spacing is used. The bands obtained within the GGA–Perdew–Burke–Ernzerhof functional [panel (a)] and with a Hubbard U correction of 4.6 eV applied on the d-orbitals of Ni [panel (b)], as in Ref. 31, are shown. The zero of the energy is set at the top of the valence band.
Band structure of NiO in the undistorted rock-salt type structure with rhombohedral symmetry introduced by a type-II antiferromagnetic order. The experimental lattice spacing is used. The bands obtained within the GGA–Perdew–Burke–Ernzerhof functional [panel (a)] and with a Hubbard U correction of 4.6 eV applied on the d-orbitals of Ni [panel (b)], as in Ref. 31, are shown. The zero of the energy is set at the top of the valence band.
D. van der Waals functionals
An efficient calculation of van der Waals (vdW) functionals35,36 was developed and first implemented in Siesta using a polynomial expansion in the local variables (q1, q2) of the nonlocal interaction kernel Φ(q1, q2, r12) and a Fourier expansion in the relative position r12.37 As a result, the scaling of the vdW computation decreases from O(N2) to O(N log N), and it becomes marginal within the overall cost. This scheme was later extended16 to a more complex kernel38 of the form Φ(n1, |∇n1|, n2, |∇n2|, r12), and it has been applied to a large variety of systems, such as carbon nanotubes,37 hydrogen adsorption,39,40 or liquid water.41
E. Hybrid functionals
The screened hybrid functional HSE0642–44 has been implemented in Siesta building on the work of Ref. 45. This functional is the result of adding nonlocal Hartree–Fock type exact exchange (HFX) into semilocal density functionals. The Coulomb potential that appears in the exchange interaction is screened, so it has a shorter range than 1/r. Here, to reduce the big prefactor involved in the computation of the HFX potential matrix elements, we fit the NAO of the basis set with Gaussian-type orbitals, especially suited to computing the four center electron repulsion integrals (ERIs) in a straightforward and efficient analytical way. An example of this fitting for the 2s and 2p atomic orbitals basis set of oxygen is shown in Fig. 4. The libint package46 is required to calculate primitive ERIs, where recursive schemes of the Obara–Saika47 method and the Head-Gordon and Pople’s variation48 thereof are implemented. ERIs are calculated in the first self-consistent-field (SCF) cycle and then stored on disk. Only the ERIs with non-negligible contributions are calculated, keeping the HFX Hamiltonian also sparse.
Gaussian fits of the radial part of oxygen 2s (a) and 2p (b) orbitals using six Gaussian functions. The orbitals to fit are represented by blue dots and the corresponding Gaussian expansions by green continuous lines. Dashed vertical lines represent the standard deviations of individual Gaussians, and a red continuous line marks their upper limit. The orbitals are set to zero in the yellow area, marking their cutoff radii.
Gaussian fits of the radial part of oxygen 2s (a) and 2p (b) orbitals using six Gaussian functions. The orbitals to fit are represented by blue dots and the corresponding Gaussian expansions by green continuous lines. Dashed vertical lines represent the standard deviations of individual Gaussians, and a red continuous line marks their upper limit. The orbitals are set to zero in the yellow area, marking their cutoff radii.
This HSE06 functional has been used to compute the band structure of bulk Si [diamond structure, Fig. 5(a)] and BaTiO3 [cubic structure, Fig. 5(b)] with a double-zeta polarized basis set at the equilibrium lattice constant of the Perdew–Burke–Ernzerhof functional49 within the generalized gradient approximation (5.499 Å for Si and 4.033 Å for BaTiO3). In both cases, the gap is opened with respect to the value obtained with the semilocal functional. In bulk Si, the bandgap is indirect: the top of the valence band is located at Γ and the bottom of the conduction band at a point along the Γ → X high-symmetry line. It increases from 0.64 eV within GGA to 1.00 eV with the hybrid functional, in good agreement with the experimental value of 1.17 eV.50 For the case of the perovskite oxide BaTiO3, the bandgap is also indirect, from R to Γ, and its value increases from 1.87 eV with the GGA to 3.28 eV with the HSE06 functional, almost agreeing the experimental value of 3.2 eV estimated by Wemple in the cubic phase.51
Band structure of (a) bulk Si in the diamond structure and (b) bulk BaTiO3 in the cubic structure obtained with the Perdew–Burke–Ernzerhof functional (red lines) and with the HSE06 hybrid functional (black lines). The zero of energies has been set to the valence band maximum.
Band structure of (a) bulk Si in the diamond structure and (b) bulk BaTiO3 in the cubic structure obtained with the Perdew–Burke–Ernzerhof functional (red lines) and with the HSE06 hybrid functional (black lines). The zero of energies has been set to the valence band maximum.
F. Spin–orbit coupling
The capability to include the spin–orbit (SO) interaction in Siesta and in the analysis tools is seen as a strategic asset for the project in view of the recent interest in topological insulators and quasi-two-dimensional systems with important spin–orbit effects, such as some of the transition metal dichalcogenides. It also brings the possibility to obtain the magnetic crystalline anisotropy (MCA) (change in the total energy of the system upon changing the spin quantization axis).
In a standard collinear-spin DFT calculation, the total KS Hamiltonian is represented by two independent spin-blocks, [σ = ↑, ↓]. However, when the SO coupling is included, off-diagonal spin blocks arise (i.e., there are non-zero couplings between the two spin components). Therefore, similar to the non-collinear spin case, the Hamiltonian becomes a full 2 × 2 matrix in spin space,
where μν subindexes refer to the Siesta basis orbitals. The fully relativistic (FR) Hamiltonian ĤKS is expressed as a sum of the kinetic energy , the scalar-relativistic pseudopotential part in the form of Kleinman–Bylander projectors , the spin–orbit term, and the Hartree and exchange–correlation potentials,
The first three terms of the right-hand side do not depend on the charge density, ρ(r), and therefore do not change in the self-consistent cycle, while and are the only spin-dependent terms that couple both spin components.
In order to compute the MCAs, different orientations of the spin quantization axis need to be considered. This may be done by rotating either (as done by Cuadrado and Cerdá52) or the density matrix, which is the approach currently followed by Siesta for compatibility with the non-collinear case.
In the current implementation, the SO term is included non-perturbatively so that the fully relativistic Hamiltonian is solved self-consistently after extending the Kohn–Sham wavefunctions to full spinors. The following two different approaches have been implemented in Siesta to account for the SO term, :
on-site approximation:
Based on the work of Fernández-Seivane et al.,53,54 only the intra-atomic SO contributions within each l-shell of each atom are considered. In this approach, the SO terms are obtained from analytically simple expressions for the angular integrals, while the radial integrals are computed numerically.
off-site approach:
Here, is built following the Hemstreet formalism52,55 whereby a fully relativistic pseudopotential (FR-PP) operator is constructed in a fully separable form, i.e., non-local in the radial part as well as in the angular variables, in order to substantially reduce the computational cost. The necessary lj Kleinman–Bylander projectors may be either constructed by Siesta itself from relativistic semilocal PPs or directly read from appropriately generated PSML files, as provided by the Pseudo-Dojo project.25,26 Moreover, we note that the FR-PP formalism (as well as the original one implemented in Ref. 52) uses the correct normalization constants Cl±1/2, in contrast with what was erroneously stated in Ref. 56.
Although we consider the off-site approach more accurate, as it includes inter-shell and inter-atomic SO couplings, both approximations yield very similar results in most of the tested systems with relevant qualitative differences only found in a few specific cases. Furthermore, the construction of the matrix is very fast under both schemes and involves a tiny fraction of the entire self-consistent calculation.
G. New electronic structure solvers
For most problems, Siesta spends the largest fraction of CPU-time in the solver stage (solution of the generalized eigenvalue problem HΦ = εSΦ). The stage devoted to the calculation of the Hamiltonian H and overlap S is typically much lighter weight, as those matrices are intrinsically sparse due to the use of a finite-support basis set. Accordingly, Siesta’s performance is almost completely linked to the use of appropriate external solver libraries.
Over the past few years, we have expanded the choices available to users and refined the relevant interfaces. Initially, we added support for new individual solvers as detailed below, but recently we have consolidated some of the most important functionality under a new common interface to the ELSI (ELectronic Structure Infrastructure) library of solvers.57,58
1. Solvers with a native interface
Diagonalization (solution of the generalized eigenproblem appropriate for non-orthogonal orbitals) is the default method for obtaining the density-matrix in Siesta. A number of standard routines are contained in the ScaLAPACK library,59 but more efficient alternatives are possible. In particular, the ELPA library60–62 uses an extra intermediate step in the tridiagonal conversion of the matrices to obtain better scalability and significant speedups over ScaLAPACK. An interface to ELPA is offered in Siesta, so this solver can be used as a drop-in replacement for ScaLAPACK throughout the code.
In addition, Siesta has implemented interfaces to several methods not based on diagonalization. In most cases, the use of a finite-support basis set, leading to the appearance of sparse matrices, is a significant factor to achieve good performance:
The Fermi Operator Expansion (FOE) method63 uses the formal relationship between Hamiltonian and density-matrix, , where fFD is the Fermi–Dirac function. A simple polynomial expansion of fFD can then be used to obtain without diagonalization. This method is implemented in the CheSS library,64 developed within the BigDFT project.65
The PEXSI method66,67 uses a pole expansion of fFD to get in the following form:
where and zl are the weights and poles for the corresponding expansion of the Fermi–Dirac function, respectively. The number of poles needed is significantly smaller than for the polynomial version of the FOE, as its dependence on the spectrum size is only logarithmic.
It would appear that having to invert matrices would still render this approach cubic-scaling, but, in fact, only selected elements of have to be actually computed. This “pole expansion and selected inversion” method offers a reduced complexity (at most for dense systems and for quasi-one-dimensional systems) and trivial parallelization over poles, so it is well-suited for very large problems on large machines. For example,68 computed the electronic structure of large (up to 11 700 atoms) graphene nanoflakes using Siesta-PEXSI.
The electronic structure problem can also be cast as a minimization problem (of an extended functional) without orthogonalization. When additional localization constraints are put in place, the original linear-scaling method in Siesta results. Without the extra localization constraints, the cubic-scaling Orbital Minimization Method (OMM)69 can be competitive with respect to diagonalization, as data can be reused across SCF-cycle steps.
2. The ELSI interface
We have considerably extended the range of solver choices and the performance enhancement possibilities of the code with the integration of the open-source ELSI library (https://elsi-interchange.org), which provides a unified software interface that connects electronic structure codes to various high-performance solver libraries to solve or circumvent eigenproblems encountered in electronic structure theory.57 ELSI also ships with its own tested versions of the individual solver libraries, but additionally, linking against already compiled upstream versions from each solver library is supported as much as possible.
The ELPA, OMM, and PEXSI solvers, which had their own ad hoc interfaces as described in Sec. III G 1, are now available through ELSI, which also supports other conventional dense eigensolvers (EigenExa70 and MAGMA71), sparse iterative eigensolvers (SLEPc72), and linear scaling density matrix purification methods (NTPoly73). As sketched in Fig. 6, an electronic structure code interfacing to ELSI automatically has access to all the eigensolvers and density matrix solvers supported in ELSI. In addition, the ELSI interface is able to convert arbitrarily distributed dense and sparse matrices to the specification expected by the solvers, taking this burden away from the electronic structure code. A comprehensive review of the capabilities in the latest version of ELSI, including parallel solution of problems found in spin-polarized systems (two spin channels) and periodic systems (multiple k-points), scalable matrix I/O, density matrix extrapolation, iterative eigensolvers in a reverse communication interface (RCI) framework, has recently been completed.58
Interaction of the ELSI interface with electronic structure codes. ELSI serves as a bridge between electronic structure codes and solver libraries. An electronic structure code has access to various eigensolvers and density matrix solvers via the ELSI API. Whenever necessary, ELSI handles the conversion between different units, conventions, matrix formats, and programming languages.
Interaction of the ELSI interface with electronic structure codes. ELSI serves as a bridge between electronic structure codes and solver libraries. An electronic structure code has access to various eigensolvers and density matrix solvers via the ELSI API. Whenever necessary, ELSI handles the conversion between different units, conventions, matrix formats, and programming languages.
With the common interface in place, any additions and enhancements to the supported solvers can be used in Siesta with almost no code changes. This is particularly relevant for performance enhancements, for example,
Further levels of parallelization: A feature common, in principle, to all solvers is that the Siesta–ELSI interface can exploit the full parallelization over k-points and spins mentioned above. This means that these calculations can use two extra levels of parallelization in the solver step beyond the standard one of parallelization over orbitals (see Fig. 7).
The new version of the PEXSI solver integrated in ELSI can achieve the same level of precision with fewer poles and offers an extra level of parallelization over trial points for the determination of the chemical-potential.
Mixed-precision support: The ELPA solver can be invoked in single-precision mode, which can speed up the initial steps of the electronic self-consistent-field (SCF) cycle.
Accelerator offloading: The ELPA library offers Graphical-Processing-Unit (GPU) support in some kernels,62 and there is scope for extending it to more kernels. ELSI also offers an interface to the accelerator-enabled MAGMA library. Finally, the PEXSI developers are working on adding GPU support to the solver.
Performance improvement from the use of the extra level of parallelization over k-points in Siesta using the ELSI interface with the ELPA solver, compared to the previous diagonalization scheme (using both the standard ScaLAPACK solver and the existing ELPA interface in Siesta). The system is bulk Si with H impurities, with 1040 atoms, 13 328 orbitals, and a sampling of 8 k-points. The multi-k scheme is able to stay closer to ideal scalability for larger numbers of MPI processes.
Performance improvement from the use of the extra level of parallelization over k-points in Siesta using the ELSI interface with the ELPA solver, compared to the previous diagonalization scheme (using both the standard ScaLAPACK solver and the existing ELPA interface in Siesta). The system is bulk Si with H impurities, with 1040 atoms, 13 328 orbitals, and a sampling of 8 k-points. The multi-k scheme is able to stay closer to ideal scalability for larger numbers of MPI processes.
H. Time dependent DFT
Time-dependent density-functional theory (TD-DFT) was first implemented in Siesta in its real-time propagating form. It was first described in Ref. 74 and then briefly in Ref. 2. It was based on the Crank–Nicolson algorithm by which the effect of the evolution operator for an infinitesimal time step
on the wavefunction coefficients matrix at a given time t0, c(t0) is approximated by
where Δt represents the finite time step resulting from time discretization, and S and H represent the overlap and Hamiltonian matrices, respectively, in the representation given by a non-orthogonal basis set, as used by Siesta. That expression is obtained from equating the first-order evolution of the coefficients forward, from t0 to t0 + Δt/2, to the backward evolution from t0 + Δt to the same intermediate step.
It can be further simplified to
for a smooth-enough variation of the Hamiltonian itself and a small enough Δt, thereby avoiding the self-consistency implied in propagation using Eq. (6). In Sec. III H 3, recent developments on efficient treatments of Eq. (6) beyond Eq. (7) are presented. Here, we describe the parallelization and related features in the TD-DFT implementation now found in standard Siesta releases.
The implemented propagation is based on Eq. (7) (with the improvement possibilities described in Sec. III H 3), but proper consideration must be taken of the fact that not only the coefficients change in time but also the basis set and the Hilbert space spanned by it when the atoms move. An analysis of the geometrical implications of this fact is presented in Ref. 75. The time-dependent Kohn–Sham equation
becomes
where H, S, and c are the Hamiltonian, overlap, and coefficients matrices, respectively, as before, and the D matrix is the connection in the manifold given by the evolving Hilbert space,75 Dμν = ⟨ϕμ|∂tϕν⟩, for ϕμ and ϕν basis functions.
A way of taking such evolution into account in the discretized implementation was proposed by Tomfohr and Sankey76 and is based on a Löwdin orthonormalization. The scheme consists of two steps. First, the wavefunctions are propagated using both S and H at time t0 using Eq. (7), but to an auxiliary coefficient matrix ,
Then, the propagation is followed by a change of basis operation (only needed if the ionic positions have changed),
This algorithm is unitary by construction, and so the preservation of orthonormality is guaranteed, regardless of the size of Δt. As discussed in detail in Ref. 75, this algorithm can be shown not to be entirely consistent with the connection represented by the D matrix defined above. Nevertheless, the discrepancies due to the mentioned inconsistency have been shown to be small in a series of studies using this formalism,77–80 at least for low atomic velocities. The practical benefit of separating the two procedures is to perform the change of basis only when necessary, allowing for many electronic steps per atomic motion step, if the nuclei are still significantly slower than electrons, for instance. The implementation of the Crank–Nicolson part is the same for both the fixed and moving basis.
The square root and inverse square root are calculated by first computing its eigenvalues and eigenvectors,
where E is a diagonal matrix with the eigenvalues of S. U is a square matrix with the eigenvectors of S as its columns. Then,
where E1/2 and E−1/2 are obtained by replacing diagonal elements of E with their square root and inverse square root (in the latter case, neglecting those eigenvalues below a certain threshold value), respectively.
The two-stage algorithm has been implemented in Siesta in parallel, allowing for k-point sampling and for collinear spin. The initial occupied states to be propagated are read from a file. Siesta is prepared to run a conventional DFT calculation of whatever relevant initial state and write a wavefunction continuation file that acts as initialization of the ulterior Siesta run in real-time TD-DFT mode. As it stands, Siesta evolves states defined as fully occupied; partial occupations are not currently supported.
1. Parallelization
The two-step procedure described above requires matrix–matrix and matrix–scalar multiplication, matrix addition, and matrix inversion, plus the diagonalization of the overlap matrix for the Löwdin step. Since only the occupied states are propagated, the c matrix is rectangular , that is, the number of propagating states × the number of basis functions, while S and H are square, . The computation of the overlap and Hamiltonian matrices is handled by pre-existing Siesta routines, which are already parallelized and well-optimized for High-Performance-Computing (HPC) environments.69,81
The parallelization of the propagation following Eqs. (10) and (11) is done simply by exploiting the MatrixSwitch library,82 which allows for an abstracted manipulation of matrices, the details of parallelization, data formats, conversions, etc., being taken care of underneath. In this case, MatrixSwitch is called to use the BLACS83 and ScaLAPACK84 libraries, meaning that this part of the code is run on the dense-matrix infrastructure, as already done with conventional diagonalization solvers. As for the latter, although the H and S matrices are sparse, the c matrix is dense.
Conversion between storage formats is an important consideration here. The native matrix storage format employed by Siesta is a compressed sparse column (CSC) scheme with a one-dimensional block-cyclic distribution (1D-BCD) over MPI processes. A block-cyclic distribution is needed by the BLACS and ScaLAPACK package. The matrices can therefore be temporarily converted from sparse to dense using the same parallel distribution; this is a very efficient operation, since no MPI communication is necessary. It should be noted that a two-dimensional (2D) BCD is known to be more efficient in terms of parallel scaling.69 The conversion from 1D to 2D does however carry a heavier cost, as MPI communication is inevitable.
The parallel efficiency of our implementation is therefore chiefly determined by that of the underlying ScaLAPACK drivers. The matrix inversion in Eq. (10) is performed using LU factorization. For the diagonalization of the overlap matrix, we have implemented the option of using either a standard diagonalization approach (tridiagonal reduction followed by the implicit QR algorithm) or a divide-and-conquer algorithm, as described in Ref. 85. The latter is known to scale better with system size.
The scaling with the number of processors is very similar to the scaling of a conventional DFT Siesta run using diagonalization as the solver option, since both procedures are run on routines of analogous scaling within the same dense-matrix-algebra library. Figure 8 shows the relative share in the total running time of the three main procedures involved: the Crank–Nicolson algorithm, the change of basis, and the calculation of the SCF Hamiltonian plus other minor processes in Siesta such as building the density matrix. This was performed for a system of 5000 Ge + 1 He atoms described with a single-zeta polarized basis set. The Crank–Nicolson algorithm takes about 18% of the total time on 30 processors, which increases to 25% on 316 processors. Instead, the change of basis procedure takes about 38% of the total time on 30 processors, which decreases as the parallelization increases, reflecting its better scaling properties. The Löwdin step is the most expensive operation on all numbers of processors, which affects TD-DFT simulations (and only those steps) involving atomic motion.
The relative share of the total running time for the Crank–Nicolson algorithm, the Löwdin step, and the rest of the program operations (including the building of the SCF Hamiltonian) for a system of 5000 Ge atoms and one He projectile using 30–316 processors.
The relative share of the total running time for the Crank–Nicolson algorithm, the Löwdin step, and the rest of the program operations (including the building of the SCF Hamiltonian) for a system of 5000 Ge atoms and one He projectile using 30–316 processors.
2. TD-DFT beyond the released version
There are many possible (and feasible) improvements on what has been described above, some of them in the pipeline. From a fundamental point of view, the Löwdin step will be replaced by another basis-changing step in the direction of what was proposed in Ref. 75. It is needed if atoms move at velocities of around 1 atomic unit or more (1 a.u. ∼ c/137, with c being the speed of light). In that case, the diagonalization step may be avoided (or replaced by the N × N diagonalization of the overlap matrix for the evolving states, instead of the for the basis set overlap).
For fast moving atoms within Ehrenfest dynamics, there is also the need to implement correction terms to the forces related to both the change of basis and the rotation of the time-dependent Hilbert space (the intrinsic curvature of the manifold). These terms are well known,86 and their geometrical meaning in terms of the relevant curvature75 will be the subject of a future publication. They are being tested and should be incorporated shortly to a visible branch in the open source repository, to be later merged into the trunk, and further incorporated into Siesta releases.
For efficiency, iterative inversion options will be explored replacing the present LU implementation in ScaLAPACK, and quite a few possibilities exist to incorporate more sophisticated algorithms to the described operations. What has been described is robust and quite transparent, but the MatrixSwitch abstraction should allow easy implementation of other techniques.
3. Improved real-time propagators
Equation (7) represents a fast approach of electronic propagation in real-time TD-DFT, especially suited to study systems where the perturbation of the electronic density is relatively small (e.g., optical linear response74). If one is interested in simulating systems with heavily perturbed electronic densities by external forces (such as those exerted by intense laser fields or fast atom collisions for instance), one should choose a more elaborate propagation scheme that preserves better the time-reversibility of the propagator operator. Some of the authors introduced in Ref. 79 an extrapolation algorithm to study the stopping power of prototype semiconductors. Briefly, the method uses Eq. (7) with an extrapolated Hamiltonian,
where the extrapolated Hamiltonian Hext reads
and
Additionally, the user is given the option to divide each propagation step Δt into n sub-steps in an effort to increase the accuracy of the first-order expansion underlying the derivation of Eq. (13). In this case, the final equation reads
with
Recently, we introduced a third algorithm for propagation. Leaving aside in this description the complications associated with the possible subdivision of each time step, the new algorithm is based on a two-step scheme where the electronic wavefunction is first propagated until half of the step, Δt/2, using extrapolation as in Eq. (13),
then an explicit calculation of the half-step Hamiltonian, H(t0 + Δt/2), is performed using the coefficients c(t0 + Δt/2) obtained from Eq. (18). In a second step, the coefficients are evolved from the beginning of the step, c(t0), to the full step, c(t0 + Δt), using the half-step Hamiltonian,
This approach, although increasing the CPU time by around ∼35%, as compared to the two previous schemes, allows for better energy conservation for highly perturbed systems where the Kohn–Sham potential heavily varies in time.
In order to provide a more quantitative comparison between the three schemes, namely, the default propagation of Eq. (7), the extrapolation propagation of Eq. (13), and the two-step propagation of Eqs. (18) and (19), we compare their performance vs the P-TDDFT implementation87 of the CPMD code88 in the case of a double ionization of a uracil molecule in the gas phase.89,90 Simulations of this type provide access to the ultrafast electronic dynamics that occurs at the atto and femto timescales in the ionized genetic material (DNA and RNA) as a consequence of collisions with proton or carbon beams.91 This particular simulation addresses the fragmentation pattern of a doubly ionized uracil molecule (its deepest Kohn–Sham orbital is empty) using the Becke- Lee-Yang-Parr (BLYP) density functional.92,93 The technical details for the simulation with the CPMD code used as a reference can be found in Refs. 89 and 90. Siesta calculations using the same density functional and the integrators described above use a double-zeta-polarized (DZP) basis set. As shown in Fig. 9, the standard Siesta implementation cannot properly deal with such a highly excited system. The extrapolation scheme in Eq. (13) already provides a large improvement and gives an energy conservation similar to the CPMD simulations in Refs. 89 and 90. Finally, the two-step algorithm further improves the energy conservation. For smaller time steps, the improvements given by the two-step scheme are even more clear, as shown in Fig. 10.
Energy drift in an energy conserving TDDFT simulation of ionized-core uracil for the three propagation methods described here, as compared with the CPMD implementation, for a time step of Δt = 0.24 as (0.01 a.u.).
Energy drift in an energy conserving TDDFT simulation of ionized-core uracil for the three propagation methods described here, as compared with the CPMD implementation, for a time step of Δt = 0.24 as (0.01 a.u.).
Energy drift comparison as in Fig. 9 for time steps Δt = 0.24 and 1.2 as.
4. Electronic stopping of atomic projectiles
Let us finish the TD-DFT section with a brief mention of its successful application to the problem of simulating the excitation of the electrons of a condensed matter system when traversed by a high-energy atomic projectile (so-called electronic stopping, since the electrons slow down the projectile). This physical problem is very relevant to questions of interest to the nuclear and aerospace industries, as well as to the treatment of cancer. Despite its great relevance and having been researched since Rutherford’s experiment in 1911, the understanding of electronic stopping processes has been essentially limited to either weak effects in the linear-response regime or beyond linear but only for target systems close to the homogeneous electron liquid (jellium).
An earlier version of the TD-DFT implementation in Siesta74 allowed the first explicit first-principles simulation of electronic stopping for protons and antiprotons in LiF, a wide-bandgap insulator, which was quite successful.94 The difference of sign between protons and antiprotons produces a significant difference in the stopping power (rate of energy excitation) beyond the linear-response paradigm (the Barkas effect), and the insulating character of the target makes it inaccessible to the jellium paradigm. The success stimulated further studies along this line77–80 using improved versions of TD-DFT in Siesta, as described here. Figure 11 displays the electron deformation density around a proton displacing in a bulk Ge target.79 They were also followed by analogous simulations using plane-wave codes by an increase in the number of groups (for a review, see Ref. 95), although the latter calculations do demand considerably larger computational resources.
Electron deformation density isosurfaces (blue positive and red negative) for a proton displacing leftwards at a velocity of 1 a.u. in the bulk of a Ge crystal.
Electron deformation density isosurfaces (blue positive and red negative) for a proton displacing leftwards at a velocity of 1 a.u. in the bulk of a Ge crystal.
I. Density functional perturbation theory
The original implementation of density functional perturbation theory, as a post-processing and independent code (linres96), has been recently merged into Siesta. It allows us to compute the phonon dispersions using a supercell approach (Γ-point phonons). Both LDA and GGA functionals can be used (through LibXC). Calculation of the perturbed Hamiltonian and overlap matrix elements follows the same methodology as for ground-state calculations with similar computational costs, which are comparable to those obtained with a finite difference approach. Figure 12 shows a comparison between both methods for model fullerene-type systems of different sizes.
(Left panel) Comparison of the performance of the new DFPT approach with the conventional finite-differences method in Siesta. The time required to compute a whole row of the dynamical matrix (derivative of the forces on all atoms when one atom is displaced in x, y, z directions) is plotted as a function of the number of orbitals in carbon fullerenes of different sizes. (Right panel) Performance of the alternative algorithm described in the text [based on Eq. (22), blue circles], as compared to the original implementation based on Eq. (20) (black diamonds). The initialization (computing Ξ and Ω) is the most time-consuming step, although it has to be performed only once, and can be used for all perturbations (each atomic displacement). The new algorithm becomes more efficient for system sizes larger than the threshold value (green dashed line).
(Left panel) Comparison of the performance of the new DFPT approach with the conventional finite-differences method in Siesta. The time required to compute a whole row of the dynamical matrix (derivative of the forces on all atoms when one atom is displaced in x, y, z directions) is plotted as a function of the number of orbitals in carbon fullerenes of different sizes. (Right panel) Performance of the alternative algorithm described in the text [based on Eq. (22), blue circles], as compared to the original implementation based on Eq. (20) (black diamonds). The initialization (computing Ξ and Ω) is the most time-consuming step, although it has to be performed only once, and can be used for all perturbations (each atomic displacement). The new algorithm becomes more efficient for system sizes larger than the threshold value (green dashed line).
The solution of the Sternheimer equation and calculation of the perturbed density matrix are the most demanding step. It requires the perturbed coefficients of the electronic wavefunctions to be obtained,
where and . The change in the density matrix is then given by
and a similar expression applies to the change in the energy-density matrix.
The change in the occupation of the electronic state can be computed from the change in its eigenenergy , and it is relevant in metals for states close to the Fermi level. The Fermi level can also be shifted by the perturbation, and it can be determined through the conservation of the number of electrons in the system, Ne.
Obtaining ∂ρμν is the most computationally expensive part of the code. While the computation of ∂Hμν basically has linear scaling with the system size, the matrix Aij scales as , where Nb is the number of basis functions and M is the maximum number of neighbor orbitals for any orbital in the system. Equation (20) then requires operations for each atomic perturbation, and the change in the density matrix requires loops. An alternative approach that offers a better computational scaling for systems with a gap has also been tested. If we define , we obtain
where is a smooth function of εi and can be described by Chebyshev’s expansion with a few selected energy points and their corresponding weights,
Note that is perturbation-independent and could be computed only once and used for all the possible atomic displacements with a cost that scales as , with Nl being the number of Chebyshev’ polynomials. The change in the electronic density is then given by
where only the central term requires self-consistency and . Although computing the change in the density scales as , most of the computational cost is required in an initialization step to obtain Ξ and Ω that are perturbation-independent, enabling the extraction of the whole dynamical matrix with operations. A preliminary serial calculation for Cn fullerenes shows that the threshold system size for the new algorithm to become more efficient than the original implementation lies at around 650 atoms. This value can be conveniently reduced by an efficient parallelization of the initialization step.
J. TranSiesta
The transport code TranSiesta, initially developed by Brandbyge and co-workers,15 enables open-boundary condition calculations by extending periodic regions with bulk electrodes. It is based on the non-equilibrium Green function formalism, which allows biased calculations. TranSiesta has been completely re-written and now uses advanced inversion algorithms, enables electrodes, allows thermo-electric calculations, performing real-space calculations (without k-points), and adds phonon transport calculations using the Hessian,97,98
The non-equilibrium Green function formalism can be summarized in the following equations, which are generalized for electrodes:
where is the equilibrium density matrix for electrode , G is the retarded Green’s function matrix, and is the correction to the equilibrium part. The spectral function and carries electrons from the electrode . Finally, is the Fermi function with the chemical potential corresponding to electrode . It is evident that the Fermi functions depend on the chemical potential and the electronic temperature in the associated electrodes. By using different temperatures for each electrode, one can calculate thermoelectric effects due to different reservoirs having separate electronic temperatures self-consistently.
We note that TranSiesta uses a multiple complex energy-contour algorithm to more accurately describe the total density ρ. It does this by weighing each using a simple scheme97 (Sec. III B). So far, few multi-electrode calculations have been performed, so the importance of the multiple contour algorithm is currently unknown.99–101 However, for the well-known 2-electrode problem, it allows smoother convergence properties.15
In the latest TranSiesta, we implement three different inversion algorithms: (i) a block-tri-diagonal (BTD) algorithm, (ii) a MUMPS sparse algorithm, and (iii) a dense algorithm (LAPACK). The performance of these (speedup compared to TranSiesta 3.1) is summarized in Fig. 13. Since the BTD algorithm is linear scaling for constant width, it can easily outperform the dense algorithm by a factor of 100. This performance gain is also important for the memory footprint, enabling even larger systems. The BTD algorithm favors long and narrow systems but uses less memory for all types of systems.
Performance characterization of TranSiesta using a pristine graphene cell (24 atoms wide). Speedup for (a) EGF and (b) NEGF calculations of pristine graphene compared against the dense implementation. The BTD method exhibits more than 40 times the speed of the LAPACK implementation for the largest size. MUMPS gains speed after 5000 orbitals.
Performance characterization of TranSiesta using a pristine graphene cell (24 atoms wide). Speedup for (a) EGF and (b) NEGF calculations of pristine graphene compared against the dense implementation. The BTD method exhibits more than 40 times the speed of the LAPACK implementation for the largest size. MUMPS gains speed after 5000 orbitals.
A recent addition to the TranSiesta package is the use of real space self-energy terms.98,102 These self-energies are semi-infinite in more than one direction and can thus be used as surrounding electrodes for, e.g., single defects in 2D or 3D structures or line defects. Real space self-energies are superior to BZ integrated quantities since they correctly describe the infinite bulk by leaving out image couplings and also removing the need for k-point sampling. When taking into account the complete procedure for a TranSiesta calculation, the real space self-energies provide an increased throughput since the SCF k-point sampling and the subsequent k-point sampled transport calculation are completely removed.98
Additionally, tbtrans enables the calculations of user defined tight-binding models and also interfaces to phonon transport using the Hessian matrix (program named phtrans). The phonon Green function is similar to the electron one,
where D is the Hessian and ω is the phonon frequency. Finally, inelastic transport involving phonon-excitation can be treated with perturbation theory in a post-processing step with the inelastica package.103,104
K. Wannierization
The interface between Siesta and wannier90105,106 (version 3.0.0) has been implemented, so the latter code can be called directly from Siesta as a library or used as a post-processing tool. wannier90 is an open-source code for generating maximally localized Wannier functions (MLWFs)107,108 and using them to compute the advanced electronic properties of materials with high efficiency and accuracy.
The Wannier functions can be considered as a unitary transformation (more precisely, a Fourier transformation) of a set of Bloch functions associated with a given manifold of bands. We can view the Bloch and Wannier functions as providing two different basis sets describing the same manifold of states associated with the electron band manifold in question. The Wannier functions display a number of very interesting properties.109 Among them, we can enumerate the following: (i) they are localized in real space, each of them concentrated around a given unit cell (see Fig. 14); (ii) Wannier functions centered on different cells are translational images of one another; (iii) they form an orthonormal basis set; and (iv) they span the same subspace of the Hilbert space as is spanned by the Bloch functions from which they are constructed. Because of the gauge freedom in the definition of the phases of the Bloch functions, the Wannier functions are not unique. However, the location of their centers in the home unit cell is unique to within a lattice vector, i.e., they are gauge invariant.109 The high degree of arbitrariness in the definition of the phases can be exploited to produce unitary transformation matrices between Bloch and Wannier functions in such a way that a localization functional that measures the sum of the quadratic spreads of the Wannier functions in the home unit cell around their centers is minimized.107 In a practical procedure to construct Wannier functions, a set of localized functions is used to generate an initial guess for the unitary transformations. These localized functions should be roughly located on sites where Wannier functions are expected to be centered and have appropriate angular character. In our implementation, we can directly use the localized atomic orbitals of the basis or the hydrogenoid localized functions (including hybrid orbitals), as suggested in wannier90.
Maximally localized Wannier functions (MLWFs) for graphene. Panel (a) displays the character of σ-bonded combinations of sp2 hybrids. Panel (b) displays the π character of the bands with weight on the pz orbitals. Isosurfaces of different colors correspond to two opposite values for the amplitudes of the real-valued MLWFs. Yellow spheres represent the position of the C atoms, while smaller blue spheres mark the center of the bonding.
Maximally localized Wannier functions (MLWFs) for graphene. Panel (a) displays the character of σ-bonded combinations of sp2 hybrids. Panel (b) displays the π character of the bands with weight on the pz orbitals. Isosurfaces of different colors correspond to two opposite values for the amplitudes of the real-valued MLWFs. Yellow spheres represent the position of the C atoms, while smaller blue spheres mark the center of the bonding.
The Wannier functions provide an exact tight-binding representation of the dispersion of the Bloch bands. This property will be exploited to extract in an automatic and user blind way the parameters required to run multiscale simulations, as described in Sec. III L.
Currently, wannier90 can be used as a post-processing tool or it can be directly called from Siesta in a library mode. Within this latter approach, the unitary matrices that transform the Bloch states into Wannier functions are directly accessible in Siesta, allowing a clear and straightforward interconnection between the two alternatives to span the Hilbert space. Besides, the use of Wannier functions opens the door to a wide range of potential applications. Already implemented in Siesta is the possibility of performing SCF convergence under the constraint of a rigid shift on the energy associated with a given Wannier function to be used to calculate electron–electron interactions for multiscale simulations, as detailed in Sec. III L. The interface with the self-consistent dynamical mean field theory dmftwdft code110 using MLWF has been already implemented.111 In addition, alternative approaches to compute the exact Hartree–Fock exchange in extended insulating systems with a linear scaling computational cost using MLWFs have been proposed, being another interesting research line for the future.112
L. Multiscale methods
Density functional theory can be used as the basis for parameterized multiscale methods that can be used to carry out simulations including tens or even hundreds of thousands of atoms.113 First-principles methods are used to produce detailed models that are subsequently used to predict properties that require large-scale simulations. The models are created for specific materials, and their accuracy can be systematically improved to converge toward DFT precision. Given the dependence on first-principles, we refer to these methods as second-principles DFT (SPDFT) and run them on an independent code called scale-up.113
SPDFT is based on a division of the total electronic density, , into reference [] and deformation [] contributions,
where is considered as a small perturbation with respect to that, in non-magnetic cases, represents the ground state of the system.113 This division is then used113 to expand the DFT energy with δn finding that the zeroth order term, E(0), corresponds to the full DFT energy for the reference density. The corrections to this reference energy only depend on δn (and parametrically on n0) which, given its smallness, can be efficiently calculated leading to a fast and accurate approximation of the full DFT energy. The expansion is usually taken to second-order,
resulting in a stationary problem that is equivalent to Hartree–Fock with the important distinction that the interactions are screened by the exchange–correlation potential. In order to keep δn small, the application of the method is restricted to problems where atomic bonds are not created or destroyed, i.e., to processes that display an invariant bond topology.
The E(0) term represents the exact DFT energy for the reference density. We represent it for a variety of geometries with an accurate force-field114 that allows for fast evaluation. The E(1) and E(2) terms account for the changes in the electronic structure that are represented by geometry-dependent Wannier functions. Under this basis, E(1) becomes a tight-binding model, while E(2) represents electron–electron interactions.
The interconnection between the first (Siesta) and second (scale-up) principles simulations is carried out through a python script, Modelmaker. Taking a few cutoff distances, Modelmaker is able to produce a model’s terms and automatically carry out DFT simulations with Siesta to determine the force field, a Wannier Hamiltonian to represent the bands, electron-lattice terms to account for how the bands change with geometry, and electron–electron interactions to describe, for example, magnetism.
While, so far, few publications with SPDFT methods include explicit treatment of electronic degrees of freedom, the lattice part has successfully been used in several applications. One of the main fields of research has been thermal conductivity in perovskites. In particular, it was employed to study the electrophononic coupling in SrTiO3115 and PbTiO3116 and the proposal of a thermal switch in PbTiO3.117 It has also been used to study the competition between various ferroelectric domain structures in PbTiO3/SrTiO3 superlattices as a function of strain.118 As a result, it was found that tensile strains lead to the appearance of chiral ferroelectric vortices, while ferroelectric skyrmions were predicted and experimentally observed for more compressive strain values.118 The calculated dielectric properties of these superlattices119 are in very good agreement with measured values and show very large electric susceptibility consistent with regions of negative, static electric permittivity situated at the core of the vortices and the PbTiO3/SrTiO3 interfaces.
M. Scripting and integration in external frameworks
An ongoing trend in many areas of computational science is to move away from rigid and monolithic codes, favoring instead a more flexible approach in which the internal functionality of a program is somehow exposed to the outside world. If done in a proper and well-documented way, this can serve to enhance the interoperability of codes with different functionalities, playing to the relative strengths of each, and/or to implement new functionalities by combining the available basic blocks. In Siesta, we have followed two different but complementary routes to these ends: the development of an internal scripting framework based on the Lua language, which enables new functionality without code recompilation, and the implementation of a formal interface to the AiiDA platform.
1. Lua interface
Lua120 is an easy-to-learn and fast scripting language built for embedding. It is very lightweight (its memory footprint is less than 300 kB) and provides very simple ways to interface to the data structures and routines of a host program. A Lua script, interpreted by the Lua interpreter embedded in the program, can then control the flow of execution and the data. Different user-level scripts can implement new functionalities, without recompilation of the host code. The strategy we have followed in Siesta is based on handling control to the Lua interpreter at specific relevant points in the program flow (e.g., at the beginning of a geometry step and at the end of a SCF step). Lua scripts implement handlers appropriate to the point they want to hook into and can request access to specific data structures. For example, a script intended to implement a better SCF mixing algorithm would be executed after every SCF step, inspecting the convergence data and changing mixing parameters or schemes, as appropriate. As another example, convergence checks over mesh-cutoffs and k-point sampling can be performed automatically.
The above mixing scenario exemplifies an important area of usefulness of the approach: the prototyping in Lua (followed eventually by a full implementation) of new ideas and algorithms. We have implemented a number of custom molecular dynamics (MD) modes, geometry relaxation algorithms, and advanced optimization schemes in a pure Lua library flos.121 The code in the library can be re-used or taken as a starting point for other implementations by users. These user-level scripts can, in turn, be shared, opening the way to the development of new functionality with faster turnaround that the traditional approach that needs a careful integration into the program’s code base.
As a specific demonstration of the power of the Lua embedding, we have developed a number of variations of the nudged-elastic band (NEB) method122,123 for transition-state search. Previously proposed implementations in Siesta involved significant, hard to maintain code changes, and did not make into the mainstream version. With Lua, we have been able to implement, non-intrusively, not only the standard algorithm, but a Double Nudged Elastic Band (DNEB)124 variation, and also another version that treats atomic coordinates and lattice variables on an equal footing (the variable-cell NEB or VC-NEB method125).
The integration of Lua functionality in Siesta has been made possible by the development of an intermediate layer, flOOK126 (for “fortran-Lua-hook”), which provides wrappers for access to Fortran data structures and subroutines.
2. AiiDA plugins and workflows
The AiiDA framework127–129 provides support for high-throughput computations in materials science, keeping full provenance of the calculations and facilitating data handling and sharing. The framework is open-source, written in Python, and designed to support arbitrary codes via a plugin interface. A plugin for Siesta has been implemented and is distributed as the open-source package aiida-siesta.130 The plugin provides the basic operations of preparing the input files for a calculation using AiiDA-specific input objects, parsing the results and generating the AiiDA output objects. The AiiDA data are stored in a graph database that keeps a permanent record of the inputs and outputs of the calculation and is fully searchable for, e.g., data analytics purposes.
AiiDA also provides robust support for the creation of workflows that incorporate all the necessary steps in the calculation of potentially complex properties, including the proper heuristics and fail-safe features. The aiida-siesta package provides a base workflow and a few workflows for standard materials properties, such as band structures. Figure 15 shows the execution graph of a workflow designed to generate a synthetic scanning tunneling microscopy (STM) image from a given structure. Work is ongoing to implement more complex ones.
Automatically generated graph for the execution of an AiiDA workflow for the simulation of STM images.
Automatically generated graph for the execution of an AiiDA workflow for the simulation of STM images.
In addition to an interface to the computational capabilities of the Siesta code via the plugin and workflows, the aiida-siesta package also provides an implementation of basic objects representing pseudopotential files, notably one for PSML. Families of pseudopotentials can be uploaded to an AiiDA database and shared via the provided mechanisms for data export and import, facilitating the interoperability of different codes.
N. Utilities for post-processing and supplementary features
Siesta offers several features beyond the core functionality of solving the electronic structure problem and performing optional geometry relaxations and molecular dynamics runs. It is worth noting, in particular, that the atomic character of the basis set enables the use of a very intuitive suite of analysis tools, which take advantage of the fact that most of the concepts relating to chemical bonding use the language of atomic orbitals.
The (partial) density of states, atomic and orbital populations, and other useful outputs can be obtained directly from the program. The Siesta distribution also includes several tools in the Util directory for band-structure and wavefunction plotting, bonding analysis, etc. Beyond these, special tool packages that implement a specific feature that extends the functionality of the main program or that provide extra options for visualization or post-processing, in general, are available in alternate distribution points. We describe in what follows the most relevant developments.
1. Updates to core utilities
A number of improvements, enhancements, and additions have been made to the core utilities shipped with the Siesta distribution.
There is now a “fat-bands” feature by which bands can be decorated with information about the relative weight of the given orbitals in each state. The wavefunction-related analysis tools have been extended to the non-collinear and spin–orbit cases. This includes the COOP/COHP bonding analysis, band-structures, and a new tool for spin-texture calculation. There have also been improvements to band-structure plotting utilities and to the visualization of charge densities, potentials, and other magnitudes represented in a real-space grid.
A band unfolding utility has been added. Based on the Fourier decomposition of the Bloch wavefunctions, it allows us to perform a “full unfolding” even for non-periodic systems (e.g., liquids) calculated with a large simulation cell. By refolding the fully unfolded bands from the reciprocal supercell of a perturbed or defective crystal to the reciprocal unit cell of the primitive crystal, one recovers the conventional “unfolded” bands.131
2. sisl
sisl is a Python toolbox that was initially conceived to handle and manipulate the Siesta/TranSiesta output.102 It has since been extended to support other DFT codes with the aim of offering equivalent operations for them.
By reading the LCAO outputs from Siesta, one can post process the Hamiltonian and calculate, e.g., Brillouin zone integrated DOS, wavefunctions expanded on grids, eigenvalues, and band velocities. sisl can process nearly all the Siesta output files. In particular, it is also able to post process the data on the real-space grid. Its command line interface allows data format changes, e.g., conversion of Siesta XV files to xyz/xsf files or Siesta binary grid data (VH, VT, etc.) to cube/xsf files.
As it can process density matrices from Siesta, one can also use sisl to prepare an input electronic structure for new calculations, which may be helpful to reduce initial SCF steps.
sisl also allows the creation of custom tight-binding models (both orthogonal and non-orthogonal), and since it extracts the DFT Hamiltonian matrix, one can manipulate the Hamiltonian to retain certain band-structure features and thus perform large-scale simulations.132 This allows calculating far-field currents using reduced basis-sets with a very little loss of accuracy.
The Atomic Simulation Environment (ASE)133 and sisl have a certain degree of overlap in terms of geometry handling functionality. One can easily convert to and from ASE objects in sisl, thus allowing seamless interaction.
3. Other post-processing and visualization utilities
The body of utilities contributed by non-core developers and other Siesta users has continued to expand. In particular, we feature in this section two suites of utilities: one dealing with alternate visualization tools for some Siesta results and another one specifically dealing with lattice dynamics.
For structures, the xv2xsf and xv2vesta converters process data from the Siesta .XV file into the native formats of XCrySDen134 and VESTA,135 respectively. Each of these two codes offers many options of graphical representation of structures, adding translations, clipping fragments, etc. Three-dimensional spatial functions (e.g., charge density and local density of states integrated throughout the chosen energy range) are computed by Siesta on a real-space grid. Tools are provided for interpolating the data from the Siesta output grid (fixed by the unit cell dimensions and the MeshCutoff parameter) onto an arbitrarily cut (and possibly rotated or resampled) parallelepipedic box. XCrysDen provides a number of display options, including contour lines over grid planes or isosurfaces. A special feature available in XCrySDen is plotting the Fermi surfaces. A special script, eig2bxsf, serves to analyze the list of k-points handled by Siesta, expanding it onto a regular sequence and writing the respective band energies in the necessary format.
The tools concerning the lattice dynamics have been developed having in mind the Γ phonons calculated for a large enough supercell, that is a typical case in a simulation of molecular crystals or disordered substitutional alloys. For visualization, vib2xsf and vib2vesta place arrows at the atoms according to the vibration pattern stored in the eigenvectors file (.vectors), produced by the core Vibra utility, and can also be used to make animations (sequences of snapshots) of the selected vibration modes. Both vib2xsf and vib2vesta tools allow the selection of a part of the system to be exposed.
The phdos tool is designed for analyzing the zone-center vibration results. As the system is supposed to be large (e.g., a supercell chosen for a periodic crystal), the (artificially broadened, for convenience) discrete spectrum may serve as a fair approximation to the total density of modes, and if weighted with (squared) components of eigenvectors at different atoms, it provides a decomposition into contributions of different atoms in the total density of vibration modes.
A more sophisticated option is the projection of different eigenvectors according to various criteria. The typical system under study is a supercell in which, e.g., an alloying, or some kind of deformation, breaks the underlying perfect periodicity. Still, some trends related to the latter can be revealed by appropriate projections. The two obvious cases are the projections onto (1) q-vectors of the underlying lattice and (2) irreducible representations of the space group of the underlying lattice; the corresponding formulas and some results can be found in Ref. 136. The first type of projection, if done for a sequence of q values, helps to reveal “phonon dispersions,” obviously blurred by the broken periodicity, also making distinction between transversal and longitudinal modes (see Ref. 137 for an example of use). To make the trends more pronounced, the supercell needs to be sufficiently long in the direction concerned (see, e.g., Fig. 16). The simplest case, a projection onto a single q = 0 value, may also be of interest, since it enhances the modes that are expected to dominate the infrared or Raman spectra and thus facilitates their comparison with the experiment.
Left panel: a 192-at. quasirandom supercell representative for the Be1/3Zn2/3Se solid solution and right panel: density of modes within the frequency range of Zn–Se vibrations, extracted with phdos and projected onto different values of qz and different polarizations, parallel (labeled LO) and perpendicular (TO) to q. These results were partially shown in Fig. 4 of Ref. 137 and discussed in that work.
Left panel: a 192-at. quasirandom supercell representative for the Be1/3Zn2/3Se solid solution and right panel: density of modes within the frequency range of Zn–Se vibrations, extracted with phdos and projected onto different values of qz and different polarizations, parallel (labeled LO) and perpendicular (TO) to q. These results were partially shown in Fig. 4 of Ref. 137 and discussed in that work.
The symmetry projection may help to isolate in a possibly complex spectrum those modes which are expected to dominate according to a given selection rule, again in view of their verification against the experiments. The group-symmetry information needed for the projections is available, e.g., from the Bilbao Crystallographic Server,138 and the technical details are explained in the documentation included in the tools.
The vibent tool performs a straightforward calculation (see, e.g., Sec. II.C in Ref. 139 or Sec. 5.3 in Ref. 140) of temperature-dependent vibration contributions to the free energy and entropy (see Fig. 17 as an example). The necessary input information is the vibration spectrum, originating from the Vibra frozen phonon calculation on a sufficiently large system.
Vibration properties of Cu2ZnSnS4 (CZTS) with substitutional impurities used in Ref. 139. Left panel: densities of modes (extracted with phdos) and right panel: vibration contributions to the free energy and entropy (calculated with vibent). Adapted from Figs. 5.3 and 5.4 of Ref. 140.
The velcf tool calculates the velocity autocorrelation function and its Fourier transform from a (presumably sufficiently long) molecular dynamics (MD) history, recorded in the .MD or .ANI file. This technique141 can be used to obtain phonon frequencies and was applied along with a Siesta calculation in Ref. 142. An example of such simulation (1000 MD steps at 600 K) is shown in Fig. 18 in comparison with frozen phonon results, revealing similarities of the spectra obtained.
Vibration properties of “Ni4” molecular magnet [Mo12O30(μ2-OH)10H2{Ni(H2O)3}4]·14H2O. Left panel: density of modes from frozen phonon calculation and right panel: velocity autocorrelation function, its Fourier transform, and the resulting density of vibration modes.
Vibration properties of “Ni4” molecular magnet [Mo12O30(μ2-OH)10H2{Ni(H2O)3}4]·14H2O. Left panel: density of modes from frozen phonon calculation and right panel: velocity autocorrelation function, its Fourier transform, and the resulting density of vibration modes.
4. Optical properties of finite systems: Linear response TDDFT starting from Siesta orbitals
The Siesta package offers at least two ways of obtaining the optical properties of finite systems. The first way uses real-time TD-DFT propagation by applying an external electric field with a simple time dependence (e.g., a Heaviside step-function).74 The second way is by computing the non-interacting dielectric function.1,2 Both methods are implemented in Siesta and can be employed without any external tools. However, they are limited in different aspects. The non-interacting dielectric function often underestimates the HOMO–LUMO gaps and calls for the use of the phenomenological scissor-shift operator. Real-time propagation makes cumbersome the analysis of the optical response properties in the frequency domain. Furthermore, the frequency resolution scales with the duration of the real-time simulation. Thus, accurate spectra require long simulations.
Fortunately, there are two efficient implementations of linear-response TDDFT that use the Kohn–Sham orbitals from Siesta as a starting point and are available for the open-source community.143,144 In both packages, the linear density response δn(r, ω) is obtained directly in the frequency domain, which makes the analysis of derived properties straightforward. However, there are differences between both implementations on the construction of the auxiliary basis necessary to expand the orbital products. These differences can severely affect the computational cost of the calculation.
The linear-response TDDFT is built on the concept of the induced electronic density δn(r, ω) in response to a small perturbation of the external potential δVext(r, ω). The integral operator connecting δn(r, ω) to δVext(r, ω) is the interacting density response function χ(r, r′, ω). By virtue of the KS equations, χ(r, r′, ω) can be connected to the non-interacting density response function χ0(r, r′, ω)145 with a Dyson equation,
where the interaction kernel K(r, r′) contains the bare Coulomb interaction and the so-called exchange and correlation kernel Kxc,which is a known operator for simple functionals such as LDA and GGA. The non-interacting response function χ0(r, r′, ω) can be expressed as a sum over electron–hole excitations within the basis formed by the KS orbitals Ψn(r),144–146
where fn are occupations of the KS orbitals and En are their energies.
The optical polarizability tensor α(ω) is related to the induced density by α(ω) = ∫rδn(r, ω)dr or, alternatively,
where due to Eq. (29) and using the dipole approximation for the electron–photon coupling, the screened effective perturbation δVs(r′, ω) satisfies the linear integral equation,
The efficiency of the methods presented in Refs. 143 and 144 comes from solving iteratively Eq. (32) for δVs(ω) instead of using standard matrix inversion to obtain χ(ω) from Eq. (29). Once δVs(ω) is known, Eqs. (30) and (31) allow the computation of the optical properties of the system. Furthermore, it is also possible to perform different types of analyses. For example, it is easy to partition the polarizability tensor α(ω) in terms of electron–hole contributions146,147 due to the existence of the sum over the electron–hole pairs in Eq. (30). Similarly, one can achieve other types of Mulliken-like analyses144,146,148 of the optical polarizability tensor α(ω) or the induced density δn(r, ω).
The Python implementation of linear response TDDFT in the PySCF-NAO package as described in Ref. 144 is convenient to use and rather potent. It is capable of computing the optical properties of compact metallic objects containing up to several hundreds of atoms.146,149,150 For example, we were able to track down the different size-dependences of the plasmon resonance in sodium and silver clusters due to the screening effect of silver d-orbitals in the latter case.151 In those calculations, using an optimized version that incorporates some additional memory-saving features not present in the currently distributed version of PySCF-NAO, icosahedral silver and sodium clusters containing up to 5043 atoms were studied.
In Figs. 19 and 20 we show the photo-absorption cross sections of a series of compact silver clusters146 and the real part of the induced density change in the cluster Ag147 close to its surface-plasmon frequency (3.4 eV), respectively.
The absorption cross sections of silver clusters of icosahedral shape. One can recognize sharp surface-plasmon resonances around 3–4 eV and a broad resonance at 6–7 eV.
The absorption cross sections of silver clusters of icosahedral shape. One can recognize sharp surface-plasmon resonances around 3–4 eV and a broad resonance at 6–7 eV.
The isosurfaces of density change Re(δn(r, ω)) of the Ag147 cluster close to the frequency of the surface-plasmon resonance of the cluster (3.4 eV).
The isosurfaces of density change Re(δn(r, ω)) of the Ag147 cluster close to the frequency of the surface-plasmon resonance of the cluster (3.4 eV).
5. Thermal transport by the AEMD method
The approach to the equilibrium molecular dynamics (AEMD) method152 has been implemented to obtain the thermal conductivity. In the first stage of the method, the system is decomposed into two different regions, each one equilibrated to a different initial temperature (canonical run with Bose or Anneal MD). Then, a microcanonical run (Verlet) is carried out for the whole system, and the average temperature of each subsystem is monitored. This temperature transient regime is then used to extract the thermal conductivity from the exact solution of the heat transport equation.153
6. Core-level shifts
Core-level shifts can serve to analyze changes in the local and chemical environment of atoms of a given species. Density-functional-theory calculations have proved to be quite useful in complementing the experimental information, which is sometimes hard to interpret. Two schemes have been implemented in Siesta for the calculation of core-level shifts within a pseudopotential approach.154
In the so-called initial-state approximation, the electronic relaxation in the presence of the core hole is neglected, and the photo-electron’s binding energy is directly related to the eigenvalue of the core level. A pseudopotential calculation obviously cannot compute the latter, but differences in core eigenvalues in different environments can be estimated by the changes in the expectation value of the crystal potential using the core state’s atomic wavefunctions at different sites. These can be extracted from the matrix elements,
with a further step of averaging to remove the splittings stemming from the loss of spherical symmetry.
In the final-state approximation, the relaxation is explicitly taken into account, and the experimental shifts (measured via the kinetic energy of an existing electron) are correlated with the differences in the energy of the crystal with a “core-hole” in different sites. For this, a special pseudopotential with a missing core electron has to be generated, and a full Siesta calculation is needed for each different site.
The implemented methodology has been used to study, for example, the shifts induced by hydrogen bonding in organic molecules.155
O. Software-engineering advances and partnerships
The traditional development model for scientific codes in academic settings has been typically based on multiple contributions with various levels of programming competence and with very little time to plan ahead in the face of pressing scientific demands. Siesta has been no exception and has grown in features and complexity over the years. It is very important to keep complexity under control or else a project becomes un-maintainable and cannot survive. It is not simple, however, to balance the need of incorporation of new features and the need to increase the computing performance in a landscape of constantly evolving hardware and programming models. One essential route is modularization, which allows the separation of concerns at various levels. In the context of a code such as Siesta, this means that the scientific ideas and algorithms should be handled at a high level, calling on lower-level modules for specific functionality (domain-specific libraries, mathematical libraries, communication protocols, etc.). These lower-level modules can hopefully be re-used by different codes and, most importantly, can be focused on by highly skilled programmers for optimization on relevant architectures.
Another important method of taming complexity involves the streamlining of the data structures of the code. This is an ongoing process (see Sec. V) but has already taken a very significant step by the introduction of reference-counted data structures. They build on a well-known and not particularly advanced technique of memory-handling,156 but in Siesta they have enabled a much simpler bookkeeping of the data structures needed for a richer control of molecular-mechanics (MM) and SCF iterations.
Regarding performance-oriented developments, in the recent past we have implemented a mixed MPI/OpenMP programming model, which allows, for suitable systems, to better balance arithmetic intensity and communications needs. The deployment of this model is more advanced in the TranSiesta module, and significant speedups have been obtained for large systems.
Some of the above software-engineering developments have been enabled and strengthened by the participation of Siesta in a number of international partnerships, notably the MaX (Materials at the eXascale) EU center of excellence157 and the Electronic Structure Library initiative.158 The “separation of concerns” described above in the context of modularization is an example of the so-called “open-innovation” paradigm, at the foundation of the ESL strategy for code reusability, and is also a cornerstone of MaX’s efforts to achieve exascale-readiness for its flagship materials science codes (with Siesta among them): performance-enhancement efforts are to be focused on relevant domain-specific modules.
A number of modules from Siesta have been turned into stand-alone libraries that now feature in the ESL: libGridXC for exchange and correlation calculations, libPSML as a handler of PSML files, xmlf90 for general purpose handling of XML files, etc. Conversely, Siesta uses some of the libraries offered by the ESL, notably the ELSI library of electronic structure solvers mentioned in Sec. III G 2, whose development, including its API design and internal data organization, has been, in turn, influenced by contributions and feedback from the Siesta project, among others. There are also plans to incorporate the PSolver library159 for the solution of the Poisson problem, a contribution to the ESL from the BigDFT project.
We should mention that the renewed dynamism of Siesta development and the advances made possible by the interaction with community initiatives are both a blessing and a challenge. It is non-trivial, for example, to handle the building process of a code that relies on a number of different external libraries, programming models, and special features such as the embedded Lua interpreter. Fortunately, as will be discussed in Sec. V, these are issues that are being addressed in wider contexts, and Siesta is well placed to take advantage of it.
IV. APPLICATIONS
We present here a few demonstrated applications that illustrate the capabilities of Siesta in breadth, efficiency, and accuracy.
A. 4 terminal NEGF on germanium surface
Breakthrough simulations using the new multi-terminal implementation in TranSiesta were fundamental to elucidate the electronic transport mechanism on a novel and complex experiment.100 For the first time, two-probe scanning tunneling microscopy/spectroscopy (STM/STS) with probes operating in tunneling conditions over the same atomic-scale system was used to extract detailed information on in-plane electronic transport. The addressed system was the reconstructed (001) surface of germanium, where electrons injected from one STM tip at a position determined with atomic precision were collected at the same Ge dimer row at a distance as short as 30 nm. The experiment was theoretically modeled by a system composed of a 12-layer Ge(001)-c(4 × 2) slab contacted by Au tips oriented along the (100) direction (Fig. 21). On this self-consistent 4-terminal treatment, two Ge electrodes were connected at each slab termination and other two at the Au model tips. The whole system was defined by 4924 atoms (36 442 atomic orbitals) in a supercell of dimensions ∼32 × 160 × 80 Å3, where five different tip-to-sample distances were considered. Besides the large dimensions of the system, another important challenge of such simulation was the level alignment between the metallic and semiconducting leads and the scattering region for which a method had to be devised. A remarkable agreement was found between the calculated transmission function and the experimental transconductance spectra, allowing the identification and assignment of the observed resonances to transport channels existing along the surface Ge dimer rows. Moreover, the simulations elucidated the transport directionality of the injected hot electrons, revealing a transition from the 2D to quasi-1D coherent transport regime as a function of the carrier’s energy. This work shows that complex experimental setups combined with advanced calculations can provide new insights into transport properties at the nanoscale.
First-principles transport simulations for the two-probe experiments. Representation of the four-terminal setup. The electrode regions are highlighted by blue boxes, two of them located at each Ge(001)-c(4 × 2) slab termination (leads: left and right) and the other two at each Au model tip (leads: tip1 and tip2). The 50 Ge atoms closest to each tip were allowed to fully relax. Adapted from Ref. 100.
First-principles transport simulations for the two-probe experiments. Representation of the four-terminal setup. The electrode regions are highlighted by blue boxes, two of them located at each Ge(001)-c(4 × 2) slab termination (leads: left and right) and the other two at each Au model tip (leads: tip1 and tip2). The 50 Ge atoms closest to each tip were allowed to fully relax. Adapted from Ref. 100.
B. Novel topological phases in ferroelectric materials
In material systems with several interacting degrees of freedom (such as spin, charge, and lattice distortions), the complex interplay between these factors can give rise to exotic phases. Prototypical examples are the superlattices of alternating lead titanate and strontium titanate layers. Simulations on such PbTiO3/SrTiO3 heterostructures, consisting of n unit cells of PbTiO3 and n unit cells of SrTiO3 stacked along the [001] direction, were carried out with Siesta. As a function of the periodicity, the superlattices undergo a phase transition from a monodomain configuration (small periodicity, n ≲ 3–4) with a normal component of the polarization that is preserved throughout the structure, to a multidomain configuration (large periodicity, n ≳ 3–4) with alternating up and down domains.160 In order to further reduce the electrostatic energy costs, the local dipoles within the PbTiO3 layer continuously rotate forming a sequence of clock-wise/counter-clockwise array of vortices along the [100] direction. The theoretical predictions, done with Siesta161 after the relaxation of supercells of up to 1000 atoms, were experimentally confirmed five years later by atomic-scale mapping of the polar atomic displacements by scanning transmission electron microscopy162 (Fig. 22). Moreover, the appearance of an axial component of the polarization pointing in the direction of the vortices makes the systems chiral and optically active, as recently confirmed by circular dichroism experiments.163
Panel (a): local polarization profile of polydomain structures in (PbTiO3)n/(SrTiO3)n with n = 6 obtained from an atomic relaxation with Siesta. The PbTiO3 and SrTiO3 are depicted as gray and white regions, respectively. Clockwise and counterclockwise vortices within the PbTiO3 are clearly visible. Red dashed square in the SrTiO3 layers mark the position where antivortices are formed. Reprinted with permission from Aguado-Puente and Junquera, Phys. Rev. B 85, 184105 (2012). Copyright 2012 American Physical Society. Panel (b): experimental observation of vortex–antivortex structures in a cross-sectional high-resolution scanning transmission electron microscopy image with an overlay of the polar displacement vectors for a (SrTiO3)10/(PbTiO3)10 superlattice, showing that an array of vortex–antivortex pairs is present in each PbTiO3 layer. Courtesy of R. Ramesh, adapted from Ref. 162.
Panel (a): local polarization profile of polydomain structures in (PbTiO3)n/(SrTiO3)n with n = 6 obtained from an atomic relaxation with Siesta. The PbTiO3 and SrTiO3 are depicted as gray and white regions, respectively. Clockwise and counterclockwise vortices within the PbTiO3 are clearly visible. Red dashed square in the SrTiO3 layers mark the position where antivortices are formed. Reprinted with permission from Aguado-Puente and Junquera, Phys. Rev. B 85, 184105 (2012). Copyright 2012 American Physical Society. Panel (b): experimental observation of vortex–antivortex structures in a cross-sectional high-resolution scanning transmission electron microscopy image with an overlay of the polar displacement vectors for a (SrTiO3)10/(PbTiO3)10 superlattice, showing that an array of vortex–antivortex pairs is present in each PbTiO3 layer. Courtesy of R. Ramesh, adapted from Ref. 162.
C. 1D and 2D systems
Siesta is particularly well suited to study low dimensional nanostructures, such as 1D and 2D systems, where a large vacuum region is needed within the simulation cell. When, in addition, a large number of atoms are required to study particular physical effects, Siesta could excel with respect to other methods. There is an extensive literature on simulations of graphene and other exfoliated materials, where the properties of point defects, edges, or grain boundaries are of much relevance. We list a few examples the magnetic properties of impurities164,165 and edges,166 electronic properties, including transport characteristics, in grain boundaries,167,168 ribbons,169 nanoporous graphene,170 large graphene flakes,68,171 or the effect of substrates.172 Other materials, such as mono- and multi-layered dichalcogenides173,174 or phosphorene,175,176 are also being widely studied, including optical properties in nanoflakes with up to a few thousand atoms.177
1. CDWs
A number of recent studies on charge density waves (CDW) in low dimensional materials illustrate the impressive accuracy that can be obtained with Siesta for systems with very subtle electronic structures.178 For example, in 2H–NbSe2, Siesta calculations were able to predict the existence of six different atomic structures within a narrow energy range of a few meV, all of them compatible with the experimental 3 × 3 CDW modulation. Careful analysis of theoretical and experimental STM images for different bias potentials allowed us to identify two of these structures that can coexist in the same image.179 In a different work,180 the temperature dependency of the electronic Lindhard response function in blue bronze K0.3MoO3 was studied. This system has a rather complex monoclinic structure with 20 formula units per unit cell where MoO6 octahedra form chains along one direction (b-axis). The Lindhard function shows well decoupled sharp responses that correspond to intra- and interband Fermi surface nesting. By fitting these peaks, one can obtain the coherence length of the fluctuating 1D electron–hole pair (that determines the length scale of the experimental intrachain CDW correlations) and the intrachain modulation of the response (that determines the shape of the Kohn anomaly measured in experiments), providing, for the first time, quantitative evidence of the weak electron–phonon coupling scenario for the Peierls transition.
D. Siesta in biology: Pilin proteins as conductors
Siesta’s efficiency and the clear bandgaps of biomolecules, in general, have made molecular biology a very suitable field for Siesta since the beginning181 and have stimulated the targeted developments of the code for the field, such as the hybrid Quantum Mechanics/Molecular Mechanics approach (QM/MM).182,183 An interesting illustration of its suitability in an all-quantum biological problem is the study of the electrostatics around the pilin protein in aqueous solution.184 The pilin considered here is the main protein in the pili (external filaments) of the Geobacter sulfurreducens bacterium, which have been shown to be able to transmit electronic current, allowing the microbe to feed by remote redox reactions on ferrous mineral particles in the soil. As a nanowire designed by natural evolution, understanding the mechanism for charge transport is of obvious interest.
Peculiar to this protein is the fact that its main alpha helix, the main feature of this elongated protein, is singly oriented, that is, there is no back alpha helix (as in a common hairpin configuration) that would counter the polarization of the single alpha helix. In an alpha helix, all peptide-bond dipoles point in the same direction along the axis of the helix, which, in solid-state parlance, represents a polarization with clear electrostatic implications. Indeed, a DFT calculation of the molecule in vacuum shows a well-defined electrostatic potential ramp along the protein, which tends to close the effective bandgap. The question is then, how does an aqueous environment affect this depolarizing field.
Long molecular mechanics (MM) simulations were performed for the protein in a suitable solution of NaCl at a concentration of 0.1M. The protein’s residues had charge states corresponding to pH = 7, and the MM field was validated with Siesta calculations in vacuum (944-atom dynamic relaxation in a 104.43 Å3 box). The wet system contained 4580 atoms, and the statistical average of the electrostatic potential around the molecule (see Fig. 23) was obtained from a sample of full Siesta calculations of statistically independent snapshots, taken every 50 ps during the last 0.5 ns of the simulation.
Color coded electrostatic potential on a plane cutting along the main axis of the G. Sulfurreducens pilin molecule in wet conditions. A perspective ball rendering of the atomic structure of the protein is superposed. For the meaning and details of this figure, see Ref. 184 (Figure courtesy of Gustavo T. Feliciano).
Color coded electrostatic potential on a plane cutting along the main axis of the G. Sulfurreducens pilin molecule in wet conditions. A perspective ball rendering of the atomic structure of the protein is superposed. For the meaning and details of this figure, see Ref. 184 (Figure courtesy of Gustavo T. Feliciano).
Figure 23 shows how the aqueous environment kills the quite homogeneous potential ramp along the protein axis that appears in vacuum and replaces it with long wavelength, slow, but quite significant fluctuations. The gap remains sizable, and coherent transport is not likely. However, the frontier orbitals evolve in a very suggestive way for enhanced diffusive electron transport.184
E. Use of Siesta in other fields
Although an exhaustive summary of all the recent results obtained with Siesta is out of the scope of this work, we would like to refer the reader to a sample of recent reviews in various fields in which the program is featured. These cover biological sciences185 (including interaction between organic and inorganic materials186,187), geology and materials under high-pressure,188 isotopic fractionation predictions for Martian geochemistry,189 the engineering of typical core structural materials used in nuclear reactors,190 or even in astrophysical and atmospheric systems.191 The reactivity of metallic nanoparticles for catalysis was studied by Viñes, Gomes, and Illas,192 and the role of Siesta in the computation of the kinetic and dynamics of the catalytic reaction at surfaces (including adsorption and desorption of reactants or products) was explored in Chap. 8 of Ref. 193 by Catapan and co-workers.
V. FUTURE EVOLUTION
Work on enhancing Siesta’s capabilities, performance, and robustness is continuing, driven by a good number of developers and collaborators. A mature and flexible development platform and practices are essential to keep it productive. Our recent platform changes have forced developers to shift workflows twice in the past four years. Through the changes, we have learned a lot but also spent a significant amount of time on ensuring Siesta’s continuous development. At the current state, we believe we have stabilized the development platform on GitLab, while we will add more integrated development features in the coming years, e.g., continuous integration (CI) and source code checks. Using CI will also enable easier code-style checks to conform to coding standards. We hope that our open-platform initiative will keep external contributions coming into the program.
Our basic development plans include refactoring, apparently unexciting but essential to streamline the code base to enable further implementations. In addition, we foresee a change in the release model, moving away from coexisting long-lived release branches whose maintenance takes up a lot of time and offering instead more frequent and short-maintenance releases.
We plan to exploit the idea of modularization, continuing the abstraction of relevant reusable pieces, but also dealing with a higher level, exposing the core electronic structure capabilities of Siesta to other programs. It will be necessary to redesign some of the internal data structures to remove global variables and encapsulate them into objects or derived types associated with particular configurations and stages of the calculations. This encapsulation will be matched with the streamlining of the input/output operations. This work will open the door to the creation of complex workflows leveraging the strengths of various codes.
Accelerated hybrid architectures (including, for example, GPUs) are very likely going to feature prominently in the upcoming exascale machines. In the case of Siesta, the data indirection associated with the handling of sparse matrices limits the acceleration possibilities of the section of the code that builds the Hamiltonian and overlap matrices, but the solver stage is more amenable to porting, and in fact, several solver libraries used by Siesta are being enhanced to offer the GPU support, as mentioned in Sec. III G 2.
Modularization and the use of new programming models result in an increase in the complexity of the building and deployment of the code. We will leverage the ESL bundle created to facilitate the use of the modules in the ESL collection, to streamline Siesta’s building process, and to explore containerization as an option for the deployment of the code.
The “pseudopotential barrier to entry” has been lowered by the availability of curated databases supporting the PSML format. Basis sets are a perennial challenge, but new tools and ideas are being explored to provide users with appropriate basis sets: high-throughput workflows for optimization, “tiers” of quality/cost, but perhaps not just of a simple “Periodic Table” form, as offered by other codes (e.g., FHI-aims13), but with a possible dependence on an approximate characterization of the chemical environment in which a given atom finds itself.
Complementary to the underlying basis set optimization that focuses on providing an adequate variational freedom, an on-the-fly contraction of the basis set, which results in a set of lower-cardinality adapted to the description of the occupied subspace, can be exploited for increased efficiency. This is particularly relevant for FOE methods (see Sec. III G 1) in which the number of polynomial terms depends on the extent of the spectrum.
The original claim to fame of Siesta was based on its linear-scaling solver. We are in the process of re-designing the code with a new, more efficient backend based on the DBCSR library for handling distributed block-sparse matrices194,195 with the MatrixSwitch library82 acting as an intermediary interface between it and high-level physical ideas and algorithms. A connection between the internal Siesta formats and MatrixSwitch itself has been recently provided, using initially the cubic-scaling libOMM library196 as a test bed, hence still using a dense coefficient matrix, as it corresponds to the case without localization constraints in the solution of the electronic structure problem. The implementation of a sparse coefficient matrix will make it possible to perform efficient calculations. The computational effort can be further reduced through the analysis of sparsity of the Hamiltonian and overlap matrices and their re-organization in the block-compressed sparse form.
Other developments in the pipeline are linear-response calculations for arbitrary distortions, electronic transport calculations with spin–orbit coupling, thermal transport with the Green–Kubo formalism, as described in Ref. 197, a redesign of the molecular dynamics subsystem, and the development of workflows for the generation of data for scale-up.
ACKNOWLEDGMENTS
Siesta development was historically supported by different Spanish National Plan projects (Project Nos. MEC-DGES-PB95-0202, MCyT-BFM2000-1312, MEC-BFM2003-03372, FIS2006-12117, FIS2009-12721, FIS2012-37549, FIS2015-64886-P, and RTC-2016-5681-7), the latter one together with Simune Atomistics Ltd. We are thankful for financial support from Grant PGC2018-096955-B funded by MCIN/AEI/ 10.13039/501100011033 and by ERDF A way of making Europe, by the European Union.
We acknowledge the Severo Ochoa Center of Excellence Program [Grant Nos. SEV-2015-0496 (ICMAB) and SEV-2017-0706 (ICN2)], the GenCat (Grant No. 2017SGR1506), and the European Union MaX Center of Excellence (EU-H2020 Grant No. 824143).
P.G.-F. acknowledges support from Ramón y Cajal (Grant No. RyC-2013-12515). J.I.C. acknowledges Grant No. RTI2018-097895-B-C41.
R.C. acknowledges the European Union’s Horizon 2020 Research and Innovation Program under Marie Skłodoswka-Curie Grant Agreement No. 665919.
D.S.P, P.K., and P.B. acknowledge Grant No. MAT2016-78293-C6, FET-Open No. 863098, and UPV-EHU Grant No. IT1246-19.
V. W. Yu was supported by a MolSSI Fellowship (U.S. NSF Award No. 1547580), and V.B. and V.W.Y. were supported by the ELSI Development by the NSF (Award No. 1450280). We also acknowledge Honghui Shang and Xinming Qin for giving us access to the honpas code, where a preliminary version of the hybrid functional support described here was implemented.
We are indebted to other contributors to the Siesta project whose names can be seen in the Docs/Contributors.txt file of the Siesta distribution, and we thank those, too many to list, contributing fixes, comments, clarifications, and documentation for the code.
JF and VMGS acknowledge financial support from grant PGC2018-094783 funded by MCIN/ AEI /10.13039/501100011033/ and by ERDF A way of making Europe, by the European Union.
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.