We review recent advances in the capabilities of the open source ab initio Quantum Monte Carlo (QMC) package QMCPACK and the workflow tool Nexus used for greater efficiency and reproducibility. The auxiliary field QMC (AFQMC) implementation has been greatly expanded to include k-point symmetries, tensor-hypercontraction, and accelerated graphical processing unit (GPU) support. These scaling and memory reductions greatly increase the number of orbitals that can practically be included in AFQMC calculations, increasing the accuracy. Advances in real space methods include techniques for accurate computation of bandgaps and for systematically improving the nodal surface of ground state wavefunctions. Results of these calculations can be used to validate application of more approximate electronic structure methods, including GW and density functional based techniques. To provide an improved foundation for these calculations, we utilize a new set of correlation-consistent effective core potentials (pseudopotentials) that are more accurate than previous sets; these can also be applied in quantum-chemical and other many-body applications, not only QMC. These advances increase the efficiency, accuracy, and range of properties that can be studied in both molecules and materials with QMC and QMCPACK.
Quantum Monte Carlo (QMC) methods are an attractive approach for accurately computing and analyzing solutions of the Schrödinger equation.1–3 The methods form a general ab initio methodology able to solve the quantum many-body problem, applicable to idealized models such as chains or lattices of atoms to complex and low-symmetry molecular and condensed matter systems, whether finite or periodic, metallic or insulating, and with weak to strong electronic correlations. Significantly, the methods can naturally treat systems with significant multi-reference character and are without electron self-interaction error, which challenges many quantum chemical approaches and density functional theory (DFT) approximations, respectively. The methods continue to be able to take advantage of improvements in computational power, giving reduced time to solution with new generations of computing. Due to these features, usage of QMC methods for first principles and ab initio calculations is growing.
Compared to traditional deterministic approaches, QMC methods are generally distinguished by (1) the use of statistical methodologies to solve the Schrödinger equation. This allows the methods to not only treat problems of high-dimensionality efficiently but also potentially use basis, wavefunction, and integral forms that are not amenable to numerical integration. (2) Use of few and well-identified approximations that can potentially be quantified or made systematically convergeable. (3) A low power scaling with system-size, but a large computational cost prefactor. (4) High suitability to large scale parallel computing owing to lower communications requirements than conventional electronic structure methods. Scaling has been demonstrated to millions of compute cores.4
Modern applications of QMC have expanded to cover many of the same systems studied by density functional theory (DFT) and quantum chemical approaches and, in many cases, also at a similar atom and electron count, although at a far greater computational cost. Besides those described in the following, recent molecular applications of QMC include studies of the nature of the quadruple bond in C2,5 acenes,6 physisorption of water on graphene,7 binding of transition metal (TM) diatomics,8 and DNA stacking energies.9 Materials applications include nitrogen defects in ZnO,10 excitations in Mn doped phosphors,11 and the singlet–triplet excitation in MgTi2O4.12 Methodological improvements include reducing the sensitivity of pseudopotential evaluation,13 extensions to include linear response,14 density functional embedding,15 excited states including geometry optimization,16 improved twist averaging,17 and accurate trial wavefunctions via accurate densities.18 Importantly, for model systems such as the hydrogen chain, the methods can be used to benchmark themselves as well as other many-body approaches.19 This partial list of developments and applications from the last two years alone indicates that the field is growing and maturing.
In this article, we describe recent updates to the QMCPACK code and its ecosystem of wavefunction converters and workflow tools. These updates have aimed to expand the range of systems, properties, and accuracies that can be achieved with both QMCPACK and QMC techniques in general. For a description of the underlying methodology, see Refs. 1–4, and 20. In particular, a thorough description of real space QMC methods is given in Sec. 5 of Ref. 4. For an extensive introduction to AFQMC, see Ref. 20.
QMCPACK is a fully open source and openly developed QMC package, with 48 coauthors on the primary citation paper4 published in 2018 and an additional five contributors since then. The main website for QMCPACK is https://qmcpack.org and the source code is currently available through https://github.com/QMCPACK/qmcpack. QMCPACK aims to implement state-of-the-art QMC methods, be generally applicable, easy to use, and high-performing on all modern computers. Since the publication of Ref. 4, the range of QMC calculations that are possible has been expanded by significant enhancements to the Auxiliary-Field QMC (AFQMC) solver. This orbitally based method is distinct from and complementary to the longer-implemented real space methods of variational and diffusion QMC (VMC and DMC, respectively). The AFQMC implementation can fully take advantage of graphics processing units (GPUs) for a considerable speedup and, unlike the real-space methods, can also exploit k-point symmetries. It shares the same workflow tool, Nexus, which helps simplify and ease application of all the QMC methods by new users and aids in improving reproducibility of complex multi-step research investigations. To our knowledge, this is currently the only AFQMC code designed for large scale research calculations that is open source. To help guarantee the future of the code, it is undergoing rapid development and refactoring to target the upcoming Exascale architectures as part of the U.S. Exascale Computing Project,21 which also entails major updates to the testing, validation, and maintainability.
The electronic structure and quantum chemical codes that QMCPACK is interfaced to for trial wavefunctions have been expanded to include Qbox,22 PySCF,23 Quantum Espresso,24 Quantum Package,25 and GAMESS.26 Additional codes such as NWCHEM27 can be interfaced straightforwardly.
In the following, we first review in Sec. II the open development principles of QMCPACK. In Sec. III, we discuss updates to the Nexus workflow package. This integrates entire research electronic structure workflows for greater productivity and reproducibility than by-hand invocation of individual calculations. Due to the infeasibility of performing QMC calculations for general systems using an all electron approach, use of effective core potentials (ECPs), or pseudopotentials, is essential. To improve the accuracy obtainable, we have developed a new approach and a set of “correlation consistent” ECPs. These can be used in all ab initio calculations, not only QMC, and are described in Sec. IV. Advances in the AFQMC implementation are described in Sec. V. Turning to real-space QMC methods in Sec. VI, algorithms and multiple determinant trial wavefunctions can now be used to obtain improved ground state energies as well as bandgaps in solid-state materials. As a result, it is now possible to begin to test the accuracy of the nodal surfaces that have long been used in these calculations. Finally, in Sec. VII, we give three applications: first, application to non-valence anions, which challenge all electronic structure and quantum chemical techniques, second, application to excitations of localized defects in solids VIIB, and third, the ability to obtain the momentum distribution has recently been improved, motivated by recent experiments on VO2. A summary is given in Sec. VIII.
II. OPEN DEVELOPMENT AND TESTING
Fully open source development is an important core value of the QMCPACK development team. Besides improving the quality of the software, anecdotally, it also improves the on-boarding experience for new users. While the developers of many electronic structure packages now practice some degree of open development, QMCPACK has seen very significant benefits from this in the past few years. We expect that other packages would also benefit from full adoption and therefore give details here.
QMCPACK is an open source package, with releases and the latest development source code available through https://github.com/QMCPACK/qmcpack. QMCPACK is written in C++14, with MPI parallelization between compute nodes and OpenMP threading used for multicore parallelism. CUDA is used for NVIDIA accelerators. Options to support CUDA, complex valued wavefunctions, and to adjust the numerical precision used internally are currently compile time options.
Besides adoption of a distributed source code control system, we have found that development productivity can be further increased by adoption of code reviews and continuous integration (testing). To maximize the efficiency of both contributors and reviewers and shorten the development cycle of new features, work-in-progress pull requests are encouraged for early engagement in the process. The early review allows guidance to be given, e.g., whether the algorithms are clear enough to other developers and whether the coding guidelines are being followed. At the same time, continuous integration is applied to the proposed code change. This process routinely catches cases that developers may not have considered or tested against, e.g., the complex-valued build of QMCPACK or accelerated graphical processing unit (GPU) support, which are compile time options. This period of comment while the work is being completed also helps advertise the work to other developers and minimizes the risk of duplicated work. Our experience strongly suggests that this process reduces bugs, reduces potential developer’s effort, and saves reviewer’s time compared to a late engagement with an unexpected pull request. All the discussions around the code change become archived searchable documentation and potential learning materials.
Testing of QMCPACK has been significantly expanded. Two years ago, QMCPACK had limited unit, integration, and performance testing categories: unit tests that run quickly on individual components, integration tests that exercise entire runs, and performance tests for monitoring relative performance between code changes. However, due to the stochastic nature of QMC, as the number of tests and build combinations increased, it became impractical to run the integration tests long enough to obtain a statistically reliable pass/fail: The smallest (shortest) integration test set currently takes around one hour to execute on a 16 core machine and must necessarily suffer from occasional statistical failures. Thus, a new category of tests was needed for quickly examining full QMC execution with a reproducible Monte Carlo trajectory. The new deterministic integration tests are modified QMC runs with only a few steps, very few Monte Carlo walkers, and fixed random seeds for absolute reproducibility. All the major features of QMCPACK are covered by this new of category tests. Running all the unit and deterministic integration tests takes approximately one minute, which is fast enough for iterative development and fast enough to be used in continuous integration. This speed to run a set of tests facilitates significant changes and refactoring of the application, which otherwise would be far more difficult to test and unlikely to be attempted by non-experts without long experience with the codebase. All the deterministic tests are accompanied by longer running statistical tests that can be used to verify a new implementation when changes alter a previous deterministic result. Combinations of these tests are run automatically on a nightly basis and reported to a public dashboard https://cdash.qmcpack.org. At the time of writing, around 25 different machine and build combinations are used to run around 1000 labeled tests each and most of these cover multiple features.
Improving source code readability is critical for both new and experienced developers. In the past, misleading variable or class names and confusing function names have confused developers and resulted in subtle bugs, e.g., due to similarly named functions, only one of which updates internal state in the Monte Carlo algorithm. For this reason, coding standards including naming conventions have been added in the manual and are enforced on newly contributed codes. Existing codes are updated to follow the standards as they need other modifications. Automatic source formatting is also applied with the help of the clang-format tool. Concomitantly, both developer sections in the manual and source code documentation are significantly expanded.
As a result of the above changes, new contributors with a basic theoretical background can connect source code with textbook equations with much less difficulty than in the past. These efforts are clearly bringing long term benefit to QMCPACK and hopefully can be transferred to other scientific applications as well.
III. IMPROVING QMC WORKFLOWS WITH NEXUS
QMC techniques are progressing from methods under research toward more routine application. In this transition, usability of QMC becomes an important factor. A mature, usable computational method transfers responsibility for correct execution from users to the code. Major factors determining overall usability include the ease of requesting a desired result (in the form of input), robustness of the code in obtaining the desired result, and complexity of the overall calculation process. All of these contribute to the effort required by the user to obtain desired results. In essence, higher required effort translates directly into lower productivity of the user base. Lower productivity in turn risks a lower overall adoption rate and thus blunts the overall impact of the method. It is therefore important to seek to understand and minimize barriers to the practical use of QMC.
To illustrate the complexity of the QMC calculation process, we describe below a basic but realistic sequence of calculations (a scientific workflow), which is required to obtain a final fixed node DMC total energy per formula unit for a single crystalline solid with QMCPACK. In this workflow, we suppose that self-consistent (SCF) and non-self-consistent (NSCF) calculations are performed with Quantum Espresso24 and wavefunction optimization (OPT), variational Monte Carlo (VMC), and diffusion Monte Carlo calculations are performed with QMCPACK. SCF/NSCF calculations might be performed on a workstation or a few nodes of a cluster, VMC/OPT calculations on a research-group sized cluster (∼30 nodes), and DMC on high performance computing resources (∼1000 nodes).
Converge DFT orbitals with respect to plane wave energy cutoff (4–6 SCF calculations ranging from 300 to 800 Ry for the energy cutoff).
Converge B-spline orbital representation with respect to B-spline mesh spacing (1 NSCF, ∼5 VMC calculations in a small supercell over a series of finer mesh spacings).
Converge twist grid density (∼5 VMC calculations in a small supercell for a series of increasingly dense supercell Monkhorst–Pack twist grids).
Determine the best optimization process [∼6 optimization (OPT) calculations in a small supercell over varying input parameters and Jastrow forms].
Obtain the fixed node DMC total energy (∼3 NSCF, ∼3 OPT, and ∼9 DMC calculations, three successively smaller time steps for time step extrapolation, and three successively larger supercells for finite size extrapolation).
This basic workflow process is to be compared with the much reduced complexity for obtaining a single converged total energy with DFT, which typically requires only a single input file and single program execution to perform a single SCF calculation for the final energy. The complexity intrinsic to the basic workflow translates into a large degree of effort on the part of the user and limits the accessibility of the method for new users or for experienced users pursuing ambitious projects comprised of a large number of DMC calculations.
Scientific workflow tools make the QMC process more accessible in multiple ways: (1) bringing the constellation of electronic structure codes needed to produce a single QMC result under a single framework, (2) reducing the number of inputs required to request a desired result to a single user-facing input file, (3) reducing overall complexity by abstracting the execution process, and (4) minimizing the direct effort required to execute the workflow process by assuming the management of simulation execution and monitoring from the user. Workflow tools have been applied with significant benefits to related electronic structure methods such as DFT28–31 and also to QMC.32,33
The Nexus workflow automation system34 was created to realize these advantages for users of QMCPACK. Nexus is a Python-based, object oriented workflow system that can be run on a range of target architectures. Nexus has been used successfully on simple workstations and laptops, small group or institutional computing clusters, university level high performance computing centers in the U.S. and internationally, and Leadership Computing Facilities supported by the U.S. Department of Energy. Nexus has been used in a growing number of QMC studies involving QMCPACK and its uptake by new users is high.
Nexus abstracts user’s interactions with each target simulation code, which are components of a desired simulation workflow. Access to each respective code is enabled through single function calls that only require the user to specify a reduced set of important input parameters. Each function call resembles a small input block from a standard input file for an electronic structure code. Taken together, a sequence of these blocks comprises a new meta-input file that represents the data flow and execution pattern of the underlying simulation codes as a combined workflow.
Nexus assumes the responsibility of initiating and monitoring the progress of each simulation job in the workflow. Nexus generates expanded input files to each code based on the reduced inputs provided by the user. It also generates job submission files and monitors job execution progress via a lightweight polling mechanism. Apart from direct execution of each workflow step, Nexus also automates some tasks that previously fell to users. One example is that Nexus selects the best wavefunction produced during the non-linear statistical optimization process employed by QMCPACK and automatically passes this wavefunction to other calculations (such as diffusion Monte Carlo), which require it.
In the future, additional productivity gains might be realized with Nexus by further abstracting common workflow patterns. For example, convergence studies for orbital parameters (k-points, mesh-factors, and source DFT functional) often follow similar patterns that could be encapsulated as simple components for users. Additionally, more of the responsibility for obtaining desired results, e.g., total energies to a statistically requested tolerance, could be handled by Nexus through algorithms that create and monitor dynamic workflows.
IV. EFFECTIVE CORE POTENTIALS
All-electron (AE) QMC calculations become inefficient and eventually infeasible with the increase in atomic number Z since the computational cost grows roughly as35,36 Z6. Since our primary interest is in valence properties, pseudopotentials and/or effective core potentials (ECP) are commonly employed to eliminate the atomic cores leading to valence-only effective Hamiltonians. Unfortunately, the existing tables and ECP generating tools have proved to exhibit somewhat mixed fidelity to the true all-electron calculations, especially in high accuracy QMC studies. In order to overcome this limitation, we have proposed and constructed a new generation of valence-only Hamiltonians called correlation consistent ECPs (ccECP).37–40 The key feature of this new set is the many-body construction of ccECPs from the outset: in particular, (i) we have emphasized and put upfront the accuracy of many-body valence spectra (eigenvalues and eigenstates) as a guiding principle in addition to the well-known norm conservation/shape consistency principles, (ii) we have opted for simplicity, transparency, and eventual wide use in addition to offering several choices of core sizes or even smoothed-out all-electron nuclear Coulomb potentials, (iii) we have used a set of tests and benchmarks such as molecular bonds over a range of distances in order to extensively probe for the quality and transferability of the ccECPs, and (iv) we have established reference datasets for the exact/nearly exact atomic total energies, kinetic energies, and single-reference and multi-reference fixed-node DMC energies. At present, this covers elements H–Kr with subsequent plans to fill the periodic table.
B. ccECP atomic and molecular properties
The construction of ccECPs builds in electron correlations obtained from the accurate coupled-cluster singles doubles with perturbative triples [CCSD(T)] method. By doing so, ccECPs achieve very high accuracy and enjoy spectral properties on the valence subspace, which are in close agreement with the scalar relativistic all-electron (AE) Hamiltonian. The agreement is often within chemical accuracy over a large range of atomic excitations and ionizations that often spans hundreds of eV energy windows. Molecular properties such as binding energies in multiple geometries, equilibrium bond lengths, and vibrational frequencies were also considered in the development, mostly examining oxides, hydrides, and homonuclear dimers. Especially, compressed bond length properties were given priority as this corresponds to high-pressure applications and probes for the proper behavior of the valence charge in the core region. These atomic and molecular tests provide a direct and comprehensive comparison of ccECP and other core approximations, such as BFD,41 STU,42 eCEPP,43 CRENBL,44 SBKJC,45 UC (uncorrelated, self-consistent, and all-electron core), and ccECP.S (optimization including only atomic spectrum). Here, we illustrate some of these results for the selected cases.
Figure 1 shows the molecular binding energy discrepancies for FeH, FeO, VH, and VO molecules relative to all-electron CCSD(T) where we observe that some previous ECPs display significant errors. In addition, Table I lists a more comprehensive comparison by tabulating the average of mean absolute deviations (MAD) of molecular binding properties relative to all-electron CCSD(T) for all 3d transition metal (TM) molecules. Similarly, Fig. 2(a) presents the MAD of a large valence spectrum for all 3d TM atoms. In both atomic and molecular tests, we see that ccECP achieves smaller or on par average errors with regard to the other ECPs. In addition, Fig. 1 shows these improvements to be consistent for different elements and varying geometries. Hence, we believe that ccECP accomplishes the best accuracy compromise for atomic spectral and molecular properties. Furthermore, ccECPs are provided with smaller cores than conventionally used ones in some cases where large errors were observed. This includes Na–Ar with [He] core and H–Be with softened/canceled Coulomb singularity at the origin [ccECP(reg)]. Selected molecular test results for these are shown in Fig. 3.
|.||UC .||BFD .||STU .||eCEPP .||ccECP.S .||ccECP .|
|.||UC .||BFD .||STU .||eCEPP .||ccECP.S .||ccECP .|
For reference, we also provide accurate total and kinetic energies for all ccECPs46 using methods such as CCSDT(Q)/FCI (FCI, full configuration interaction) with DZ-6Z extrapolations to estimate the complete basis set limit. These data, for instance, are useful in the assessment of fixed-node DMC biases. Figure 2(b) shows the summary of single-reference (HF) fixed-node DMC errors for ccECP pseudo atoms.
C. ccECP database and website
In order to facilitate the use ccECPs, we have provided basis sets and a variety of ECP formats available at https://pseudopotentiallibrary.org, shown in Fig. 4. Each ccECP is presentedin a quantum chemistry format for direct use in various codes, including Molpro, GAMESS, NWChem, and PySCF, which uses the NWChem format. We also provide an XML format, which can directly be used in QMCPACK.
In addition to the ccECPs themselves, we have also provided basis sets appropriate for correlated calculations in each code format. Specifically, we have provided Dunning style48 correlation consistent basis sets from the DZ to 6Z and, in most cases, have also provided an augmented version. For use in solid state applications using a plane wave basis, we have also transformed the semi-local potentials into fully nonlocal Kleinman–Bylander potentials49 using the unified pseudopotential format. This allows the ccECPs to be directly used in codes such as Quantum Espresso. A report file is included, giving detailed information about the quality of the Kleinman–Bylander version of the potential and recommended plane wave energy cutoff energies.
D. Status and future developments
The ccECP table and construction principles aim at improved account of systematic errors built into effective valence Hamiltonians in a wide variety of correlated calculations (see the encouraging feedback so far50,51). Further effort is focused on adapting ccECPs for efficient calculations with plane wave basis set (ccECPpw versions). This requires modifying the deep potentials of the late 3d elements Fe–Zn in particular. The goal is to enable calculations with plane wave cutoffs not exceeding ∼600 Ry to 800 Ry. Plans for the near future involve ccECPs for selected 4d and 5d elements, which include a number of technologically important elements and require explicit treatment of the spin–orbit interactions. Additional improvements such as core polarization and relaxation corrections can be added per specific, application driven needs. Further plans include seeking feedback from the electronic structure community, collecting the data for reference and validation, and adjustments per need for use in a broad variety of ab initio approaches.
V. AUXILIARY FIELD QUANTUM MONTE CARLO
The latest version of QMCPACK offers a now mature implementation of the phaseless auxiliary-field quantum Monte Carlo (AFQMC) method20,52 capable of simulating both molecular53,54 and solid state systems.55–57 AFQMC is usually formulated as an orbital-space approach in which the Hamiltonian is represented by the second-quantized form as
where and ĉi are the fermionic creation and annihilation operators, hij and are the one- and two-electron matrix elements, and EII is the ion–ion repulsion energy. Key to an efficient implementation of AFQMC is the factorization of the 4-index electron-repulsion integral (ERI) tensor , which is essential for the Hubbard–Stratonovich (HS) transformation.58,59
QMCPACK offers three factorization approaches, which are appropriate in different settings. The most generic approach implemented is based on the modified-Cholesky factorization60–64 of the ERI tensor,
where the sum is truncated at Nchol = xcM, xc is typically between 5 and 10, M is the number of basis functions, and we have assumed that the single-particle orbitals are in general complex. The storage requirement is thus naively although sparsity can often be exploited to keep the storage overhead manageable (see Table II). Note that QMCPACK can accept any 3-index tensor of the form of so that alternative density-fitting based approaches can be used. Although the above approach is efficient for moderately sized molecular and solid-state systems, it is typically best suited to simulating systems with fewer than 2000 basis functions.
|Method .||Memory .||Propagation .||Energy .||Setting .||GPU .|
|Dense 3-index||xcM3||M ≤ 1000||Yes|
|Sparse 3-index||sxcM3||M ≤ 2000||No|
|THC||M ≤ 4000||No|
|k-point||Nkm ≤ 6000||Yes|
|Method .||Memory .||Propagation .||Energy .||Setting .||GPU .|
|Dense 3-index||xcM3||M ≤ 1000||Yes|
|Sparse 3-index||sxcM3||M ≤ 2000||No|
|THC||M ≤ 4000||No|
|k-point||Nkm ≤ 6000||Yes|
To reduce the memory overhead of storing the three-index tensor, we recently adapted the tensor-hypercontraction65–67 (THC) approach for use in AFQMC.56 Within the THC approach, we can approximate the orbital products entering the ERIs as
where are the one-electron orbitals, rμ are a set of specially selected interpolating points, ζμ(r) are a set of interpolating vectors, and Nμ = xμM. We can then write the ERI tensor as a product of rank-2 tensors,
To determine the interpolating points and vectors, we use the interpolative separable density fitting (ISDF) approach.68–70 Note that the storage requirement has been reduced to . For smaller system sizes, the three-index approach is preferred due to the typically larger THC prefactors determined by xμ ≈ 15 for propagation and xμ ≈ 10 for the local energy evaluation. The THC approach is best suited to simulating large supercells and is also easily ported to GPU architectures due to its smaller memory footprint and use of dense linear algebra. Although the THC-AFQMC approach has so far only been used to simulate periodic systems, it is also readily capable of simulating large molecular systems using the advances from Ref. 71.
Finally, we have implemented an explicitly k-point dependent factorization for periodic systems,72
where now i runs over the number of basis functions (m) for k-point ki in the primitive cell, Q = ki − kk + G = kl − kj + G′ is the momentum transfer vector (arising from the conservation of crystal momentum), and G, G′ are reciprocal lattice vectors. Although explicitly incorporating k-point symmetry reduces the scaling of many operations and the storage requirement by a factor of 1/Nk (see Table II), perhaps, the most significant advantage is that it permits the use of batched dense linear algebra and is thus highly efficient on GPU architectures. Note that the THC and k-point symmetric factorization can be combined to simulate larger unit cells and exploit k-point symmetry; however, this has not been used to date. We compare the three approaches in Table II and provide guidance for their best use.
In addition to state-of-the-art integral factorization techniques, QMCPACK also permits the use of multi-determinant trial wavefunction expansions of the form
We allow for either orthogonal configuration interaction expansions where ⟨DI|DJ⟩ = δIJ and also for non-orthogonal multi Slater determinant expansions (NOMSD) where ⟨DI|DJ⟩ = SIJ. Orthogonal expansions from the complete active space self-consistent field (CASSCF) or selected CI methods allow for fast overlap and energy evaluation through Sherman–Morrison based techniques and thus do not typically incur a significant slowdown. However, they often require a large number of determinants to converge the phaseless error. NOMSD expansions do not benefit from fast update techniques but often require orders of magnitude fewer determinants than their orthogonal counterparts to achieve convergence in the AFQMC total energy53 (see Fig. 5).
QMCPACK also permits the evaluation of expectation values of operators that do not commute with the Hamiltonian using the back propagation method.59,81,82 In particular, the back-propagated one-particle reduced density matrix (1RDM) and components or contracted forms of the two-particle reduced density matrix are available. As an example, we plot in Fig. 6 the natural orbital occupation numbers computed from the back-propagated phaseless AFQMC 1RDM.
Tools to generate the one- and two-electron integrals and trial wavefunctions for molecular and solid state systems are also provided through the afqmctools package distributed with QMCPACK. To date, these tools are mostly dependent on the PySCF software package;23 however, we provide conversion scripts for FCIDUMP formatted integrals and simple Python routines to convert factorized integrals or trial wavefunctions provided from any source to our internal HDF5 based file format. Detailed tutorials on how to run AFQMC in QMCPACK are also provided. Nexus (Sec. III) can be used to drive the process of mean-field calculation, wavefunction conversion, and AFQMC calculation. We are using this integration to perform a study of the relative strengths of AFQMC and real space QMC methods.
Over the next year, we plan to extend the list of observables available as well as complete GPU ports for all factorization and wavefunction combinations. In addition, we plan to implement the finite temperature AFQMC algorithm84–88 and spin–orbit Hamiltonians with non-collinear wavefunctions. We will also release our ISDF-THC factorization tools and our interface to Quantum Espresso.24 We hope that our open-source effort will enable the wider use of AFQMC in a variety of challenging settings.
VI. TOWARDS SYSTEMATIC CONVERGENCE OF REAL-SPACE QMC CALCULATIONS
The key factor in reaching high accuracy using QMC is the choice of trial wavefunction ΨT. For all-electron DMC calculations, the nodes of the wavefunction are the only factor in determining the error in the computed energy, while the bulk of the wavefunction affects the statistical efficiency and time step error of the calculation. For calculations involving pseudopotentials, high accuracy of the trial wavefunction is also needed around the atomic cores to minimize the approximations in evaluating the non-local energy. Single Slater determinant (SD) wavefunctions built with Hartree–Fock or Kohn–Sham orbitals supplemented by a Jastrow correlation factor generally give good results [see Ref. 89] and are used almost exclusively today in solid-state calculations.
Reaching systematic convergence of the trial wavefunction and its nodal surface for general systems has been a key challenge to real space QMC methods since their invention. Besides increasing accuracy in calculated properties, this is also required to remove the starting point dependence and allow for the use of QMC where all potential sources of trial wavefunction are unreliable. In 2008, this was performed for first row atoms and diatomic molecules [Ref. 90], and improved algorithms aid calculations on larger systems.91 For general systems with many electrons, the overall challenge remains. Furthermore, if the wavefunction is to be used in DMC, commonly used optimization techniques only optimize the nodal surface indirectly by improving the VMC energy and/or variance. Minimization of the objective function is therefore not guaranteed to minimize the fixed-node energy. Consistently, high accuracy wavefunctions are also needed around atomic cores to minimize the locality error in pseudopotential evaluation, posing a challenge to trial wavefunction optimization with a large number of coefficients.
One possible step along the way would be to optimize all the orbital coefficients in a single determinant wavefunction, but due to the limited flexibility in describing the (3N − 1) dimensional nodal surface, this protocol cannot give exact nodes for general systems. This approach could represent a useful starting point independent step while keeping a simple form for the trial wavefunction. Other possibilities for improving the trial wavefunction while retaining simplicity include techniques such as backflow and iterative backflow92,93 and antisymmetrized geminal product wavefunctions (see Ref. 94). However, more flexible and complex trial wavefunctions are required to achieve systematic convergence of the nodal surface and to approach exact results for general systems.
The most straightforward method to improve the quality of the trial wavefunction nodes in a convergeable manner is to increase its complexity via a multi-Slater determinant (MSD) or configuration-interaction (CI) expansion,
where ΨT is expanded in a weighted (ci) sum of products of up and down spin determinants Di and J is the Jastrow correlation factor. In the limit of a full configuration interaction calculation in a sufficiently large and complete basis set, this wavefunction is able to represent the exact wavefunction. However, direct application of configuration interaction is prohibitively costly for all but the smallest systems because very large numbers of determinants are usually required. To speed application, an efficient selection procedure for the determinants is needed. This can be combined with efficient algorithms for evaluating the wavefunction in QMC.95–97
A. Ground state calculations
Multiple variants of selected Configuration Interaction (sCI) methods have recently demonstrated significant success in reaching high accuracy for the ground state and excited states of molecular systems with tractable computational cost. Within the class of sCI methods, the CIPSI98 method has proven to be practical in providing high accuracy wavefunctions for QMC for both molecular systems and for solids.99–105 sCI methods enable unbiased construction of the trial wavefunction using only a single threshold parameter and therefore avoid the complexities of CASSCF techniques, which require expert selection of the active space. CIPSI algorithms are implemented in the Quantum Package 2.0 code25 and fully interfaced with QMCPACK and Nexus.
For systems where CIPSI can be fully converged to the FCI limit and reliably extrapolated to the basis set limit, QMC is not required, but for any reasonable number of electrons, QMC can be used to further improve the convergence. The wavefunctions produced from CIPSI can be either used directly, the case in which the nodal error is determined by the CIPSI procedure, used to provide an initial selection of determinants whose coefficients are subsequently reoptimized in the presence of a Jastrow function, or used within DMC where the projection procedure will improve on the CIPSI wavefunction. This procedure is equally applicable to solids as well as molecules, provided that k-points and their symmetries are fully implemented.
In the following, we illustrate these techniques by application to molecular and solid-state lithium fluoride. In both cases, we use Linear Combination of Atomic Orbitals (LCAO) and different Gaussian basis set sizes to generate the trial wavefunctions. CIPSI energies refer to the variational energy corrected with the sum of energies from second order perturbation theory (PT2) of each determinant, i.e., E + PT2, at convergence in energy with the number of determinants. Since the sizes of both systems are small enough to reach CIPSI convergence with, for the largest case, less than 5M determinants, for the DMC calculations, the coefficients of the determinants are not reoptimized in the presence of a Jastrow function. The cost of DMC with a CIPSI trial wavefunction scales as , where N is the number of determinants in the expansion and VarRatio is the ratio between the variance of a system at one determinant and the same system at N determinants (). In the case of the molecular systems, VarRatio varies between 0.7 (for cc-pcVDZ and N = 1.6M) and 0.8 (cc-pcVQZ and N = 5.2M) or an increase in cost ranging from 620 to 1500 times the cost of a single determinant DMC run. In the case of the solid, VarRatio varies between 0.83 (for cc-pVDZ and N = 700k) and 0.56 (cc-pcVQZ and N = 9M) or an increase in cost ranging from 570 to 940 times the cost of a single determinant DMC run. The main difference in the change of cost between the molecular system and the solid system is the use of ECPs, reducing significantly the variance of the calculations for both DMC(SD) and DMC(CIPSI)
B. Molecular lithium fluoride
Lithium fluoride is a small molecule for which the multi-determinant expansion and therefore trial wavefunction can be fully converged to the FCI limit using the CIPSI method. Moreover, CCSD(T) calculations of the Vertical Ionization Potential (VIP) are feasible and its experimental value is known,106 providing reliable reference data. Care has to be taken in the comparison to experimental values, as experimental measurements include intrinsic uncertainties and environmental parameters such as temperature. These effects, including zero-point motion, are not included in our study and might explain the remaining discrepancies between our calculations and the experimental ionization energy value given by Berkowitz et al.106
While the total energies at each basis set of ground state calculations and cation calculations are different, their trends are identical and we therefore only show figures representing the ground state. Figure 7 shows the ground state DMC total energies of LiF computed using various trial wavefunctions; single-determinant such as Hartree–Fock (HF), DFT’s PBE0, and B3LYP hybrid functionals and multi-determinant using the converged CIPSI trial wavefunction. CCSD(T) and CIPSI energies are added to the figure for reference and all calculations are performed for three basis-sets increasing in size (cc-pCVNZ, N = D, T, Q) and extrapolated to the complete basis-set limit (CBS). The trends of CCSD(T) and CIPSI total energies are in agreement with each other to the CBS limit. CIPSI calculations recover more correlation energy ∼0.24 eV for the ground state and ∼0.13 eV for the cation. This is to be expected as CCSD(T) includes singles, doubles, and perturbative triples excitations, while CIPSI wavefunction includes up to 9th order excitations with more than 70% describing quadruple excitations (10% describing higher order excitations) for both ground and cation states. Interestingly, at the CBS limit, CIPSI total energy converges to the same limit as the DMC(CIPSI) energy, while the CCSD(T) converges to the same energy as the SD-DMC energies.
At the DMC level of theory, the dependence on basis-set is rather weak; less than ∼10 meV in the worst case. In the LiF molecular case, the nodal surfaces of all tested single-determinant trial wavefunctions are within error bars of each other, meaning that they are essentially the same. Such weak dependence on the starting method and on the basis set are a significant advantage and strength of the method when compared to other methods such as sCI or even AFQMC. The use of CIPSI-based trial wavefunctions in DMC allows the recovery of 0.24 eV for the ground state and 0.5 eV for the cation. This difference underlines the different sensitivity of the nodal surface to excited and charged states. The vertical ionization potential (VIP), EVIP = Ecation − Eground, DMC performed using CIPSI wavefunctions shows almost no dependency to the basis set size (Fig. 8), while DMC(CIPSI), CIPSI, and CCSD(T) are in perfect agreement with the CBS limit, demonstrating good error compensation for the latter method.
C. Solid-state lithium fluoride
Solid LiF is a face centered cubic material with a large gap, used mainly in electrolysis for its role in facilitating the formation of an Li–C–F interface on the carbon electrodes.107 The purpose of this example is to demonstrate basis set effect and the systematic convergence of the DMC energy with the number of determinants (a paper demonstrating convergence to the thermodynamic limit is in preparation). We simulated a cell of (LiF)2 (4 atoms per cell) at the Gamma point using correlation consistent electron-core potentials (ccECP) (described by Bennett et al.37 and Wang et al.,40 Sec. IV) and the cc-pVDZ, cc-pVTZ and cc-pVQZ basis sets associated with the ccECPs. PySCF, Quantum Package, and QMCPACK are able to simulate all shapes of cells with both real and complex wavefunctions, corresponding to any possible k-point. In this case, running at the Gamma point is simply for convenience.
For such a small simulation cell, it is possible to converge the sCI wavefunction to the FCI limit with a reasonable number of determinants, as shown in Fig. 9. The number of determinants needed to reach approximate convergence remains important: around 700 K in cc-pVDZ, 6M in cc-pVTZ, and 9M in cc-pVQZ. Similar to the molecular case, the converged energies for the cc-pVTZ and cc-pVQZ basis sets are in agreement, indicating that the basis set is sufficiently converged. Interestingly, in the cc-pVTZ case, the DMC energy converges significantly faster with the number of determinants (700 K instead of 6M). The slower convergence of the cc-pVQZ curve indicates that important determinants describing relevant static correlations are introduced later in the selection process. Using natural orbitals or in general, an improved choice of orbitals or selection scheme could accelerate the convergence.
D. Solid-state bandgap calculations
The bandgap of a solid is a critical and fundamental property of a material to predict accurately. QMC calculations for solids have traditionally used completely independent calculations for ground and excited states and single determinant calculations. This approach can be accurate, but it relies on good error cancellation between the calculated total energy for each state, making the selection of consistently accurate trial wavefunctions critical. Improved methods are needed to enforce good error cancellation including approaches that can be systematically converged to give, in principle, exact results.
As discussed above, convergent wavefunctions and energies can be constructed using sCI techniques. However, even for small primitive cells with relatively uncorrelated electronic structures, this approach quickly requires millions of determinants, making it expensive to apply today. We have developed theories, methods, and implementations to obtain the band edge wavefunctions around the fundamental gap and their relative energies efficiently and to a high accuracy. Error cancellation is built into the methodology so that simpler trial wavefunctions are effective and the scheme is substantially more efficient to apply. Surprisingly, for the systems examined so far, only single and double excitations need to be considered to obtain accurate bandgaps, even using the simple VMC method. This makes the technique comparatively cheap to apply.
To compute the optical bandgaps of insulators and semi-conductors, we use the energy difference of optimized wavefunctions that describe the valence band maximum (VBM) and the conduction band minimum (CBM). Optimizations use the recently developed excited state variational principle,108–110
whose global minimum is not the ground state but the eigenstate with energy immediately above the chosen value ω, which could be placed within the bandgap to target the first excited state and thus predict the optical gap. QMCPACK evaluates Ω via the variational Monte Carlo (VMC) method to avoid explicit dealing with the H2 term and minimizes it using the linear method. For the ground state, we include the closed-shell determinant built from Kohn–Sham (KS) orbitals, plus all single-particle–hole excitations, which represents the leading-order terms of orbital rotation that transforms KS orbitals to the ones that minimizes Ω in the presence of the Jastrow factor. For the excited state, we include all the single-particle–hole excitations as in Bethe–Salpeter equation (BSE)111 methods and selected double excitations to capture the re-polarization of the electron cloud in the vicinity of the exciton. We use the variance of the wavefunctions as a proxy for accuracy and, by varying the number of determinants, choose ground and excited states with consistent variance.
We have used this approach in Ref. 112 to study optical gaps of a variety of solids ranging from small-gap semi-conductors to large gap insulators and compare our results to the commonly used GW approach based on many-body perturbation theory (MBPT). As detailed in the supplemental information of Ref. 112, the bandgaps were extrapolated using calculations on 8, 16, and 24 atom supercells. Figure 10 shows that the predicted optical gaps are in excellent agreement with experimental values and the mean absolute deviation (MAD) is just 3.5%, compared to MADs more than twice this large for the optical gaps obtained by subtracting the known exciton binding energy from G0W0 and self-consistent GW gaps. These calculations were able to run of departmental level computing and did not require supercomputers due to the use of VMC and moderate sized supercells utilized. Remaining errors that have yet to be investigated include the size of the CI expansion, the limited correlation energy obtained with VMC, the pseudopotentials, and residual finite size error. DMC may further improve the VMC results.
In order to further show the method’s advantage, we performed a thorough analysis of zinc oxide, a material that is particularly challenging for MBPT.115 As shown in Fig. 10, the perturbative nature of G0W0 makes its prediction highly sensitive to the amount of exact exchange included in the DFT reference. As G0W0 assumes a zeroth-order picture in which electronic excitations are simple particle–hole transitions between the one-particle eigenstates of DFT, such a sensitivity indicates the breakdown of this assumption, and then, G0W0 becomes unreliable. On the other hand, our VMC approach is designed to be insensitive to the DFT choice for two reasons: (1) its ability to include approximate orbital relaxation counteracts the shortcomings of the DFT orbitals and, (2) unlike G0W0, it does not require orbital energies as the input. From Fig. 10, we do find that its prediction to optical gap is both accurate and independent of the choice of DFT functionals.
Using DMC as a post-sCI method is very promising to systematically improve molecular and solid-state calculations beyond the single-determinant picture. It converges faster than the sCI methods used on their own. From a practical perspective, PySCF, Quantum Package, and QMCPACK are fully interfaced with each other through the Nexus workflow automation system. The necessary multi-step workflow to run the above examples is fully implemented. In the case of solids, Nexus can automatically manage finite-size scaling calculations by setting the size of the super-cells and the number of twists angles and drive PySCF, Quantum Package, and QMCPACK appropriately and automatically.
To illustrate the application of recent developments in QMCPACK, in Sec. VII A, we give an example of using real space QMC to study non-valence anions, which are particularly challenging systems. Section VII B gives an example of computing the excited states of localized defects, which is a challenge for all electronic structure methods. In Sec. VII C, we give an example of computing the momentum-distribution and Compton profile from real space methods. The necessary estimators have recently been specifically optimized for these tasks.
A. Applications of DMC to non-valence anions
In addition to valence anions, molecules and clusters can possess non-valence anions in which the binding of the excess electron is dominated by a combination of long-range electrostatics and long-range dispersion-type correlation effects. The best known class of non-valence anions are dipole-bound anions in which the binding of the excess electron is driven by the dipole field of the neutral.116–121 Non-valence anions, regardless of the nature of the long-range interaction responsible for the electron binding, are challenging to treat using traditional electronic structure methods due to the large, highly diffuse basis sets required. Both DMC and AFQMC methods have been demonstrated to be useful in characterizing dipole-bound anions.122,123 Non-valence anions in which electron correlation effects dominate the binding of the excess electron pose an additional challenge, namely, by definition, they do not bind the excess electron in the Hartree–Fock (HF) approximation. In fact, HF calculations on such excess electron systems collapse on to the neutral molecule, leaving the excess electron in a discretized continuum orbital.124 Hence, methods that start from the HF wavefunction, e.g., MP2 and CCSD(T),125 also fail to bind the excess electron. Many-body methods, such as equation-of-motion coupled cluster (EOM-CC),126–129 have proven successful in treating these species but still face the problem of requiring very large basis sets.
This raises the question of whether DMC calculations using a single Slater determinant trial wavefunction to define the nodal surface can accurately describe non-valence correlation-bound (NVCB) anions. To investigate this, we have undertaken DMC calculations on a (H2O)4 model with the monomers arranged so that the net dipole is zero.124 This model has been studied previously using EOM-CC methods. In the present work, we focus on a geometry at which the excess electron does not bind in the HF approximation, but for which EOM-EA-CCSDT129 calculations using the aug-cc-pVDZ basis set130,131 augmented with a 7s7p set of diffuse functions located at the center of the molecular cluster give an electron binding energy (EBE) of 174 meV.124
For the QMC calculations, Slater–Jastrow trial wavefunctions that are products of a single Slater determinant comprised of HF or DFT orbitals and a Jastrow factor were employed. All calculations were carried out using the ccECP pseudopotentials with the corresponding aug-cc-pVDZ type basis sets37,39 augmented with the same 7s7p set of diffuse functions, as employed in Ref. 124. The DMC calculations were carried out at three imaginary time steps (0.001 Ha−1, 0.003 Ha−1, and 0.005 Ha−1), and a linear extrapolation was performed to obtain the zero time step limit. HF calculations with this basis set fail to bind the excess electron, and a plot (see Fig. 11) of the singly occupied orbital of the excess electron system reveals that it is very diffuse because it has collapsed onto the lowest “continuum” solution as described by the discrete basis set. The QMC calculations were carried out with the QMCPACK code.
The Jastrow factors used in the trial wavefunctions included electron–electron, electron–nuclei, and electron–electron–nuclei terms. Normally, one would optimize the parameters in the Jastrow factors separately for the neutral and for the anion. However, this approach would not give meaningful results for an unbound anion, and as a result, we adopted the strategy of using the Jastrow factor optimized for the neutral in the subsequent DMC calculations of the anion. The DMC calculations using trial wavefunctions determined in this manner give an EBE of 183 ± 10 meV, in good agreement with the previous EOM-EA-CCSDT result (174 meV).124 This demonstrates that DMC calculations can recover from the use of a trial wavefunction for the anion that has collapsed onto a discretized continuum solution.
Even so, there remains the question of whether the EBE obtained from DMC calculations that use an unphysical (i.e., collapsed onto the continuum) trial wavefunction could incur an appreciable error due to an inadequate description of the nodal surface of the anion. To address this question, we also carried out VMC and DMC calculations on the neutral and anion of the (H2O)4 cluster model using orbitals from B3LYP132–134 calculations employing the same pseudopotential and basis sets as used for the HF calculations. The anion is bound by 395 meV in the B3LYP calculations, and the singly occupied orbital of the excess electron system, while still diffuse, is localized much closer to the molecule than in the HF calculations. A comparison of the charge distributions of the singly occupied orbitals from the HF and DFT calculations (see Fig. 11) shows that the DFT charge distribution is much less spatially extended. Moreover, it more closely resembles the charge distribution of the relevant natural orbital from the EOM calculation of Ref. 124. (The over-binding of the anion in the B3LYP calculations is likely due to the finite extent of the integration grid.) The DMC calculations using trial wavefunctions derived from B3LYP orbitals give an EBE of 212 ± 11 meV, ∼29 meV larger than the DMC result obtained using HF orbitals. This increase indicates that employing a trial wavefunction with a more physical charge distribution for the singly occupied orbital of the anion does have an impact on the nodal surface for the exchange of the electrons. The EOM-EA-CCSDT result obtained using the aug-cc-pVDZ+7s7p basis set is 38 meV smaller than the DMC EBE obtained using the trial wavefunction employing B3LYP orbitals. This suggests that the EBE from EOM calculations may not be fully converged in these large basis sets. Overall, these results demonstrate that DMC is a viable approach for the characterization of NVCB anions.
B. Excitation energies of localized defects
For defects and interfaces, most ab initio methods can only achieve qualitative agreement on the optical properties. We have recently studied emission energies of Mn4+-doped solids using DMC, which is chosen as proof of principle.11 We show that our approach is applicable to similar systems, provided that the excitation is sufficiently localized. In support of this work, Nexus scripts and new tutorials on excited state calculations were developed, which can be applied to any gapped system (Lab 5 in the QMCPACK manual).
Multivalent ionic defects, such as Mn4+, can create multiple localized electronic states that are trapped within the bandgap of wide gap materials. Thus, luminescent centers are created in the dopant sites through radiative recombination. Mn4+ has the d3 electronic configuration all on the t2g orbitals. The ground state is in the (4A2g) configuration due to Hund’s rules, but the excited state is found to be as (2Eg).135 Therefore, the emission energy is simply defined as Eem = E(2Eg) − E(4A2g).
Figure 12(a) shows that DMC can reproduce experimental emission energies of Mn4+ doped insulating host compounds.136–138 DFT + U and hybrid-DFT, however, substantially underestimate. Relative quantitative success of HSE with respect to PBE (or DFT + U) indicates that emission energies might be reproduced with a larger portion of exact exchange. However, this would worsen the accuracy of the computed bandgaps in the host compounds.11 In Fig. 12(b), we show the spin flipped electrons per radial volume , which is spherically integrated around the Mn4+ atom. n(r) approaches to zero with the increase in radius, indicating that the excited electron density is strongly localized on the impurity atom. DMC and LDA + U n(r) densities are almost identical to each other despite the large difference in their emission energies, which underscore the difficulties that are needed to be overcome for better DFT functionals.139
C. Calculation of the many-body properties: The momentum distribution
As full-many body methods, QMC can be used to calculate many-body properties that cannot be readily obtained from single-particle or mean-field techniques. We have recently updated and optimized the calculation of the momentum distribution.
Experimentally, the momentum distribution function (MDF) can be accessed, e.g., via angle-resolved photoemission spectroscopy and scattering methods such as Compton scattering, positron annihilation, the (e, 2e) process, and high energy electron scattering.140–143 In general, the differential cross sections of the scattering can be related to the momentum distribution. These experimental techniques are powerful probes for understanding subtle details on the ground state properties of materials, which are manifested in the MDF.
In normal Fermi liquids, the electron MDF has a discontinuity at the Fermi momentum pF. In three-dimensional systems, this discontinuity defines the shape of the Fermi surface, which is also related to the screening properties of the electrons.144 The Fermi surface can be extracted from the p-space MDF via back-folding.145 This leads to occupation density within the first Brillouin zone from which the Fermi surface topology can be considered.146
The magnitude of the discontinuity at the Fermi surface, however, quantifies the strength of a quasiparticle excitation and is generally referred to as the renormalization factor.147,148 For strongly coupled systems, the renormalization factor tends to zero as the coupling strength increases, and thus, it provides an estimate for the strength of electron correlations. Interestingly, the discontinuity at the Fermi momentum disappears for superfluids or superconducting materials. For insulators, the discontinuity is absent and the sharp drop is noticeably broadened, which also holds true for some semi-metals.149,150 Even small scale charge density oscillations lead to clear signatures in the MDF.151 Therefore, the momentum distribution function provides complementary and informative knowledge to other characterizations of many-body systems.
The MDF, n(p), is obtained by taking the Fourier transform of the one-body density matrix,
where Ω is the volume containing Ne electrons, , s = r1′ − r1, and
In variational Monte Carlo, this is expressed as
where . Thus, we get, for the MDF,
In practice, the Monte Carlo estimate for the MDF with Ns samples is given by
where R includes the coordinates of all the electrons and is a displacement vector acting on the jth electron of the ith sample. In diffusion Monte Carlo calculations, the mixed distribution replaces |Ψ(R)|2, and additional measures must be taken to calculate or estimate the density matrix. Note that the momentum distribution normalizes to the number of electrons,
in which d refers to dimensionality. In Eq. (12), a finite system and a system at the thermodynamic limit are described by summation and integration, respectively.
The MDF estimator is a one-body density matrix based estimator with a very high computational cost resulting from the large number of wavefunction evaluations required. A naive implementation can easily double the cost of a QMC calculation. Thus, efficient algorithms and implementations are critical, and similar techniques can be used for related estimators.
In Eq. (11), the computation of both wavefunction ratios and phase factor is expensive. Direct NeNs times of calculation is easy to implement but has a lot of repeated effort. In the term, the evaluation of single particle orbitals at dominates the cost. In fact, its call count can be reduced from NsNe to Ns by making
where r′i is the electron coordinates of sample i. Although the leading cost of is optimized away, its remaining terms still scale as and the computational cost should be comparable to the non-local pseudopotential calculation.
Unlike the wavefunction ratios that need to be computed only once for all the p, phase factors are computed for each p and also take significant portion of time. Similarly, for each p, the number of evaluations can be reduced to Ns + Ne times by separating indices i and j in two terms such as in Eq. (13). The calculation of and can be efficiently vectorized using the single instruction multiple data (SIMD) unit in modern processors.
By applying the above techniques and ensuring vectorization of all operations, the overhead for evaluating the MDF in a 48 atom cell VO2 was reduced from additional 150% to only 50% cost increase compared to a DMC run without any estimators.
Within the so-called impulse approximation (IA), the Compton profile and the dynamical structure factor are proportional to the projection of n(p) onto a scattering vector.149,152 In this case, directional Compton profile in the z-direction would be expressed as
The IA is especially appropriate for x-ray Compton scattering from electronic systems,142,149 and thus, it is capable of providing a unique perspective for understanding the electronic structure of materials; bulk properties in particular.
In Ref. 151, QMCPACK was used in obtaining the MDFs and Compton profiles for VO2 across its metal–insulator transition from the metallic rutile (R) phase to the insulating monoclinic (M1) phase. There, the analysis of the MDF shows signatures of the non-Fermi liquid character of the metallic phase of vanadium dioxide. Moreover, findings therein provide an explanation for the experimentally observed anomalously low electronic thermal conductivity,153 which manifests as back scattering characteristics within the momentum distribution function. Figure 13 shows some examples of MDF differences across the phase transition in two planes as well as for a few different directions.151
We have described recent enhancements to the open source QMCPACK package. Besides increases in capability for both real space and auxiliary field Quantum Monte Carlo (QMC) methods, the surrounding ecosystem has also been improved. These enhancements include the workflow system Nexus, which aims to reduce the complexity of performing research studies and the tens to hundreds of individual calculations that might be entailed. A new set and open database of effective core potentials has also been established at https://pseudopotentiallibrary.org, and we expect that these will be of interest for other quantum chemical and many-body calculations due to their increased accuracy, including for stretched bonds and excited states. We have also described how improvements in open software development have benefitted the project. Besides the activities described in this article, we note that there is substantial ongoing work to enhance the architecture of QMCPACK for GPU accelerated machines and to obtain portable performance from a single code base. Once the new design is proven on diverse GPUs, it will be described in a future article.
Overall, the applicability of QMC continues to expand, it is becoming easier to apply, and there are many systems and phenomena where the higher accuracy and many-body nature of QMC are both warranted and can now be applied. We hope that this article will help encourage these new applications.
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Methodological development and scientific applications of QMCPACK are currently primarily supported by the U.S. Department of Energy, Office of Science, Basic Energy Sciences, Materials Sciences and Engineering Division, as part of the Computational Materials Sciences Program and the Center for Predictive Simulation of Functional Materials. Software developments focused on future Exascale architectures are supported by the Exascale Computing Project (Grant No. 17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. S.U. and K.D.J. acknowledge the support of NSF (Grant No. CHE-1762337). B.R. acknowledges previous support for the work described from the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract Grant Nos. DEAC52-07NA27344 and 15-ERD-013 and NSF (Grant No. DMR-1726213). L.Z. acknowledges additional funding from the Dalton Fellowship at the University of Washington. An award of computer time was provided by the Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program. This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under contract Grant No. DE-AC02-06CH11357. This research also used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract Grant No. DE-AC05-00OR22725. Sandia National Laboratories is a multi-mission laboratory managed and operated by National Technology and Engineering Solutions of Sandia LLC, a wholly owned subsidiary of Honeywell International, Inc. for the U.S. Department of Energy’s National Nuclear Security Administration under Contract No. DE-NA0003525. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government.
This manuscript has been authored by UT–Battelle, LLC, under Contract No. DE-AC0500OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for the United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).