The Massively Parallel Quantum Chemistry (MPQC) program is a 30-year-old project that enables facile development of electronic structure methods for molecules for efficient deployment to massively parallel computing architectures. Here, we describe the historical evolution of MPQC’s design into its latest (fourth) version, the capabilities and modular architecture of today’s MPQC, and how MPQC facilitates rapid composition of new methods as well as its state-of-the-art performance on a variety of commodity and high-end distributed-memory computer platforms.
I. INTRODUCTION
Massively Parallel Quantum Chemistry (MPQC) is an open-source software suite for molecular electronic structure (“quantum chemistry”) computations. As suggested by the first half of the name (“Massively Parallel”), the objective of MPQC is first, and foremost, to enable deployment of quantum chemistry methods on large-scale parallel computers. Another key objective of MPQC is to serve as a research platform for the development of electronic structure methods. Thus, MPQC was designed from scratch to emphasize both high performance and composability. To achieve these objectives, MPQC was implemented almost exclusively in the C++ programming language. In fact, MPQC was the pioneering example of large-scale deployment of C++ in the domain of electronic structure; its use of “modern” programming styles (object-oriented, generic, functional, etc.), its object model of this domain, its design patterns (smart pointers, factories), and its techniques (serialization, runtime class reflection) influenced the designs of many other software packages that followed. In its present incarnation as version 4, MPQC continues the pursuit of these goals by introducing the following: the ability to compose fine-grained task decomposition of algorithms, high-level domain-specific languages for block-sparse tensor algebra and many-body electronic structure, and support for portable execution on heterogeneous platforms.
As suggested by the second half of the name (“Quantum Chemistry”), the original and primary focus of MPQC is on electronic structure methods applicable to molecules. Thus, MPQC utilizes exclusively the traditional Linear Combination of Atomic Orbitals (LCAO) technology. Until ∼5 yr ago, MPQC was focused on the workhorse methods of quantum chemistry [Hartree–Fock (HF), density functional theory (DFT), and second-order perturbation theories], whereas the current MPQC focuses primarily on accurate many-body methods such as coupled-cluster. MPQC now also includes emerging support for computations on periodic solids.
The goal of this article is to highlight the innovative design of MPQC, which supports facile composition of high-performance parallel implementations of electronic structure methods, as well as the unique methods available in MPQC. Although the latest public release of MPQC has very little source code left from the first public release, it is instructive to briefly review the history of MPQC in the context of the evolution of both the electronic structure formalisms and the high-performance computer platforms (Sec. II). We will then discuss the design of the current version of MPQC and how it supports deployment to modern high-performance computing platforms and composability (Sec. III). Finally, we will discuss the current and emerging capabilities and recent performance highlights (Sec. IV).
II. HISTORY
The origins of MPQC go back to the pioneering efforts to deploy quantum chemistry methods on the emerging distributed memory parallel computers in the early 1990s, pursued first by the group at Sandia National Laboratories (Curtis Janssen, Michael Colvin, and Robert Whiteside) and subsequently in collaboration with the National Institutes of Health (Edward Seidl). The roots of MPQC (the name coined by Curt Janssen) go back to the early parallel quantum chemistry developments by Colvin and Whiteside in Fritz Schaefer’s group at Berkeley1 and to the parallel quantum chemistry code based on the distributed matrix package DMT written in CWEB (a literate programming extension of C)2 at Sandia. Curt Janssen joined the efforts at Sandia and lead the design and implementation of what became the MPQC program written largely in C++. Although the revision history in the public repository3 only contains code from December 29, 1993, onward, the initial commits of MPQC include C++ code from at least 1991 and perhaps even earlier.
A. Versions 1 and 2
The first official release of MPQC (version 1.0, repository tag release-1-0) occurred on December 20, 1996. Its features included energies and forces for closed- and open-shell (including low-spin and two-configuration) Hartree–Fock (HF)4 and closed-shell second-order Møller–Plesset perturbation theory (MP2).5,6 It could also compute energies with several spin-restricted open-shell second-order perturbation theories.5 This version was the only official release that included the legacy CWEB-based DMT package and the mean-field methods based on it. The rest of the code was the “new” MPQC developed in C++ since Curt Janssen’s arrival at Sandia. It included a robust object model of quantum chemistry (classes such as Molecule, GaussianBasisSet, and abstract Wavefunction with instinctively obvious purposes). The essential technical features of MPQC 1.0 included the following:
robust object-oriented design in C++ with support for automated lifetime management via reference-counting smart pointers (RefBase and Ref), object serialization (SavableState), and runtime class reflection (DescribedClass);
portable abstractions for message passing (MessageGrp) and partitioned global memory (MemoryGrp) implemented for shared-memory multiprocessors using System V interprocess communication and distributed-memory multiprocessors using Message Passing Interface (MPI), Parallel Virtual Machine (PVM), and the NX communication library for Intel Paragon;
matrix package (SCMatrix), which supported plain and blocked matrices stored locally, in a replicated or distributed fashion; and
modular design whereby the application was decomposed into a hierarchy of 25 libraries (broken into util, math, and chemistry families); the top MPQC application was free of domain logic and only parsed the input, set up the environment, and, optionally, loaded the needed modules dynamically if the domain-specific libraries were not linked in statically.
Some of these features had contemporary analogs, e.g., the MemoryGrp abstraction is related to the Global Arrays toolkit and the Aggregate Remote Memory Copy Interface (ARMCI) library.7,8 However, these features became common in quantum chemistry, and in scientific applications in general, only much later, e.g., the use of smart pointers only became common in the late 1990s and was eventually included in the C++ Library TR1 in 2003 and the official C++ ISO standard in 2011. This was partially due to the immaturity of the C++ programming language itself and the associated tools; many features were not widely supported, so MPQC, until version 2.0.0, implemented C++ templates using preprocessor macros. Similarly, the in-source documentation was extracted using custom scripts since standard documentation tools such as Doxygen9 only became available much later.
The emphasis on internal (intra-MPQC) and external (outside of MPQC) reuse via the use of object-oriented and generic programming styles and modular design allowed facile composition of new methods in standalone programs with the existing MPQC code. This could be achieved by implementing existing abstract C++ APIs (e.g., by deriving new concrete wave function method classes from the abstract Wavefunction class) while keeping the rest of the application logic intact and by composing new program logic completely. The modular design of MPQC also facilitated the interoperability between packages, e.g., it included an interface to the Gaussian 92 program10 and to the Schaefer group’s Psi package (MPQC served as a driver for both, with energies and forces computed in Gaussian 92 or Psi used for geometry optimization in MPQC; a deeper integration with Psi was developed later in MPQC version 3). The emphasis on external reuse outside of the package is a relatively recent phenomenon in quantum chemistry, which has been (and still is) dominated by monolithic frameworks. This is partially due to the changing technical landscape: the use of modules and modular design only became common in quantum chemistry in the 2010s with the widespread adoption of languages that support modules natively (e.g., Python) as well as build frameworks such as CMake that support a module-like import of build dependencies for languages such as C++ that lack modules (fully featured modules will be included in the upcoming 2020 ISO C++ standard).
Several recursive Gaussian integral-based codes were included, all utilizing some form of code generation to achieve high performance. Collaboration between Justin Fermann and the MPQC team was a key influence on the development of the Libint library.11,12
The development of MPQC continued with public release 1.1 in which the legacy CWEB code was removed. Portable abstraction for thread pools (ThreadGrp) was also introduced in this release. Version 1.2 added Kohn–Sham density functional theory energies and gradients.
The first public release of MPQC version 2.0 occurred on October 4, 2001. It included support for simplified (non-object-oriented) input formats as well as an extensive use of threads for shared-memory parallelism throughout the code.13 Support for explicitly correlated MP2-R12 energies, including the improved resolution-of-the-identity approximations,14,15 was added in version 2.2.16 Deeper interoperability with other packages (NWChem)17 was made possible by integrating MPQC’s components into the Common Component Architecture (CCA) framework; this allowed, as an example, for the exchange of integrals with NWChem,18 and the CCA integrals standard19 developed in the course of this work is the basis for the ongoing standardization of integral interfaces.
B. Version 3
Efforts to release the next major version of MPQC started in 2006, but most of the effort was refocused on the complete reengineering of the package for version 4. Thus, an official release of version 3 never occurred. Nevertheless, since version 3.0 branch v3 included some unique capabilities, it is useful to summarize its essential features here:
Explicitly correlated MP2-F12 (including properties20 and multiple correlation factors21) and coupled-cluster singles and doubles with perturbative triples CCSD methods22–25 [the latter utilized CCSD(T) code from the PSI3 package26 and thus was not scalable beyond 1 node].
Local MP2 energy implementation based on projected atomic orbitals (PAOs).27
Universal perturbative explicitly correlated F12 correction.28–32
Simple evaluation of charge-transfer couplings.33
Reduced-scaling exchange matrix evaluation using concentric atomic density fitting.34,35
During the development of these features, the need for reengineering of the MPQC infrastructure became clear. Thus, instead of continuing to invest in version 3, the team switched efforts toward the modernization of the infrastructure by redesigning the package almost from scratch. The resulting new reincarnation of MPQC, version 4, will be described in Sec. III.
III. DESIGN
The development of the latest (fourth) version of MPQC started in earnest in 2014. However, the development of the keystone TiledArray (TA) framework began in 2008. The primary objective of this new version of MPQC was to allow the deployment of reduced-scaling highly accurate many-body methods on commodity and high-end computer platforms. However, several shortcomings in the existing codebase made achieving this goal difficult. To address these shortcomings, detailed below, a near-complete reengineering of MPQC was undertaken.
Tensor algebra. Although MPQC version 3 included efficient implementations of distributed tensors with up to four dimensions, both in memory and on disk, notably utilized to perform integral transformations and evaluate F12 intermediates efficiently,16,21 composing algorithms over such data structures was laborious and error-prone. Similarly, the limited functionality for block-sparse tensors developed for the PAO-based local MP2 method27 was not easily extendable to more complex theories. The need for efficient and composable dense and sparse tensor algebra motivated the development of the TiledArray tensor framework.36 MPQC version 3 was an early test platform for deployment of TiledArray as the implementation vehicle for new methods.20,31 TiledArray is a key building block of today’s MPQC, serving as the main engine for distributed-memory tensor computation.
Higher-level parallel programming models. MPQC version 3 supported several paradigms for writing parallel programs, namely, message passing, multithreading, active messaging, and partitioned global address space (PGAS). To support the development of reduced-scaling many-body methods that involve algorithms on sparse irregular tensorial data structures, more powerful forms of composing parallel programs were needed. These needs motivated the choice of the MADNESS parallel runtime as the implementation basis for the TiledArray framework. The MADNESS runtime (a part of the MADNESS package37 for numerical calculus in many-dimensions) supported the aforementioned programming paradigms as a basis for higher-level concepts such as futures, task queues, and distributed object messaging. These concepts are not novel, e.g., futures have been a part of standard C++ since 2011; similarly, task-based algorithm composition is available in Intel Thread Building Blocks (TBB38) and OpenMP39 (versions 3 and higher), among many others. What makes MADNESS runtime unique is how the known features combine to provide useful ways to compose parallel programs. For example, unlike the C++ futures, the futures in MADNESS are global, i.e., can refer to (potentially unevaluated) results on remote processes; this feature greatly enhances the ability to decouple placement of data and work from each other. These features make MADNESS runtime a powerful platform for developing fine-grained task-parallel algorithms, which hide latency by overlapping communication and computation, are more tolerant of load imbalance by decoupling work scheduling and execution, and are easy to compose by automating details of the data movement and work scheduling. The MADNESS runtime is the main implementation vehicle for today’s MPQC and thus has replaced all legacy parallel constructs in earlier MPQC versions.
Excessive technical debt. By 2014, many features in the legacy MPQC codebase had become widely available in either standard C++ (smart pointers) or in high-quality community codebases such as Boost40 (serialization). This offered a great opportunity to reduce the effort of maintaining the MPQC codebase by reusing standard and external components. Similarly, the custom structured key-value format used to specify inputs for legacy MPQC codes was replaced by industry-standard JavaScript Object Notation (JSON) and Extensible Markup Language (XML) formats. This also offered an opportunity to tweak the design to eliminate legacy elements such as overuse of (single) inheritance that produced deep class hierarchies in favor of compile time polymorphism (templates) and multiple inheritance. Finally, this made it possible to eliminate abstractions that in practice were no longer relevant, e.g., abstract integral engines could be eliminated without almost any loss of functionality by making the Libint2 library the standard integrals engine.
As a result of the near-complete reengineering of MPQC, very few legacy codes (13 out of 427 files) remain in today’s MPQC. Despite radical technical changes, the essential elements of the MPQC design are intact, such as the emphasis on modularity and reuse, the focus on high performance via powerful parallel abstractions, and the MPQC object model of the quantum chemistry domain.
MPQC version 4 was publicly released in beta on August 14, 2018. Active development continues with a stable release slated for 2020.
A. Architecture
Just like the original MPQC codebase, today’s MPQC (nicknamed MPQC4) is designed as a collection of libraries. What is different about MPQC4 is how much functionality it outsources to the external packages. The major components, such as parallel runtime, linear/tensor algebra, and Gaussian integrals, are provided by external packages (Fig. 1). Some functionality of the legacy MPQC has also been replaced by features of the standard C++ language.
Since the C++ language does not yet support modules, code reuse in C++ puts heavy burden on the user to ensure the correct order of header file inclusion, proper compiler and linker command line arguments, etc. This is a major reason why the majority of electronic structure software packages developed in compiled languages become rigid monoliths, with code reuse limited to the interior of each package (relatively few exceptions exist). Languages with module support, such as Python, allow for easier reuse of functionality outside of each package’s boundaries, and this is a major contributor to the popularity of Python-based packages (Psi4,41 PySCF,42 Horton,43 adcc,44 etc.) and Python-based interfaces for standalone packages (ASE45). For more discussion of this, including the emerging Python bindings to the MPQC API, see Sec. V A.
Since its inception, MPQC has emphasized the reuse of code within the package as well as outside of the package. To make such reuse possible, legacy MPQC used custom scripting around the legacy build system (GNU Autotools). Luckily, modern build systems such as CMake,46 Bazel,47 build2,48 and Meson49 make the reuse of compiled software components easier by supporting module-like concepts (targets) whose transitive dependencies can be managed mostly automatically.50 Thus, to make today’s MPQC libraries maximally reusable, they are “modularized” as CMake targets. Reuse of the libraries, both inside and outside of MPQC, uses the same CMake-based machinery. The modular architecture of MPQC is illustrated in Fig. 2. Each module x has the corresponding CMake target named MPQCx (except module libmpqc).
B. Core
The core MPQC components (labeled “Util” in Fig. 2) include libraries that, in combination with the MADNESS parallel runtime and the TiledArray framework, interact with the runtime environment (MPI, threads, compute unified device architecture (CUDA) devices, floating-point exceptions, and signal handling), provide debugging support (e.g., generate human-readable backtraces and launch debugger programmatically or upon exceptional conditions), and provide domain-neutral abstractions used across the entire MPQC. Among key abstractions is the hierarchical key-value data structure KeyVal used to represent MPQC input/output in industry-standard formats JSON/XML. As our community is engaged in active efforts to standardize the representation of various data (geometries, basis sets, individual computations, and workflows), it is useful to briefly review the novel features of KeyVal and how it supports various aspects of composability.
1. The KeyVal library
Input to an electronic structure program such as MPQC conveys all information necessary to specify the initial state of the program. The traditional solution to converting input specified in a (typically) custom format into a program state is performed by a dedicated module, which by its very purpose becomes a single point of coupling of all modules in the program; namely, it must know how to convert each “keyword” or each key-value pair in the input to the corresponding elements of the program state. Thus, the input processor typically contains the logic related to every module whose state it controls. Such design is difficult to maintain and extend. Therefore, many modern electronic structure packages, in particular, the Python-based codes, eschew the need for custom input and single point of logic coupling in favor of direct programmatic construction of the program state. Such a solution is not practical unless all of the functionalities are exposed in a language that can be interpreted. Although C++ can be interpreted,51,52 it is not very common.
MPQC’s solution, since its very beginning, has been for the program input to directly map to the program state. This solution allowed full control of the program’s state using an input specified in a domain-neutral structured text format. In MPQC4, this mechanism was modernized by using the industry-standard domain-neutral formats JSON and XML. The idea is best demonstrated by an example (for simplicity, we will only demonstrate examples in JSON, although the XML format are also accepted by MPQC). Figure 3 shows an input for computing the MP2/STO-3G energy of a water molecule. Named keyword groups, e.g., property, mp2, water, and sto-3g, represent objects to be created at runtime. Nesting of objects naturally represents containment, e.g., the mp2 object representing the MP1 wave function contains the ref object representing the RHF wave function. However, containment is only sufficient to represent a tree of objects. To represent a directed acyclic graph (DAG) of objects, MPQC input can encode references from one part of the input to another. For example, the keyword atoms of object sto-3g refers to the water object located in the top keyword group. The references can point not only to objects but also to values of arbitrary types. The ad hoc syntax used to encode references in the MPQC input was inherited from legacy MPQC. Eventually, the need for the custom syntax may go away. The ability to encode references is already supported in XML via the standardized XML Path (XPath) Language.53 JSON supports absolute references via standardized JSON Pointers,54 with JSON relative references only proposed for standardization.55
The use of references makes it possible to specify a complete DAG of objects needed to execute a task. The DAG is instantiated by starting at the root object. For example, the MPQC program instantiates its graph by starting from the property object, whose constructor will instantiate the mp2 object and so on. The ability to create a DAG of objects is important to be able to precisely model the dependencies between objects in the program. For example, both the sto-3g and wfn_world objects refer to the same water object, rather than each containing its own copy of the water object. By avoiding data duplication, the maintenance of valid program states is greatly simplified; for example, changes in the state of the single water object, e.g., during geometry optimization, are mirrored in the state of both the sto-3g and wfn_world objects. This is also essential for optimal efficiency: for example, by sharing the same wfn_world object, the ref and mp2 objects can reuse intermediate quantities, such as, in this particular case, the AO integrals or, in more general scenarios, various artifacts of the density-fitting approximations.
Another essential feature of the input is polymorphism. The MPQC program does not know the exact type of property object a priori; instead of computing the energy, the user may want to compute excitation energies or electron attachment/detachment energies. Thus, MPQC can only know that the property object will model a particular abstract interface, which in C++ is specified by the mpqc::Property class. To be able to construct an object of a particular concrete type, it will read its type from the input (specified by the type keyword), map it to the class constructor corresponding to the desired object type, and invoke the constructor with the contents of the property keyword group. The polymorphic input makes it possible to completely decouple the domain-specific logic from the domain-neutral core of MPQC while allowing the user to easily compose complex program states.
The ability to express precisely the relationships of objects in a program composed in an object-oriented style is quite important for being able to specify very complex workflows. MPQC is able to accomplish this by essentially providing a substantial percentage of the programmatic control via simple text input. Established56 and recently introduced57,58 standard schemas for capturing results of quantum chemistry computations in JSON/XML and more generally for providing package-portable programmatic input are motivated by the same ideas that have driven the design of MPQC’s input since its very beginning. Our experience with MPQC suggests that such efforts should strive for (1) an explicit object-oriented structure mirroring the logical state of objects in a program and (2) a precise description of the object model of the domain of quantum chemistry and the more general domain of atomistic simulation.
The KeyVal library provides the programmatic representation and control of the structured key/value documents in MPQC. More precisely, it provides the Document Object Model (DOM) of the MPQC input (and output), similar to the standard DOM implementations for XML and JSON. The KeyVal objects can be constructed from existing XML/JSON documents and exported to the XML/JSON form. The KeyVal objects can also be constructed purely programmatically. Figure 4 illustrates a programmatic construction of the KeyVal object corresponding to the JSON in Fig. 3. Instead of creating a single KeyVal object and filling it with a hierarchy of key-value pairs, temporary KeyVal objects are created for each constructor call, thus emulating the named parameter syntax (supported by many languages, e.g., Python and Fortran) in C++.
C. Math
The numerical foundation for MPQC is provided by the TiledArray (TA) tensor framework, which provides for scalable and composable block-sparse tensor algebra. By outsourcing most of the math capability to TA, MPQC’s math libraries (labeled “Math” in Fig. 2) are relatively lean. Thus, it is reasonable to list the features of TA that are essential for supporting the rest of MPQC:
Support for dense and block-sparse tensor algebra, efficiently scalable both within a node and across many nodes. Scalability is achieved by the task decomposition and asynchronous execution of most of the algorithms. For the all-important tensor contraction, the use of the communication-minimizing SUMMA algorithm,59 implemented in a task-based form, makes it possible to attain a high percentage of the peak performance on commodity and high-end platforms.60
High-level domain-specific language for tensor algebra, embedded in C++, providing not only facile composability, which is common, but also extensive programmatic control of sparsity and parallel distribution of intermediates. The following expressions for the MP1 amplitude residual and the MP2 energy
are written in the TA DSL as
TA distributed arrays can be equipped with custom tile types. This makes it possible, for example, to implement integral-direct algorithms by instantiating arrays with custom tile types that compute their data (e.g., AO integrals) as needed and then discard it after the use. This also makes it possible to implement arrays whose tiles are compressed, e.g., as in compressed low-rank format,61 and to provide support for heterogeneous machines [e.g., graphics-processing units (GPUs)].62 This capability is crucial for making TA more than just tool for conventional (dense) many-body quantum chemistry: competitive integral-direct reduced-scaling density-fitting Fock matrix builders can be composed using a few lines of TA DSL.
Note that the TA framework supports multiple programming styles, ranging from the aforementioned math-like tensor DSL to the Standard Template Library (STL) like algorithms on ranges of tiles, to arbitrary loop nests over computation on remote and local data, to direct manipulation of data via pointers to local tiles. Thus, the supported functionality is much richer than just math-like dense/block-sparse tensor algebra and cannot be discussed here due to the space constraints.
The TA framework does not provide native linear algebra capability, and thus, interfaces to common linear algebra libraries are used (Eigen,63 Elemental,64 and, recently, ScaLAPACK65). TA-native implementations of iterative linear and nonlinear solvers, such as DIIS,66 the Davidson algorithm,67 the matrix inverse square root,60 and the preconditioned conjugate gradient, are also available.
D. Quantum chemistry
The bulk of modern MPQC consists of the infrastructure for molecular and, recently, solid-state quantum chemistry. This includes components for representing a molecule/unit cell as a clustered set of atoms, (clustered) basis sets of Gaussian AOs, molecular integral evaluations, Hartree–Fock, and a variety of many-body methods (based on both wave functions and propagators). Compared to the legacy MPQC’s QC component, today’s package features a significantly different set of methods: there is no support for DFT, and there is a major focus on accurate many-body methods. Implementation of a relatively large number of new methods in MPQC4 was facilitated by the high-level TA DSL discussed previously, which makes the composition of one- and many-body methods relatively straightforward, once the relevant Hamiltonian tensors are available. In most cases, the code looks as close to the math as possible and the details of thread and distributed-memory parallelism are (almost) completely abstracted out; the main complication is the auxiliary logic and the need to maintain a number of code variants, e.g., there are several algorithms for CCSD that differ in whether all or some integrals are stored and whether density fitting is used. Some choices require refactorization of the equations and thus must be purpose-encoded.
Significant technical simplification of the QC component in MPQC4 is the removal of the support for abstract integral interfaces; the Libint engine’s API and classes, which follow the established domain model of legacy MPQC, are used directly. The CCA integral standard is adopted as the default.19
One particular set of modules worthy of more detailed discussion deals with getting the Hamiltonian tensors into the TA distributed arrays. These tasks are (relatively) difficult for beginners since they involve dealing with the low-level details of tasking and the management of integral engines’ resources. Specifically, in order to support efficient intra-node parallelization of the AO integral evaluation, it is necessary to manage integral engine resources (scratch space) on a per-thread basis, i.e., additional abstractions such as engine pools are needed. Second, the Libint library does not, for the sake of simplicity, provide any interfaces for computing integrals over clusters of shells and/or higher-level screening; the support for computation of the Hamiltonian tensors over clustered basis sets, with proper screening at the level of tile sets and shell sets, and the precomputation of the shell pair data is all implemented in the lcao_integrals module of MPQC. Users of TiledArray who are interested in efficient evaluation of the AO integrals are encouraged to try this module of MPQC4 before deciding to work on their own.
Finally, to automate the AO → MO integral transformation and various types of approximate reconstructions of the AO/MO integral tensors, we introduced a set of abstractions initially introduced in MPQC version 3 for the purpose of composing F12 methods more easily. Together with the TA tensor DSL, these provide a mini-DSL for electronic structure that we found useful not only for composing new methods but also for education. Of course, this is a very primitive DSL since parsing and validation are all done at runtime; nevertheless, it is very powerful. For this reason, it may be useful to discuss it briefly here.
The basic idea of the electronic structure DSL is illustrated by Fig. 5. Module lcao_factory provides abstractions {AOFactory, LCAOFactory} that can compute {AO, MO} integrals, both exactly and using DF, while managing partially transformed and intermediate integral reuse. Each AO/LCAO wave function object (RHF, RMP2, etc.) is associated with a unique LCAO computation context (WavefunctionWorld), each of which provides the requisite basis set objects (orbital, density fitting, etc.), an orbital space registry, and one integral factory of each kind. Thus, the wave function objects that belong to the same WavefunctionWorld can collaborate on the integral evaluation by reusing integrals/intermediates. The meaning of index labels (μ, ν, etc.) in formulas such as μ ν |G| κ λ is defined in the orbital space registry object that can be programmatically controlled, thus making the DSL grammar extensible. The mapping of the operator labels, e.g., G in (μ ν |G| κ λ), to the operator type (in this case, ) is currently hardwired. Multiple contexts can coexist so that wave functions with different basis sets and even for different molecules can be used cooperatively; for example, an RHF wave function computed in a small basis can be used to provide an initial guess for the large-basis counterpart. Similarly, complex scenarios such as embedding or fragment decomposition can be implemented trivially.
IV. CAPABILITIES OF MPQC
A. Application highlights
MPQC currently features a very limited set of methods (closed-shell, energies only), focused on high-accuracy methods and their efficient implementation on distributed-memory machines:
Ground states: MP2,68 CCSD,69 CCSD(T),70 CCSDT (as well as its iterative approximations),71–73 explicitly correlated doubles correction ,22 and distinguishable cluster methods (DCSD and DCSDT74).
Excited states: Configuration interaction singles (CIS)75 and equation of motion coupled-cluster singles and doubles (EOM-CCSD).76
Ionized- and electron-attached states: EOM-{IP, EA}-CCSD77,78 and standard and explicitly correlated GF279 and NR280,81 propagator methods.
Linear response: CCSD.82
Many items on this list should only be viewed as building blocks for future work due to several developments:
the emergence of efficient reduced-scaling many-body formalisms, such as pair natural orbital (PNO)-based coupled-cluster methods,83–87 and
the ongoing shift to heterogeneous hardware, such as general-purpose graphics processing units (GPUs).
Although for some methods it remains to be seen whether the reduced-scaling formalisms will be competitive with the conventional formulations (e.g., for off-resonance linear response CCSD88), efficient reduced-scaling formulations of many methods become superior to the conventional approaches even for a very small system. Thus, addressing these and other challenges will be a major part of the short-term agenda for our team.
The extensive work that went into the development of the TiledArray framework should help us with both the transition to the reduced-scaling paradigm and the transition to heterogeneous platforms. It is partially due to its design, which was motivated by the beliefs that (1) the reduced-scaling methods were the future and that (2) the importance of concurrency is only going to grow and will have a major impact on which methods map better on the hardware; both turned out to be good bets. Thus, we would like to highlight recent work involving TiledArray and MPQC4 relevant to these challenges.
Efficient explicitly correlated DF-CCSD.89 To support both memory-limited platforms and high-end platforms, several closed-shell CCSD formulations were implemented (conventional, DF, and hybrid DF-AO; later we added the direct-DF variant that avoids persistent storage of integrals with three- and four-unoccupied indices62). Several perturbative explicitly correlated CCSD methods were also implemented. The code exhibited excellent strong parallel scaling on a commodity cluster (≈65% and ≈51% parallel efficiency, respectively, on 16 and 32 nodes relative to 1 node) and excellent absolute efficiency (the calculation of the particle-particle-ladder, or ABCD, term utilized 85% of the machine peak on 1 node and 70% of the machine peak on 32 nodes).89 Similar performance was observed on the high-end IBM BlueGene/Q Mira system at Argonne National Laboratory. The high efficiency of the implementation made it possible89 to determine with high precision the CCSD binding energy of the uracil dimer and revealed errors in previous benchmarks; these CCSD predictions were supported by the later data from Brauer, Kesharwani, and Martin.90 The explicitly correlated CCSD implementation was also used to validate the precision of the PNO-based reduced-scaling explicitly correlated CCSD method: the conventional cc-pVDZ-F12 CCSD energy of the water 20-mer was evaluated in 94 min on 32 768 cores of Mira.The DF-direct CCSD implementation was compared62 to the reference GC-dDMP-B benchmark (103 correlated occupied orbitals and 1042 basis functions) of Anisimov et al.91 obtained with their improved Kobayashi–Rendell closed-shell CCSD implementation in NWChem. The MPQC implementation of DF-CCSD was conservatively estimated to be more than 25× efficient.
Efficient DF-CCSD(T) for heterogeneous platforms.62 The closed-shell (T) energy implementation aimed at high-end hardware with limited memory (such as accelerators) was developed in MPQC. The code demonstrated perfect intranode linear scaling with respect to the number of threads and excellent strong scaling with respect to the number of nodes (parallel efficiency of ≈80% on 32 nodes relative to 1 node). The absolute CCSD(T) efficiency of MPQC was compared against the GC-dDMP-B (T) benchmark computed with NWChem by Anisimov et al.91 [at the time, it was the largest conventional CCSD(T) calculation reported]. On 64 nodes of BlueRidge, the MPQC time to solution for (T) was 47.4 h vs 1.4 h on 20 000 essentially identical nodes used by Anisimov et al.91 The implementation was extended to exploit CUDA-compatible GPUs, using the CUDA extension of the TiledArray framework. In order to achieve high efficiency, not only were tensor contractions moved to the GPU but also other pieces of tensor algebra, namely, transposes and reductions. For the water 14-mer in the cc-pVDZ basis, the central processing unit (CPU)-only execution was {3.5, 5.8} times slower than execution on {one, two} NVIDIA P100 GPUs. This is encouraging performance compared to the {4.4, 8.7} ideal speedups estimated from the peak floating point operations per second (FLOPS) rates of the hardware. The GPU-capable (T) code exhibited excellent strong scaling, with a speedup of 13.5 on 16 dual-P100 nodes of NewRiver for an cluster (cc-pVDZ basis set).
Efforts are underway to develop efficient and scalable reduced-scaling many-body methods in MPQC. Along with the development of the production code, simulated implementations of PNO-based CCSD methods have been developed for the ground92 and excited states.67 Development of reduced-scaling methods for solids has begun with periodic Hartree–Fock with fast exchange evaluation and a crystalline orbital localizer.
B. Composability
MPQC is designed for the rapid implementation of new methods in a form capable of production-level performance on parallel architectures. Unlike the monolithic packages that have dominated our community until recently, the modular architecture of MPQC makes it possible to compose new functionality outside of the MPQC source tree. There are several such composition mechanisms that MPQC supports (both mechanisms were also supported by legacy MPQC).
Plug-in: new functionalities are plugged into the MPQC codebase by implementing predefined abstract interfaces, i.e., by deriving from abstract MPQC classes. The libraries containing the new functionalities are linked into the MPQC executable and become usable as a part of the MPQC program.
Plug-out: MPQC libraries are plugged into an external project as any other external dependency.
These mechanisms are discussed in more detail next.
1. Plug-in composition
The plug-in style of composition makes it possible to add new functionality to MPQC by incorporating new classes that implement abstract interfaces that are already defined in MPQC. The new classes will have the same status as the core classes of MPQC, i.e., the user will be able to use them in exactly the same manner as the native classes. This approach is best demonstrated by discussing a standalone MP2 energy code that plugs into MPQC (see Fig. 6), an example that is simple but that illustrates the key steps necessary to implement a new many-body electronic structure method, without cluttering the presentation with nonessential but practically important details such as precision control and memoization.
Since we are going to compute the MP2 energy in the LCAO representation (i.e., using MOs), the MP2 class is derived from the LCAOWfn base, which introduces the concepts of LCAO orbital spaces partitioned into core, active, and inactive subsets. The class is intended to only compute the total energy, so it is derived from the Provides ⟨Energy⟩ class, which makes MP2 capable of evaluating the Energy property (which represents a Taylor expansion of the energy in terms of the atomic coordinates). This is a key design difference with legacy MPQC, in which the base Wavefunction class was not designed to deal with properties. Putting all the known property methods into the base wave function class introduces undesired coupling of all property classes with all wave function classes. To decouple the properties from the wave function, MPQC4 uses the Acyclic Visitor pattern,93 which makes it possible to, in effect, implement double dispatch in C++. Deriving a particular wave function class from Provides⟨Properties…⟩ then mixes in an abstract interface for evaluating Properties of that particular wave function, rather than the abstract base for all wave functions.
The property interface includes two functions: can_evaluate and evaluate (these are overloaded for each property type). The former can be used to query whether the wave function can evaluate a particular requested property. For example, if we did not implement any support for the frozen core in the MP2 class and the user requested the MP2 energy with nonzero frozen core orbitals, then this function would return false. In our case, can_evaluate returns true if the computation of only the energy is requested, rather than the energy and its derivatives. Evaluation of the MP2 energy in the evaluate method is straightforward. The Hartree–Fock energy is evaluated first. For simplicity, we do not here assume a particular choice of orbitals; if canonical orbitals were needed, then they would need to be requested explicitly from the RHF object. The init_sdref populates the orbital registry of the wave function context with various types of spaces (active occupied, unoccupied) so that the AO/LCAO integral factories can compute integrals using the DSL described in Sec. III D. Finally, the compute_mp2_energy function evaluates the MP1 amplitudes and the MP2 energy. The body of the solver is not shown for brevity; it includes the MP1 residual evaluation, as shown in Sec. III C, and the amplitude update.
Macro MPQC_EXPORT_CLASS2 registers the KeyVal constructor of the MP2 class so that it can be invoked just like those of the built-in MPQC classes in the MPQC inputs. For example, to use this MP2 class instead of the built-in RMP2 MPQC class in Fig. 3, the mp2:type keyword should be set to “MP2” instead of “RMP2.” The ForceLink forces the linking of the MP2 class code into the MPQC main function. Building of the MP2 plugin program is trivial with CMake; it is almost identical to how the plug-out example program is built, which we discuss next.
2. Plug-out composition
The plug-out style of composition makes it possible to develop new capabilities by importing MPQC modules into an arbitrary codebase. Figures 4 and 5 are examples of such an approach: both are fragments of standalone programs that import MPQC libraries.
The use of the plug-out approach can be completely trivial when the only modules of MPQC to be imported are those that do not require the initialization of runtime components; an example is the KeyVal library. In this scenario, the use of MPQC is as trivial as the use of any other external library. However, most MPQC modules do require initialization of one or more runtime components, such as the MPI library, MADNESS parallel runtime, and TiledArray framework’s runtime components. Thus, to use such modules, it is usually necessary to call the mpqc::initialize function (part of the init module) and call mpqc::finalize afterward. There are several variants of mpqc::initialize to support several initialization scenarios (MPI is initialized by MPQC or not, the default context spans the entire MPI_COMM_WORLD or not, etc.). The basic initialization is sufficient for most needs, however. Figure 7 shows the resulting plug-out harness that, when combined with Figs. 4 or 5, forms a complete program.
To make building a program that uses MPQC easier, the MPQC CMake targets are exported so that they can be easily imported from any outside package that uses the CMake build system (which, by now, is a de-facto standard, at least for C++). Figure 8 shows an example of how all the MPQC modules can be imported into a user’s executable. The only difference in how plug-in and plug-out programs are built is the former link to the MPQC main target, which includes the MPQC executable’s main function along with the complete set of MPQC modules.
V. CRITICAL ASSESSMENT
The discussion so far has focused on how MPQC came to be, how it is engineered, what types of computations it can perform, and how it can be used to compose new methods. To put various traits of MPQC in proper perspective, let us assess MPQC from the standpoint of two user types, novices and experts, however skewed such an assessment may be by the limited perspective of a single research group.
A. MPQC for novice users
Novices, by definition, have little to no experience in electronic structure theory. Unfortunately, most novices in our domain also have limited or altogether missing programming skills. Having to learn programming as well as the formalism of electronic structure at the same time is a daunting challenge for most novices. Thus, the critical challenge is to avoid overwhelming novices by putting too much on their plate right away. Research into how skills are acquired,94,95 as well as common sense, suggests that to flatten the learning curve, novice education should use carefully designed exercises, with each exercise broken down into small steps, each of which requires as little context as possible. To minimize the context, it also helps to learn in sandboxed environments, in which all details that are essential for experts but nonessential for novices are hidden. There are public educational materials96 built according to these ideas (undoubtedly, many groups in our field have private materials similar to these). We should also mention Psi4NumPy as an example of a sandboxing environment designed primarily for learning.97 The reality of how novices learn in our domain of course deviates from the above ideal, sometimes dramatically. Part of the reason is the amount of effort involved in developing specialized learning experiences for the novices; equally significant is the desire on the part of the supervisor and the novice to engage in research as soon as possible. Thus, often novices are almost immediately tasked with a research problem involving the use of expert-level tools; equally often, there is little to no feedback from more advanced practitioners.
MPQC is designed to be an expert tool, suitable for performing electronic structure calculations where resource constraints mandate the use of parallel computers. Thus, learning electronic structure formalism and/or programming with the help of MPQC can be an intimidating experience for most novices. There are several reasons for this:
Expert-level design: MPQC components are designed to be used in expert tasks. This often means additional complexity that is counterproductive for learning purposes. For example, wave function classes in MPQC strive to guarantee precision of the properties they are asked to evaluate (e.g., in the discussion of the plug-in composition in Sec. IV B 1, we mentioned precision control, but deemed it nonessential to the simple example). Precision estimates are very useful for being able, for example, to evaluate guaranteed-precision finite-difference derivatives of properties or for implementing gradual ramp-up of precision in iterative solvers (e.g., geometry optimization). However, estimating precision of a property is hard and relies on heuristic rules. Another example is the clustered structure of Molecule and AtomicBasis classes: unlike easy-to-comprehend flat sequences of atoms and shells used, for example, by the Libint2 high-level API that was originally designed purely for educational purposes, Molecule and AtomicBasis classes provide a heuristic clustered structure that is essential for revealing sparsity in AO-basis states and operators. This complexity makes using these classes more difficult to novices but is essential for experts. These and other expert-level features make the design of MPQC components more complicated than ideal for novices.
The C++ programming language: C++ is arguably among the most complex programming languages. It is complex because it is an old language evolved to be usable for any computing task, from low-level system programming (its low-level abstractions map almost directly on the hardware) to planet-scale system design. By supporting so many programming paradigms while rarely sacrificing backward compatibility, C++ is huge and continues to evolve at a rapid pace (there is a new standard every 3 yr, with multiple paradigm-shifting changes in store for the 2020 ISO standard). The inventor of C++ once noted that “within C++, there is a much smaller and cleaner language struggling to get out.”98 This is indeed what is often referred to as the Modern C++; it is now possible to write simpler and safer code in C++, without sacrificing performance or resource requirements. However, it is still difficult to teach novices how to get started with the modern subset of C++. Many critical features for novices are still missing in C++, most importantly concepts (which would make error messages due to invalid template instantiations readable) and modules (which would make building C++ projects easier and faster); both are slated for standardization in the 2020 C++ standard. Finally, many tooling issues, such as the complexity of building C++ programs, the lack of standard ABI, and the use of a preprocessor, among others, make using C++ difficult for beginners and experts alike.
Parallelism: all end-user components such as wave function classes are expected to function correctly for MPQC with 1 or 17 MPI ranks. Some methods in MPQC are parallelized implicitly by leveraging the TiledArray tensor framework for parallel computation. However, many tasks need to be parallelized explicitly by decomposing computation into tasks. This complicates the code and takes the user focus away from science. Obviously, for pilot implementations, the user can assume that the code will always be invoked with 1 MPI rank; however, all contributions to the MPQC proper, no matter how minor, must pass tests with multiple MPI ranks.
Of the three factors listed here, C++ is probably the most troublesome; we have not figured out a good way to flatten the learning curve for C++. Many years of deliberate practice, accompanied by expert feedback, and continuous education is the only way to master C++.
Due to these reasons, novice researchers in the Valeev research group typically do not learn electronic structure and programming using MPQC components; instead, learning projects start with the use of standard C++ augmented by linear algebra capability (e.g., the Eigen library), followed by the use of the Libint library for toy codes implementing basic models of electronic structure, followed by an increasing use of MPQC components if the research project warrants it.
To work around the technical challenges encountered by the novices due to the use of C++ in MPQC, we recently developed a prototype Python interface for TiledArray and MPQC. Python is a popular general-purpose multiparadigm programming language. Python is much easier for novice programmers to learn due to its simpler type system, clean syntax, interpreted nature, robust module system and tooling, dominance of a single implementation of the language, and widely available educational materials aimed at beginners. Another major plus is the huge ecosystem of community libraries, such as NumPy and SciPy, which provide objects and functions to handle a multitude of scientific computing problems, e.g., linear algebra and tensor contraction. Python development has also greatly benefited from recent explosion in machine learning as it has become the de facto lingua franca of data science. All these strengths make it an attractive language for education, idea exploration, and small-scale system design. For these reasons, our domain, as well as scientific computing in general, has seen an increase in use of Python for both education and in research codes.
In designing the Python interface, a crucial consideration was to expose not only top-level functionality (e.g., wave function classes) but also enough low-level functionality to make the endeavor worthwhile. The former would be already useful for workflow management and for plugging MPQC into external frameworks, such as ASE;45 however, to use MPQC for education and rapid prototyping requires the latter. A key prerequisite for this work was the development of the Python API for the TiledArray tensor framework, which involved solving several interesting technical problems [such as how to robustly send (member) function pointers between MPI ranks despite the address randomization by the OS/linker]. The Python TA API currently supports distributed-memory computation with dense data (CPU-only for now) using both a DSL similar to the C++ TA DSL and an einsum interface more familiar to Python users. The API provides seamless interoperability between NumPy arrays and TA tensors through the buffer protocol. It is also possible to implement custom expressions by injecting user operations from Python to C++; with some care, custom Python ops (e.g., lambda functions) can be executed concurrently, thus allowing Python API to execute robustly on clusters of multicore nodes just like the C++ API. Of course, it is not possible to expose fully the complete set of TA capabilities to Python since the pervasive genericity of TA would ultimately require C++ code generation at runtime. Nevertheless, the limited set of use cases supported by the TA Python API should cover most of the needs of Python users of TA.
The MPQC Python API builds on the TA Python API by exposing core C++ components (Molecule, AtomicBasis, and LCAO/AO factories) as well as the end-user methods (wave functions, properties, orbitals, Hartree–Fock, Coupled Cluster, etc.). Figure 9 illustrates how one can create complex workflows without shell scripting and log file parsing. The interface allows us to create objects using pythonic as well as C++-like constructs. Figure 10 highlights the current state of the MPQC Python plugin API by demonstrating how the MP2 plug-in written in C++ (Fig. 6) can be implemented in Python. The Python plug-in is structured very similar to the C++ counterpart, but of course, it is not invoked from MPQC’s main function; instead, it is invoked from the Python interpreter. The main highlight is the implementation of the compute_mp2_energy function in Python; it illustrates how easy it is to compute MO-basis integrals and how easy it is to compose tensor algebra using a Python code that looks very much like the TA DSL in C++. Note that the example can be executed efficiently on a distributed-memory machine just like the C++ counterpart. A key advantage of the Python code is the greatly reduced development loop turnaround: instead of compiling a heavily templated C++ code every time even a minor change is made, reloading preinstantiated C++ code every time the Python program is restarted is significantly quicker. Of course, the Python code can be easily integrated with other libraries, such as NumPy. This allows us to prototype and develop algorithms much faster than in C++, leveraging third-party Python libraries while maintaining serial and distributed performance granted by MPQC and TA.
B. MPQC for experts
The discussion of the first use case should not be interpreted as “Python is easier than C++, hence better.” The simplicity has its price. What makes Python “easy” for beginners makes it limiting for experts. For example, the dynamic type system of Python makes it more difficult to build large systems than in C++ that has a static type system (modulo some C-legacy escape clauses); as a result, most large-scale systems are developed in statically typed languages. Second, the performance overhead and resource management constraints of Python are not acceptable for low-level kernels, which makes it necessary to supplement Python with native-language components. Third, Python is designed to deal with modern concurrent and heterogeneous hardware, thus again requiring native-language components. These, and other, limitations exclude Python from being the primary choice for expert-centric designs, such as MPQC. The ongoing experimentation with the Python API for MPQC will hopefully shed light on its usefulness as a component of an expert toolkit. Note that even limited use of Python, in the user API or for component integration, already increased our costs: there is more application code to write/maintain, additional complexity of building, testing, and deploying harnesses, and more APIs to learn.
While it is obvious that some languages are less appropriate for expert-level designs, an important follow-up question is what are the pros and cons of C++ as the current implementation choice for MPQC and its components? and, finally, are there alternatives to C++ that exist today or are on the horizon?
Some of the cons of C++ were already mentioned when discussing why C++ is difficult to use for beginners, and they apply to the experts as well: language size and complexity (keeping up with modern C++ is a full-time job), missing or insufficient support for heterogeneous and distributed-memory programming, long compile times, unreadable error messages, and lack of integrated build and package management, among others. Some of these are being addressed (e.g., see the introduction of modules and concepts in C++20), but some problems may be impossible to resolve fully (e.g., compile times). C++ tooling is also improving. Modern integrated development environments (IDEs) for C++ have vastly improved code introspection and transformation capabilities compared to their counterparts from even a decade ago. The de-facto standard C++ (meta)build system, CMake, makes it increasingly easier to build C++ projects reliably, accurately, and portably.
The choice of C++ for MPQC in the early 1990s was ambitious, considering the immaturity of the language and tooling. However, compared to the available languages, C++ had a clear upside: support for more advanced programming styles than C or Fortran without sacrificing their performance or portability. When we decided to rewrite MPQC in early 2010s, C++ was still the only choice that provided the expert-level features and was available on the large-scale platforms that MPQC targets. Ultimately, the challenges posed by C++ are outweighed by the power of expressive APIs that it supports, such as the TA tensor DSL that allows even novice developers to write high performance parallel codes (for example, the high-order coupled-cluster codes in MPQC were developed by a postdoc who never wrote a single line of C++ before joining our group).
Would C++ be the right choice if we were starting MPQC today? Safer/cleaner alternatives to C++ such as Rust, D, or Go would deserve some consideration, but none have C++’s buy-in from the community or availability on the leading HPC platforms of the day. Thus, C++ would potentially still be the logical choice for a project of this kind, but at least, there are other realistic alternatives to consider.
VI. SUMMARY AND OUTLOOK
Over its ∼30 yr history, MPQC has undergone several major changes, but its goals have stayed the same: to enable the development of electronic structure methods for efficient execution on massively parallel computer platforms. The latest (fourth) version of the package is a complete rewrite of the legacy MPQC codebase in order to support facile composition of conventional and reduced-scaling electronic structure methods on modern increasingly heterogeneous computers. MPQC is implemented in the modern C++ programming language, using the primarily object-oriented design. The modular design of MPQC enables the reuse of code within the package as well as the composition of new functionality outside of the MPQC codebase via several approaches.
The ongoing development of MPQC is focused in several directions. The first major direction is focused on extending support for performance-portable execution on modern heterogeneous platforms. Unfortunately, the programming for such platforms is difficult for several reasons: (1) the existing programming models are relatively immature and are usually vendor-specific, (2) the programming models are asynchronous, and (3) resource management is complicated. Our experience with adding support for NVIDIA’s CUDA in TiledArray and MPQC was a first step in the ongoing efforts to support the programming models across all major vendors. These efforts largely fall under the umbrella of the Exascale Computing Project activities, in which MPQC is involved as an independent swimlane in the NWChemEx project. The work on programming models and runtimes for performance-portable execution on heterogeneous distributed-memory platforms is also supported by the NSF EPEXA project in collaboration with computational scientists at Stony Brook University and the University of Tennessee.
Another major effort is focused on minimizing the programming involved in the development of many-body physics solvers. To this end, we are developing a runtime engine to support Wick’s theorem algebra and symbolic tensor algebra coupled to the numeric tensor algebra provided by TiledArray and other backends. The goal is to maximally automate the development of both conventional and reduced-scaling many-body theories, which will allow us to cover the feature space needed for the practical application of such methods quickly. Of course, the development of reduced-scaling many-body methods, which is needed to understand how to implement such methods efficiently, is continuing in parallel with the automation work.
DATA AVAILABILITY
Data sharing is not applicable to this article as no new data were created or analyzed in this study.
ACKNOWLEDGMENTS
The current MPQC team acknowledges the visionary design and implementation of the original MPQC code by Curt Janssen, Ed Seidl, Ida Nielsen, Michael Colvin, Joe Kenny, and Matt Leininger whose ideas and code made our work much simpler. E.F.V. would like to thank Curt Janssen and Ed Seidl for sharing details of the early MPQC history, and they are this paper’s virtual co-authors. E.F.V. would also like to thank Matt Leininger for hosting him during the many Sandia visits in 2002–2004. E.F.V. would also like to thank Curt Janssen for his invaluable mentorship since joining the MPQC team in 2001 and the lasting influence on his evolution as a computational scientist.
This work was supported by the U.S. National Science Foundation (Award Nos. 1550456 and 1800348) and by the Exascale Computing Project (No. 17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration.
REFERENCES
Peeking ahead, introduction of modules in C++ in the just-finalized 2020 standard will simplify the reuse even more by removing the need to specify properties of “modules” in the domain-specific language of the build tool; module properties will be specified in the C++ source code directly