We present here a massively parallel implementation of the recently developed CPS(D-3) excitation energy model that is based on cluster perturbation theory. The new algorithm extends the one developed in Baudin et al. [J. Chem. Phys., 150, 134110 (2019)] to leverage multiple nodes and utilize graphical processing units for the acceleration of heavy tensor contractions. Furthermore, we show that the extended algorithm scales efficiently with increasing amounts of computational resources and that the developed code enables CPS(D-3) excitation energy calculations on large molecular systems with a low time-to-solution. More specifically, calculations on systems with over 100 atoms and 1000 basis functions are possible in a few hours of wall clock time. This establishes CPS(D-3) excitation energies as a computationally efficient alternative to those obtained from the coupled-cluster singles and doubles model.

Electronic excitation energies are relevant in several research fields, such as photochemistry, where chemical reactions are initiated by electromagnetic radiation, usually visible or ultraviolet light. The energy of electromagnetic radiation initiates a multitude of photochemical processes relevant to atmospheric chemistry, the vision in the eye, and the techniques behind photography and detection, along with the photosynthesis in the green leaves.1,2 In the latter case, the chlorophyll in the leaves absorbs parts of the light from the Sun and thereby provides energy to form glucose and oxygen from carbon dioxide and water. Other interesting photochemical reactions include industrial products produced by photochemical processes, such as the pesticide lindane (hexachlorocyclohexane) and the breakdown of environmentally harmful organic compounds by photo-oxidation with hydrogen peroxide or by photocatalytic oxidation using titanium oxide as catalyst.1,2 Excitation energies are also crucial when investigating chemical reactions induced by the monochromatic radiation of a laser where it is possible to form reactants with high precision in specific excited quantum states with accurate excitation energies.1,2 The laser-induced chemical reactions involve bimolecular reactions and unimolecular reactions (photochemistry with lasers), and in order to understand and control these reactions, it is crucial to perform accurate calculations of excitation energies.

With the advent of exascale supercomputers and increasing availability of computational resources, ab initio quantum chemistry enters a phase where a key challenge in pushing the limits of current methodologies is to develop massively parallel codes that can efficiently utilize heterogeneous clusters with several hundreds of nodes that contain both multiple central processing units (CPUs) and multiple graphical processing units (GPUs). This work consists of two main facets. First, the theoretical models need to be parallel in nature such that a parallel implementation is possible. The second challenge is to develop codes that exploit the potential parallelization of a given method and to ensure that it also makes efficient use of accelerators, such as GPUs, to increase the speed of, e.g., heavy tensor contractions in quantum chemistry by several orders of magnitude.3–5 

For methodologies such as Hartree–Fock (HF)6–14 and Density Functional Theory (DFT),15–24 both of the above-mentioned challenges have been extensively addressed.25,26 Both were implemented in highly parallel codes and scale very well with large amounts of computational resources as work can be distributed over different nodes and/or GPUs using distributed Fock builds. An example of this is a recent work that has led to a GPU-accelerated two-electron repulsion integral (ERIs) code that enables HF calculations on systems as large as a green fluorescent protein with iterations times of a few hours using multiple nodes with multiple GPUs.27 

Meanwhile, correlated wave function theories, such as Coupled Cluster (CC) methods, produce much more reliable results than those obtained from HF and DFT and could enjoy a much wider scope in applications. Unfortunately, parallelization of conventional CC theory presents a substantial challenge. Not only is the formal scaling of commonly used CC models, such as CC2,28 CCSD, CC3,29 and CCSD(T),30 much higher than mean-field theories, but they are also particularly difficult to scale to a large amount of computational resources as the solution to the non-linear amplitude equations is inherently difficult to parallelize using distributed memory schemes. Yet, over the last few decades, several different approaches have been explored in order to overcome the steep scaling of CC calculations. These include, but are not limited to, the following: (I) Methods that exploit rank sparsity in the ERIs, such as the resolution of the identity (RI) approximation (also known as density fitting)31–35 and Cholesky decomposition36–38 as well as further factorization using the pseudospectral approximation,39–43 the chain-of-spheres approximation,44,45 or tensor hypercontraction.46–49 Another alternative are tensor compression techniques that project, e.g., the double or triple amplitudes into a reduced space to reduce the size of the calculations.50–54 These approximations can efficiently lower the formal scaling of CC methods as well as the memory requirements at the cost of some accuracy. However, they have not yet solved the difficulties in parallelizing conventional CC theories. (II) Methods that utilize element-wise locality in CC calculations where, e.g., the molecular orbitals are localized in space such that the CC calculation can be divided into smaller CC calculations on fragments of the system. These fragments are only correlated with other fragments with which they actually interact based on some cutoff thresholds, e.g., distance. These methods can be divided into two families: (1) based on reduction in excitation orbital spaces55–69 and (2) correlation within local fragments70–86 that have seen great success for energy calculations and in achieving massively parallel implementations. However, the same success has not been achieved for molecular properties, such as excitation energies, that are inherently delocalized in nature.

Another strategy for developing approximation to CC models that have both a lower formal scaling and are more prone to parallelization is to utilize perturbation theory. In this context, the most prevalent examples of this are Møller–Plesset perturbation theory (MPPT)87 and Coupled Cluster Perturbation Theory (CCPT).88–90 Yet, none of these have been successfully extended to comprehend time-dependent molecular properties. Recently, we developed Cluster Perturbation (CP) theory,91–98 which has overcome the drawbacks of MPPT and CCPT. CP series have been developed for both ground state energies (first-order molecular properties) and excitation energies while we have also shown the existence of CP series for response properties. The essence of CP theory is that the target excitation space is divided into a parent and an auxiliary excitation space. The parent space spans the zeroth-order solution to the perturbation series, while the contributions from the auxiliary space are added perturbatively. An example is the CPS(D) model where the zeroth-order solution is the CCS wave function and its associated properties, and we target the CCSD wave function and its associated properties by deriving the perturbative corrections arising from the doubles’ excitation space and its interaction with the singles’ excitation space. Generally, single reference-based approaches perform excellently for molecular systems that are dominated by a single Slater determinant and investigated at the equilibrium molecular structure. For cases where there is more than one dominating Slater determinant it is crucial to utilise multi-reference methods.28–30,99–102

In Ref. 93, an initial implementation of excitation energies using the CPS(D-3) model was outlined and it is seen that the CPS(D-3) model gives excitation energies of CCSD quality. In the current paper, we extend this implementation of CPS(D-3) excitation energies to exploit massively parallel heterogeneous computer hardware with accelerators, such as GPUs. This paper is structured in the following way. In Sec. II, we present the theoretical aspects of the CPS(D-3) method for excitation energies, and in Sec. III, we outline the developed CPS(D-3) excitation energy algorithm. Following that, we present a scaling analysis and showcase the ability of the algorithm to exploit a large amount of computational resources in Sec. IV. Finally, Sec. V contains some concluding remarks.

In conventional CC theory, CCSD excitation energies ωx are obtained as eigenvalues of the non-Hermitian CCSD Jacobian. We have illustrated that by using right- and left-eigenvalue equations
(1)
(2)
where Rx and Lx are the corresponding right and left eigenvectors for the excited state x with the eigenvalue or excitation energy ωx. Rx and Lx are usually chosen to be orthonormal,
(3)
and the elements of the CCSD Jacobian are given as
(4)
where H0 is the electronic Hamiltonian of the molecular system and T is the cluster operator. The cluster operator in CCSD theory is given as
(5)
where the individual operators are given by
(6)
The cluster operator contains the cluster amplitudes tμi and the corresponding many-body excitation operators θμi that are used to generate rotations in the Hartree–Fock reference orbital space
(7)
with i denoting excitation level and μi being a specific excitation at that level. The cluster amplitudes are determined such that they satisfy the CCSD amplitude equation
(8)
which in conventional CCSD excitation energy calculations has to be solved iteratively before solving the CCSD Jacobian eigenvalue problem. Both of these operations formally scale iteratively with O(N6), where N is the size of the system. This scaling represents the formal scaling, which, together with the memory requirements, limit the systems that can be treated using CCSD.
In Refs. 92 and 93, it was outlined how CCSD excitation energies can be approximated to high accuracy utilizing the CPS(D-3) model of CP theory. In the CPS(D) series, the CCSD excitation energies are expanded in orders of the fluctuation potential as
(9)
using the Møller–Plesset partitioning of the Hamiltonian
(10)
where f is the Fock operator and Φ is the fluctuation potential. The fluctuation potential is the perturbation operator in the CPS(D) series. The zeroth-order excitation energy is the CCS excitation energy
(11)
Similarly, the CCSD Jacobian as well as the right and left eigenvectors are expanded in orders of the fluctuation potential as
(12)
(13)
(14)
and therefore, we obtain the zeroth-order eigenvalue equations as
(15)
(16)
and the zeroth-order eigenvectors are chosen to be orthonormal
(17)
Using the generalized order concept of CP theory, it is required that the zeroth-order Jacobian must contain the CCS Jacobian in the singles excitation space while it contains a zeroth-order Fock operator contribution in the doubles excitation space
(18)
to ensure that CCS excitation energies are the zeroth-order eigenvalues. Furthermore, the zeroth-order eigenvectors have the CCS eigenvectors in the singles excitation space and a vanishing doubles component, meaning that
(19)
as the CCS Jacobian is Hermitian. In order to determine corrections to the Jacobian, the left and right eigenvectors, and ultimately the excitation energy through orders in the fluctuation potential, it is necessary to expand the CCSD amplitudes in orders of the fluctuation potential
(20)
and solve the CCSD amplitude equations in order of the fluctuation potential. We have carried out the derivation of the CPS(D) series for the CCSD amplitudes, the CCSD Jacobian, and the eigenvectors in Refs. 91–96, and we refer the reader to these papers for detailed derivations of these equations. We have summarized explicit corrections for the cluster amplitudes and the right excitation vector through the second order as well as the Jacobian through third order in  Appendix A. These can be used to obtain the first-, second-, and third-order CPS(D) excitation energy corrections as
(21)
(22)
(23)
Using the notation that
(24)
(25)
and
(26)
The second-order energy correction is the well-known non-iterative doubles correction of the CIS(D) model,103 and it can be obtained with a formal scaling of O(N5). Meanwhile, the bottleneck of the CPS(D-3) calculation is the third-order correction to the excitation energy that formally scales O(N6) with system size. Our focus in Sec. III will therefore be limited to the extension of the algorithm developed in CP III for the calculation of ω(3) to utilize several nodes and GPU acceleration of tensor contractions to enable CPS(D-3) excitation energy calculations for large molecular systems with small time to solution.
The multi-node multi-GPU accelerated CPS(D-3) algorithm developed here builds on the algorithm developed in Ref. 93, which is outlined in Algorithm 1 that builds on the singlet biorthonormal basis working equations given in  Appendix B. Using the resolution of the identity (RI) approximation104,105 for two-electron repulsion integrals, three center integral fitting coefficients are constructed from an auxiliary basis set using the Coulomb metric
(27)
such that conventional 4 index two-electron repulsion integrals can be approximated as
(28)
The RI approximation efficiently enables the calculation ω(3) in batches over a virtual index where incremental contributions to ω(3) from each batch can be determined independently and on-the-fly. Furthermore, the strategy effectively eliminates the storage of fourth-order tensors in main memory as these are decomposed using the RI approximation such that the three index fitting coefficients are the largest quantity to be stored. Even though this approach requires recalculation of, e.g., first-order doubles amplitudes t2(1) for each batch, it enables the batch size to be scaled to fit the available memory as the largest quantity, being the fully virtual three index fitting coefficients obtained from the RI approximation, can be written to disk and read into memory in batches over a virtual index. These features enable a massively parallel algorithm.
ALGORITHM 1.

Outline of the massively parallel algorithm for calculating the third-order excitation energy correction, ω(3), using the RI approximation in batches over a virtual index.

1: Calculate transformation matrices, X̄, Ȳ, X̆, Y̆
2: Calculate three center integral fitting coefficients, BaiP, BijP, BĭaP, BiăP 
3: Calculate one index transformed Fock matrices F̄ia, F̄ij, F̄ab 
4: Calculate fully virtual three index integral fitting coefficients BabP and BābP 
and distribute in global memory using tiled tensors. 
5: for A (batch over virtual index a) do 
6: Calculate second-order doubles intermediate XAibjt (see Algorithm 2
7: Calculate (iA|̆jb)=P(BĭAPBjbP+BiĂPBjbP+BiAPBj̆bP+BiAPBjb̆P) 
8: Get excitation energy contribution from the current batch: 
ω(3)=ω(3)+aAibj(2XaibjtXajbit)εiεa+εjεb(P̂ijabRaiCCSF̄jb(ia|̆jb)) 
9: End batch loop over
10: Deallocate BĭaP and BiăP from memory 
11: Calculate three center integral fitting coefficients Bāi, Baī, and Bīj 
12: for A (batch over virtual index a) do 
13: Calculate second-order doubles intermediate XAibjR (see Algorithm 2
14: Calculate (Ai|̄bj)=P(BĀiPBbjP+BAīPBbjP+BAiPBb̄jP+BAiPBbj̄P) 
15: Get excitation energy contribution from the current batch: 
ω(3)=ω(3)+aAibj((2XaibjRXajbiR)εiεa+εjεb+2RaiCCStbj(2)RajCCStbi(2))(Ai|̄bj) 
End batch loop over
1: Calculate transformation matrices, X̄, Ȳ, X̆, Y̆
2: Calculate three center integral fitting coefficients, BaiP, BijP, BĭaP, BiăP 
3: Calculate one index transformed Fock matrices F̄ia, F̄ij, F̄ab 
4: Calculate fully virtual three index integral fitting coefficients BabP and BābP 
and distribute in global memory using tiled tensors. 
5: for A (batch over virtual index a) do 
6: Calculate second-order doubles intermediate XAibjt (see Algorithm 2
7: Calculate (iA|̆jb)=P(BĭAPBjbP+BiĂPBjbP+BiAPBj̆bP+BiAPBjb̆P) 
8: Get excitation energy contribution from the current batch: 
ω(3)=ω(3)+aAibj(2XaibjtXajbit)εiεa+εjεb(P̂ijabRaiCCSF̄jb(ia|̆jb)) 
9: End batch loop over
10: Deallocate BĭaP and BiăP from memory 
11: Calculate three center integral fitting coefficients Bāi, Baī, and Bīj 
12: for A (batch over virtual index a) do 
13: Calculate second-order doubles intermediate XAibjR (see Algorithm 2
14: Calculate (Ai|̄bj)=P(BĀiPBbjP+BAīPBbjP+BAiPBb̄jP+BAiPBbj̄P) 
15: Get excitation energy contribution from the current batch: 
ω(3)=ω(3)+aAibj((2XaibjRXajbiR)εiεa+εjεb+2RaiCCStbj(2)RajCCStbi(2))(Ai|̄bj) 
End batch loop over
ALGORITHM 2.

The algorithm used for the calculation of the first-order correction to the doubles excitation amplitudes taibj(1) and the Xaibjt for a given batch A. The algorithm is also used for the calculation of XaibjR with only minor adjustments.

1: Calculate tAidk(1)=(Ai|dk)/(εiεA+εkεd)
2: Get BcAP tiled distributed tensor 
3: For D (batch over virtual index) do 
4: Extract tAiDk(1) from tAidk(1) 
5: Get BbDP tiled distributed tensor 
6: (bD|kj)=PBbDPBkjP 
7: XAibjt=XAibjtDktAiDk(1)(bD|kj) 
8: XAibjt=XAibjtDktAkDj(1)(bD|ki) 
9: (cA|bD)=PBcAPBbDP 
10: Calculate tDjci(1) 
11: XAibjt=XAibjt+12DctDjci(1)(cA|bD) 
12: End batching over D 
13: For L (batch over occupied index) do 
14: Extract tAkbL(1) from tAkbl(1) 
15: (ik|jL)=PBikPBjLP 
16: XAibjt=XAibjt+12kLtAkbL(1)(ik|jL) 
17: End batching over L 
18: YAiP=ck(2tAick(1)tAkci(1))BckP 
19: XAibjt=XAibjt+PYAiPBbjP 
1: Calculate tAidk(1)=(Ai|dk)/(εiεA+εkεd)
2: Get BcAP tiled distributed tensor 
3: For D (batch over virtual index) do 
4: Extract tAiDk(1) from tAidk(1) 
5: Get BbDP tiled distributed tensor 
6: (bD|kj)=PBbDPBkjP 
7: XAibjt=XAibjtDktAiDk(1)(bD|kj) 
8: XAibjt=XAibjtDktAkDj(1)(bD|ki) 
9: (cA|bD)=PBcAPBbDP 
10: Calculate tDjci(1) 
11: XAibjt=XAibjt+12DctDjci(1)(cA|bD) 
12: End batching over D 
13: For L (batch over occupied index) do 
14: Extract tAkbL(1) from tAkbl(1) 
15: (ik|jL)=PBikPBjLP 
16: XAibjt=XAibjt+12kLtAkbL(1)(ik|jL) 
17: End batching over L 
18: YAiP=ck(2tAick(1)tAkci(1))BckP 
19: XAibjt=XAibjt+PYAiPBbjP 

Using a master-worker model, Algorithm 1 is parallelized for distributed memory architectures using the message passing interface (MPI). The master rank initializes the ω(3) calculation by determining the quantities in lines 1–4 and broadcasts these to all other ranks. In line 5, the master rank distributes the calculation of the first contributions to ω(3) in batches over the virtual index a to the other ranks that execute the calculation of the contribution to ω(3) from a specific batch and return the incremental contribution to ω(3) to the master rank that sums these to get the total ω(3) correction. To avoid reading the fully virtual three index integral fitting coefficients from disk as done in Ref. 93, these are distributed in the collective memory of all nodes in line 4 using tiled distributed tensors through the Scalable Tensor Library (ScaTeLib).106 Once all worker ranks finish the calculation of the first contribution to ω(3), all ranks discard the no longer needed quantities and the master node determines a new set of three center integral fitting coefficients in line 11 as these are needed for the second contribution to ω(3). In line 12, the master rank distributes the calculation of the second contribution to ω(3) in batches over the virtual index a in complete analogy to the first batch loop, yielding the full ω(3) correction.

Algorithm 1 thus enables the calculation of CPS(D-3) excitation energies on system sizes that are beyond the reach of a conventional CCSD calculation, as the model can be effectively parallelized via the on-the-fly calculation of contributions from distributed batches. To that end, the time to solution for a calculation can be decreased by supplying more computational resources until the point where each rank effectively only does a calculation for a single virtual index, i.e., when the number of ranks equals the number of virtual orbitals. At this point, further batching could be introduced inside each virtual batch, but it would reduce compute density; thus, this was not an option that we have pursued in our current implementation.

To further accelerate the CPS(D-3) excitation energy calculation, all expensive tensor contractions in each batch are offloaded to GPUs using the Tensor Algebra Library for Shared Memory Computers (TALSH)107 locally on each rank as well as Open Multiprocessing (OMP) to utilize shared memory parallelism locally for each rank. Using the same philosophy, we also parallelized the calculation of μ1[Φ,T2(1)]HF, which is required in the calculation of ω(2) and t1(2) as the O(N5) scaling of this term proved to be a bottleneck when moving to larger systems.

Consequently, we have outlined and developed a GPU-accelerated hybrid MPI/OMP parallel implementation of CPS(D-3) excitation energies that should scale very efficiently with increasing system size and amounts of computational resources allowing calculations on very large molecular systems, which we will display in Sec. IV.

In the following, we present numerical test calculations to display the performance and scalability of the developed implementation. All calculations utilized the RI-approximation for two-electron integrals employing the RI-basis set corresponding to the utilized (aug-)cc-pVXZ basis set,108–110 e.g., calculations using aug-cc-pVTZ will use the aug-cc-pVTZ-RI. Furthermore, all calculations correlate all electrons. Timings presented in this work are obtained from calculations performed using one or multiple IBM Power System AC922 nodes (two IBM POWER9 processors, six NVIDIA Tesla V100 accelerators, and 512 GB of DDR4 memory) on Summit at the Oak Ridge Leadership Computing Facility. Timings only relate to the calculation of ω(3) and do not include the time spent in the preceding Hartree–Fock, CCS, and CIS(D) calculations, as these do not represent a bottleneck in the calculations. All calculations were run using a local development version of the LSDalton program.111 

To test the scalability of the developed CPS(D-3) algorithm, we perform CPS(D-3)/aug-cc-pVTZ excitation energy calculations on a series of fatty acids CH3(CH2)nCOOH and CPS(D-3)/aug-cc-pVDZ excitation energy calculations on alanine oligomer chains ranging from 8 atoms to 53 atoms and from 276 basis functions, 380 auxiliary basis functions to 1196 basis functions, and 1560 auxiliary basis functions using varying node counts.

The success of the developed algorithm is critically dependent on leveraging the computational resources efficiently to achieve near-ideal speedup when multiple nodes are used. In Fig. 1, we, therefore, display the strong scaling of the CPS(D-3) algorithm in the form of wall times and speedup for different components of the CPS(D-3) algorithm as well as the total CPS(D-3) in the calculation of the first excitation energy of caproic acid [CH3(CH2)4COOH, 828 basis functions, 1088 auxiliary basis functions] using nodes between 1 and 32.

FIG. 1.

Wall time and speedup for the construction of the RHS, the iterative solution of Eq. (29), the evaluation of the third-order excitation energy correction according to Algorithm 1, and the full third-order calculation on caproic acid [CH3(CH2)4COOH, 828 basis functions, 1088 auxiliary basis functions]. The calculations are performed using 1, 2, 4, 8, 16, and 32 nodes.

FIG. 1.

Wall time and speedup for the construction of the RHS, the iterative solution of Eq. (29), the evaluation of the third-order excitation energy correction according to Algorithm 1, and the full third-order calculation on caproic acid [CH3(CH2)4COOH, 828 basis functions, 1088 auxiliary basis functions]. The calculations are performed using 1, 2, 4, 8, 16, and 32 nodes.

Close modal
We find that the overall wall time of CPS(D-3) excitation energy correction calculation drops dramatically as the node count is increased from 1 to 8, thus displaying an excellent speedup of almost 5. Going from 8 to 16 nodes and from 16 to 32 nodes, the serial part of the calculation, being initialization of RI-integrals, overhead associated with data transfer, and distribution of jobs from the master rank, starts taking up a significant portion of the execution time for this relatively small system resulting in modest speedups and lower parallel efficiency of around only 30% and 20%, respectively. This is completely expected with Ahmdahl’s law of strong scaling in mind. Considering different parts of the calculation, it is evident that the initialization, including the construction of the RHS for the linear system of equations
(29)
from which tν1(2) is obtained, is very rapid and the execution time is mainly dominated by the iterative solution of Eq. (29) and the actual determination of the third-order excitation energy correction. Solving Eq. (29) is done using the Conjugate Residual with Optimized Parameters (CROP) solver of Refs. 112 and 113, which is implemented in LSDalton, and this algorithm shows near ideal scaling with increasing node count. Furthermore, the computation of the solution to Eq. (29) formally only scales O(N4) with system size, and the O(N6) scaling of the third-order excitation energy correction will thus dominate for larger systems. The determination of the third-order excitation energy correction does not scale as well with increasing node count achieving a parallel efficiency of only around 50% for this system, as the data transfer between the master and worker ranks start to dominate the execution time. We expect that the overall parallel efficiency of the algorithm increases significantly with larger system sizes as the ratio between data transfer and the number of floating point operations performed will decrease.
However, in actual computations, the amount of computational resources used for the calculation would usually be scaled to fit the size of the system. It is therefore more interesting to consider the weak scaling of the CPS(D-3) excitation energy algorithm and determine the actual scaling from this analysis. To that end, we performed calculations on fatty acids CH3(CH2)nCOOH with n going from 1 to 6 in aug-cc-pVTZ basis using n nodes and on alanine oligomer chains with between 1 and 5 alanine units in aug-cc-pVDZ basis using a node count corresponding to the number of units. In Figs. 2 and 3, we present the wall time and computational cost of the outlined calculations on fatty acids and alanine oligomer chains, where the computational cost is defined as
(30)
Note that the computational cost is plotted against the number of basis functions on a double logarithmic scale, as it will display a linear dependence on the number of basis functions on such scales and that this allows us to obtain the actual computational scaling as the slope of a linear regression on the data points.
FIG. 2.

The wall time and computational cost of different parts of the third-order excitation energy correction calculation on fatty acids CH3(CH2)nCOOH with n going from 1 to 6 in aug-cc-pVTZ basis. The computational cost is defined as the wall time multiplied by the node count, and its relation to system size is fitted to a linear function on a double logarithmic scale to give the scaling with system size.

FIG. 2.

The wall time and computational cost of different parts of the third-order excitation energy correction calculation on fatty acids CH3(CH2)nCOOH with n going from 1 to 6 in aug-cc-pVTZ basis. The computational cost is defined as the wall time multiplied by the node count, and its relation to system size is fitted to a linear function on a double logarithmic scale to give the scaling with system size.

Close modal
FIG. 3.

The wall time and computational cost of different parts of the third-order excitation energy correction calculation on alanine oligomers with between 1 and 5 alanine units in a aug-cc-pVDZ basis using a node count corresponding to the number of units. The computational cost is defined as the wall time multiplied by the node count, and its relation to system size is fitted to a linear function on a double logarithmic scale to give the scaling with system size.

FIG. 3.

The wall time and computational cost of different parts of the third-order excitation energy correction calculation on alanine oligomers with between 1 and 5 alanine units in a aug-cc-pVDZ basis using a node count corresponding to the number of units. The computational cost is defined as the wall time multiplied by the node count, and its relation to system size is fitted to a linear function on a double logarithmic scale to give the scaling with system size.

Close modal

From Figs. 2 and 3, it is evident that the empirically observed computational scaling of N3.5 and N4 for the calculation of the third-order excitation energy correction is significantly lower than the expected N6. This most likely stems from two factors. First, the calculations involve non-negligible parts that formally scales only as N4 and N5, such as the initialization of RI-integrals and the construction of the RHS for Eq. (29), which shows N3.8 and N4.2 scaling in the two cases, the iterative solution of Eq. (29), which shows N3 and N3.6 in the two cases, and many of the smaller tensor contractions. Consequently, it is only parts of third-order excitation correction that scale as N6, which means that the observed overall scaling appears lower. Second, algorithms, such as the one developed here, are often more efficient than the formal scaling prescribes as they never reach the asymptotic limit and their efficiency increases with system size, e.g., due to the exploitation of sparsity in libraries, such as cuBLAS.

In both Figs. 2 and 3, we thus find that our algorithm efficiently leverages computational resources with increasing system sizes and that the scalability of the developed implementation enables routine calculations of CPS(D-3) excitation energies of system sizes over 1000 basis functions. As mentioned previously, the current implementation only employs batching over a single virtual index that limits the scalability with respect to computational resources. Providing a larger number of MPI ranks than the number of virtual orbitals in the system will leave some ranks redundant, yet, this problem can be alleviated by introducing further batching.

To display the potential usage of the developed algorithm in large-scale excitation energy calculations, we perform CPS(D-3) excitation energy calculations on an azaoxahelicene that has been investigated both experimentally and computationally in previous studies114,115 due to its interesting optical properties. Helicenes have previously been researched heavily as chiral antennas in optoelectronic devices in molecular electronics and as sources of chiral emission due to their strong circularly polarized emission.116–119 It is thus crucial to be able to determine accurate excitation energies in order to scrutinize the potential applicability of helicenes and derivatives thereof in said applications from in silico investigations to computationally model and predict the optical properties of helicenes. We do note that triples and even quadruples excitations are in many cases necessary to obtain results that are in complete alignment with experiments, yet, currently such methods are in general out of scope for systems of this size and the CPS(D-3) model is a good option in this case as it neither suffers from the empirical approximations, such as those in, e.g., DFT or the size limitations of CCSD implementations.

We perform CPS(D-3) excitation energy calculations of the 8 lowest singlet excited states on B3LYP-D3/6-31+G(d) structure of the [9]-azaoxahelicene (MeO[9]OMe) displayed in Fig. 4 using the cc-pVDZ basis set with all electrons correlated. We note that this basis set is probably not sufficient for producing results that are directly comparable to experiments, yet, our current goal is simply to showcase the performance of the developed CPS(D-3) implementation in a HPC setting and that calculations, such as the ones performed here, can be carried out routinely. In the CPS(D-3) calculations, we utilize 128 nodes with 6 MPI ranks on each node, as this is where the batching over virtual indices has one batch per rank. This means that the calculations utilize 5292 cores, 756 GPUs, and 387 TB of memory in total.

FIG. 4.

The structure of the studied [9]-azaoxahelicene: MeO[9]OMe.

FIG. 4.

The structure of the studied [9]-azaoxahelicene: MeO[9]OMe.

Close modal

The resulting CCS, CIS(D), and CPS(D-3) excitation energies are given in Table I along with relevant information from the calculations. It is evident that the accumulated corrections to the CCS excitation energies are quite substantial as they range from around 0.3–0.9 eV and that all corrections lead to a lower total excitation energy. Furthermore, we observe that the second-order correction in CIS(D) is much larger than the third-order correction that is added going from CIS(D) to CPS(D-3), which is perfectly in line with previous CP studies on excitation energies in Refs. 92 and 93. It is also seen that CCS does not order the excitation energies in the same way as CIS(D) and CPS(D-3) with, e.g., the second CCS root becoming the lowest excitation root in CIS(D) and CPS(D-3). These tendencies illustrate that CCS is not adequate for determining accurate excitation energies, as the effects of doubles excitations are crucial when determining the excited state properties.

TABLE I.

The 8 lowest singlet CCS, CIS(D), and CPS(D-3) excitation energies of MeO[9]OMe sorted according to the CCS ordering. Furthermore, the number of contracted, primitive, and auxiliary basis functions is given along with the average wall time for the CPS(D-3) calculations for each root.

RootCCS (eV)CIS(D) (eV)CPS(D-3) (eV)
4.064 682 3.758 556 3.797 664 7 
4.378 592 3.637 146 3.731 542 0 
4.804 915 4.190 844 4.247 765 5 
4.913 226 4.084 771 4.144 026 5 
5.105 541 4.302 162 4.327 723 0 
5.178 502 4.360 824 4.337 567 8 
5.498 913 4.738 371 4.829 778 2 
5.871 168 5.312 514 5.383 732 3 
Number of contracted basis functions  915  
Number of primitive basis functions  1601  
Number of auxiliary basis functions  3402  
Average CPS(D-3) wall time per root  147.0 min  
Number of nodes  128  
RootCCS (eV)CIS(D) (eV)CPS(D-3) (eV)
4.064 682 3.758 556 3.797 664 7 
4.378 592 3.637 146 3.731 542 0 
4.804 915 4.190 844 4.247 765 5 
4.913 226 4.084 771 4.144 026 5 
5.105 541 4.302 162 4.327 723 0 
5.178 502 4.360 824 4.337 567 8 
5.498 913 4.738 371 4.829 778 2 
5.871 168 5.312 514 5.383 732 3 
Number of contracted basis functions  915  
Number of primitive basis functions  1601  
Number of auxiliary basis functions  3402  
Average CPS(D-3) wall time per root  147.0 min  
Number of nodes  128  

With regard to timings, the average wall time required to determine the CPS(D-3) correction to a specific root is 147 min using 128 nodes. This illustrates that, with the advent of large-scale HPC facilities CPS(D-3), excitation energy calculations can be carried out routinely for larger molecules with a low time to solution. The low wall times demonstrate that the readily parallelizable CPS(D) framework and our implementation efficiently leverage large amount of computational resources and enable large scale CPS(D-3) excitation energy calculations. By introducing more batching, we could run the calculations using an even larger amount of computational resources and a more specialized GPU code would also provide higher efficiency and enable even larger calculations. However, our current implementation demonstrates the potential for developing massively parallel CPS(D) codes that enable routine calculations on large molecular systems that are not possible using conventional CCSD.

Using the readily parallelizable framework of the CPS(D) methodology, we have developed a large-scale implementation of CPS(D-3) excitation energies. The introduction of hybrid MPI/OMP parallelization and GPU acceleration in the previously outlined CPS(D-3) excitation energy implementation of CP III enables the efficient exploitation of modern high performance computing infrastructure. By batching the calculation of the third-order excitation energy correction ω(3) over a virtual molecular orbital index, the calculation is readily divided into a subset of incremental excitation energy correction evaluations. The fact that each incremental contribution to ω(3) can be evaluated individually provides an ideal domain decomposition that makes the CPS(D-3) model excellently suited for parallelization over large-scale modern heterogeneous computing architectures.

Our hybrid MPI/OMP parallelized and GPU accelerated implementation readily scales to a very large number of compute nodes and achieves high parallel efficiency provided that the size of the system under study is large enough. As evidenced by our numerical illustrations, CPS(D-3) excitation energies can be determined for systems well beyond 1000 basis functions in only a few hours. Furthermore, the empirically observed scaling is only O(N4) for the systems studied here as opposed to the formal O(N6) scaling. Coupling these results with the comparison of CPS(D-3) excitation energies to CCSD and CC3 excitation energies in Ref. 93, it is thus evident that our current work makes excitation energies of CCSD quality readily available for system sizes of several hundred atoms using several thousand basis functions. This establishes CP theory as a viable alternative, e.g., to DFT methods, which is currently the go-to choice for quantum chemical calculations on large molecular systems.

In our current investigation, we have only considered excitation energies, yet, CP theory also comprehends the calculation of ground state energies, first-order properties, and higher-order properties through response functions. All of these can be formulated in a completely analogous way with batching over one or several molecular orbital indices such that incremental perturbation corrections can be evaluated individually on the fly. This allows massively parallel implementations, such as the one presented here. It is thus our aim to establish similar implementations of the CPS(D) series for these properties in the future to allow large scale CPS(D) calculations of various molecular properties. These efforts should make CPS(D) calculations routine for systems containing 100–200 atoms and thus make results of CCSD quality readily available.

On a different note, it would also be interesting to apply tensor decomposition and tensor compression to CP theory as has been done with great success for Møller–Plesset perturbation theory and standard CC theory in the past few years.46–49 Not only have these efforts shown that you can reduce the formal scaling of Møller–Plesset perturbation theory and CC methods, it is also possible to compress the doubles amplitudes such that only a small fraction of the doubles amplitudes are considered with only minor implications on the results.50–54 In future studies, we aim to apply these techniques in CP theory to achieve reduced scaling and reduced storage requirements, and, ultimately, enable calculations on much larger systems.

This research used resources from the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract No. DEAC05-00OR22725. A.E.H.B. and K.V.M. acknowledge the Danish Council for Independent Research (Grant No. DFF-0136-00081B) and the European Union’s Horizon 2020 Framework Program under Grant Agreement No. 951801 for financial support.

Notice: This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The publisher, by accepting the article for publication, acknowledges that the U.S. Government retains a non-exclusive, paid up, irrevocable, world-wide license to publish or reproduce the published form of the manuscript, or allow others to do so, for U.S. Government purposes. The DOE will provide public access to these results in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

The authors have no conflicts to disclose.

Andreas Erbs Hillers-Bendtsen: Data curation (equal); Formal analysis (equal); Investigation (equal); Validation (equal). Dmytro Bykov: Conceptualization (equal); Formal analysis (equal); Investigation (equal); Supervision (equal). Ashleigh Barnes: Investigation (equal). Dmitry Liakh: Investigation (equal). Hector H. Corzo: Investigation (equal); Validation (equal). Jeppe Olsen: Formal analysis (equal); Methodology (equal). Poul Jørgensen: Formal analysis (equal); Investigation (equal). Kurt V. Mikkelsen: Conceptualization (equal); Investigation (equal); Project administration (equal); Supervision (equal).

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Explicit expressions for the amplitudes through third order using the extended CCS Jacobian partitioning and energy corrections through fourth order:
(A1)
(A2)
(A3)
(A4)
(A5)
(A6)
(A7)
(A8)
(A9)
(A10)
(A11)
(A12)
Following the outline of Ref. 93, the excitation energy corrections can be formulated in singlet biorthonormal basis to obtain working equations for the CPS(D-3) excitation energy model as
(B1)
(B2)
where
(B3)
(B4a)
(B4b)
(B5)
(B6)
(B7)
(B8)
(B9)
(B10)
(B11)
with the second-order intermediates Xaibjt and XaibjR given in Tables V and VI of Ref. 93.
1.
G.
Granucci
and
M.
Persico
,
Photochemistry—A Modern Theoretical Perspective
,
1st ed.
(
Springer
,
Berlin
,
2019
).
2.
V.
Balzani
,
P.
Ceroni
, and
A.
Juris
,
Photochemistry and Photophysics Concepts Research Applications
,
1st ed.
(
Wiley-VCH Verlag GmbH
,
Berlin
,
2014
).
3.
I. S.
Ufimtsev
and
T. J.
Martínez
,
J. Chem. Theory Comput.
4
,
222
(
2008
).
4.
I. S.
Ufimtsev
and
T. J.
Martinez
,
J. Chem. Theory Comput.
5
,
1004
(
2009
).
5.
I. S.
Ufimtsev
and
T. J.
Martinez
,
J. Chem. Theory Comput.
5
,
2619
(
2009
).
6.
J.
Almlöf
,
K.
Faegri
, Jr.
, and
K.
Korsell
,
J. Comput. Chem.
3
,
385
(
1982
).
7.
C.
Ochsenfeld
,
C. A.
White
, and
M.
Head-Gordon
,
J. Chem. Phys.
109
,
1663
(
1998
).
8.
M. C.
Strain
,
G. E.
Scuseria
, and
M. J.
Frisch
,
Science
271
,
51
(
1996
).
9.
T. L.
Windus
,
M. W.
Schmidt
, and
M. S.
Gordon
,
Chem. Phys. Lett.
216
,
375
(
1993
).
11.
R.
Ahlrichs
,
M.
Bär
,
M.
Häser
,
H.
Horn
, and
C.
Kölmel
,
Chem. Phys. Lett.
162
,
165
(
1989
).
12.
H. P.
Lüthi
,
J. H.
Ammeter
,
J.
Almlöf
, and
K.
Faegri
,
J. Chem. Phys.
77
,
2002
(
1982
).
13.
W.
Klopper
,
J. Chem. Phys.
102
,
6168
(
1995
).
14.
S.
Sæbø
and
J.
Almlöf
,
Chem. Phys. Lett.
154
,
83
(
1989
).
15.
J. M.
Soler
,
E.
Artacho
,
J. D.
Gale
,
A.
García
,
J.
Junquera
,
P.
Ordejón
, and
D.
Sánchez-Portal
,
J. Phys.: Condens. Matter
14
,
2745
(
2002
).
16.
17.
X.-P.
Li
,
R. W.
Nunes
, and
D.
Vanderbilt
,
Phys. Rev. B
47
,
10891
(
1993
).
18.
D. K.
Jordan
and
D. A.
Mazziotti
,
J. Chem. Phys.
122
,
084114
(
2005
).
19.
P.
Giannozzi
,
S.
Baroni
,
N.
Bonini
,
M.
Calandra
,
R.
Car
,
C.
Cavazzoni
,
D.
Ceresoli
,
G. L.
Chiarotti
,
M.
Cococcioni
,
I.
Dabo
,
A.
Dal Corso
,
S.
de Gironcoli
,
S.
Fabris
,
G.
Fratesi
,
R.
Gebauer
,
U.
Gerstmann
,
C.
Gougoussis
,
A.
Kokalj
,
M.
Lazzeri
,
L.
Martin-Samos
,
N.
Marzari
,
F.
Mauri
,
R.
Mazzarello
,
S.
Paolini
,
A.
Pasquarello
,
L.
Paulatto
,
C.
Sbraccia
,
S.
Scandolo
,
G.
Sclauzero
,
A. P.
Seitsonen
,
A.
Smogunov
,
P.
Umari
, and
R. M.
Wentzcovitch
,
J. Phys.: Condens. Matter
21
,
395502
(
2009
).
20.
G.
Kresse
and
J.
Hafner
,
Phys. Rev. B
47
,
558
(
1993
).
21.
M. J.
Gillan
,
D. R.
Bowler
,
A. S.
Torralba
, and
T.
Miyazaki
,
Comput. Phys. Commun.
177
,
14
(
2007
), proceedings of the Conference on Computational Physics 2006.
22.
J.
VandeVondele
,
M.
Krack
,
F.
Mohamed
,
M.
Parrinello
,
T.
Chassaing
, and
J.
Hutter
,
Comput. Phys. Commun.
167
,
103
(
2005
).
23.
X.
Qin
,
H.
Shang
,
H.
Xiang
,
Z.
Li
, and
J.
Yang
,
Int. J. Quantum Chem.
115
,
647
(
2015
).
24.
T.
Ozaki
and
H.
Kino
,
Phys. Rev. B
72
,
045121
(
2005
).
25.
G. M. J.
Barca
,
J. L.
Galvez-Vallejo
,
D. L.
Poole
,
A. P.
Rendell
, and
M. S.
Gordon
,
J. Chem. Theory Comput.
16
,
7232
(
2020
).
26.
E.
Chow
,
X.
Liu
,
S.
Misra
,
M.
Dukhan
,
M.
Smelyanskiy
,
J. R.
Hammond
,
Y.
Du
,
X.-K.
Liao
, and
P.
Dubey
,
Int. J. High Perform. Comput. Appl.
30
,
85
(
2016
).
27.
K. G.
Johnson
,
S.
Mirchandaney
,
E.
Hoag
,
A.
Heirich
,
A.
Aiken
, and
T. J.
Martínez
,
J. Chem. Theory Comput.
18
,
6522
(
2022
).
28.
O.
Christiansen
,
H.
Koch
, and
P.
Jørgensen
,
Chem. Phys. Lett.
243
,
409
(
1995
).
29.
H.
Koch
,
O.
Christiansen
,
P.
Jørgensen
,
A. M.
Sanchez de Merás
, and
T.
Helgaker
,
J. Chem. Phys.
106
,
1808
(
1997
).
30.
K.
Raghavachari
,
G. W.
Trucks
,
J. A.
Pople
, and
M.
Head-Gordon
,
Chem. Phys. Lett.
157
,
479
(
1989
).
31.
J. L.
Whitten
,
J. Chem. Phys.
58
,
4496
(
1973
).
32.
B. I.
Dunlap
,
J. W.
Connolly
, and
J. R.
Sabin
,
Int. J. Quantum Chem.
12
(
S11
),
81
87
(
1979
).
33.
B. I.
Dunlap
,
J. W. D.
Connolly
, and
J. R.
Sabin
,
J. Chem. Phys.
71
,
3396
(
1979
).
34.
M.
Feyereisen
,
G.
Fitzgerald
, and
A.
Komornicki
,
Chem. Phys. Lett.
208
,
359
(
1993
).
35.
O.
Vahtras
,
J.
Almlöf
, and
M. W.
Feyereisen
,
Chem. Phys. Lett.
213
,
514
(
1993
).
36.
N. H. F.
Beebe
and
J.
Linderberg
,
Int. J. Quantum Chem.
12
,
683
(
1977
).
37.
I.
Røeggen
and
E.
Wisløff-Nilssen
,
Chem. Phys. Lett.
132
,
154
(
1986
).
38.
H.
Koch
,
A.
Sánchez de Merás
, and
T. B.
Pedersen
,
J. Chem. Phys.
118
,
9481
(
2003
).
39.
R. A.
Friesner
,
J. Chem. Phys.
85
,
1462
(
1986
).
40.
R. A.
Friesner
,
J. Chem. Phys.
86
,
3522
(
1987
).
41.
J. M.
Langlois
,
R. P.
Muller
,
T. R.
Coley
,
W. A.
Goddard
, III
,
M. N.
Ringnalda
,
Y.
Won
, and
R. A.
Friesner
,
J. Chem. Phys.
92
,
7488
(
1990
).
42.
M. N.
Ringnalda
,
M.
Belhadj
, and
R. A.
Friesner
,
J. Chem. Phys.
93
,
3397
(
1990
).
43.
44.
F.
Neese
,
F.
Wennmohs
,
A.
Hansen
, and
U.
Becker
,
Chem. Phys.
356
,
98
(
2009
).
45.
R.
Izsák
,
F.
Neese
, and
W.
Klopper
,
J. Chem. Phys.
139
,
094111
(
2013
).
46.
E. G.
Hohenstein
,
R. M.
Parrish
, and
T. J.
Martínez
,
J. Chem. Phys.
137
,
044103
(
2012
).
47.
R. M.
Parrish
,
E. G.
Hohenstein
,
T. J.
Martínez
, and
C. D.
Sherrill
,
J. Chem. Phys.
137
,
224106
(
2012
).
48.
E. G.
Hohenstein
,
R. M.
Parrish
,
C. D.
Sherrill
, and
T. J.
Martínez
,
J. Chem. Phys.
137
,
221101
(
2012
).
49.
R. M.
Parrish
,
E. G.
Hohenstein
,
N. F.
Schunck
,
C. D.
Sherrill
, and
T. J.
Martínez
,
Phys. Rev. Lett.
111
,
132505
(
2013
).
50.
R. M.
Parrish
,
Y.
Zhao
,
E. G.
Hohenstein
, and
T. J.
Martínez
,
J. Chem. Phys.
150
,
164118
(
2019
).
51.
E. G.
Hohenstein
,
Y.
Zhao
,
R. M.
Parrish
, and
T. J.
Martínez
,
J. Chem. Phys.
151
,
164121
(
2019
).
52.
E. G.
Hohenstein
,
B. S.
Fales
,
R. M.
Parrish
, and
T. J.
Martínez
,
J. Chem. Phys.
156
,
054102
(
2022
).
53.
M.
Lesiuk
,
J. Chem. Phys.
156
,
064103
(
2022
).
54.
M.
Lesiuk
,
J. Chem. Theory Comput.
18
,
6537
(
2022
).
55.
C.
Edmiston
and
M.
Krauss
,
J. Chem. Phys.
42
,
1119
(
1965
).
56.
W.
Meyer
,
Int. J. Quantum Chem.
5
,
341
(
1971
).
57.
W.
Meyer
,
J. Chem. Phys.
58
,
1017
(
1973
).
58.
W.
Meyer
and
P.
Rosmus
,
J. Chem. Phys.
63
,
2356
(
1975
).
59.
H.-J.
Werner
and
W.
Meyer
,
Mol. Phys.
31
,
855
(
1976
).
60.
R.
Ahlrichs
,
F.
Driessler
,
H.
Lischka
,
V.
Staemmler
, and
W.
Kutzelnigg
,
J. Chem. Phys.
62
,
1235
(
1975
).
61.
F.
Neese
,
F.
Wennmohs
, and
A.
Hansen
,
J. Chem. Phys.
130
,
114108
(
2009
).
62.
F.
Neese
,
A.
Hansen
, and
D. G.
Liakos
,
J. Chem. Phys.
131
,
064103
(
2009
).
63.
H.-J.
Werner
,
J. Chem. Phys.
145
,
201101
(
2016
).
64.
C.
Riplinger
,
B.
Sandhoefer
,
A.
Hansen
, and
F.
Neese
,
J. Chem. Phys.
139
,
134101
(
2013
).
65.
C.
Riplinger
and
F.
Neese
,
J. Chem. Phys.
138
,
034106
(
2013
).
66.
M.
Schwilk
,
Q.
Ma
,
C.
Köppl
, and
H.-J.
Werner
,
J. Chem. Theory Comput.
13
,
3650
(
2017
).
67.
Q.
Ma
,
M.
Schwilk
,
C.
Köppl
, and
H.-J.
Werner
,
J. Chem. Theory Comput.
13
,
4871
(
2017
).
68.
Q.
Ma
and
H.-J.
Werner
,
J. Chem. Theory Comput.
14
,
198
(
2018
).
69.
M.
Sparta
and
F.
Neese
,
Chem. Soc. Rev.
43
,
5032
(
2014
).
70.
M.
Ziółkowski
,
B.
Jansík
,
T.
Kjærgaard
, and
P.
Jørgensen
,
J. Chem. Phys.
133
,
014107
(
2010
).
71.
K.
Kristensen
,
M.
Ziółkowski
,
B.
Jansík
,
T.
Kjærgaard
, and
P.
Jørgensen
,
J. Chem. Theory Comput.
7
,
1677
(
2011
).
72.
K.
Kristensen
,
I.-M.
Høyvik
,
B.
Jansik
,
P.
Jørgensen
,
T.
Kjærgaard
,
S.
Reine
, and
J.
Jakowski
,
Phys. Chem. Chem. Phys.
14
,
15706
(
2012
).
73.
I.-M.
Høyvik
,
K.
Kristensen
,
B.
Jansik
, and
P.
Jørgensen
,
J. Chem. Phys.
136
,
014105
(
2012
).
74.
J. J.
Eriksen
,
P.
Baudin
,
P.
Ettenhuber
,
K.
Kristensen
,
T.
Kjærgaard
, and
P.
Jørgensen
,
J. Chem. Theory Comput.
11
,
2984
(
2015
).
75.
T.
Kjærgaard
,
P.
Baudin
,
D.
Bykov
,
K.
Kristensen
, and
P.
Jørgensen
,
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
7
,
e1319
(
2017
).
76.
A.
Hansen
,
D. G.
Liakos
, and
F.
Neese
,
J. Chem. Phys.
135
,
214102
(
2011
).
77.
I.-M.
Høyvik
and
P.
Jørgensen
,
Chem. Rev.
116
,
3306
(
2016
).
78.
P. R.
Nagy
and
M.
Kállay
,
J. Chem. Phys.
146
,
214106
(
2017
).
79.
P. R.
Nagy
,
G.
Samu
, and
M.
Kállay
,
J. Chem. Theory Comput.
14
,
4193
(
2018
).
80.
K.
Kitaura
,
E.
Ikeo
,
T.
Asada
,
T.
Nakano
, and
M.
Uebayasi
,
Chem. Phys. Lett.
313
,
701
(
1999
).
81.
K.
Kitaura
,
S.-I.
Sugiki
,
T.
Nakano
,
Y.
Komeiji
, and
M.
Uebayasi
,
Chem. Phys. Lett.
336
,
163
(
2001
).
82.
W.
Li
and
P.
Piecuch
,
J. Phys. Chem. A
114
,
8644
(
2010
).
83.
W.
Li
and
P.
Piecuch
,
J. Phys. Chem. A
114
,
6721
(
2010
).
84.
J.
Friedrich
,
D. P.
Tew
,
W.
Klopper
, and
M.
Dolg
,
J. Chem. Phys.
132
,
164114
(
2010
).
85.
J.
Friedrich
and
K.
Walczak
,
J. Chem. Theory Comput.
9
,
408
(
2013
).
86.
W.
Li
and
S.
Li
,
J. Chem. Phys.
121
,
6649
(
2004
).
87.
C.
Møller
and
M. S.
Plesset
,
Phys. Rev.
46
,
618
(
1934
).
88.
J. J.
Eriksen
,
K.
Kristensen
,
T.
Kjærgaard
,
P.
Jørgensen
, and
J.
Gauss
,
J. Chem. Phys.
140
,
064108
(
2014
).
89.
J. J.
Eriksen
,
P.
Jørgensen
, and
J.
Gauss
,
J. Chem. Phys.
142
,
014102
(
2015
).
90.
K.
Kristensen
,
J. J.
Eriksen
,
D. A.
Matthews
,
J.
Olsen
, and
P.
Jørgensen
,
J. Chem. Phys.
144
,
064103
(
2016
).
91.
F.
Pawłowski
,
J.
Olsen
, and
P.
Jørgensen
,
J. Chem. Phys.
150
,
134108
(
2019
).
92.
F.
Pawłowski
,
J.
Olsen
, and
P.
Jørgensen
,
J. Chem. Phys.
150
,
134109
(
2019
).
93.
P.
Baudin
,
F.
Pawłowski
,
D.
Bykov
,
D.
Liakh
,
K.
Kristensen
,
J.
Olsen
, and
P.
Jørgensen
,
J. Chem. Phys.
150
,
134110
(
2019
).
94.
F.
Pawłowski
,
J.
Olsen
, and
P.
Jørgensen
,
J. Chem. Phys.
150
,
134111
(
2019
).
95.
F.
Pawłowski
,
J.
Olsen
, and
P.
Jørgensen
,
J. Chem. Phys.
150
,
134112
(
2019
).
96.
N. M.
Høyer
,
F. Ø.
Kjeldal
,
A. E.
Hillers-Bendtsen
,
K. V.
Mikkelsen
,
J.
Olsen
, and
P.
Jørgensen
,
J. Chem. Phys.
157
,
024106
(
2022
).
97.
J.
Olsen
,
A. E.
Hillers-Bendtsen
,
F. Ø.
Kjeldal
,
N. M.
Høyer
,
K. V.
Mikkelsen
, and
P.
Jørgensen
,
J. Chem. Phys.
157
,
024107
(
2022
).
98.
A. E.
Hillers-Bendtsen
,
N. M.
Høyer
,
F. Ø.
Kjeldal
,
K. V.
Mikkelsen
,
J.
Olsen
, and
P.
Jørgensen
,
J. Chem. Phys.
157
,
024108
(
2022
).
99.
T.
Helgaker
,
P.
Jørgensen
, and
J.
Olsen
,
Molecular Electronic-Structure Theory
(
John Wiley & Sons
,
2014
).
100.
R. J.
Bartlett
,
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
2
,
126
(
2012
).
101.
S.
Das
,
D.
Mukherjee
, and
M.
Kállay
,
J. Chem. Phys.
132
,
074103
(
2010
).
102.
M.
Véril
,
A.
Scemama
,
M.
Caffarel
,
F.
Lipparini
,
M.
Boggio‐Pasqua
,
D.
Jacquemin
, and
P. F.
Loos
,
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
11
,
e1517
(
2021
).
103.
M.
Head-Gordon
,
R. J.
Rico
,
M.
Oumi
, and
T. J.
Lee
,
Chem. Phys. Lett.
219
,
21
(
1994
).
104.
I.
Panas
,
J.
Almlöf
, and
M. W.
Feyereisen
,
Int. Jour. Quant. Chem.
40
,
797
807
(
1993
).
105.
C.
Hättig
and
F.
Weigend
,
J. Chem. Phys.
113
,
5154
(
2000
).
106.
P.
Ettenhuber
, “
Scatelib—A scalable tensor library
,” https://gitlab.com/pett/ScaTeLib.
107.
D. I.
Lyakh
, “
Tal-sh: Tensor algebra library for shared memory computers
,” https://github.com/DmitryLyakh/TAL_SH.
108.
T. H.
Dunning
, Jr
,
J. Chem. Phys.
90
,
1007
(
1989
).
109.
R. A.
Kendall
,
T. H.
Dunning
, Jr
, and
R. J.
Harrison
,
J. Chem. Phys.
96
,
6796
(
1992
).
110.
F.
Weigend
,
A.
Köhn
, and
C.
Hättig
,
J. Chem. Phys.
116
,
3175
(
2002
).
111.
K.
Aidas
,
C.
Angeli
,
K. L.
Bak
,
V.
Bakken
,
R.
Bast
,
L.
Boman
,
O.
Christiansen
,
R.
Cimiraglia
,
S.
Coriani
,
P.
Dahle
et al,
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
4
,
269
(
2014
).
112.
M.
Ziółkowski
,
V.
Weijo
,
P.
Jørgensen
, and
J.
Olsen
,
J. Chem. Phys.
128
,
204105
(
2008
).
113.
P.
Ettenhuber
and
P.
Jørgensen
,
J. Chem. Theory Comput.
11
,
1518
(
2015
).
114.
S. K.
Pedersen
,
K.
Eriksen
, and
M.
Pittelkow
,
Angew. Chem., Int. Ed.
58
,
18419
(
2019
).
115.
A. E.
Hillers-Bendtsen
,
Y.
Todarwal
,
M.
Pittelkow
,
P.
Norman
, and
K. V.
Mikkelsen
,
J. Phys. Chem. A
126
,
6467
(
2022
).
116.
Y.
Yang
,
R. C.
Da Costa
,
M. J.
Fuchter
, and
A. J.
Campbell
,
Nat. Photonics
7
,
634
(
2013
).
117.
Y.
Yang
,
R. C.
da Costa
,
D. M.
Smilgies
,
A. J.
Campbell
, and
M. J.
Fuchter
,
Adv. Mater.
25
,
2624
(
2013
).
118.
B.
Mahato
and
A. N.
Panda
,
J. Phys. Chem. A
126
,
1412
(
2022
).
119.
K.
Mori
,
T.
Murase
, and
M.
Fujita
,
Angew. Chem., Int. Ed.
54
,
6847
(
2015
).