A method for the computation of nuclear magnetic resonance (NMR) shieldings with second-order Møller–Plesset perturbation theory (MP2) is presented which allows to efficiently compute the entire set of shieldings for a given molecular structure. The equations are derived using Laplace-transformed atomic orbital second-order Møller–Plesset perturbation theory as a starting point. The Z-vector approach is employed for minimizing the number of coupled-perturbed self-consistent-field equations that need to be solved. In addition, the method uses the resolution-of-the-identity approximation with an attenuated Coulomb metric and Cholesky decomposition of pseudo-density matrices. The sparsity in the three-center integrals is exploited with sparse linear algebra approaches, leading to reduced computational cost and memory demands. Test calculations show that the deviations from NMR shifts obtained with canonical MP2 are small if appropriate thresholds are used. The performance of the method is illustrated in calculations on DNA strands and on glycine chains with up to 283 atoms and 2864 basis functions.

## I. INTRODUCTION

Nuclear magnetic resonance (NMR) is a widely used spectroscopic method in the field of chemistry. NMR spectra are highly sensitive to the molecular geometry and contain a wealth of structural information. However, especially for large molecules, their interpretation can be difficult, and it can be challenging to unambiguously assign a molecular structure to a given spectrum. In such situations, the comparison with theoretically computed spectra can be very helpful. For this reason, much work has been done on the development of quantum-chemical methods for the simulation of NMR shieldings (for reviews, see, e.g., Refs. 1–3).

Methods for computing NMR shieldings with Hartree–Fock (HF) theory^{4–8} or density functional theory (DFT)^{9–11} can often provide useful shifts at moderate computational cost. For these theories, linear and sublinear scaling implementations have been developed which allow to apply them to large molecules with 1000 atoms or more.^{12–14} If higher accuracy is desired, wave-function based correlation methods can be used. This includes second-order Møller–Plesset perturbation theory (MP2),^{15,16} the multiconfigurational self-consistent field method (MCSCF),^{17,18} or approaches based on coupled-cluster theory.^{19–21} Among the mentioned correlated methods, MP2 is particularly attractive because it has lower formal scaling and computational cost than high-level coupled cluster methods but has been shown to yield typically NMR shifts of significantly higher quality than Hartree–Fock theory or DFT.^{16,22} Therefore, a significant amount of work has been done on MP2-based methods for computing NMR shieldings. In 1992, Gauss presented the first method for computing NMR shifts with MP2^{15,16} and a Z-vector approach.^{23,24} In later years, a more efficient integral-direct implementation by Kollwitz and Gauss^{25} and implementations using local-correlation approximations^{26–28} were developed. Maurer and Ochsenfeld^{29} demonstrated that a linear and sublinear scaling computation of NMR shieldings is possible with atomic-orbital MP2 (AO-MP2)^{30–35} and integral screening approaches. Apart from pure MP2 theory, closely related methods, such as double-hybrid density functional theory^{36,37} or spin-component-scaled^{38,39} and scaled-opposite-spin MP2,^{39,40} showed good accuracy in benchmark studies and proved to be promising for efficient calculations on large molecules considering that the algorithmic developments for MP2 can easily be transferred to these methods.

In this work, we present a new method for efficiently computing NMR shieldings at the MP2 level of theory. Like the related approach from the study by Maurer and Ochsenfeld,^{29} the method presented in this work is based on Laplace-transformed AO-MP2 but differs in several aspects. First of all, our new method employs an *all*-*nuclei* formulation, which allows us to efficiently compute the entire set of NMR shieldings for all nuclei in a given molecule. In contrast, Maurer and Ochsenfeld^{29} used a *selected*-*nuclei* formulation, which was first introduced by Beer *et al.*^{14} at the SCF level of theory. The selected-nuclei formulation allows to exploit locality and accelerate the computation of the shieldings of a small subset of all nuclei. However, if all shieldings need to be computed, this formulation is inferior due to the large number of coupled-perturbed self-consistent field (CPSCF) equations that need to be solved. The working equations differ significantly between the all-nuclei and selected nuclei ansatz because certain terms involving derivatives of gauge-including atomic orbitals (GIAOs)^{4,41,42} are present in one formulation and absent in the other one. GIAOs are used in our method in order to ensure gauge-origin independence. Compared to the method reported by Maurer and Ochsenfeld,^{29} our new implementation avoids four-center electron repulsion integrals (ERIs) with the help of the resolution-of-the-identity (RI)^{43,44} approximation. Like in our previous work on the efficient computation of MP2 energies,^{45} an attenuated Coulomb metric^{46–48} is used in order to further increase the available sparsity in the three-center integrals. In this context, we would like to mention an efficient MP2 shielding method using a Cholesky decomposition (CD) of ERIs that was very recently introduced by Burger *et al.*^{49} As RI, the CD approach allows to avoid four-center integrals. Compared to RI, it has certain advantages, such as rigorous control of the numerical error of the decomposition and independence of pre-defined auxiliary basis sets.^{50,51} However, the Cholesky factors have lower sparsity than three-center RI integrals with an attenuated Coulomb metric for large systems.

Here we use a pivoted Cholesky decomposition^{52–54} of pseudo-density matrices which allows us to utilize the reduced rank of these matrices (especially for the occupied pseudo-densities) and further lowers computational cost and memory demands. The sparsity of the three-center integrals is exploited using sparse linear algebra approaches. The transformations of the three-center integrals in the atomic orbital basis with the Cholesky molecular orbitals (Cholesky-MOs) and with pseudo-density and perturbed pseudo-density matrices are accelerated using block-sparse matrix multiplications. For the subsequent steps involving three-center integrals, we use the natural blocking format,^{45,47,55} which allows us to reduce the computational effort for contractions of the three-center integrals and the I/O overhead. We demonstrate in test calculations that the developed method provides accurate NMR shieldings if the same thresholds are used that also proved to be appropriate for achieving chemically accurate energies in our recent work on MP2 correlation energies.^{45} In calculations on glycine chains and DNA strands, we analyze the efficiency of the method, the scaling, and the requirements on memory and disk space. We also compare calculations with standard MP2 to shielding calculations with scaled-opposite spin MP2 (SOS-MP2)^{39,40} and show that the SOS-approximation leads to further substantial gains in efficiency.

## II. THEORY

In Sec. II A, a brief review of the AO-MP2 method for energies is provided, which provides the starting point for the derivation. Section II B contains a discussion of AO-MP2 gradients with emphasis on the gradient with respect to the nuclear magnetic moment. In Sec. II C, the mixed second derivative of the energy with respect to nuclear magnetic moment and magnetic field is derived, which allows to compute NMR shieldings. In Sec. II D, a nested Z-vector approach is described, which allows to circumvent the explicit computation of the nuclear magnetic moment derivative of the density matrix and the second derivative of the density matrix and thus minimizes the number of CPSCF equations that need to be solved. Then, in Sec. II E, we discuss strategies for accelerating the time-dominating steps of the computation.

### A. Review of atomic-orbital MP2

In general, the elements of the NMR shielding tensor *σ*^{A} of a nucleus *A* can be computed by taking a mixed second derivative of the electronic energy *E*. One derivative needs to be taken with respect to one of the components of the magnetic field *B*_{s} with $s\u2208x,y,z$, while the other derivative is taken with respect to a component of the nuclear magnetic moment vector $mrA$,

Different quantum-chemical methods can be used for computing *E*; the method presented in this work is based on atomic-orbital MP2 (AO-MP2).^{30,32,34,35} Starting from the closed-shell MP2 energy expression in terms of canonical, doubly occupied Hartree–Fock orbitals,

AO-MP2 can be derived by expressing the orbital energy denominator from Eq. (2) using a Laplace transformation, which can be approximated using numerical integration^{30} as follows:

where *t*_{α} and *ω*_{α} are the exponent and weight of Laplace point *α*, respectively. After inserting the Laplace-transformed denominator into Eq. (2), insertion of the basis set expansion of the MOs, and subsequent summation over all MO-indices, the energy expression of AO-MP2 is obtained as follows:

where

and

### B. Gradient with respect to the nuclear magnetic moment

In Ref. 56, Schweizer *et al.* derived the gradient of AO-MP2. One possible expression for the gradient of AO-MP2 with respect to a general perturbation *ξ* reads as follows:

where

Due to the pseudo-densities appearing in Eqs. (9) and (10), the matrices $R\u0332$ and $R\u0304$ are also dependent on the Laplace point. For the derivation of the working equations of our new MP2-NMR method, we will take the gradient with respect to the nuclear magnetic moment **m**^{A} as a starting point,

Here and in the following, **m**^{A} shall be abbreviated as **m** in order to simplify the notation. Note that for this particular perturbation, the last term from Eq. (8) is zero because the two-electron integrals do not depend on **m**. Equation (11) requires perturbed pseudo-density matrices. Explicit expressions for the unperturbed pseudo-density matrices in terms of MO coefficients and orbital energies are shown in Eqs. (6) and (7); these matrices can also be computed only from quantities in the AO basis^{33,56,57} as follows:

where $P=\u2211iC\mu i*C\nu i$ is the HF density matrix and $Pvirt=\u2211aC\mu a*C\nu a$ is the virtual density matrix. After inserting the expressions from Eqs. (12) and (13) into Eq. (11) and differentiating using the product rule, one obtains

By applying cyclic permutations under the trace, Eq. (14) becomes

In general, the perturbed virtual density matrix can be computed as follows:^{56}

For the nuclear magnetic moment perturbation, the derivative of **S** is zero, and Eq. (16) therefore becomes $Pvirtm=\u2212Pm$. The terms involving the perturbed density matrices can be efficiently treated with a Z-vector approach, which will be described in detail in Sec. II D. Here, we will focus on the terms containing perturbed matrix exponentials, which are of the general form $Tr[BeA\xi ]$. As shown in Appendix A, such a term can be rearranged as follows:

where

**Y** can be computed efficiently by a recursive scheme, which is also shown in Appendix A. In order to rearrange the terms containing perturbed matrix exponentials in Eq. (15), we define two intermediates $Y\u0332$ and $Y\u0304$. $Y\u0304$ is computed by evaluating Eq. (18) with $A=t\alpha PF$ and $B=PR\u0304$. $Y\u0332$ is calculated using the same formula and setting $A=\u2212t\alpha PvirtF$ and $B=PvirtR\u0332$. Using $Y\u0332$ and $Y\u0304$, the intermediates $Y\u03321$, $Y\u03322$, $Y\u03041$, and $Y\u03042$ defined in Ref. 56 can be computed by a small number of additional matrix multiplications as follows:

Instead of evaluating four different recursion formulas for $Y\u03321$, $Y\u03322$, $Y\u03041$, and $Y\u03042$ as in Ref. 56, only two recursions for $Y\u0332$ and $Y\u0304$ need to be evaluated if the new recursion formulas presented in this work are used. This reduces the number of necessary matrix multiplications by roughly a factor of two. Note that also all “**Y**”-matrices from Eqs. (19)–(22) implicitly depend on the Laplace point.

Using the intermediates defined so far, we are able to write the formula for the gradient with respect to **m** in the following way:

$GF(\alpha )$ is computed analogously to the two-electron part of the Fock matrix with $F(\alpha )$ taking the role of the density matrix,

Noticing that only $F(\alpha )$ and $P(\alpha )$ depend on the Laplace point in Eq. (23), we define

Using $F$ and $P$, Eq. (23) becomes

### C. Second derivative with respect to the magnetic field

As shown in Eq. (1), NMR shieldings can be computed as the mixed second derivative of the energy with respect to the nuclear magnetic moment and the magnetic field. The expression from Eq. (29) thus needs to be differentiated with respect to the magnetic field **B** as follows:

Equation (30) contains the derivative of $F$ and $P$, which can be computed as follows:

Equations (31) and (32) require the magnetic field derivative of the “**Y**”-matrices from Eqs. (19)–(22),

Here, $Y\u0332B$ and $Y\u0304B$ can be computed recursively. The corresponding recursion formulas are derived in Appendix B by differentiating the recursion formulas for $Y\u0332$ and $Y\u0304$. For all recursion formulas, asymptotic linear scaling can be achieved by employing block-sparse matrices. The matrix $GBF$ is computed in a similar way to $GF$ in Eq. (26) with the difference that the magnetic field derivatives of the ERIs are used instead of the standard unperturbed ERIs,

For an efficient calculation of $GBF$ and $G[FB]$ in Eq. (32) and of $GF$ in Eq. (26), any approach for efficient Fock matrix construction in Hartree–Fock theory can be used, such as the Continuous Fast Multipole Method (CFMM)^{58,59} or the RI-J method^{60} for the Coulomb matrix and LinK screening^{61,62} or semi-numerical exchange approaches^{63} for the exchange matrix.

Equations (31) and (32) also require the magnetic field derivative of $R\u0332$ and $R\u0304$, which can be calculated as follows:

Both Eqs. (38) and (39) contain two types of terms: terms involving magnetic field derivatives of ERIs and terms in which the unperturbed ERIs are transformed both with perturbed and unperturbed pseudo-densities. The efficient computation of these terms is discussed in Sec. II E. The notation with (**B**) as in $\mu \u2032\nu \u0304|\lambda \u0332\sigma \u0304(B)$ means that only the integrals are differentiated, not the pseudo-density or density matrices.

### D. Nested Z-vector approach for avoiding **P**^{m} and **P**^{Bm}

While Eq. (30) would allow us to compute the NMR shieldings of all nuclei in a molecule, these equations still contain **P**^{m} and **P**^{Bm}. The explicit computation of these matrices should be avoided in order to reduce the number of CPSCF equations that need to be solved. This can be achieved using a nested Z-vector approach similar to the one presented by Maurer and Ochsenfeld.^{29} In this section, we will present the theory behind this approach in a general way, which can be applied to any formulation of CPSCF. In our implementation, we employ the density matrix based CPSCF (D-CPSCF) method introduced by Ochsenfeld and Head-Gordon;^{64} detailed working equations for this formulation are derived in Appendix D.

In general, a CPSCF equation for **P**^{m} has the following structure:

where **b**^{m} is the right-hand side of the CPSCF equation and **A** is the Hartree–Fock Hessian matrix. Formally, this equation can be solved by multiplying with the inverse of the Hessian from the left as follows:

Usually, an explicit inversion of **A** is circumvented in favor of an iterative solution. In order to compute all matrices **P**^{m} for a given molecule, 3 × *N*_{at} CPSCF equations would need to be solved where *N*_{at} denotes the number of atoms. In a term of the form Tr[**XP**^{m}], however, this can avoided by applying a Z-vector approach,^{23,24}

where **Z**_{X} is obtained by solving a CPSCF-like equation with **X** as the right-hand side as follows:

The advantage here is that only a single CPSCF equation needs to be solved. The outlined strategy could be directly applied to the term $Tr[PBPm]$; however, as we will discuss in the following, a contribution from the term $Tr[PPBm]$ is first added to $PB$ in our implementation.

The term $Tr[PPBm]$ contains the mixed second derivative of the density matrix. An equation for explicitly computing **P**^{Bm} can be derived from the CPSCF equation **AP**^{B} = **b**^{B} by differentiating it with respect to **m** as follows:

**P**^{Bm} can thus be obtained by solving a CPSCF equation with (**b**^{Bm} − **A**^{m}**P**^{B}) as a right-hand side as follows:

Next, we insert Eq. (45) into the term $TrPPBm$ as follows:

$ZP$ is computed by solving a CPSCF equation with $P$ as the right-hand side as follows:

All terms in $TrZPbBm\u2212Am[PB]$ that depend on **P**^{m} need to be rearranged in the form Tr[…**P**^{m}] such that another Z-vector approach can be applied. A detailed derivation is provided in Appendix D. It is shown that the sum of the result from Eq. (46) and the term $Tr[PBPm]$ can be rearranged in the following form:

where $OFm$, $OYm$, and **O** are defined in Appendix D. The explicit computation of **P**^{m} can be avoided by using a Z-vector **Z**_{O} as follows:

which is obtained by solving a CPSCF equation with **O** as the right-hand side as follows:

With this, we arrive at the final form of the working equation for computing NMR shieldings as follows:

Figure 1 summarizes the described algorithm for computing NMR shieldings and shows in which order the steps of the calculation are carried out in our implementation. With the described nested Z-vector approach, seven CPSCF equations need to be solved in total irrespective of the size of the molecule (one equation for $ZP$ and three equations for the magnetic field components of **P**^{B} and **Z**_{O}, respectively).

### E. Efficient computation of $R\u0332$, $R\u0304$, $R\u0332B$, and $R\u0304B$

The computation of the matrices $R\u0332$ and $R\u0304$ and their magnetic field derivatives (including the computation of the required three-center integrals) is the most time-consuming step of the shielding calculation with our new method. In this section, we describe how these steps can be accelerated using several approximations.

In order to exploit their rank deficiency, we subject the pseudo-density matrices a pivoted Cholesky decomposition,^{52–54}

This decomposition has been applied before in several methods for MP2^{45,65,66} and direct RPA,^{67} and it was shown to preserve sparsity. It is especially useful for the occupied pseudo-density as the number of obtained Cholesky vectors is less than or equal to the number of occupied orbitals and therefore much smaller than the number of basis functions (assuming reasonably accurate basis sets). From the Cholesky factors of Eqs. (52) and (53), local pseudo-MOs $\varphi i\u0332$ and $\varphi a\u0304$ can be computed as follows:

The pivoted Cholesky decomposition is not applicable to the magnetic field perturbed pseudo-densities because they are not positive semi-definite. For this reason, we leave the perturbed pseudo-densities undecomposed. After inserting the Cholesky-decomposed pseudo-densities, Eqs. (9), (10), (38), and (39) become

We furthermore use the RI approximation in order to avoid four-center integrals. As a metric for the RI, we employ an attenuated Coulomb metric^{46–48} as in our recent work on MP2 energies.^{45} With this metric, four-center ERIs can be approximated as follows:

where

and

The attenuated Coulomb metric depends on the parameter *ω*, which determines accuracy and sparsity in the three-center integrals. For *ω* equal to zero, the attenuated Coulomb metric reduces to the standard Coulomb metric, which is highly accurate but has long-ranged coupling between the bra and ket of the three-center integrals. On the other hand, in the limit *ω* → ∞, the rather inaccurate overlap metric is obtained, which leads to a high degree of sparsity in the three-center integrals. It has been shown for MP2 and direct RPA that *ω* = 0.1 provides a good compromise and gives significant sparsity in the integrals in combination with only small errors compared to the Coulomb metric.^{45,67} Therefore, we also use *ω* = 0.1 for all timings shown in this work.

Using the RI approximation allows us to eliminate all transformed four-center integrals from Eqs. (56)–(59) by substituting them with three- and two-center RI-integrals. In consequence, several different types of transformed three-center integrals are needed. This includes integrals that are transformed with Cholesky-MO coefficients or with perturbed pseudo-densities and transformed three-center GIAO integrals, such as $i\u0332a\u0304|P(B)$. In order to compute all of these transformed integrals, the three-center integrals in the atomic orbital basis are computed once and written to disk and then read into memory for each Laplace point. The AO three-center integrals for any auxiliary basis function are copied to a block-sparse matrix; likewise, the Cholesky-MOs and pseudo-density matrices are kept in a block-sparse matrix format. Then, the necessary integral transformations can be carried out efficiently using block-sparse matrix multiplications.

Next, the matrices $R\u0332$ and $R\u0304$, as well as their B-field derivatives, are computed from the three-center integrals. Figure 2 displays the algorithm used for efficiently computing the Coulomb-type contributions to the matrices $R\u0332$ and $R\u0304$. For this, the transformed integrals are copied into a natural blocking^{47,55} data format, which has also been described in Ref. 45 and is used for speeding up the subsequent contractions. As an example, let us discuss the following contraction, which is one of the formally $O(N4)$-scaling steps in the algorithm from Fig. 2,

The integrals used in this contraction are arranged such that there is one matrix for each $i\u0332$ with initial dimensions of *n*_{virt} × *n*_{aux} (where *n*_{virt} is the number of virtual Cholesky MOs and *n*_{aux} is the number of auxiliary basis functions). Before carrying out any contraction, rows and columns in these matrices get deleted if they only contain elements that are below the chosen natural blocking threshold. By this, the dimensions of these matrices are reduced and the integral contractions are accelerated.

For a given $i\u0332$, the indices of the significant rows and columns are collected in sparse lists ${a\u0304}i\u0332P$ and ${P}i\u0332a\u0304$. The list ${a\u0304}i\u0332P$, e.g., contains all virtual pseudo-MOs $\varphi a\u0304$ that are significant for the orbital $\varphi i\u0332$. The superscript index *P* shall denote the third index of the three-center integrals, from which the sparse list is derived (in this case, ). An additional superscript index **B** is used to denote lists derived from GIAO integrals. The list ${a\u0304}i\u0332P,B$, e.g., is derived from the integrals. Note also that two sparse lists are identical if the contained indices refer to the same type of orbital, e.g., ${a\u0304}i\u0332P={b\u0304}j\u0332Q$. Such sparse lists, which have also been employed in Ref. 45, are used extensively for exploiting sparsity in the algorithms from Figs. 2–5.

For the contraction in Eq. (64), the scaling can be reduced from formally $O(N4)$ to asymptotically linear because for large systems, only a constant number of virtual pseudo-MOs and auxiliary functions are expected to be significant for a given $\varphi i\u0332$.

The matrix **A** from Eq. (64) is then multiplied from both sides with $C\u0303$; this step scales cubically but has a very small prefactor. The resulting matrix **B** in Fig. 2 and the intermediates **D** and **E** are all dense matrices. As these matrices are dense, the scaling of the remaining steps in the algorithm shown in Fig. 2 cannot be reduced to asymptotically linear but only to asymptotically quadratic; in these steps, natural blocking is employed again in order to exploit the sparsity in the three-center integrals.

For MP2 shielding calculations, not only the computational cost but also the memory and disk space requirements can be prohibitive. Compared to the implementation of Maurer and Ochsenfeld,^{29} the memory requirements are substantially reduced by the use of the RI approximation, which allows us to avoid four-center integrals. The employed attenuated Coulomb metric increases the degree of sparsity in the three-center integrals. As in our implementation, the three-center integrals are always kept either in the natural blocking format or a block-sparse data format, the sparsity can be efficiently exploited, and the memory requirements be reduced significantly. In order to further increase the size of systems that can be treated with a given amount of memory, the three-center integrals are stored on disk and only read into memory when needed during the integral transformations and contractions, which are all done in batches. The size of the batches is chosen in accordance with the available memory. Therefore, the available disk space determines the maximum system size that can be treated. In Sec. IV B, we show data for the required disk space in illustrative calculations on glycine chains and DNA strands.

Figures 3 and 4 show the algorithm used for computing the Coulomb-type contributions to the $R\u0332B$ and $R\u0304B$ matrices. It is more involved than the corresponding algorithm for the contributions to the unperturbed $R\u0332$ and $R\u0304$ matrices from Fig. 2 but uses similar intermediates and computation steps. Formally, the algorithm leads to an $O(N4)$ scaling. Again, natural blocking is used for exploiting the sparsity in the three-center integrals. The asymptotic scaling of all steps except for the matrix multiplication $C\u0303ABC\u0303$ can be reduced to either linear or quadratic in this way.

Finally, the algorithm for computing the exchange contributions to $R\u0332$ and $R\u0304$ and their magnetic field derivatives is shown in Fig. 5. In this algorithm, four-center integrals are built and then contracted as opposed to the algorithms used for the Coulomb contributions. As a first step, three-center integrals are multiplied with the matrix $C\u0303$ from Eq. (60) as follows:

where *w* and *x* can denote either Cholesky-MO-indices or transformed or untransformed AO indices. This step scales asymptotically quadratic even if the short-range coupling of the indices in the three-center integrals is exploited because $C\u0303$ is a dense matrix. Next, the four-center integrals are built by contracting the intermediate $B\u0303$ with three-center integrals as follows:

For any *wy*-pair, the computational effort is $O(1)$ because only $O(1)$ orbitals with indices *x* and *z* are significant due to exponential coupling between *w* and *x* and between *y* and *z*. In addition, the summation over *Q* can be restricted to $O(1)$ elements by exploiting the short-range coupling between *Q* and *y*. Therefore, the asymptotic scaling of this step is quadratic. Likewise, also the remaining steps of the algorithm in Fig. 5 scale asymptotically quadratic.

## III. COMPUTATIONAL DETAILS

The described method was implemented in a development version of the FermiONs++ program.^{68–70} Unless stated differently, the def2-SVP basis set^{71} was used in combination with the corresponding auxiliary basis set^{72} in all calculations. Two different sets of parameters are used in the calculations shown in Sec. IV A, which we denote as “Loose” and “Tight.” For the “Loose” settings, the attenuation parameter *ω* is set to 0.1, and 5 Laplace points are used in MP2 and 10 Laplace points in DL-CPSCF^{73} iterations, which are used for computing **P**^{B} and solving the Z-vector equations. Furthermore, 10^{−6} is used as the threshold for deleting rows and columns in the natural blocking format. These values have also been used in our recent work on efficient computation of MP2 energies with the *ω*-RI-CDD-MP2 method, where we demonstrated that chemically accurate energies can be obtained.^{45} For the “Tight” settings, the natural blocking threshold and *ω* are set to zero (meaning that effectively a Coulomb metric is used); the number of Laplace points for both the MP2 shielding part and the DL-CPSCF is increased to 13 in both cases. All timings in Sec. IV B were carried out using the “Loose” settings.

In all calculations, shell-pairs with an overlap of less than 10^{−12} were neglected. Integral screening with a threshold of 10^{−10} is employed during the SCF. The Laplace points are determined using a minimax algorithm as described in Refs. 74 and 75; the number of integration points is reduced automatically in our implementation if the fitting interval is small and no improved accuracy can be obtained with more integration points. The pseudo-density matrices are orthogonalized prior to the pivoted Cholesky decomposition as described in Ref. 67; afterward, the orthogonalization is reverted. No frozen-core approximation is employed. For comparison, canonical MP2 shielding calculations were carried out with the Turbomole program.^{25,76,77} All timings were performed using 64 threads on a node with AMD EPYC 7302 processors, 256 GB RAM, and a solid-state drive (SSD) with a capacity of 1.7 TB.

## IV. RESULTS

### A. Accuracy of the introduced approximations

In order to analyze the influence of the employed approximations on the accuracy, NMR shieldings were computed for all complexes/dimers from the S22 test set^{78} and for all structures from the benchmark set used by Flaig *et al.*^{22} In these calculations, both the “Loose” and “Tight” settings were employed. As a reference, NMR shielding calculations with canonical MP2 performed with the Turbomole program^{25,76,77} are used. The statistics on the two benchmark sets are shown in Tables I and II. The “Loose” settings, which will be employed also for the timings in Sec. IV B, give satisfactory accuracy for the NMR shieldings, as the mean absolute deviations amount to only 0.024 ppm for the S22 test set and to 0.036 ppm for the benchmark set from the study by Flaig *et al.*^{22} No clear improvement compared to the “Loose” settings is observed when using the “Tight” settings. This suggests that the dominant source of error compared to canonical MP2 is the RI approximation; the errors due to the other approximations appear to be negligible for these test sets. Indeed, for both test sets, the already small deviations to Turbomole are reduced further in calculations with “Tight” settings and a cc-pVQZ-RI auxiliary basis set^{79,80} instead of a def2-SVP-RI basis set. We thus conclude that the RI approximation is the largest source of error, which does not preclude the computation of shieldings, in very good agreement with canonical MP2 shieldings.

. | Loose . | Tight . | Tight^{*}
. |
---|---|---|---|

Mean error | 0.016 | 0.024 | 0.001 |

Mean absolute error | 0.024 | 0.026 | 0.011 |

Root mean squared error | 0.044 | 0.047 | 0.021 |

Maximum error | 0.229 | 0.271 | 0.151 |

. | Loose . | Tight . | Tight^{*}
. |
---|---|---|---|

Mean error | 0.016 | 0.024 | 0.001 |

Mean absolute error | 0.024 | 0.026 | 0.011 |

Root mean squared error | 0.044 | 0.047 | 0.021 |

Maximum error | 0.229 | 0.271 | 0.151 |

. | Loose . | Tight . | Tight^{*}
. |
---|---|---|---|

Mean error | 0.028 | 0.033 | −0.002 |

Mean absolute error | 0.036 | 0.035 | 0.004 |

Root mean squared error | 0.098 | 0.104 | 0.017 |

Maximum error | 0.662 | 0.604 | 0.150 |

. | Loose . | Tight . | Tight^{*}
. |
---|---|---|---|

Mean error | 0.028 | 0.033 | −0.002 |

Mean absolute error | 0.036 | 0.035 | 0.004 |

Root mean squared error | 0.098 | 0.104 | 0.017 |

Maximum error | 0.662 | 0.604 | 0.150 |

However, one needs to consider that the molecules in the S22 test set and the test set from the study by Flaig *et al.*^{22} are relatively small and that the errors for several of the used approximations, such as the local RI-metric or the sparse linear algebra, might be larger for more extended systems. Therefore, we also carried out calculations on all monomers in the L7 benchmark set^{81} and display the results in Table III. For the purpose of computational efficiency, we use the SOS-approximation in these calculations; this means that the exchange contributions are neglected, while the Coulomb contributions are scaled with 1.3 as suggested by Jung *et al.*^{40} (note that alternatively, for application studies, the basis-set specific scaling parameters from the study by Maurer and Ochsenfeld^{39} might also be useful). In addition, for the L7 test set, the differences between the “Loose” and “Tight” settings are small with a mean absolute deviation of 0.028 ppm. This suggests that the “Loose” settings are appropriate for computing reliable NMR shieldings also for larger systems; therefore, we use these settings also for the timings shown in Sec. IV B.

### B. Scaling behavior and efficiency

In Table IV, timings with the new method in shielding calculations on glycine chains are shown. The scaling decreases for larger chain lengths and amounts to 2.74 between the two largest systems, glycine-6 and glycine-10. With this, the observed scaling is well below the formal $O(N5)$ scaling of the MO-based method. This agrees well with the expectations, as the asymptotic scaling of all $O(N4)$ and $O(N5)$ scaling steps is reduced to linear or quadratic if sparsity is exploited (see Sec. II).

nglyc . | nbas . | Wall time (s) . |
---|---|---|

1 | 95 | 415 |

2 | 166 | 2 051 |

4 | 308 | 14 063 |

6 | 450 | 34 861 |

10 | 734 | 133 282 |

nglyc . | nbas . | Wall time (s) . |
---|---|---|

1 | 95 | 415 |

2 | 166 | 2 051 |

4 | 308 | 14 063 |

6 | 450 | 34 861 |

10 | 734 | 133 282 |

Figure 6 also displays timings on glycine chains; in contrast to the calculations from Table IV, the SOS-approximation is used in all calculations. The wall times needed for the SOS-MP2 shielding calculations are much lower than those for the full MP2 shieldings (e.g., by a factor of 32.9 for glycine-10). This shows that the exchange contributions are the major bottleneck in the MP2 shielding calculations from Fig. 4. Some acceleration for the exchange contributions might be possible by using integral screening, e.g., with Schwarz estimates,^{82} in conjunction with the natural blocking, as done in our recent work on MP2 energies,^{45} but we have not explored this so far. For SOS-MP2, the scaling is close to quadratic for the larger chain lengths and thus lower than the formal scaling of SOS-MP2 $(O(N4))$. Due to the significantly higher computational efficiency obtained with the SOS-approximation, it is likely preferable to use SOS-MP2 for studying very large systems. For this, we recommend to use the basis-set specific scaling factors from the study by Maurer and Ochsenfeld,^{39} which were shown to provide similar or even better accuracy than canonical MP2.

As discussed, memory and/or disk space can also be a bottleneck in MP2 shielding calculations. Therefore, we show data for the used disk space during calculations on glycine chains in Fig. 7. For the largest system (glycine-40), roughly 437 GB of disk space were used. The scaling continuously decreases with the increasing chain length and amounts to 1.3 between glycine-30 and glycine-40. This is significantly lower than the formal cubic scaling for the memory demands of the three-center integrals and is enabled by the sparsity provided by the attenuated Coulomb metric and by the employed sparse data formats (block sparse matrices and natural blocking), which allow us to neglect many of the insignificant integrals.

. | AT_{1}
. | AT_{2}
. |
---|---|---|

nbas | 625 | 1332 |

Wall time (h) | 1.0 | 14.0 |

Scaling | ⋯ | 3.44 |

Disk space (GB) | 59.5 | 531.4 |

. | AT_{1}
. | AT_{2}
. |
---|---|---|

nbas | 625 | 1332 |

Wall time (h) | 1.0 | 14.0 |

Scaling | ⋯ | 3.44 |

Disk space (GB) | 59.5 | 531.4 |

In addition to the glycine chains, also timings on short DNA strands with one and two adenine–thymine base pairs were carried out (see Table V). These molecules are more representative of potential applications on large biomolecular systems. The calculation on the largest DNA strand, AT_{2}, with 128 atoms and 1332 basis functions took 11 h, which shows that the new method allows to treat systems in this size range with moderate computational effort. The scaling of the compute time between AT_{1} and AT_{2} amounts to 3.4. This is below the formal $O(N4)$ scaling of Laplace-transformed SOS-MP2, which indicates that there is already usable sparsity for systems in this size range. As expected, due to the less extended shape of the AT base pairs, the observed scaling is larger than for glycine chains with a similar number of basis functions. In Table VI, timings for several of the calculation steps are shown. Most of the time during the SOS-MP2 shielding calculations is used for the transformations of the three-center integrals followed by the computation of the perturbed $R\u0332B$ and $R\u0304B$ matrices. Significantly less compute time is needed for the unperturbed $R\u0332$ and $R\u0304$ matrices, the recursion formulas, and the nested Z-vector approach. In Table V, also the maximum amount of disk space used during the shielding calculations on the DNA strands is shown. Significantly more disk space than that for glycine chains with a similar number of basis functions is needed, which is caused by the lower sparsity. For AT_{2}, 531.4 GB disk space is needed for storing three-center integrals. Calculations on larger DNA strands were not possible with the available disk space on the employed nodes. In order to improve the applicability to larger systems, it is therefore important to circumvent the disk space bottleneck. A potential solution would be an integral-direct approach, by which we plan to address the disk space bottleneck in a future publication.

. | . | Wall time (s) . | . |
---|---|---|---|

Integral transformation | 15 702 | ||

$R\u0332$ and $R\u0304$ | 1 205 | ||

$R\u0332B$ and $R\u0304B$ | 10 782 | ||

$Y\u0332B$ and $Y\u0304B$ | 2 795 | ||

Nested Z-vector | 4 470 | ||

Total | 39 619 |

. | . | Wall time (s) . | . |
---|---|---|---|

Integral transformation | 15 702 | ||

$R\u0332$ and $R\u0304$ | 1 205 | ||

$R\u0332B$ and $R\u0304B$ | 10 782 | ||

$Y\u0332B$ and $Y\u0304B$ | 2 795 | ||

Nested Z-vector | 4 470 | ||

Total | 39 619 |

## V. CONCLUSION

We presented an efficient method for computing NMR shieldings at the MP2 level of theory, which is based on Laplace-transformed AO-MP2. It employs many of the approximations that were also used in our recently published *ω*-RI-CDD-MP2 method^{45} for correlation energies, including RI with an attenuated Coulomb metric and Cholesky decomposition of pseudo-density matrices. Furthermore, block sparse linear algebra and natural blocking are used in order to speed up the computations and reduce the memory needed for the three-center integrals. A nested Z-vector allows us to efficiently compute the entire set of NMR shieldings for all nuclei (as opposed to the selected-nuclei formulation of Maurer and Ochsenfeld^{29}) and requires the solution of only seven CPSCF equations irrespective of the molecule size. Benchmark calculations indicated that the same thresholds and settings that were shown previously to give accurate correlation energies with the *ω*-RI-CDD-MP2 method also allow us to compute accurate NMR shieldings. Timings on glycine chains show close to quadratic scaling for larger systems. Particularly high efficiency is obtained in combination with an SOS-approximation. In addition, the scaling of the needed disk space decreases with molecule size and becomes sub-quadratic for larger chain lengths. Calculations on DNA strands with up to two base pairs illustrate the potential of the method for computing highly accurate NMR shieldings of biomolecular systems. The new method thus enables the computation of NMR shieldings of large molecules with significantly reduced computational cost and memory requirements.

## SUPPLEMENTARY MATERIAL

See the supplementary material for all computed NMR shieldings from Sec. IV A and the employed xyz-structures for DNA and glycine chains.

## ACKNOWLEDGMENTS

The authors acknowledge the financial support of the “Deutsche Forschungsgemeinschaft” (DFG) under Grant No. SFB 1309-32587107 and the cluster of excellence (Grant No. EXC2111-390814868) “Munich Center for Quantum Science and Technology” (MCQST). M.G. and S.V. acknowledge the “Studienstiftung des Deutschen Volkes” for graduate fellowships. C.O. acknowledges additional financial support as a Max-Planck-Fellow at MPI-FKF Stuttgart.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

## DATA AVAILABILITY

The data that support the findings of this study are available within the article and its supplementary material.

### APPENDIX A: RECURSION FORMULAS FOR **Y**

In this section, the recursion formulas for a matrix **Y** are derived such that $Tr[YA\xi ]=TrBeA\xi $ holds. First, we insert the Taylor series expansion of the matrix exponential

and subsequently differentiate each term using the product rule

In each term, cyclic permutations under the trace are used to move **A**^{ξ} to the far right,

Upon comparing with Eq. (17), the term in round brackets can then be identified with **Y**,

**Y** can be efficiently computed using recursion as shown in the following equations:

In order to ensure a fast convergence and avoid numerical difficulties, we employ a modified version of the “scaling and squaring” approach,^{83,84} which is commonly used for computing matrix exponentials. Instead of using **A** and **B** in Eqs. (A5)–(A8), two matrices **A**′ and **B**′ are employed as follows:

where *n* is an integer number chosen such that |**A**′| < 0.5, and

From the result **Y**′, one can then compute the matrix **Y** as follows:

### APPENDIX B: DIFFERENTIATED RECURSION FORMULAS FOR OBTAINING **Y**^{ξ}

For computing MP2 shieldings, also derivatives of the **Y** matrices from Appendix A are needed. These perturbed **Y** matrices can also be computed using recursions, which are obtained by differentiating the recursion from Eqs. (A5)–(A8) as follows:

with

$Ak\u22121\xi $ in Eq. (B2) can also be computed recursively as follows:

Faster convergence can be obtained using a modified scaling and squaring approach. For this, **A**′ = **A**/*n* and **A**^{′ξ} = **A**^{ξ}/*n* should be used instead of **A** and **A**^{ξ} in the recursion from Eqs. (B2) and (B4). Furthermore, **B** needs to be replaced with **B**′ from Eq. (A10). The result **Y**^{′ξ}, which is computed by carrying out the recursion from Eq. (B2) with **A**′, **A**^{′ξ}, **B**′, and **B**^{′ξ} instead of their respective unprimed counterparts, finally needs to be multiplied with $1n$ in order to obtain **Y**^{ξ}.

### APPENDIX C: EFFICIENT COMPUTATION OF MATRIX EXPONENTIALS AND PERTURBED MATRIX EXPONENTIALS

The Taylor series expansion of a matrix exponential *e*^{A} is given by

The matrix exponential can be evaluated efficiently using recursion as follows:

The recursion from Eq. (C2) can only be expected to converge quickly if the norm of **A** is small. If this is not the case, “scaling and squaring”^{83,84} can be applied. For this, an integer number *n* is chosen such that $A/n<0.5$. Then, the matrix exponential *e*^{A/n} is computed using the recursion from Eq. (C2). The exponential of **A** is obtained by taking *e*^{A/n} to the power *n* as follows:

For the presented MP2-NMR method, also perturbed matrix exponentials of the form $(eA)\xi $ need to be computed. A formula for the recursive computation of the perturbed matrix exponential can be derived by differentiating Eq. (C2) as follows:

If scaling and squaring is used, computing $eA\xi $ requires the recursive computation of $eA/n\xi $ and *e*^{A/n} as follows:

An efficient evaluation of this formula is possible via another recursion,

### APPENDIX D: NESTED Z-VECTOR APPROACH WITH D-CPSCF

There are different formulations of CPSCF, which lead to different equations for **A** and **b**^{B} in Eq. (40). Here, we use the D-CPSCF formulation from Ref. 64, which results in the following expressions for **A** and **b**^{B}:

where

and

with

The difference between **F**^{B} and **F**^{(B)} is that the latter does by definition not include contributions containing the perturbed density **P**^{B}. For computing the second derivative of the density matrix **P**^{Bm}, the nuclear magnetic moment derivatives $AmPB$ and **b**^{Bm} are also needed as shown in Eq. (45),

where

and

where

As it is commonly done in implementations of NMR shieldings, we do not use complex-valued matrices explicitly, but instead we treat purely imaginary matrices using skew-symmetric, real-valued matrices. In this case, one needs to be careful to use the correct sign for each term; whenever a term contains two imaginary matrices, like the term **F**^{m}**PS**^{B} with **F**^{m} and **S**^{B}, an additional sign change needs to be applied because both imaginary matrices carry a factor *i* and *i*^{2} = −1.

Instead of iteratively solving for the full **P**^{Bm} as suggested by Eq. (45), we only solve for the occupied-virtual (ov) and virtual-occupied (vo) subspace projections. For the occupied–occupied (oo) and virtual–virtual (vv) subspace projections, explicit expressions can be derived by differentiating the idempotency condition **PSP** = **P** twice and projecting onto the corresponding subspaces as follows:^{29}

with

The computation of the ov- and vo-projections of **P**^{Bm} in the first term on the right-hand side of Eq. (D9) can be avoided using a Z-vector approach,

Here, **Z**_{ov} and **Z**_{vo} are the ov- and vo-projections of the Z-vector $ZP=A\u22121P$.

After eliminating **P**^{Bm} from the equations, **P**^{m} remains to be eliminated. For this, we will show how terms involving **P**^{m} can be rearranged in the form Tr[…**P**^{m}], which then allows us to apply another Z-vector approach. The terms involving $PooBm$ + $PvvBm$ can be brought into this form by applying cyclic permutations under the trace,

where

Next, we focus on the expression $Tr[Zov+ZvobBm\u2212AmPB]$. First, we insert Eqs. (D5) and (D7). After applying cyclic permutations, we arrive at

where

and

where $Z\u0304\u2261Zov+Zvo$.

Equation (D14) contains the Fock matrix derivative **F**^{m}, which also needs to be circumvented in the all-nuclei formulation. For this, we use the following identities:

where **W** is some general matrix and $GW$ is

Similarly, the terms involving $GXm$ and **Y**^{m} in Eq. (D14) can be rearranged by using Eq. (D19) and then applying cyclic permutations as follows:

Using the results from Eqs. (D18)–(D23) allows us to rewrite the expression $Tr[Zov+Zvo(bBm\u2212Am[PB])]$ in the following form:

where

Combining $O\u0303$, $O[PooBm+PvvBm]$, and $PB$ from the term $Tr[PBPm]$ allows us to compactly express all remaining terms involving **P**^{m} in the form $TrOPm$,

This term now has the proper form for applying a Z-vector approach, which allows us to circumvent **P**^{m},

where

In our implementation, the Z-vector equation for **Z**_{O} is solved iteratively using the DL-CPSCF approach from the study by Beer and Ochsenfeld.^{73}