Conical intersections control excited state reactivity, and thus, elucidating and predicting their geometric and energetic characteristics are crucial for understanding photochemistry. Locating these intersections requires accurate and efficient electronic structure methods. Unfortunately, the most accurate methods (e.g., multireference perturbation theories such as XMS-CASPT2) are computationally challenging for large molecules. The state-interaction state-averaged restricted ensemble referenced Kohn–Sham (SI-SA-REKS) method is a computationally efficient alternative. The application of SI-SA-REKS to photochemistry was previously hampered by a lack of analytical nuclear gradients and nonadiabatic coupling matrix elements. We have recently derived analytical energy derivatives for the SI-SA-REKS method and implemented the method effectively on graphical processing units. We demonstrate that our implementation gives the correct conical intersection topography and energetics for several examples. Furthermore, our implementation of SI-SA-REKS is computationally efficient, with observed sub-quadratic scaling as a function of molecular size. This demonstrates the promise of SI-SA-REKS for excited state dynamics of large molecular systems.

Conical intersections1 (CIs) are molecular geometries where two or more potential energy surfaces intersect and the nonadiabatic coupling between these states is non-vanishing. It has been well established that CIs are dominant players in nonadiabatic transitions, which transfer energy from electronic to vibrational degrees of freedom.2–5 Hence, locating and characterizing CIs are crucial for understanding reaction mechanisms in photochemistry. To accomplish this task, an electronic structure method should meet several requirements. Specifically, it must (1) efficiently evaluate analytical nuclear gradients and nonadiabatic coupling (NAC) vectors,1 (2) include both static and dynamic electron correlation,6–8 and (3) provide an unbiased description of ground and excited states.7,9–11 Given these restrictions, the most accurate methods for describing CIs are the multistate complete active space second-order perturbation theory (MS-CASPT2)12 and its “extended” variant (XMS-CASPT2).13,14 However, these methods are computationally demanding. This has limited their application to molecules with at most tens of atoms. Recent approaches using tensor hypercontraction15 and graphical processing units (GPUs) promise to extend this to hundreds of atoms16,17 but are still under development.

REKS (spin-restricted ensemble-referenced Kohn–Sham method) and its state-averaged (SA-REKS) and state-interaction (SI-SA-REKS) variants18–21 are ensemble density functional approaches that meet all the requirements listed above, but with the computational cost of mean-field methods. We recently reported22 a formalism for analytical energy derivatives of individual states in the SA-REKS and SI-SA-REKS (also denoted as SSR) methods. In this work, we detail and benchmark our graphical processing unit (GPU) implementation of analytical energy derivatives for individual states and NAC vectors. We exploit sparsity in the atomic orbital basis to reduce the computational scaling with respect to molecular size. The formal scaling of the gradient implementation presented here is O(N3), where N is the number of basis functions. However, we present timings that demonstrate an observed scaling of approximately O(N1.76), where the decreased effort is largely due to integral screening. We are able calculate the analytical gradients of individual state energies of molecules with hundreds of atoms and more than 2500 orbitals. Moreover, we perform minimal energy conical intersection (MECI) searches on the trans-penta-2,4-dieniminium cation (C5H8N+, trans-PSB3)23 with SSR and (X)MS-CASPT2. Careful inspection around the MECI shows that SSR gives the correct topography and energetics of conical intersections. A comparison of the MECI structures also indicates that SSR MECIs mimic geometric and energetic features found using MS-CASPT2 but missing in state-averaged complete active space self-consistent field (SA-CASSCF) methods.24–27 

This paper is organized as follows: Sec. II recapitulates the essential formulas for the implementation of analytical nuclear gradients of individual states of SA-REKS and SSR. Section III details the algorithm used in our GPU implementation. Section IV provides the computational details of the performance tests and application examples. Section V presents the computational performance of the implementation and compares SSR, XMS-CASPT2, and SA-CASSCF for local minima and MECIs of the PSB3 molecule.

Throughout this work, we employ the following notation: Ψ and Φ refer to eigenstates of the Hamiltonian. AOs are indexed with μ, ν, κ, γ, and τ. MOs are indexed with p, q, i, and j. The active subset of MOs is indexed with r and s, referring to the lower and upper active orbitals of REKS(2,2). The coefficients defining the MOs in the AO basis are denoted as Cμi. Energies are differentiated with respect to an external perturbation (e.g., displacement of a nuclear coordinate or application of an external field) denoted as λ.

We first briefly review the essential equations for the analytical gradient calculation of individual states in SA-REKS(2,2) and SI-SA-REKS(2,2).13,18,19,32,33 Both methods focus on strongly correlated systems with two fractionally occupied frontier orbitals accommodating two electrons. In SA-REKS(2,2), we focus on two individual states, the ground state, which is reminiscent of the perfectly spin-paired two-configurational wavefunction of generalized valence bond theory,28–30 

(1)

and the lowest excited open-shell singlet (OSS) state obtained by single excitation from Eq. (1),

(2)

The energies of these two states are expressed as weighted sums over the energies of several microstates L (Fig. 1), EL=E[ρLα,ρLβ], depending on the pure-state (single determinant) spin densities ρLσ,σ=α,β, as follows:20,21

(3)
(4)

Here, r and s label the fractionally occupied frontier orbitals and nr and ns are the respective fractional occupation numbers (FONs), where the occupation numbers are constrained such that nr + ns = 2. The occupation patterns of the individual microstates are defined in Fig. 1. The functions

(5)

with δ = 0.4, are used to interpolate between the weakly correlated and strongly correlated limiting cases.19–21 Note that we will often use the single argument form indicated in the last line of Eq. (5), incorporating the constraint on the sum of nr and ns.

FIG. 1.

Orbital occupation of the individual microstates in REKS(2,2).

FIG. 1.

Orbital occupation of the individual microstates in REKS(2,2).

Close modal

Combining the two states leads to the SA-REKS energy functional (the subscript SA is added to the individual state energies in the SA-REKS functional),

(6)

which is minimized with respect to the orbitals and the FONs of the fractionally occupied orbitals in the REKS ground singlet state (GSS). An equiensemble is usually set in Eq. (6), i.e.,. wGSS = wOSS = 1/2, for a balanced treatment of the ground and excited states.

The SI-SA-REKS(2,2) method is an extension of SA-REKS(2,2) that allows the REKS and OSS states to interact. The ground (E1SSR) and excited (E2SSR) state energies in the SI-SA-REKS method (abbreviated as SSR henceforth for brevity) are given by

(7)

where aij are the elements of the eigenvectors of the 2 × 2 secular problem,

(8)

with the interstate coupling element ΔSA defined as

(9)

where Wrs is the off-diagonal Lagrange multiplier between the fractionally occupied orbitals in the SA-REKS one-electron equations,

(10)

Here, the Lagrange multipliers, ϵpq, are defined as

(11)

where the one-electron Fock operators F^q are defined as

(12)

where fq = nq/2 is the average occupation number of the qth spin orbital and the one-electron Fock operators F^Lσ of the individual microstates are given by

(13)

In Eqs. (12) and (13), σ = α, β is the electron spin, ĥ is the one-electron core Hamiltonian, Vxc,Lσ(r)=δExc,L/δρLσ(r) is the exchange–correlation (XC) potential of the respective microstate for the spin channel σ, and nq,Lσ are the (integer) occupation numbers of the orbitals φq(r) in the Lth microstate, as defined in Fig. 1.

Several variations of the Lagrange multipliers defined in Eq. (11), including the form for the averaged state, will be used in the equation

(14)

for an individual state X = REKS/OSS,

(15)

and for a microstate L,

(16)

Since the molecular orbitals are not variationally optimized with respect to the individual state energies of SA-REKS or SSR, analytic gradients require the solution of orbital response from the coupled-perturbed (CP) REKS equations, which boil down to solving the Z-vector equation,

(17)

The matrix A is given as

(18)

where

(19)
(20)
(21)

Here, EL are the energies of the respective microstates and x = nr/2. The Coulombic/exchange–correlation functional contribution in Eq. (18) is given as follows:

(22)

where the symmetrized Hartree–XC kernel f̃Hxc,Lσσ is defined as

(23)
(24)

In Eq. (24), the four terms are Coulomb, exact exchange, DFT exchange, and DFT correlation and gHF and gDFT are the parameters (or functions) defining the fraction of exact and DFT exchange used in a hybrid functional (see Text S1 of the supplementary material).

The right-hand side of Eq. (17), X, takes a different form for SA-REKS and SSR. For SA-REKS, it is defined as

(25)

for the REKS and OSS states, respectively, and the corresponding solutions of the Z-vector equation are denoted as ZREKS and ZOSS. For SSR, X is given by

(26)

where

(27)

where the Coulombic/exchange–correlation term, (rs|f̃Hxct(pq)|pq), is defined similarly to Eq. (22) for t = r, s,

(28)

and the scalar Rsr is given as

(29)

The corresponding solutions of the SSR Z-vector equation are then denoted as ZSSRm, where m = 1, 2 for the ground and excited states, respectively.

Given the appropriate Z vector, the analytical gradient of an individual state X (X = REKS/OSS) in SA-REKS can be calculated with the following equation:

(30)

where

(31)

and the matrix TLσ,λ is the derivative p|F^Lσ|q/λ transformed to the AO basis set representation,

(32)

The primed differentiation symbol denotes that only the AO basis integrals are to be differentiated and not the MO coefficients. The superscript λ denotes differentiation with respect to λ, Vxc,Lσ(r)=δExc,L/δρLσ(r) is the XC potential of the Lth microstate, and

(33)

are the elements of the density matrix for the Lth microstate. Here, gHF and gDFT have the same definition, as in Eq. (24) (more information available in the supplementary material, Text S1). The matrices Q(1)X and Q(2)X are defined as

(34)
(35)

where

(36)

The energies of the individual SSR states are given by Eq. (8) in terms of the energies of the diabatic states (ESAREKS and ESAOSS) and the coupling element ΔSA. Therefore, the analytical gradient of the individual states of SSR is given as

(37)

Using the results for SA-REKS gradients, this can be expressed as

(38)

In Eq. (38), ZpqSSRm are the elements of the vector ZSSRm, which is the solution of the Z-vector equation given in Eq. (17) using the right-hand side X given by Eq. (26). The microstate weight coefficient for the mth SSR eigenstate, CLSSRm, is defined as

(39)

and the matrices RLσSSRm, Q(1)SSR1, and Q(2)SSR1 are given by

(40)
(41)
(42)

where RL,νμσΔ=nrnr,Lσnsns,LσCνsCrμ and RLσ(m), Q(1)(m), and Q(2)(m) are obtained from Eqs. (31), (34), and (35), respectively, by replacing ZX with ZSSRm. The matrices Q(1)Δ and Q(2)Δ are given by

(43)
(44)

The two vectors that define the branching plane of the conical intersection (CI) are the gradient difference vector (GDV),

(45)

and the derivative coupling vector (DCV),

(46)

For SSR, these two vectors can be calculated as

(47)

where E1SSR, E2SSR and ESAREKS, ESAOSS are the gradients of the adiabatic and diabatic states, respectively. Here, we need to explicitly calculate the gradient of the coupling element ΔSA,

(48)

where ZΔ is the solution of Eq. (17) with −XΔ from Eq. (27) on the right-hand side. The quantities RLσcoup, Q(1)coup, and Q(2)coup are obtained by replacing XZ with ZΔ in Eqs. (31), (34), and (35), respectively.

In practice, we do not explicitly calculate all the gradients E1SSR, E2SSR, ESAREKS, ESAOSS, and ∇ΔSA to obtain the GDV and DCV. Instead, only E1SSR and ∇ΔSA are explicitly formed. Then, E2SSR=2ESA-REKSE1SSR. This is advantageous because ESA-REKS can be obtained without solving any CP equations (see Text S2 of the supplementary material). ESAREKS and ESAOSS can then be solved from Eq. (37), and thus, the DCV can be obtained from Eq. (47).

The success of GPUs in accelerating electronic structure methods is largely attributed to highly efficient GPU algorithms for calculating two-electron integrals.31–36 The details of our GPU algorithms for the J and K matrices (Coulomb and exchange operators, respectively) and the DFT quadrature have been previously published.37–43 These algorithms were first implemented for ground state SCF methods such as HF and DFT, which are naturally formulated with these integrals. GPU acceleration was later extended to single reference excited-state methods, such as configuration interaction singles (CIS) and time-dependent density functional theory (TDDFT).44 More recently, the complete active space self-consistent field (CASSCF) method was reformulated in terms of J and K matrices,27 which greatly increased the applicability of CASSCF to large molecular systems. Here, we show that these same primitive building blocks are sufficient for REKS, CP-REKS, and SSR gradient evaluation.

The Coulomb- and exchange-like matrices elements are given by

(49)
(50)

where D are usually some type of density or transition density matrices. Highly effective sorting45–47 and screening48–56 algorithms are used in building these matrices to exploit the sparsity of both electron repulsion integrals and the density matrix. Primitive Gaussian basis function pairs are sorted by their Schwarz bounds,57 and negligible pairs smaller than a prescreening threshold are removed. Surviving pairs are transferred to the GPUs, where the ERIs are built and contracted with the density matrices in a direct SCF approach,58 without storage of intermediate quantities. Because many widely available GPUs have much more efficient single precision arithmetic (compared to double precision), mixed precision33,39 and dynamic precision59 approaches have been adopted to balance performance and accuracy. ERIs with density-weighted Schwarz bounds less than a threshold, δspdp, are calculated with single precision and are then accumulated into the J or K matrix with double precision.

For density functional calculations, we also need to evaluate the exchange–correlation energy Exc and its functional derivatives in various orders. The single point energy evaluation of REKS, SA-REKS, and SSR involves building Exc and the matrix elements of its first order functional derivative, vxc, also known as the exchange-correlation potential, in the AO basis,60 

(51)

In practice, these integrals are evaluated with numerical quadrature.37,61 The most computationally demanding parts of the quadrature evaluation, including generating quadrature weights and assembling the integral from the values on grid points, are intrinsically data parallel and well-suited to evaluation on GPUs.43,62 Other operations, including the evaluation of the exchange–correlation functional at each grid point, are done on the CPU.

With the basic primitive operations defined in Eqs. (49)(51), the (SI)-(SA)-REKS energy evaluation can be implemented efficiently through a self-consistent field (SCF) procedure similar to normal Hartree–Fock or DFT calculations. Many commonly used SCF algorithms can be easily adapted for REKS. Specifically, we implemented the REKS SCF with both the original Roothaan repeated diagonalization algorithm63 and Pulay’s direct inversion in the iterative subspace (DIIS),64 as shown in Algorithms 1 and 2. Mixed33,39 and dynamic59 precision schemes for accelerating J and K builds on GPUs during SCF iterations are also adapted here for REKS. There are two features that make the REKS SCF different from the normal SCF: (1) composing an effective Fock matrix FREKS from the microstate Fock matrices FLσ, as shown in Algorithm 3, and (2) self-consistently updating the FON, which we perform using the Newton–Raphson method. Hence, each REKS SCF iteration contains two spin-restricted Fock-like builds for microstates 1 and 2 and two spin-unrestricted Fock-like builds for microstates 3 and 5, given the relationship between microstates 3 and 4 (or 5 and 6). In total, one REKS SCF iteration includes six J builds, six K builds, and six vxc builds.

ALGORITHM 1

REKS SCF energy calculation using the Roothaan scheme.

Obtain the initial guess of MO coefficients Cμi 
k = 0 
do whileeSCF> threshold and k < max_iteration and k ≠ 0 
Build density matrices 
Pμν00=p=1NcoreCμpCνp;Pμν01=Pμν00+CμsCνsPμν10=Pμν00+CμrCνr;Pμν11=Pμν00+CμrCνr+CμsCνs 
Assign the microstate densities PLα,PLβ 
(P1α,P1β)=(P10,P10);(P2α,P2β)=(P01,P01)(P3α,P3β)=(P4β,P4α)=(P10,P01);(P5α,P5β)=(P6β,P6α)=(P11,P00) 
Build Fock matrix (AO) for each distinct microstate on GPUs, L = 1, 2, 3, 5      Eqs. (49) and (50) 
FLα=Hcore+J(PLα+PLβ)K(PLα)+vxcαFLβ=Hcore+J(PLα+PLβ)K(PLβ)+vxcβF4α=F3β;F4β=F3α;F6α=F5β;F6β=F5α 
Compute microstate energies: EL=12μνσ(Hμνcore+FL,μνσ)PL,μνσ 
Update occupation number nr, ns (x = nr/2) with the Newton–Raphson method to satisfy equations 
EREKSxx=xmin=EREKSx+2EREKSx2(xminx)=0EREKSxx=xmin=Err¯Ess¯+fint(x)x+2fint(x)x2xminxErs2+Er¯s¯2Er¯s2Ers¯2=0 
Update CL: C1=nr2;C1=ns2;C3=C4=C5=C6=12fint(nr2)       Eq. (3) 
Compute EREKS=L=16CLREKSEL       Eq. (3) 
Assemble FREKS from FLα,FLβ in MO space with Algorithm3  
eSCF=maxij{FijREKS} 
Diagonalize FREKS: FREKSC̃=C̃ε 
Update MO coefficients: 
C(k+1)=C(k)C̃ 
k = k + 1 
Obtain the initial guess of MO coefficients Cμi 
k = 0 
do whileeSCF> threshold and k < max_iteration and k ≠ 0 
Build density matrices 
Pμν00=p=1NcoreCμpCνp;Pμν01=Pμν00+CμsCνsPμν10=Pμν00+CμrCνr;Pμν11=Pμν00+CμrCνr+CμsCνs 
Assign the microstate densities PLα,PLβ 
(P1α,P1β)=(P10,P10);(P2α,P2β)=(P01,P01)(P3α,P3β)=(P4β,P4α)=(P10,P01);(P5α,P5β)=(P6β,P6α)=(P11,P00) 
Build Fock matrix (AO) for each distinct microstate on GPUs, L = 1, 2, 3, 5      Eqs. (49) and (50) 
FLα=Hcore+J(PLα+PLβ)K(PLα)+vxcαFLβ=Hcore+J(PLα+PLβ)K(PLβ)+vxcβF4α=F3β;F4β=F3α;F6α=F5β;F6β=F5α 
Compute microstate energies: EL=12μνσ(Hμνcore+FL,μνσ)PL,μνσ 
Update occupation number nr, ns (x = nr/2) with the Newton–Raphson method to satisfy equations 
EREKSxx=xmin=EREKSx+2EREKSx2(xminx)=0EREKSxx=xmin=Err¯Ess¯+fint(x)x+2fint(x)x2xminxErs2+Er¯s¯2Er¯s2Ers¯2=0 
Update CL: C1=nr2;C1=ns2;C3=C4=C5=C6=12fint(nr2)       Eq. (3) 
Compute EREKS=L=16CLREKSEL       Eq. (3) 
Assemble FREKS from FLα,FLβ in MO space with Algorithm3  
eSCF=maxij{FijREKS} 
Diagonalize FREKS: FREKSC̃=C̃ε 
Update MO coefficients: 
C(k+1)=C(k)C̃ 
k = k + 1 
ALGORITHM 2

REKS SCF energy calculation with DIIS.

Obtain the initial guess of MO coefficients Cμi 
k = 0 
loop while (eDIIS< threshold ork>max_DIIS_iteration) and k ≠ 0 
Build density matrices       Algorithm1  
Assign the values of microstate densities PLα,PLβ       Algorithm1  
Build Fock matrices (AO) for microstates       Algorithm1  
Compute microstate energies       Algorithm1  
Update occupation number nr, ns       Algorithm1  
Update CL       Algorithm1  
Compute EREKS       Algorithm1  
Assemble FREKS in MO space Algorithm3  
Compute the DIIS error matrix: 
D=diag{11Ncore,fr,fs,00Nvirt} 
ek=XSC(FREKSDDFREKS)CSX 
Bik=tr(eiTek), eDIIS = ||ek|| 
Transform FREKS to AO space as F̃(k)=SCFREKSCS and store *notes below 
Solve the DIIS linear equation: 
0111B11B1k1Bk1Bkkλξ1ξk=100 
Construct the interpolated Fock matrix: F̃int=ikξiF̃(i) 
Transform back to MO space: Fint=CF̃intC*notes below 
Diagonalize the interpolated Fock matrix: FintC̃=C̃ε 
Update MO coefficients: C(k+1)=C(k)C̃ 
k = k + 1 
* Further details about the transformation of FREKS to AO are available in Text S4 of the supplementary material
Obtain the initial guess of MO coefficients Cμi 
k = 0 
loop while (eDIIS< threshold ork>max_DIIS_iteration) and k ≠ 0 
Build density matrices       Algorithm1  
Assign the values of microstate densities PLα,PLβ       Algorithm1  
Build Fock matrices (AO) for microstates       Algorithm1  
Compute microstate energies       Algorithm1  
Update occupation number nr, ns       Algorithm1  
Update CL       Algorithm1  
Compute EREKS       Algorithm1  
Assemble FREKS in MO space Algorithm3  
Compute the DIIS error matrix: 
D=diag{11Ncore,fr,fs,00Nvirt} 
ek=XSC(FREKSDDFREKS)CSX 
Bik=tr(eiTek), eDIIS = ||ek|| 
Transform FREKS to AO space as F̃(k)=SCFREKSCS and store *notes below 
Solve the DIIS linear equation: 
0111B11B1k1Bk1Bkkλξ1ξk=100 
Construct the interpolated Fock matrix: F̃int=ikξiF̃(i) 
Transform back to MO space: Fint=CF̃intC*notes below 
Diagonalize the interpolated Fock matrix: FintC̃=C̃ε 
Update MO coefficients: C(k+1)=C(k)C̃ 
k = k + 1 
* Further details about the transformation of FREKS to AO are available in Text S4 of the supplementary material
ALGORITHM 3

Assembling FREKS from FLα,FLβ in the MO space.

Compute weighting factor for each block: 
Diagonal blocks: 
wL,cσ=0.5CL;wL,rσ=0.5CLnL,rσ;wL,sσ=0.5CLnL,sσ 
Off-diagonal blocks: 
wL,pqσ=0.5CL(nL,pσnL,qσ) 
Assemble the effective Fock operator FREKS from FLα,FLβ
FpqREKS=LLmaxσ=α,βwL,pqσFL,pqσ 
Compute weighting factor for each block: 
Diagonal blocks: 
wL,cσ=0.5CL;wL,rσ=0.5CLnL,rσ;wL,sσ=0.5CLnL,sσ 
Off-diagonal blocks: 
wL,pqσ=0.5CL(nL,pσnL,qσ) 
Assemble the effective Fock operator FREKS from FLα,FLβ
FpqREKS=LLmaxσ=α,βwL,pqσFL,pqσ 

The second functional derivatives of the exchange–correlation functional in the MO basis are given by

(52)

This four-index tensor is needed in TDDFT65 (or TDDFT/TDA66), CPKS (coupled-perturbed Kohn–Sham),67 and our CPREKS equations. Expanded analytical expressions of Eq. (52) in terms of ρσ, γσ, 2εxcρσρσ, 2εxcγσσγσσ, and 2εxcγσσρσ are needed for implementation and have been provided previously.68 In our GPU implementation,44 we compute this tensor in the contracted form in the AO basis,

(53)

where the second functional derivatives at quadrature grid points are evaluated for a reference state with density Pα, Pβ. These derivatives are evaluated only once and saved in memory. The density matrices Dα, Dβ are usually updated through iterations of certain iterative solvers such as Davidson69 for TDDFT or conjugate gradient (CG), for CPKS and CPREKS. In each iteration, the saved functional derivatives are contracted with Dα/Dβ and the basis function values and then summed over the quadrature grid to form matrix elements. For CPREKS, we denote this integral for each microstate L as

(54)

for simplicity (see also Algorithms 5 and 6).

The CPREKS equation [Eq. (17)] is a large scale, symmetric positive definite linear system, where the matrix A is of dimension MxM, where M = O(noccnvirt) and nocc/nvirt are the numbers of occupied/virtual orbitals, respectively. Following standard practice, we use the iterative preconditioned conjugate gradient70 (PCG) method to avoid the O(nocc3nvirt3) cost of solving Eq. (17) with direct inversion. In each iteration, we evaluate the inner product of A and trial vector Y, which eventually converges to Z. The whole procedure is shown in Algorithm 4. The most crucial part in PCG is the evaluation of the AY product, which can be efficiently implemented with the following expression:

(55)
(56)
(57)

where matrix A is partitioned into A1e and A2e, involving only the one-electron and two-electron integrals, respectively. In Eq. (57), the auxiliary matrix ΛLσ is defined as

(58)

where auxiliary matrix R in Eq. (58) has the same form as XR defined in Eq. (31) but with ZX replaced by trial vector Y. The pseudocode for calculating AY is shown in Algorithm 6. When MO integrals are needed, these are obtained by transformation of mean-field operators in the AO basis (e.g., J and K matrices for specified input density matrices). The cost for evaluating the above integrals can be significantly decreased by leveraging relationships between the spin densities of different microstates. First, the evaluation of microstates L = 4, 6 can be omitted because they have symmetric spin density with L = 3 and 5, respectively. Second, the cost of evaluating integrals for closed-shell microstates (L = 1, 2) is halved since the auxiliary density matrix, RLσ, is symmetric in spin, and Kμν(D) only needs to be evaluated once for these microstates. The cost for ML,μνσ,(Da,Dβ) [Eq. (54)] can also be reduced. On the one hand, at each quadrature grid point, because PLα=PLβ for L = 1, 2, some of the second functional derivatives can be reused (e.g., 2excραρα=2excρβρβ); on the other hand, the second functional derivatives only need to be contracted with RLσ and are only evaluated (and stored for future use) in the first iteration of CPREKS. With these simplifications, the computation of one AY product for CPREKS takes roughly 3x/6x the time for an AY build in CP-UKS (CP-unrestricted Kohn–Sham) and CP-RKS (CP-restricted Kohn–Sham), respectively.

ALGORITHM 4

Solution of the CPREKS equation.

Build right-hand side: 
Build XGSS, XOSS       Eq. (25) 
if SI-SA-REKS: 
Build XΔ using Algorithm5       Eq. (27) 
Build XSSRm       Eq. (26) 
Initialize the preconditioned conjugate gradient (PCG): 
D1=diag{1/dij};dij=Aij,ijle       Eq. (18) 
r(0)=XAZ(0);p(1)=z(0)=D1r(0);k=1 
do whiler(k)22 < Threshold 
ifk > 1: 
p(k)=z(k1)+(r(k))Tz(k)(r(k1))Tz(k1)p(k1) 
Construct Ap(k) with Algorithm6  
μ=rz(k)(p(k))TAp(k) 
Z(k)=Z(k1)+μp(k) 
r(k)=r(k1)μAp(k) 
z(k) = D−1r(k) 
k = k + 1 
Build right-hand side: 
Build XGSS, XOSS       Eq. (25) 
if SI-SA-REKS: 
Build XΔ using Algorithm5       Eq. (27) 
Build XSSRm       Eq. (26) 
Initialize the preconditioned conjugate gradient (PCG): 
D1=diag{1/dij};dij=Aij,ijle       Eq. (18) 
r(0)=XAZ(0);p(1)=z(0)=D1r(0);k=1 
do whiler(k)22 < Threshold 
ifk > 1: 
p(k)=z(k1)+(r(k))Tz(k)(r(k1))Tz(k1)p(k1) 
Construct Ap(k) with Algorithm6  
μ=rz(k)(p(k))TAp(k) 
Z(k)=Z(k1)+μp(k) 
r(k)=r(k1)μAp(k) 
z(k) = D−1r(k) 
k = k + 1 

Many other two-electron integrals involved in the evaluation of the SSR gradient can be treated with very similar algorithms as the AY product (Algorithm 6). The evaluation of Q(2) [Eq. (35)] is only different from Algorithm 6 in the final AO to MO transformation, so Q(2) can be readily formed at the last step of CG iteration of CPREKS without extra integral evaluations. Algorithm 5 for evaluating XΔ [Eq. (27)] is just a simplified version of Algorithm 6, with fewer ERI evaluations, and the evaluation of Q(2)Δ [Eq. (44)] only differs from XΔ in the AO to MO transformation.

ALGORITHM 5

Building XΔ.

Build one-electron terms 
XΔpq1e=δqs2nrϵprSArnsϵprSAs+δqr2nrϵpsSArnsϵpsSAs+G(1)RsrΩpq 
Pμν(rs)=CμrCνs 
Construct half transformed ERIs: 
Jμν(P(rs))=(μν|rs), Kμν(P(rs))=(μr|νs)       Eq. (49 ) 
K̃(P(rs))=12(K(P(rs))+(K(P(rs)))T)       Eq. (50) 
for each distinct microstate, L = 1, 2, 3, 5 
Build auxiliary density matrices 
PL,μνΔ,σ=nsns,Lσnrnr,LσCrμCνs 
Rescale the ERIs: 
Jμν(PLΔ)=σnsns,Lσnrnr,LσJμν(P(rs))K̃μν(PLΔ,σ)=nsns,Lσnrnr,LσK̃μν(P(rs)) 
Construct the DFT integrals ML,μνσ,(PLΔ,α,PLΔ,β)       Eq. (54) 
Let integral IL,μνσ=Jμν(PΔ)K̃μν(PΔ,σ)+ML,μνσ,(PLΔ,α,PLΔ,β)       Eq. (24) 
Transform the integral into MO space and add to Xpq 
βL = 1 for L = 1, 2; βL = 2 for L = 3, 5; 
XΔpq+=βLCLσnp,Lσnq,LσμνCpμCνqIL,μνσ       Eq. (27) 
Build one-electron terms 
XΔpq1e=δqs2nrϵprSArnsϵprSAs+δqr2nrϵpsSArnsϵpsSAs+G(1)RsrΩpq 
Pμν(rs)=CμrCνs 
Construct half transformed ERIs: 
Jμν(P(rs))=(μν|rs), Kμν(P(rs))=(μr|νs)       Eq. (49 ) 
K̃(P(rs))=12(K(P(rs))+(K(P(rs)))T)       Eq. (50) 
for each distinct microstate, L = 1, 2, 3, 5 
Build auxiliary density matrices 
PL,μνΔ,σ=nsns,Lσnrnr,LσCrμCνs 
Rescale the ERIs: 
Jμν(PLΔ)=σnsns,Lσnrnr,LσJμν(P(rs))K̃μν(PLΔ,σ)=nsns,Lσnrnr,LσK̃μν(P(rs)) 
Construct the DFT integrals ML,μνσ,(PLΔ,α,PLΔ,β)       Eq. (54) 
Let integral IL,μνσ=Jμν(PΔ)K̃μν(PΔ,σ)+ML,μνσ,(PLΔ,α,PLΔ,β)       Eq. (24) 
Transform the integral into MO space and add to Xpq 
βL = 1 for L = 1, 2; βL = 2 for L = 3, 5; 
XΔpq+=βLCLσnp,Lσnq,LσμνCpμCνqIL,μνσ       Eq. (27) 
ALGORITHM 6

Building the matrix-vector product AY.

Build one-electron terms: A1eYij         Eq. (56) 
for distinct microstate, L = 1, 2, 3, 5 
Build the auxiliary density matrix:         Eq. (31) 
RL,μνσ=i<jYij(ni,Lσnj,Lσ)CiμCνjRL,μν=RL,μνα+RL,μνβ 
Construct ERIs: 
Jμν(RL), Kμν(PLσ)         Eqs. (49) and (50) 
K̃μν(RLσ)=12(Kμν(RLσ)+(Kμν(RLσ))T) 
Construct the DFT integrals ML,μνσ,(RLα,RLβ)         Eq. (54) 
Let integral ΛL,μνσ=Jμν(RL)K̃μν(RLσ)+ML,μνσ,(RLσ,RLσ)         Eq. (58) 
Transform the integral into MO space and add to A2eYpq 
βL = 1 for L = 1, 2; βL = 2 for L = 3, 5; 
A2eYpq+=βLCLσnp,Lσnq,LσμνCpμCνqΛL,μνσ         Eq. (57) 
Build one-electron terms: A1eYij         Eq. (56) 
for distinct microstate, L = 1, 2, 3, 5 
Build the auxiliary density matrix:         Eq. (31) 
RL,μνσ=i<jYij(ni,Lσnj,Lσ)CiμCνjRL,μν=RL,μνα+RL,μνβ 
Construct ERIs: 
Jμν(RL), Kμν(PLσ)         Eqs. (49) and (50) 
K̃μν(RLσ)=12(Kμν(RLσ)+(Kμν(RLσ))T) 
Construct the DFT integrals ML,μνσ,(RLα,RLβ)         Eq. (54) 
Let integral ΛL,μνσ=Jμν(RL)K̃μν(RLσ)+ML,μνσ,(RLσ,RLσ)         Eq. (58) 
Transform the integral into MO space and add to A2eYpq 
βL = 1 for L = 1, 2; βL = 2 for L = 3, 5; 
A2eYpq+=βLCLσnp,Lσnq,LσμνCpμCνqΛL,μνσ         Eq. (57) 

For the analytical gradient evaluation of the energy of an individual SSR state, the challenging part is the derivative two-electron integral evaluation. These involve the generalized derivative J and K matrices,

(59)
(60)

and the derivative of the exchange–correlation potential vxc,

(61)

This last integral can be efficiently implemented, provided that functional second derivatives and gradients of DFT grid weights are available. The GPU implementation of such integrals has already been realized in previous work on TDDFT.44 It is noteworthy that the TDDFT gradient evaluation involves the third order functional derivatives of the XC functional,67 which are not needed in SA-REKS or SSR. Algorithm 7 provides the pseudocode for calculating the gradient of an individual SSR state.

ALGORITHM 7

Calculating the SSR gradient of SI-SA-REKS.

βL = 1 for L = 1, 2; βL = 2 for L = 3, 5; 
Calculate with the standard RKS gradient subroutine for microstate L = 1,2: 
ELλunrelaxed=ELλ12i,jall(εijLi+εijLj)Sjiλ 
Calculate ELλunrelaxed with the standard UKS gradient subroutine for microstate L = 3, 5 
Solve the CPREKS equation and get Z vector ZSSRm with Algorithm4       Eq. (17) 
Assemble the unrelaxed microstate gradient: 
EλSSRm=L=1,2,3,5βL(CLSSRm+wGSSG(1)ILp<qZpqSSRmΩpq)ELλunrelaxed       Eqs. (30), (39), and (19)(21) 
Calculate one-body relaxation contribution to the gradient: 
One-electron gradient: EλSSRm+=μνRμνSSRmhμνλ, where RμνSSRm=L=16σRμνσ,LSSRm       Eq. (40) 
Overlap gradient: 
Construct Qpq(1), Qpq(1)Δ       Eq. (34) 
Construct Q(2)
Construct ΛL,μνσ in Algorithm6 for vector ZSSRm 
Transform the integral into MO space: 
Qpq(2)=12L=1,2,3,5βLCLσnp,Lσ+nq,LσμνCpμCνqΛL,μνσ       Eq. (35) 
Construct Q(2)Δ
Take IL,μνσ in Algorithm5  
Transform the integral into MO space: 
Qpq(2)Δ=12L=1,2,3,5βLCLσnp,Lσ+nq,LσμνCpμCνqIL,μνσ 
EλSSRm+=tr(Q(1)SSRm+Q(2)SSRm)Sλ       Eq. (30) 
Calculate two-body relaxation contribution to the gradient: 
Construct the J gradient: EλSSRm+=βLCLJ(PL,RLSSRm)λ,L=1,2,3,5       Eq. (59) 
Construct the K gradient:       Eq. (60) 
EλSSRm+=2βLCLK(PLα,RLσSSRm)λ,L=1,2 
EλSSRm+=βLCLK(PLα,RLσSSRm)λ+K(PLβ,RLβSSRm)λ,L=3,5 
Construct the DFT gradient:       Eq. (61) 
EλSSRm+=2βLCLvxc(PLα,RLσSSRm)λ,L=1,2 
EλSSRm+=βLCLvxc(PLα,RLσSSRm)λ+vxc(PLβ,RLβSSRm)λ,L=3,5 
βL = 1 for L = 1, 2; βL = 2 for L = 3, 5; 
Calculate with the standard RKS gradient subroutine for microstate L = 1,2: 
ELλunrelaxed=ELλ12i,jall(εijLi+εijLj)Sjiλ 
Calculate ELλunrelaxed with the standard UKS gradient subroutine for microstate L = 3, 5 
Solve the CPREKS equation and get Z vector ZSSRm with Algorithm4       Eq. (17) 
Assemble the unrelaxed microstate gradient: 
EλSSRm=L=1,2,3,5βL(CLSSRm+wGSSG(1)ILp<qZpqSSRmΩpq)ELλunrelaxed       Eqs. (30), (39), and (19)(21) 
Calculate one-body relaxation contribution to the gradient: 
One-electron gradient: EλSSRm+=μνRμνSSRmhμνλ, where RμνSSRm=L=16σRμνσ,LSSRm       Eq. (40) 
Overlap gradient: 
Construct Qpq(1), Qpq(1)Δ       Eq. (34) 
Construct Q(2)
Construct ΛL,μνσ in Algorithm6 for vector ZSSRm 
Transform the integral into MO space: 
Qpq(2)=12L=1,2,3,5βLCLσnp,Lσ+nq,LσμνCpμCνqΛL,μνσ       Eq. (35) 
Construct Q(2)Δ
Take IL,μνσ in Algorithm5  
Transform the integral into MO space: 
Qpq(2)Δ=12L=1,2,3,5βLCLσnp,Lσ+nq,LσμνCpμCνqIL,μνσ 
EλSSRm+=tr(Q(1)SSRm+Q(2)SSRm)Sλ       Eq. (30) 
Calculate two-body relaxation contribution to the gradient: 
Construct the J gradient: EλSSRm+=βLCLJ(PL,RLSSRm)λ,L=1,2,3,5       Eq. (59) 
Construct the K gradient:       Eq. (60) 
EλSSRm+=2βLCLK(PLα,RLσSSRm)λ,L=1,2 
EλSSRm+=βLCLK(PLα,RLσSSRm)λ+K(PLβ,RLβSSRm)λ,L=3,5 
Construct the DFT gradient:       Eq. (61) 
EλSSRm+=2βLCLvxc(PLα,RLσSSRm)λ,L=1,2 
EλSSRm+=βLCLvxc(PLα,RLσSSRm)λ+vxc(PLβ,RLβSSRm)λ,L=3,5 

Our SSR CP equation and the gradient algorithm have been implemented within the TeraChem33,39–41 electronic structure package. Timings for SSR have been obtained using Intel Xeon X5570 CPUs clocked at 2.93 GHz and NVIDIA GeForce GTX TITAN GPUs. For comparison, restricted time-dependent density functional theory (RTDDFT) calculations have been conducted with TeraChem using the same GPUs and CPUs, and XMS-CASPT212 calculations have been conducted with the CPU-based electronic structure package BAGEL.71 Timings for BAGEL have been obtained using a single core of the faster Intel Xeon E5-2643 CPU clocked at 3.40 GHz. In order to test the efficiency of the CPREKS equation and SSR individual state gradients, we benchmark our implementation for a series of trans-PSB3 molecules microsolvated by up to 186 water molecules (see Fig. 2 and Fig. S1 and Tables S1–S12 of the supplementary material). Molecular geometries were obtained using the Amber 1272 simulation package by taking snapshots from classical molecular dynamics, using the general AMBER force field (GAFF)73,74 for the solute molecule and the TIP3P75 force field for surrounding water molecules. The ωPBEh76 exchange–correlation functional with 6-31G77 and 6-31G*78 basis sets was used for the performance tests. The (2e,2o) active space is used for both SSR and XMS-CASPT2. For XMS-CASPT2 calculations, BAGEL automatically sets a group of core orbitals to be frozen (see Table S13 of the supplementary material), and density fitting79 with the svp-jkfit auxiliary basis set80 is applied.

FIG. 2.

Structures for the benchmark systems, trans-PSB3 with increasing levels of microsolvation by water molecules. For each system, the number of water molecules and the number of orbitals with 6-31G and 6-31G* basis sets are listed in parentheses. Water molecules are included in the clusters according to a distance cutoff increasing from 2.5 Å to 9 Å. Carbon, nitrogen, oxygen, and hydrogen are shown in cyan, blue, red, and white, respectively.

FIG. 2.

Structures for the benchmark systems, trans-PSB3 with increasing levels of microsolvation by water molecules. For each system, the number of water molecules and the number of orbitals with 6-31G and 6-31G* basis sets are listed in parentheses. Water molecules are included in the clusters according to a distance cutoff increasing from 2.5 Å to 9 Å. Carbon, nitrogen, oxygen, and hydrogen are shown in cyan, blue, red, and white, respectively.

Close modal

We first assess the accuracy of SSR in locating and describing conical intersections by comparing the PBS3 central bond torsion minimum energy conical intersection1 (MECICEN) optimized using SSR, MS-CASPT2, and XMS-CASPT2.13 In all three methods, the orbitals are determined by state-averaging over the lowest two singlet states. We used a (2e,2o) active space for SSR and a (6e,6o) active space for (X)MS-CASPT2. All calculations use the cc-pVDZ81 basis set, with the cc-pvdz-jkfit auxiliary basis set82 for density fitting79 in (X)MS-CASPT2 calculations. The SSR conical intersection searches were performed using the Lagrange–Newton algorithm83 as implemented in the DL-FIND optimization package.84 Three different exchange–correlation functionals were used for SSR: ωPBEh,76 BH&HLYP,85 and Hartree–Fock exchange. The MS-CASPT2 and XMS-CASPT2 calculations were performed with BAGEL.86 The (X)MS-CASPT2 MECI geometries were taken from previous work.87 For BAGEL calculations, the same active space, basis set, and auxiliary basis set are used as in previous work,87 with an SS-SR contraction scheme and a level shift of 0.5 Eh. To compare the topography of the MECIs obtained from different methods, we investigated the branching plane spanned by the g/h vector.1 The g/h vectors are orthogonalized with the procedure proposed by Yarkony.88 The topography descriptors1 Δgh, dgh, and (sx, sy) are then calculated based on the orthogonal g¯/h¯ vectors. The g¯/h¯ vectors can be further normalized as the “unscaled intersection adapted coordinates,”89x=g¯/g¯, y=h¯/h¯. To directly compare the cone plots of the branching planes generated by different methods, the x′/y′ vectors are rotated within the x′–y′ plane to maximize their overlap with the reference xref/yref vectors from XMS-CASPT2. The plots of the gap around the MECIs and the cone plots of the branching plane are generated with these rotated x¯/y¯ vectors. A detailed description of this rotation procedure is available in Text S3 of the supplementary material.

We further investigated the geometries and energetics of different PSB3 minima and MECIs. We started optimizations for these minima and MECIs from geometries previously obtained with SA-3-MS-CASPT2(6,6)/6-31G*.90 Optimizations were performed using SSR(2,2)/cc-pVDZ and three different exchange–correlation functionals: ωPBEh, BH&HLYP, and Hartree–Fock exchange. Cartesian coordinates for all optimized structures are provided in Tables S14–S31 of the supplementary material .

In this section, we first compare the computational performance of the analytical gradient evaluation for SSR, XMS-CASPT2(2), and time-dependent density functional theory. Then, we decompose the computational cost of the SSR analytical gradient evaluation and inspect the impact of applying two acceleration strategies: mixed precision and ultra-sparse DFT quadrature. Finally, we assess the quality of SSR predicted conical intersections and other stationary structures of trans-PSB3 by comparison to XMS-CASPT2.

The SSR method could be a computationally efficient alternative to high-level multi-reference methods to include both dynamic and static electron correlation in a balanced way. To test this, we compared the timings of SSR/ωPBEh vs XMS-CAS(2,2)PT2 and restricted closed-shell time-dependent density functional theory (RTDDFT), as shown in Fig. 3(a). The timings include the total runtime of the corresponding energy and gradient evaluation for all methods. Since analytical energy gradients are usually evaluated in the context of MD simulation or geometry optimization, a very good initial guess for the MO coefficients is usually available by extrapolation of a few previous steps.91,92 To mimic this behavior, we took the converged SA-CASSCF MOs, converged SSR MOs, and converged ground state Kohn–Sham MOs as the initial guess for XMS-CAS(2,2)PT2, SSR/ωPBEh, and RTDDFT, respectively.

FIG. 3.

(a) Timings (in s) for the individual state gradient evaluation for XMS-CAS(2,2)PT2, SSR/ωPBEh, and RTDDFT/ωPBEh for trans-PSB3 and a series of microsolvated trans-PSB3 molecules using the 6-31G basis set. To focus on the gradient timing and also mimic ab initio molecular dynamics simulations (where a good initial guess for MOs is available from the previous time step), we start with converged MOs for each method. Timings include energy and gradient evaluation. (b) Ratio of the timings for SSR and RTDDFT for the calculations in (a). The total runtime (blue dots) is divided into three major components: time for obtaining energy, for solving the CP equation, and for gradient evaluation. We also compared the averaged time per iteration for REKS SCF vs TDDFT RKS SCF, REKS SCF vs RTDDFT Davidson iterations, and REKS CPREKS vs RTDDFT CPKS. All REKS and TDDFT calculations are conducted with TeraChem using a single core of Intel Xeon X5570 CPU clocked at 2.93 GHz and one NVIDIA GeForce GTX TITAN GPU. XMS-CASPT2 calculations are conducted with BAGEL on a single core of the faster Intel Xeon E5-2643 CPU clocked at 3.40 GHz.

FIG. 3.

(a) Timings (in s) for the individual state gradient evaluation for XMS-CAS(2,2)PT2, SSR/ωPBEh, and RTDDFT/ωPBEh for trans-PSB3 and a series of microsolvated trans-PSB3 molecules using the 6-31G basis set. To focus on the gradient timing and also mimic ab initio molecular dynamics simulations (where a good initial guess for MOs is available from the previous time step), we start with converged MOs for each method. Timings include energy and gradient evaluation. (b) Ratio of the timings for SSR and RTDDFT for the calculations in (a). The total runtime (blue dots) is divided into three major components: time for obtaining energy, for solving the CP equation, and for gradient evaluation. We also compared the averaged time per iteration for REKS SCF vs TDDFT RKS SCF, REKS SCF vs RTDDFT Davidson iterations, and REKS CPREKS vs RTDDFT CPKS. All REKS and TDDFT calculations are conducted with TeraChem using a single core of Intel Xeon X5570 CPU clocked at 2.93 GHz and one NVIDIA GeForce GTX TITAN GPU. XMS-CASPT2 calculations are conducted with BAGEL on a single core of the faster Intel Xeon E5-2643 CPU clocked at 3.40 GHz.

Close modal

The formal scaling of CASPT2 calculations with a constant number of occupation orbitals is O(N6) with a traditional implementation93 and O(N4) with the most efficient tensor-hypercontracted algorithm currently available.17 In Fig. 3(a), we see that XMS-CAS(2,2)PT2 has an observed empirical scaling for this test set of O(N3.83). The reduced scaling is partly due to the use of density fitting79 and frozen core approximations in BAGEL. For trans-PSB3 + 30 waters (microsolvation sphere r = 3.75 Å, 104 atoms, 384 electrons, 460 orbitals), XMS-CAS(2,2)PT2 already takes about 9 h for a single gradient evaluation. XMS-CAS(2,2)PT2 calculations for the larger test cases were not possible on our machines due to large memory requirements [in addition to the long computation time, which can be inferred from Fig. 3(a)]. Our SSR implementation scales formally as O(N4), but integral and density screening are expected to decrease this substantially. We observe an empirical scaling of O(N1.76) for this test set, which is very close to the observed O(N1.69) scaling of RTDDFT in TeraChem. From trans-PSB3 (14 atoms, 44 electrons, and 70 orbitals) to trans-PSB3 + 30 waters, the speedup of SSR vs XMS-CAS(2,2)PT2 grows rapidly from 3X to 56X. Assuming that we are doing MD simulation or geometry optimization with the tolerable speed of about 4 min/step (360 steps/day), the largest system that can be handled by XMS-CAS(2,2)PT2 is just trans-PSB3 (14 atoms), whereas for SSR, it is trans-PSB3 + 18 waters (68 atoms). The calculations in Fig. 3(a) used the 6-31G basis set because when larger basis sets (e.g., 6-31G*) are used, XMS-CASPT2 cannot even deal with trans-PSB3 + 24 waters (86 atoms), resulting in too few data points to fit a reliable empirical scaling. The same comparison obtained with 6-31G* is shown in Fig. S2 of the supplementary material.

To obtain a more comprehensive comparison between CPREKS and RTDDFT, we decomposed the timings for both methods, as shown in Fig. 3(b). The total run can be partitioned into three major parts: obtaining the energy, calculating orbital relaxation by solving coupled-perturbed equations, and evaluating gradients. For SSR, the energy is obtained from the REKS SCF procedure, while the TDDFT energy requires both the ground state KS SCF and Davidson iterations for the excitation energy. For the total calculation and each task, we plotted the ratio of SSR time to RTDDFT time as a function of system size. The ratio of all components tends to fluctuate around a constant, indicating that the computational cost of SSR relative to TDDFT does not deteriorate as the system size grows. As previously analyzed, each REKS SCF iteration involves two RKS (restricted Kohn–Sham) Fock builds and two UKS (unrestricted Kohn–Sham) builds, which is equivalent to about six RKS iterations. Each CPREKS iteration involves two RKS- and two UKS-like Fock-like builds, which is equivalent to about six CPKS iterations. These match the benchmarking results (black triangles and green triangles). Similar trends are observed for the gradient evaluations. The times for obtaining energies are similar because REKS requires only the REKS SCF, while TDDFT requires both the KS SCF and Davidson iterations. The ratio for the total time of solving the CP-equations is much larger than 6. This is because the CPREKS equations are harder to solve than the CPKS equation, requiring more iterations to converge to the same convergence threshold (a norm of residue 10−6 a.u.; see Table S32 of the supplementary material). In summary, the runtime of SSR is 4-6X that of RTDDFT, regardless of system size (given the same number of iterations are used to solve the CP equations).

The primary computational parts of the SSR gradient evaluation can be divided as follows: (1) solving the CPREKS equations and evaluating (2) the gradient of the Coulomb J matrix, (3) the gradient of the exchange K matrix, (4) the gradient of the XC potential, and (5) the overlap gradient. Figure 4 (left panel) shows the GPU + CPU time taken by each of these contributions. CPREKS is the most expensive part and takes 60%–75% of the total gradient evaluation time. Decomposition of the CPREKS timing is discussed below. The K gradient has the worst empirical scaling of O(N2.00). This is similar to what was previously reported in the CASSCF gradient implementation,27 and the reason has also been discussed. The XC gradient has the best scaling of O(N1.60), slightly better than the J gradient [O(N1.70)]. However, it has a significantly larger prefactor than J and K gradients. This is because the evaluation of functional quantities at each quadrature point is done on the CPU rather than on the GPU. The overlap gradient term refers to both the overlap derivatives and the terms contracted with it. The latter [Q(2)X and Q(2)Δ matrices; see Eqs. (35) and (44)] is actually the computationally dominant part. As discussed in Sec. III, the evaluation of these quantities is very similar to the AY product for CPREKS, so the scalings are also similar.

FIG. 4.

Decomposition of the SSR runtime. Computations were performed at the SSR/ωPBEh/6-31G level of theory. The radius of solvent water cluster starts at 2 Å and increases in increments of 1 Å up to 9 Å. Timings were obtained with TeraChem using 1 NVIDIA GeForce GTX TITAN GPU. Left panel: Timings (in s) for gradient evaluation for trans-PSB3 and a series of microsolvated trans-PSB3 molecules. Timings for the whole process of solving the CPREKS equation, the terms described in Eq. (59) (J gradient), Eq. (60) (K gradient), and Eq. (61) (XC gradient), the term related to the gradient of the overlap matrix (S gradient), and the total time for solving CPREKS and gradient evaluation. Right panel: Timings (in s) for one iteration of CPREKS evaluation. Timings for the terms described in Eq. (49) (J), Eq. (50) (K), and Eq. (52) (fxc) and the remaining time spent on linear algebra (LA).

FIG. 4.

Decomposition of the SSR runtime. Computations were performed at the SSR/ωPBEh/6-31G level of theory. The radius of solvent water cluster starts at 2 Å and increases in increments of 1 Å up to 9 Å. Timings were obtained with TeraChem using 1 NVIDIA GeForce GTX TITAN GPU. Left panel: Timings (in s) for gradient evaluation for trans-PSB3 and a series of microsolvated trans-PSB3 molecules. Timings for the whole process of solving the CPREKS equation, the terms described in Eq. (59) (J gradient), Eq. (60) (K gradient), and Eq. (61) (XC gradient), the term related to the gradient of the overlap matrix (S gradient), and the total time for solving CPREKS and gradient evaluation. Right panel: Timings (in s) for one iteration of CPREKS evaluation. Timings for the terms described in Eq. (49) (J), Eq. (50) (K), and Eq. (52) (fxc) and the remaining time spent on linear algebra (LA).

Close modal

Each CPREKS iteration can be decomposed into four computationally important parts: Coulomb matrix J construction, exchange matrix K construction, XC second functional derivative fxc evaluation, and Linear algebra (LA), as shown in Fig. 4 (right panel). LA scales the worst because the dominant part is to construct the six auxiliary density matrices RLσ (L = 1, 2, σ = α; L = 3, 5, σ = α, β). Each construction involves two matrix–matrix products, requiring O(N3) operations. However, linear algebra has a much smaller prefactor compared with integral evaluation, so its cost only becomes significant when the system size is very large. For our benchmark systems, LA is only the second most expensive part even for the largest molecule with ∼2500 basis functions. For small molecules, the most expensive part is fxc because of the large prefactor caused by the CPU evaluation of functional values at quadrature grid points. For large molecules, however, K is the bottleneck because of the scaling.

In order to leverage the fast single precision arithmetic on GPUs, we used mixed precision (MP) for the calculation of all two-electron integrals related to both CPREKS and SSR gradient evaluations. Our REKS SCF is coded with dynamic precision59 using a single/double precision (SP/DP) threshold δspdp, which is dynamically adjusted according to the DIIS error. Upon convergence, the threshold δspdp is fixed and used throughout the CPREKS and gradient evaluation. We found that this mixed precision treatment improves the efficiency without hampering the accuracy and stability of SSR gradient calculation. Figure 5 compares the convergence behavior of CPREKS using mixed and double precision. A relatively strict convergence threshold of ||r||2 = 10−6 is used. In all cases, convergence behavior is identical for mixed and double precision integration until very small residual values well below the convergence threshold. The accuracy of the gradient evaluation is shown in Fig. 6. The maximum unsigned difference of all gradients entries calculated by MP and DP is plotted as a function of system size. This error stays well below 10−4 Hartree/Bohr for all systems, sufficient for geometry optimization and molecular dynamics. The speedup achieved by mixed precision is also shown in the same graph. The overall speedup for gradient evaluation is about 2.5X. This result is similar to previous reports for ground state HF/KS calculations.59 The XC gradient has the least MP speedup because a significant portion of the XC gradient evaluation is done on the CPU, which always uses double precision in our implementation. In contrast, the use of mixed precision leads to speedups of more than 3X for J and K gradients, which are almost entirely evaluated on the GPU.

FIG. 5.

Plot of mixed and double precision (MP and DP) convergence behavior for the SI-SSR/ωPBEh/6-31G CP equation of three of the benchmark systems. The convergence threshold of 10−6 is indicated with a straight black line. In all cases, convergence behavior is identical for mixed and double precision integration until very small residual values well below the convergence threshold. Timings were obtained with TeraChem using a single core of Intel Xeon X5570 CPU clocked at 2.93 GHz and one NVIDIA GeForce GTX TITAN GPU.

FIG. 5.

Plot of mixed and double precision (MP and DP) convergence behavior for the SI-SSR/ωPBEh/6-31G CP equation of three of the benchmark systems. The convergence threshold of 10−6 is indicated with a straight black line. In all cases, convergence behavior is identical for mixed and double precision integration until very small residual values well below the convergence threshold. Timings were obtained with TeraChem using a single core of Intel Xeon X5570 CPU clocked at 2.93 GHz and one NVIDIA GeForce GTX TITAN GPU.

Close modal
FIG. 6.

Plot of the mixed to double precision (MP to DP) speedups and the error for the SI-SSR/ωPBEh/6-31G second excited state energy gradient calculation of trans-PSB3 and a series of solvated trans-PSB3 molecules. Speedups for the whole process of solving the CPREKS equation, the terms described in Eq. (59) (J gradient), Eq. (60) (K gradient), Eq. (61) (XC gradient), S gradient, and the total time for solving CPREKS and gradient evaluation. Error refers to the maximum unsigned difference of all gradient entries calculated by MP and DP. The radius of solvent water cluster starts at 2 Å and increases in increments of 1 Å up to 9 Å. Timings were obtained with TeraChem using a single core of Intel Xeon X5570 CPU clocked at 2.93 GHz and one NVIDIA GeForce GTX TITAN GPU.

FIG. 6.

Plot of the mixed to double precision (MP to DP) speedups and the error for the SI-SSR/ωPBEh/6-31G second excited state energy gradient calculation of trans-PSB3 and a series of solvated trans-PSB3 molecules. Speedups for the whole process of solving the CPREKS equation, the terms described in Eq. (59) (J gradient), Eq. (60) (K gradient), Eq. (61) (XC gradient), S gradient, and the total time for solving CPREKS and gradient evaluation. Error refers to the maximum unsigned difference of all gradient entries calculated by MP and DP. The radius of solvent water cluster starts at 2 Å and increases in increments of 1 Å up to 9 Å. Timings were obtained with TeraChem using a single core of Intel Xeon X5570 CPU clocked at 2.93 GHz and one NVIDIA GeForce GTX TITAN GPU.

Close modal

It is worth noting that even though δspdp is fixed for gradient evaluation, successive iterations of CPREKS CG take less computation time than the first. As shown in Fig. 7, for all three systems displayed, the J and K build times decrease drastically as the CG iteration proceeds. This behavior is different from early TDDFT implementations, where all Davidson iterations take roughly the same amount of time. As CPREKS iterations proceed, the trial vector Y becomes sparser. J and K builds exploit the sparsity of Y by screening out small integrals. For XC integrals [Eq. (53)], however, the screening is always based only on the reference density (Pα, Pβ) rather than the density to be contracted (Dα, Dβ). Therefore, the screening is irrelevant to the sparsity of the Y matrix, and the computation time remains constant for the XC gradient. Recent work has reformulated TDDFT iterations to benefit from a similar sparsity by using unnormalized correction vectors in the Davidson iterations.94,95

FIG. 7.

Plot of relative time taken for each CG step of solving the CPREKS equation for the SSR/ωPBEh/6-31G CP equation of three of the benchmark systems. For each timing term, the relative time is calculated as the ratio of the time taken by the current step and that of the first CG step. Timings were obtained with TeraChem using a single core of Intel Xeon X5570 CPU clocked at 2.93 GHz and one NVIDIA GeForce GTX TITAN GPU.

FIG. 7.

Plot of relative time taken for each CG step of solving the CPREKS equation for the SSR/ωPBEh/6-31G CP equation of three of the benchmark systems. For each timing term, the relative time is calculated as the ratio of the time taken by the current step and that of the first CG step. Timings were obtained with TeraChem using a single core of Intel Xeon X5570 CPU clocked at 2.93 GHz and one NVIDIA GeForce GTX TITAN GPU.

Close modal

Given a molecule described with a certain number of basis functions, the cost of XC integrals is proportional to the density of the grid. Thus, it is desirable if coarse grids can already generate a moderately accurate result. Previous studies of TDDFT implementation on GPUs demonstrated that using an ultra-sparse grid for the TDDFT portion of computation can result in considerable time saving with little or no loss of accuracy.44,96,97 Here, we found the same phenomenon for SSR (Table I). For both molecules, the second SSR state gradients from the sparsest grid and the densest grid only differ in the order of 10−4 hartree/bohr, while the speedups are over 20X.

TABLE I.

SSR/ωPBEh gradient timings and max magnitude entry of the second SSR state gradient using increasing dense quadrature grids.

GridPointsPts/atombTime(s)aE (Hartree/Bohr)
PSB3 (14 atoms) 
14 062 1 004 10.37 0.047 414 
39 366 2 811 16.88 0.047 719 
87 942 6 281 27.10 0.047 634 
161 543 11 538 40.78 0.047 622 
414 161 29 582 93.18 0.047 639 
1 060 891 75 777 222.74 0.047 636 
PSB3 + 34 waters (116 atoms) 
110 381 951 426.1 −0.075 695 
306 783 2 644 686.0 −0.075 036 
692 046 5 965 1 211.9 −0.075 159 
1 268 577 10 936 1 971.0 −0.075 156 
3 258 711 28 092 3 829.6 −0.075 140 
8 335 026 71 853 10 461.7 −0.075 144 
GridPointsPts/atombTime(s)aE (Hartree/Bohr)
PSB3 (14 atoms) 
14 062 1 004 10.37 0.047 414 
39 366 2 811 16.88 0.047 719 
87 942 6 281 27.10 0.047 634 
161 543 11 538 40.78 0.047 622 
414 161 29 582 93.18 0.047 639 
1 060 891 75 777 222.74 0.047 636 
PSB3 + 34 waters (116 atoms) 
110 381 951 426.1 −0.075 695 
306 783 2 644 686.0 −0.075 036 
692 046 5 965 1 211.9 −0.075 159 
1 268 577 10 936 1 971.0 −0.075 156 
3 258 711 28 092 3 829.6 −0.075 140 
8 335 026 71 853 10 461.7 −0.075 144 
a

SSR/ωPBEh gradient timings include the time for solving CPREKS equations and for gradient calculation.

b

Number of points/atom refers to the pruned grid for TeraChem.

To test the quality of SSR gradients and NAC vectors, we studied the potential energy surfaces of cationic PSB3 near the MECIs. The branching plane is defined with the commonly used vectors, the gradient difference, g, and the interstate coupling vector, h.1 These vectors are closely related to the GDV and DCV defined above: the g vector is the same as the GDV, while hIJ = (EJEI)HIJ. The differences in the shapes of the potential energy surfaces calculated with different methods are schematically shown in Figs. 810. We first present the energy gaps along the loop (radius 0.002 Å) within the branching plane spanned by x¯/y¯ (an orthonormalized form of g/h) in Fig. 8. The details about the transformation from g/h to x¯/y¯ are available in Sec. IV and Text S3 of the supplementary material. Similar plots have been previously presented for other methods.87,98 However, we observed some new phenomena here. The same method is used for both optimizing the MECI and plotting the loop for every method in Fig. 7. This is different from an earlier study98 where the SSR loop was computed around the MECI optimized with another method since direct optimization of MECIs with SSR was not available. When computed fully consistently, the SSR loop curves show little dependence on the XC functional used. All SSR curves are very smooth with an oscillation amplitude of around 0.002 kcal/mol, which is significantly smaller than MS-CASPT2 (0.06 kcal/mol) but slightly larger than the XMS-CASPT2 results (0.0004 kcal/mol).

FIG. 8.

S0 and S1 energy gaps around the 0.002 Å radius loop centered at the MECIs, with MS-CASPT2, XMS-CASPT2, SSR/BH&HLYP, SSR/ωPBEh, and SSR/HF. The loop is on the branching plane. The x¯/y¯ vectors are orthonormalized forms of g/h.

FIG. 8.

S0 and S1 energy gaps around the 0.002 Å radius loop centered at the MECIs, with MS-CASPT2, XMS-CASPT2, SSR/BH&HLYP, SSR/ωPBEh, and SSR/HF. The loop is on the branching plane. The x¯/y¯ vectors are orthonormalized forms of g/h.

Close modal
FIG. 9.

The MECI structures of PSB3 and the corresponding x¯ and y¯ vectors (orthonormalized g/h), calculated with (1) SSR/BH&HLYP, (2) SSR/ωPBEh, (3) SSR/HF, (4) SA-2-CAS(6,6)PT2, and (5) XMS-CAS(6,6)PT2. The vectors g and h are plotted with blue and red arrows, respectively.

FIG. 9.

The MECI structures of PSB3 and the corresponding x¯ and y¯ vectors (orthonormalized g/h), calculated with (1) SSR/BH&HLYP, (2) SSR/ωPBEh, (3) SSR/HF, (4) SA-2-CAS(6,6)PT2, and (5) XMS-CAS(6,6)PT2. The vectors g and h are plotted with blue and red arrows, respectively.

Close modal
FIG. 10.

Potential energy surfaces near the MECIs of PSB3. S0 and S1 potential energies on the branching plane with (1) SSR/BH&HLYP, (2) SSR/ωPBEh, (3) SSR/HF, (4) SA-2-CAS(6,6)PT2, SS-SR, and (5) XMS-CAS(6,6)PT2, SS-SR. The axes are orthonormalized forms of g/h vectors.

FIG. 10.

Potential energy surfaces near the MECIs of PSB3. S0 and S1 potential energies on the branching plane with (1) SSR/BH&HLYP, (2) SSR/ωPBEh, (3) SSR/HF, (4) SA-2-CAS(6,6)PT2, SS-SR, and (5) XMS-CAS(6,6)PT2, SS-SR. The axes are orthonormalized forms of g/h vectors.

Close modal

We then compared the topology of the MECIs obtained with different methods characterized by g¯/h¯ vectors88 and the orthogonal version of g/h vectors. We used the topography parameters as suggested by Yarkony,89 including the descriptors for the sharpness and asymmetry of the intersection,

(62)
(63)

where g, h are the length of the orthogonal branching plane vectors g¯/h¯; the tilt parameters

(64)
(65)

where

(66)

and their length-scaled forms

(67)

The results are summarized in Table II. Based on the tilt parameters (sx, sy), XMS-CASPT2 predicts the MECI to be most peaked (a perfectly peaked intersection would have vanishing sx and sy). Both (X)MS-CASPT2 methods are more peaked than all three REKS methods tested. Previous studies7,90 showed that MS-CASPT2 predicts the PSB3 MECI to be more peaked than CASSCF. Based on Δgh values, the (X)MS-CASPT2 MECIs are more asymmetric than all the REKS methods. The orthonormal x¯/y¯ vectors for different methods are compared in Fig. 9.

TABLE II.

Comparison of intersection topography for MECIcen of PSB3 calculated with different methods.

Δghdghsxsysx/gsy/g
SSR/BH&HLYP −0.50 0.07 0.06 −0.12 1.64 −1.97 
SSR/ωPBEh 0.72 0.03 −0.01 0.11 −0.17 9.76 
SSR/HF 0.88 0.05 −0.04 −0.09 −0.80 −7.57 
XMS-CASPT2 1.00 0.04 0.01 −0.01 0.35 −58.37 
MS-CASPT2 1.00 0.05 0.05 0.00 0.98 −11.84 
Δghdghsxsysx/gsy/g
SSR/BH&HLYP −0.50 0.07 0.06 −0.12 1.64 −1.97 
SSR/ωPBEh 0.72 0.03 −0.01 0.11 −0.17 9.76 
SSR/HF 0.88 0.05 −0.04 −0.09 −0.80 −7.57 
XMS-CASPT2 1.00 0.04 0.01 −0.01 0.35 −58.37 
MS-CASPT2 1.00 0.05 0.05 0.00 0.98 −11.84 

Finally, we present the cone plots of the MECIs obtained with different methods in Fig. 10. The SSR surfaces are all smooth with the correct double cone topography for a MECI. The PESs of all methods are scanned around their respective MECIs. As reported in previous work of Shiozaki et al., MS-CASPT2 surfaces calculated with BAGEL have visible large fluctuations, which are greatly alleviated in XMS-CASPT2. The branching planes computed by REKS methods are all very smooth with the correct double cone topography. Among the three XC functionals, the SSR/HF PES looks most similar to that of XMS-CASPT2 (SS-SR, BAGEL). This may be because of the formal similarities of SI-SA-REKS/HF and CASSCF.

Equilibrium structures of trans-PSB3 in the ground and first singlet excited states were optimized using SSR/ωPBEh, SSR/BH&HLYP, and SSR/HF with the cc-pVDZ basis set. The optimized structures are depicted in Tables S14–S31 of the supplementary material (including Cartesian coordinates), and the relative energies are summarized in Table III together with SA3-MS-CASTP2(6,6) and SA3-CASSCF(6,6) reference results from previous work.90 

TABLE III.

Relative energies (kcal/mol) for minima and conical intersections of PSB3 computed with SA3-MS-CASPT2(6,6)/6-31G*, SA3-CASSCF(6,6)/6-31G, SSR/HF/cc-pVDZ, SSR/ωPBEh(2,2)/cc-pVDZ, and SSR/BH&HLYP/cc-pVDZ. SA3-MS-CASPT2(6,6)/6-31G* and SA3-MS-CASSCF(6,6)/6-31G results are from previous work.90 In each case, the zero of energy corresponds to the S0trans-PSB3 minimum.

StructureSA3-MS-CASSCFSA3-MS-CASPT2SSR/ωPBEhSSR/BH&HLYPSSR/HF
trans-PSB3 0/113.2a 0/98.3a 0/95.4a 0/97.7a 0/114.2a 
S1min_CenL 95.2 85.1 64.2 66.0 92.0 
MECICen 66.1 60.3 66.7 68.7 58.1 
MECICN 79.1 87.8   90.6 
MECITer 108.9 82.4   77.9 
cis-PSB3 3.0/113.0a 3.0/97.2a 3.0/96.2a 3.3/98.9a 4.1/113.8a 
StructureSA3-MS-CASSCFSA3-MS-CASPT2SSR/ωPBEhSSR/BH&HLYPSSR/HF
trans-PSB3 0/113.2a 0/98.3a 0/95.4a 0/97.7a 0/114.2a 
S1min_CenL 95.2 85.1 64.2 66.0 92.0 
MECICen 66.1 60.3 66.7 68.7 58.1 
MECICN 79.1 87.8   90.6 
MECITer 108.9 82.4   77.9 
cis-PSB3 3.0/113.0a 3.0/97.2a 3.0/96.2a 3.3/98.9a 4.1/113.8a 
a

Vertical excitation energy of the first excited state relative to S0-trans.

We started all optimizations from previous MS-CASPT2-optimized structures and found the corresponding stationary structures for each method. The S1 minimum (S1min) structures found with all SSR methods collapse to a structure with a single twist around the central C=C bond, which is similar to S1min_cen of CASSCF, rather than the S1min_cenL structure of MS-CASPT2. Nevertheless, we will show that SSR resembles MS-CASPT2 more than CASSCF for MECI calculations.

The three PSB3 S0/S1 MECIs of PSB3, MECIcen, MECICN, and MECITer, correspond to structures that twist mainly around the central C=C bond, the C=N bond, and the terminal C=C bond, respectively (structures available in Tables S14–S31 of the supplementary material). As shown in Table III, all three SSR methods agree on the relative stability of these three structures, E(MECIcen) < E (MECITer) < E(MECICN). This also agrees with the result of MS-CASPT2, whereas CASSCF predicts that the least energetically favorable structure is MECITer. More comparisons of the geometries further indicate that SSR goes beyond CASSCF and generates results that resemble MS-CASPT2. For the most stable MECIcen, the length of the twisted central C=C is similar among all SSR methods and MS-CASPT2 (1.447 Å–1.462 Å), whereas CASSCF MECIcen has a significantly longer bond length (1.485 Å). This structural difference is even more prominent in bond length alternation (BLA), as shown in Fig. 11. Here, we adopted the definition of the BLA coordinate from previous work,90 

(68)

where the atom labels are defined in Fig. 2. From trans-PSB3 to MECIcen, the BLA is decreased in all cases. For MS-CASPT2, the change in the BLA, ΔBLA = −0.038 Å. All SSR methods slightly underestimate the decrease, with ΔBLA = −0.024 Å, −0.014 Å, and −0.009 Å using the HF, ωPBEh, and BH&HLYP functionals, respectively. In contrast, CASSCF significantly overestimates this decrease with ΔBLA = −0.127 Å. Similar trends in BLA changes are also seen in MECICN.

FIG. 11.

Comparison of MECI structures calculated with MS-CASPT2, SSR/BH&HLYP, SSR/HF, SSR/ωPBEh, and CASSCF. (a) Change in the BLA90 from S0-trans to various MECI structures. (b) Comparison of the torsion dihedrals around the central-right C—C single bond (ϕcenR) and the C=N bond (ϕCN) for MECICN geometries optimized with different methods. Circles indicate the location of the geometry in the ϕcenR–ϕCN plane. S0-trans (black circle) is shown as a reference. The inset shows the superimposition of different structures, with the same coloring mode as the circles to indicate different methods: CASSCF (gray), MS-CASPT2 (red), SSR/HF (blue), SSR/BH&HLYP (orange), and SSR/ωPBEh (green).

FIG. 11.

Comparison of MECI structures calculated with MS-CASPT2, SSR/BH&HLYP, SSR/HF, SSR/ωPBEh, and CASSCF. (a) Change in the BLA90 from S0-trans to various MECI structures. (b) Comparison of the torsion dihedrals around the central-right C—C single bond (ϕcenR) and the C=N bond (ϕCN) for MECICN geometries optimized with different methods. Circles indicate the location of the geometry in the ϕcenR–ϕCN plane. S0-trans (black circle) is shown as a reference. The inset shows the superimposition of different structures, with the same coloring mode as the circles to indicate different methods: CASSCF (gray), MS-CASPT2 (red), SSR/HF (blue), SSR/BH&HLYP (orange), and SSR/ωPBEh (green).

Close modal

Inspection of the dihedral angles also shows that SSR methods produce MECI structures more similar to MS-CASPT2 than CASSCF. The CASSCF structure only has a 90° twisting around C=N, and the –NH2 group is perfectly flat with sp2 hybridization. In contrast, the MECICN structures produced by all SSR methods and MS-CASPT2 feature two twists around the right C—C single bond and the C—N bond, respectively (Fig. 11). The –NH2 group also shows an improper torsion of about 128°–153°.

It is worth noting that for ground state minima, SSR results are sometimes closer to CASSCF. For instance, for S0-trans and S0-cis, SSR-HF and CASSCF have very similar structures and energetics, which is expected as no XC functional other than exact exchange is used in SSR-HF. Tables of the bond length and dihedrals of structures calculated by all methods are available in the supplementary material, Tables S33 and S34.

In this work, we demonstrated that by formulating the analytical gradients of individual state energies of SSR entirely in terms of traces of matrix products, exploiting the sparsity in the atomic orbital basis for two-electron integrals, utilizing mixed precision techniques to benefit from the fast single precision arithmetic, and performing DFT quadrature on ultra-sparse grids, analytic energy derivatives for the SSR method10,20,21,99 can be efficiently implemented on GPUs.

The formalism was tested by calculating analytic energy gradients of the S0 and S1 states of molecular clusters with a trans-PSB3 molecule embedded in water clusters of increasing size. The benchmark calculations are used to demonstrate the feasibility of the developed formalism. We achieve the same scaling as the single reference TDDFT method, but SSR is able to correctly describe the double cone topography of conical intersections. The O(N1.76) scaling of SSR significantly outperforms the observed scaling of MS-CASPT2. We showed that our implementation can efficiently compute SSR gradients for systems with more than 5000 orbitals. Analysis of PSB3 MECIs shows that SSR produces geometric and energetic profiles similar to MS-CASPT2, especially when MS-CASPT2 and SA-CASSCF disagree.

As the SSR method affords mean-field computational cost while providing a MS-CASPT2 quality description of conical intersections, it can be applied to study excited state dynamics of large molecular systems. We have already successfully applied SSR nonadiabatic dynamics to investigate the photoisomerization mechanism of the retinal protonated Schiff base (RPSB) and its effects on the activation of channelrhodopsin 2.100 We also used ab initio multiple spawning with SSR to simulate the nonadiabatic dynamics of the ultrafast RPSB photoisomerization in bacteriorhodopsin and fully characterized the I fluorescent state.101 In the future, development of SSR with larger active spaces will enable excited state dynamics studies of even more molecular and biomolecular systems.

See the supplementary material for the procedure for orthonormalization of g, h vectors; relation between E1SSR, E2SSR, and ESA-REKS; the details about the transformation of FREKS to the AO basis; geometries of optimized PSB3 MECIs; geometries of optimized PSB3 S1 minima; geometries of optimized PSB3 S0 minima; geometries of solvated PSB3 used for performance testing; structures of the benchmark systems for XMS-CASPT2; SSR gradient performance test with 6-31G* basis; the number of basis functions for benchmark systems; geometries of molecules for XMS-CASPT2 performance test; and bond lengths and dihedral angles for optimized PSB3 structures.

This work was supported by the Office of Naval Research (Grant Nos. N00014-18-1-2659 and N00014-18-1-2624).

The data that support the findings of this study are available within the article and its supplementary material.

1.
D. R.
Yarkony
, “
Nonadiabatic quantum chemistry: Past, present, and future
,”
Chem. Rev.
112
,
481
(
2011
).
2.
H.
Nakamura
,
Nonadiabatic Transition: Concepts, Basic Theories and Applications
(
World Scientific
,
2002
).
3.
B. G.
Levine
and
T. J.
Martínez
, “
Isomerization through conical intersections
,”
Annu. Rev. Phys. Chem.
58
,
613
(
2007
).
4.
M. A.
Robb
,
F.
Bernardi
, and
M.
Olivucci
, “
Conical intersections as a mechanistic feature of organic photochemistry
,”
Pure Appl. Chem.
67
,
783
(
1995
).
5.
M.
Garavelli
,
F.
Bernardi
,
M.
Olivucci
,
T.
Vreven
,
S.
Klein
,
P.
Celani
, and
M. A.
Robb
, “
Potential-energy surfaces for ultrafast photochemistry static and dynamic aspects
,”
Faraday Discuss.
110
,
51
(
1998
).
6.
T.
Mori
,
K.
Nakano
, and
S.
Kato
, “
Conical intersections of free energy surfaces in solution: Effect of electron correlation on a protonated Schiff base in methanol solution
,”
J. Chem. Phys.
133
,
064107
(
2010
).
7.
S.
Gozem
,
M.
Huntress
,
I.
Schapiro
,
R.
Lindh
,
A. A.
Granovsky
,
C.
Angeli
, and
M.
Olivucci
, “
Dynamic electron correlation effects on the ground state potential energy surface of a retinal chromophore model
,”
J. Chem. Theory Comput.
8
,
4069
(
2012
).
8.
S.
Gozem
,
F.
Melaccio
,
R.
Lindh
,
A. I.
Krylov
,
A. A.
Granovsky
,
C.
Angeli
, and
M.
Olivucci
, “
Mapping the excited state potential energy surface of a retinal chromophore model with multireference and equation-of-motion coupled-cluster methods
,”
J. Chem. Theory Comput.
9
,
4495
(
2013
).
9.
B. G.
Levine
,
C.
Ko
,
J.
Quenneville
, and
T. J.
MartÍnez
, “
Conical intersections and double excitations in time-dependent density functional theory
,”
Mol. Phys.
104
,
1039
(
2006
).
10.
M.
Huix-Rotllant
,
M.
Filatov
,
S.
Gozem
,
I.
Schapiro
,
M.
Olivucci
, and
N.
Ferré
, “
Assessment of density functional theory for describing the correlation effects on the ground and excited state potential energy surfaces of a retinal chromophore model
,”
J. Chem. Theory Comput.
9
,
3917
(
2013
).
11.
S. L.
Li
,
A. V.
Marenich
,
X.
Xu
, and
D. G.
Truhlar
, “
Configuration interaction-corrected Tamm–Dancoff approximation: A time-dependent density functional method with the correct dimensionality of conical intersections
,”
J. Phys. Chem. Lett.
5
,
322
(
2014
).
12.
J.
Finley
,
P.-Å.
Malmqvist
,
B. O.
Roos
, and
L.
Serrano-Andrés
, “
The multi-state CASPT2 method
,”
Chem. Phys. Lett.
288
,
299
(
1998
).
13.
T.
Shiozaki
,
W.
Győrffy
,
P.
Celani
, and
H.-J.
Werner
, “
Communication: Extended multi-state complete active space second-order perturbation theory: Energy and nuclear gradients
,”
J. Chem. Phys.
135
,
081106
(
2011
).
14.
A. A.
Granovsky
, “
Extended multi-configuration quasi-degenerate perturbation theory: The new approach to multi-state multi-reference perturbation theory
,”
J. Chem. Phys.
134
,
214113
(
2011
).
15.
E. G.
Hohenstein
,
R. M.
Parrish
, and
T. J.
Martínez
, “
Tensor hypercontraction density fitting. I. Quartic scaling second-order Moller-Plesset perturbation theory
,”
J. Chem. Phys.
137
,
044103
(
2012
).
16.
C.
Song
and
T. J.
Martínez
, “
Reduced scaling extended multi-state CASPT2 (XMS-CASPT2) using supporting subspaces and tensor hyper-contraction
,”
J. Chem. Phys.
152
,
234113
(
2020
).
17.
C.
Song
and
T. J.
Martínez
, “
Reduced scaling CASPT2 using supporting subspaces and tensor hyper-contraction
,”
J. Chem. Phys.
149
,
044108
(
2018
).
18.
M.
Filatov
and
S.
Shaik
, “
A spin-restricted ensemble-referenced Kohn-Sham method and its application to diradicaloid situations
,”
Chem. Phys. Lett.
304
,
429
(
1999
).
19.
I. d. P. R.
Moreira
,
R.
Costa
,
M.
Filatov
, and
F.
Illas
, “
Restricted ensemble-referenced Kohn-Sham versus broken symmetry approaches in density functional theory: Magnetic coupling in Cu binuclear complexes
,”
J. Chem. Theory Comput.
3
,
764
(
2007
).
20.
M.
Filatov
, “
Spin-restricted ensemble-referenced Kohn-Sham method: Basic principles and application to strongly correlated ground and excited states of molecules
,”
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
5
,
146
(
2015
).
21.
M.
Filatov
, “
Ensemble DFT approach to excited states of strongly correlated molecular systems
,” in
Density-Functional Methods for Excited States
, edited by
N.
Ferrè
,
M.
Filatov
, and
M.
Huix-Rotllant
(
Springer
,
Heidelberg
,
2016
), Vol. 368, p.
97
.
22.
M.
Filatov
,
F.
Liu
, and
T. J.
Martínez
, “
Analytical derivatives of the individual state energies in ensemble density functional theory method. I. General formalism
,”
J. Chem. Phys.
147
,
034113
(
2017
).
23.
M.
Garavelli
,
P.
Celani
,
F.
Bernardi
,
M. A.
Robb
, and
M.
Olivucci
, “
The C5H6NH2+ protonated Shiff base: An ab initio minimal model for retinal photoisomerization
,”
J. Am. Chem. Soc.
119
,
6891
(
1997
).
24.
B. H.
Lengsfield
 III
,
P.
Saxe
, and
D. R.
Yarkony
, “
On the evaluation of nonadiabatic coupling matrix elements using SA-MCSCF/CI wave functions and analytic gradient methods. I
,”
J. Chem. Phys.
81
,
4549
(
1984
).
25.
B.
Lengsfield
 III
and
D. R.
Yarkony
, “
Nonadiabatic interactions between potential energy surfaces: Theory and applications
,”
Adv. Chem. Phys.
82
,
1
(
1992
).
26.
I.
Fdez
,
M. G.
Galván Delcey
,
T. B.
Pedersen
,
F.
Aquilante
, and
R.
Lindh
, “
Analytical state-average complete-active-space self-consistent field nonadiabatic coupling vectors: Implementation with density-fitted two-electron integrals and application to conical intersections
,”
J. Chem. Theory Comput.
12
,
3636
(
2016
).
27.
J. W.
Snyder
,
E. G.
Hohenstein
,
N.
Luehr
, and
T. J.
Martínez
, “
An atomic orbital-based formulation of analytical gradients and nonadiabatic coupling vector elements for the state-averaged complete active space self-consistent field method on graphical processing units
,”
J. Chem. Phys.
143
,
154107
(
2015
).
28.
M.
Filatov
,
T. J.
Martínez
, and
K. S.
Kim
, “
Using the GVB Ansatz to develop ensemble DFT method for describing multiple strongly correlated electron pairs
,”
Phys. Chem. Chem. Phys.
18
,
21040
(
2016
).
29.
F. W.
Bobrowicz
and
W. A.
Goddard
 III
, “
The self-consistent field equations for generalized valence bond and open-shell Hartree-Fock wave functions
,” in
Methods of Electronic Structure Theory
, edited by
H. F.
Schaefer
 III
(
Plenum
,
New York
,
1977
).
30.
W. A.
Goddard
 III
and
L. B.
Harding
, “
The description of chemical bonding from ab initio calculations
,”
Annu. Rev. Phys. Chem.
29
,
363
(
1978
).
31.
S.
Seritan
,
C.
Bannwarth
,
B. S.
Fales
,
E. G.
Hohenstein
,
S. I. L.
Kokkila-Schumacher
,
N.
Luehr
,
J. W.
Snyder
,
C.
Song
,
A. V.
Titov
,
I. S.
Ufimtsev
, and
T. J.
Martínez
, “
TeraChem: Accelerating electronic structure and ab initio molecular dynamics with graphical processing units
,”
J. Chem. Phys.
152
,
224110
(
2020
).
32.
S.
Seritan
,
C.
Bannwarth
,
B. S.
Fales
,
E. G.
Hohenstein
,
C. M.
Isborn
,
S. I. L.
Kokkila-Schumacher
,
X.
Li
,
F.
Liu
,
N.
Luehr
,
J. W.
Snyder
, Jr.
,
C.
Song
,
A. V.
Titov
,
I. S.
Ufimtsev
,
L.-P.
Wang
, and
T. J.
Martinez
, “
TeraChem: A graphical processing unit-accelerated electronic structure package for large-scale ab initio molecular dynamics
,”
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
226
,
e1494
(
2020
).
33.
K.
Yasuda
, “
Two-electron integral evaluation on the graphics processor unit
,”
J. Comput. Chem.
29
,
334
(
2008
).
34.
A.
Asadchev
,
V.
Allada
,
J.
Felder
,
B. M.
Bode
,
M. S.
Gordon
, and
T. L.
Windus
, “
Uncontracted Rys quadrature implementation of up to G functions on graphical processing units
,”
J. Chem. Theory Comput.
6
,
696
(
2010
).
35.
J.
Kussmann
and
C.
Ochsenfeld
, “
Hybrid CPU/GPU integral engine for strong scaling ab initio methods
,”
J. Chem. Theory Comput.
13
,
3153
(
2017
).
36.
J.
Kalinowski
,
F.
Wennmohs
, and
F.
Neese
, “
Arbitrary angular momentum electron repulsion integrals with graphical processing units: Application to the resolution-of-the-identity Hartree-Fock method
,”
J. Chem. Theory Comput.
13
,
3160
(
2017
).
37.
A. D.
Becke
, “
A multicenter numerical integration scheme for polyatomic molecules
,”
J. Chem. Phys.
88
,
2547
(
1988
).
38.
I. S.
Ufimtsev
and
T. J.
Martínez
, “
Quantum chemistry on graphical processing units. 1. Strategies for two-electron integral evaluation
,”
J. Chem. Theory Comput.
4
,
222
(
2008
).
39.
I. S.
Ufimtsev
and
T. J.
Martinez
, “
Quantum chemistry on graphical processing units. 3. Analytical energy gradients, geometry optimization, and first principles molecular dynamics
,”
J. Chem. Theory Comput.
5
,
2619
(
2009
).
40.
I. S.
Ufimtsev
and
T. J.
Martinez
, “
Quantum chemistry on graphical processing units. 2. Direct self-consistent-field implementation
,”
J. Chem. Theory Comput.
5
,
1004
(
2009
).
41.
I. S.
Ufimtsev
and
T. J.
Martínez
, “
Graphical processing units for quantum chemistry
,”
Comput. Sci. Eng.
10
,
26
(
2008
).
42.
A. V.
Titov
,
I. S.
Ufimtsev
,
N.
Luehr
, and
T. J.
Martinez
, “
Generating efficient quantum chemistry codes for novel architectures
,”
J. Chem. Theory Comput.
9
,
213
(
2013
).
43.
N.
Luehr
,
I.
Ufimtsev
, and
T.
Martínez
, “
Dynamical quadrature grids: Applications in density functional calculations
,” in
GPU Computing Gems Emerald Edition
, edited by
W.-M.
Hwu
(
Morgan Kaufmann
,
Boston
,
2011
), p.
35
.
44.
C. M.
Isborn
,
N.
Luehr
,
I. S.
Ufimtsev
, and
T. J.
Martínez
, “
Excited-state electronic structure with configuration interaction singles and Tamm–Dancoff time-dependent density functional theory on graphical processing units
,”
J. Chem. Theory Comput.
7
,
1814
(
2011
).
45.
M.
Challacombe
,
E.
Schwegler
, and
J.
Almlöf
, “
Modern developments in Hartree-Fock theory: Fast methods for computing the Coulomb matrix
,” in
Computational Chemistry: Reviews of Current Trends
, edited by
J.
Lesczcynski
(
World Scientific
,
Singapore
,
1996
), Vol. 1, p.
53
.
46.
E.
Schwegler
and
M.
Challacombe
, “
Linear scaling computation of the Hartree–Fock exchange matrix
,”
J. Chem. Phys.
105
,
2726
(
1996
).
47.
E.
Schwegler
,
M.
Challacombe
, and
M.
Head-Gordon
, “
Linear scaling computation of the Fock matrix. II. Rigorous bounds on exchange integrals and incremental Fock build
,”
J. Chem. Phys.
106
,
9708
(
1997
).
48.
C.
Ochsenfeld
,
J.
Kussmann
, and
D. S.
Lambrecht
, “
Linear-scaling methods in quantum chemistry
,”
Rev. Comput. Chem.
23
,
1
(
2007
).
49.
B.
Doser
,
D. S.
Lambrecht
,
J.
Kussmann
, and
C.
Ochsenfeld
, “
Linear-scaling atomic orbital-based second-order Møller–Plesset perturbation theory by rigorous integral screening criteria
,”
J. Chem. Phys.
130
,
064107
(
2009
).
50.
B.
Doser
,
J.
Zienau
,
L.
Clin
,
D. S.
Lambrecht
, and
C.
Ochsenfeld
, “
A linear-scaling MP2 method for large molecules by rigorous integral-screening criteria
,”
Z. Phys. Chem.
224
,
397
(
2010
).
51.
J.
Kussmann
and
C.
Ochsenfeld
, “
Pre-selective screening for matrix elements in linear-scaling exact exchange calculations
,”
J. Chem. Phys.
138
,
134114
(
2013
).
52.
S. A.
Maurer
,
D. S.
Lambrecht
,
J.
Kussmann
, and
C.
Ochsenfeld
, “
Efficient distance-including integral screening in linear-scaling Møller-Plesset perturbation theory
,”
J. Chem. Phys.
138
,
014101
(
2013
).
53.
J.
Kussmann
and
C.
Ochsenfeld
, “
Preselective screening for linear-scaling exact exchange-gradient calculations for graphics processing units and general strong-scaling massively parallel calculations
,”
J. Chem. Theory Comput.
11
,
918
(
2015
).
54.
M.
Beuerle
,
J.
Kussmann
, and
C.
Ochsenfeld
, “
Screening methods for linear-scaling short-range hybrid calculations on CPU and GPU architectures
,”
J. Chem. Phys.
146
,
144108
(
2017
).
55.
T. H.
Thompson
and
C.
Ochsenfeld
, “
Distance-including rigorous upper bounds and tight estimates for two-electron integrals over long- and short-range operators
,”
J. Chem. Phys.
147
,
144101
(
2017
).
56.
T. H.
Thompson
and
C.
Ochsenfeld
, “
Integral partition bounds for fast and effective screening of general one-, two-, and many-electron integrals
,”
J. Chem. Phys.
150
,
044101
(
2019
).
57.
J. L.
Whitten
, “
Coulombic potential energy integrals and approximations
,”
J. Chem. Phys.
58
,
4496
(
1973
).
58.
J.
Almlöf
,
K.
Faegri
, and
K.
Korsell
, “
Principles for a direct SCF approach to LCAO–MO ab initio calculations
,”
J. Comput. Chem.
3
,
385
(
1982
).
59.
N.
Luehr
,
I. S.
Ufimtsev
, and
T. J.
Martínez
, “
Dynamic precision for electron repulsion integral evaluation on graphical processing units (GPUs)
,”
J. Chem. Theory Comput.
7
,
949
(
2011
).
60.
B. G.
Johnson
,
P. M. W.
Gill
, and
J. A.
Pople
, “
The performance of a family of density functional methods
,”
J. Chem. Phys.
98
,
5612
(
1993
).
61.
W.
Koch
and
M. C.
Holthausen
,
A Chemist’s Guide to Density Functional Theory
(
John Wiley & Sons
,
2015
).
62.
K.
Yasuda
, “
Accelerating density functional calculations with graphics processing unit
,”
J. Chem. Theory Comput.
4
,
1230
(
2008
).
63.
C. C. J.
Roothaan
, “
New developments in molecular orbital theory
,”
Rev. Mod. Phys.
23
,
69
(
1951
).
64.
P.
Pulay
, “
Convergence acceleration of iterative sequences. The case of SCF iteration
,”
Chem. Phys. Lett.
73
,
393
(
1980
).
65.
A.
Dreuw
and
M.
Head-Gordon
, “
Single-reference ab initio methods for the calculation of excited states of large molecules
,”
Chem. Rev.
105
,
4009
(
2005
).
66.
S.
Hirata
and
M.
Head-Gordon
, “
Time-dependent density functional theory within the Tamm–Dancoff approximation
,”
Chem. Phys. Lett.
314
,
291
(
1999
).
67.
F.
Furche
and
R.
Ahlrichs
, “
Adiabatic time-dependent density functional methods for excited state properties
,”
J. Chem. Phys.
117
,
7433
(
2002
).
68.
S.
Hirata
and
M.
Head-Gordon
, “
Time-dependent density functional theory for radicals: An improved description of excited states with substantial double excitation character
,”
Chem. Phys. Lett.
302
,
375
(
1999
).
69.
E. R.
Davidson
, “
Iterative calculation of a few of lowest eigenvalues and corresponding eigenvectors of large real-symmetric matrices
,”
J. Comput. Phys.
17
,
87
(
1975
).
70.
G. H.
Golub
and
C. F.
Van Loan
,
Matrix Computations
, 4th ed. (
Johns Hopkins University Press
,
Baltimore
,
2013
).
71.
H.-J.
Werner
,
P. J.
Knowles
,
G.
Knizia
,
F. R.
Manby
, and
M.
Schütz
, “
Molpro: A general-purpose quantum chemistry program package
,”
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
2
,
242
(
2012
).
72.
D.
Case
,
T.
Darden
,
T.
Cheatham
,
C.
Simmerling
,
J.
Wang
,
R.
Duke
,
R.
Luo
,
R.
Walker
,
W.
Zhang
, and
K.
Merz
, Amber 12,
University of California
,
2012
.
73.
J.
Wang
,
W.
Wang
,
P. A.
Kollman
, and
D. A.
Case
, “
Automatic atom type and bond type perception in molecular mechanical calculations
,”
J. Mol. Graphics Modell.
25
,
247
(
2006
).
74.
J.
Wang
,
R. M.
Wolf
,
J. W.
Caldwell
,
P. A.
Kollman
, and
D. A.
Case
, “
Development and testing of a general amber force field
,”
J. Comput. Chem.
25
,
1157
(
2004
).
75.
W. L.
Jorgensen
,
J.
Chandrasekhar
,
J. D.
Madura
,
R. W.
Impey
, and
M. L.
Klein
, “
Comparison of simple potential functions for simulating liquid water
,”
J. Chem. Phys.
79
,
926
(
1983
).
76.
M. A.
Rohrdanz
,
K. M.
Martins
, and
J. M.
Herbert
, “
A long-range-corrected density functional that performs well for both ground-state properties and time-dependent density functional theory excitation energies, including charge-transfer excited states
,”
J. Chem. Phys.
130
,
054112
(
2009
).
77.
R.
Ditchfield
,
W. J.
Hehre
, and
J. A.
Pople
, “
Self-consistent molecular-orbital methods. IX. Extended Gaussian-type basis for molecular-orbital studies of organic molecules
,”
J. Chem. Phys.
54
,
724
(
1971
).
78.
P. C.
Hariharan
and
J. A.
Pople
, “
Influence of polarization functions on molecular-orbital hydrogenation energies
,”
Theor. Chim. Acta
28
,
213
(
1973
).
79.
W.
Gyorffy
,
T.
Shiozaki
,
G.
Knizia
, and
H. J.
Werner
, “
Analytical energy gradients for second-order multireference perturbation theory using density fitting
,”
J. Chem. Phys.
138
,
104104
(
2013
).
80.
R.
Gulde
,
P.
Pollak
, and
F.
Weigend
, “
Error-balanced segmented contracted basis sets of double-ζ to quadruple-ζ valence quality for the lanthanides
,”
J. Chem. Theory Comput.
8
,
4062
(
2012
).
81.
T. H.
Dunning
, Jr.
, “
Gaussian basis sets for use in correlated molecular calculations. I. The atoms boron through neon and hydrogen
,”
J. Chem. Phys.
90
,
1007
(
1989
).
82.
F.
Weigend
, “
A fully direct RI-HF algorithm: Implementation, optimised auxiliary basis sets, demonstration of accuracy and efficiency
,”
Phys. Chem. Chem. Phys.
4
,
4285
(
2002
).
83.
T. W.
Keal
,
A.
Koslowski
, and
W.
Thiel
, “
Comparison of algorithms for conical intersection optimisation using semiempirical methods
,”
Theor. Chem. Acc.
118
,
837
(
2007
).
84.
J.
Kastner
,
J. M.
Carr
,
T. W.
Keal
,
W.
Thiel
,
A.
Wander
, and
P.
Sherwood
, “
DL-FIND: An open-source geometry optimizer for atomistic simulations
,”
J. Phys. Chem. A
113
,
11856
(
2009
).
85.
A. D.
Becke
, “
A new mixing of Hartree–Fock and local density-functional theories
,”
J. Chem. Phys.
98
,
1372
(
1993
).
86.
T.
Shiozaki
, “
BAGEL: Brilliantly advanced general electronic-structure library
,”
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
8
,
e1331
(
2018
).
87.
J. W.
Park
and
T.
Shiozaki
, “
Analytical derivative coupling for multistate CASPT2 theory
,”
J. Chem. Theory Comput.
13
,
2561
(
2017
).
88.
D. R.
Yarkony
, “
On the adiabatic to diabatic states transformation near intersections of conical intersections
,”
J. Chem. Phys.
112
,
2111
(
2000
).
89.
D. R.
Yarkony
, “
Nuclear dynamics near conical intersections in the adiabatic representation: I. The effects of local topography on interstate transitions
,”
J. Chem. Phys.
114
,
2601
(
2001
).
90.
L.
Liu
,
J.
Liu
, and
T. J.
Martinez
, “
Dynamical correlation effects on photoisomerization: Ab initio multiple spawning dynamics with MS-CASPT2 for a model trans-protonated Schiff base
,”
J. Phys. Chem. B
120
,
1940
(
2016
).
91.
J. M.
Herbert
and
M.
Head-Gordon
, “
Accelerated, energy-conserving Born–Oppenheimer molecular dynamics via Fock matrix extrapolation
,”
Phys. Chem. Chem. Phys.
7
,
3269
(
2005
).
92.
P.
Pulay
and
G.
Fogarasi
, “
Fock matrix dynamics
,”
Chem. Phys. Lett.
386
,
272
(
2004
).
93.
F.
Menezes
,
D.
Kats
, and
H.-J.
Werner
, “
Local complete active space second-order perturbation theory using pair natural orbitals (PNO-CASPT2)
,”
J. Chem. Phys.
145
,
124115
(
2016
).
94.
R. M.
Parrish
,
E. G.
Hohenstein
, and
T. J.
Martínez
, “
‘Balancing’ the block Davidson-Liu algorithm
,”
J. Chem. Theory Comput.
12
,
3003
(
2016
).
95.
F.
Furche
,
B. T.
Krull
,
B. D.
Nguyen
, and
J.
Kwon
, “
Accelerating molecular property calculations with nonorthonormal Krylov space methods
,”
J. Chem. Phys.
144
,
174105
(
2016
).
96.
C.
Ko
,
D. K.
Malick
,
D. A.
Braden
,
R. A.
Friesner
, and
T. J.
Martínez
, “
Pseudospectral time-dependent density functional theory
,”
J. Chem. Phys.
128
,
104103
(
2008
).
97.
C. M.
Isborn
,
A. W.
Götz
,
M. A.
Clark
,
R. C.
Walker
, and
T. J.
Martínez
, “
Electronic absorption spectra from MM and ab initio QM/MM molecular dynamics: Environmental effects on the absorption spectrum of photoactive yellow protein
,”
J. Chem. Theory Comput.
8
,
5092
(
2012
).
98.
S.
Gozem
,
F.
Melaccio
,
A.
Valentini
,
M.
Filatov
,
M.
Huix-Rotllant
,
N.
Ferré
,
L. M.
Frutos
,
C.
Angeli
,
A. I.
Krylov
, and
A. A.
Granovsky
, “
Shape of multireference, equation-of-motion coupled-cluster, and density functional theory potential energy surfaces at a conical intersection
,”
J. Chem. Theory Comput.
10
,
3074
(
2014
).
99.
M.
Filatov
, “
Assessment of density functional methods for obtaining geometries at conical intersections in organic molecules
,”
J. Chem. Theory Comput.
9
,
4526
(
2013
).
100.
R.
Liang
,
F.
Liu
, and
T. J.
Martínez
, “
Nonadiabatic photodynamics of retinal protonated Schiff base in channelrhodopsin 2
,”
J. Phys. Chem. Lett.
10
,
2862
(
2019
).
101.
J. K.
Yu
,
R.
Liang
,
F.
Liu
, and
T. J.
Martínez
, “
First-Principles characterization of the elusive I fluorescent state and the structural evolution of retinal protonated Schiff base in bacteriorhodopsin
,”
J. Am. Chem. Soc.
141
,
18193
(
2019
).

Supplementary Material