We present a highly efficient method for the extraction of optical properties of very large molecules via the Bethe–Salpeter equation. The crutch of this approach is the calculation of the action of the effective Coulombic interaction, W, through a stochastic time-dependent Hartree propagation, which uses only ten stochastic orbitals rather than propagating the full sea of occupied states. This leads to a scaling that is at most cubic in system size with trivial parallelization of the calculation. We apply this new method to calculate the spectra and electronic density of the dominant excitons of a carbon-nanohoop bound fullerene system with 520 electrons using less than 4000 core hours.
I. INTRODUCTION
Accurate calculation of optical spectra for large systems is essential for future novel optical and electronic devices, in fields ranging from photovoltaics,1 photocatalysis,2 and organic semiconductors3,4 to even the mechanisms of UV damage on DNA.5 The ubiquitous method of the field is time-dependent density functional theory (TD-DFT), but, unfortunately, this method lacks the accuracy needed for predictive power in most electronically complex systems. The Bethe–Salpeter Equation (BSE) formalism is an increasingly popular alternative for calculating electronic spectra.6 The success of the BSE is due to the proper inclusion of an effective long range exchange kernel, which solves the failures of TD-DFT in accurately describing charge transfer excitations and avoiding crossings.7,8
Current conventional methods for solving the BSE are substantially more computationally demanding than most implementations of TD-DFT due to the explicit calculation of a large number of occupied and virtual electronic states and the evaluation of a large number of screened exchange integrals between valence and conduction states, yielding a typical scaling of , for no valence orbitals.9–11 TD-DFT with local exchange functionals has a naive scaling of . However, progress in the field has reduced the scaling with techniques, which only requires the occupied orbitals that implicitly interact with all conduction states, what we later term a “mixed representation.”12–22 While very important for TD-DFT, the speedup gained from the use of an approximate iterative method vs direct diagonalization is only worthwhile in BSE if solving for the absorption is the algorithmic bottleneck, as is also the case for efficient BSE algorithms that approximate the dielectric response.23
To go beyond TD-DFT to BSE requires constructing the effective Coulombic interaction, W, the most computationally expensive step. To overcome this issue, we adopt our previous stable iterative methods14,15 and pull from our previous studies in both TD-DFT/BSE14,15,24–26 and stochastic GW27–31 to present an efficient stochastic generation of W within an iterative BSE technique. Our combined approach uses stochastic time-dependent propagation to obtain the action of W on each required term in linear scaling.27 In our work, the scaling is achieved with equispaced grids and delocalized functions, which make it easy to achieve full spatial convergence when smooth pseudopotentials are used. Overall, this results in an efficient method with at most cubic scaling with respect to system size. (We note that when, instead, localized basis sets are used for W and for the solution of the BSE equation, a similarly cubic scaling behavior can be achieved when balancing the requirements of basis-set convergence and localization.21,22) The method and its application to a large organic semiconductor are detailed below.
II. METHOD
Every method for solving the BSE has two numerical parts: construction of the “kernels,” i.e., the action of the effective interaction W on a given transition, and then diagonalizing or iterating the resulting Bethe–Salpeter Hamiltonian-like operator for the excitons. The full algorithm is covered here for completeness.
The starting point is a closed shell molecular system with 2Nocc electrons. We are interested in excitons composed of a mixture of no occupied (valence) orbitals, ϕi, ϕj, …, times nc conduction (virtual) ones written as ϕa, ϕb, …. Typically, no ≪ Nocc states are considered. The orbitals are eigenstates of a zero-order Hamiltonian H0, with occupied-state eigenvalues ɛi, ɛj, … and virtual-state eigenvalues ɛa, ɛb, …. Formally, the zero-order Hamiltonian eigenvalues should come from a very accurate method, specifically self-consistent GW. In addition, we will use the well-established Tamm–Dancoff approximation.
For singlet excitations, the excitation energies of the system are the eigenvalues of the (nonc × nonc) Tamm–Dancoff matrix A that couples excitons, i.e., occupied-virtual pairs,
with exchange elements,
while W ≡ W(ω = 0) refers to the static effective Columbic interaction approximation, and its matrix elements are
The superscript in ɛGW denotes, as mentioned, that high quality GW single particle energies should be used. In practice, for systems that are not too small, starting at medium sized systems with a few dozen electrons, it is sufficient to use the DFT eigenstates plus a GW derived correction, called a scissors approximation. We calculate the HOMO and LUMO GW energies by the linear-scaling stochastic-GW (sGW) method27–31 and use the scissors approximation: and , where and analogously for δc. Furthermore, for higher accuracy, we use the self-consistent ΔGW0 approach where the sGW HOMO and LUMO corrections are a posteriori shifted self-consistently; for large systems, this approach was found to be an excellent approximation to eigenvalue-only self-consistent GW0 and to the experiment.30
A. Mixed representation iterative solution
The simplest derivation of an iterative method for the BSE spectrum starts with the linear-response time-dependent Hartree–Fock (TD-HF) equation.15,32,33 For an initially real occupied state perturbed along the x axis, ψj(r, t = 0) = e−iαxϕj, one performs a linear response expansion in α, , where fj(r, t = 0) = xϕj(r). The formally non-linear TD-HF equation for ψj then converts, for small α, to a linear equation for fj. In the Tamm–Dancoff approximation (where fj is not coupled to ), this evolution equation reads
where
and
Here, we introduced several terms. Δ ≡ δc − δo is the ΔGW0 scissors shift. The exciton Coulomb potential is δv(r, t) = ∫|r − r′|−1δn(r′, t)dr′, where the exciton density is δn(r, t) = 2∑iϕi(r)fi(r, t) and the sum extends over the occupied states. The exciton exchange δX is defined analogously, again under the Tamm–Dancoff approximation,
Finally, Q is a projection operator that ensures that the excited functions fj have no overlap with the occupied space, i.e., Q = I − P, with .
The BSE equation results then when the static effective interaction W replaces the Coulombic interaction in the exchange operator, yielding eventually (hiding the time-dependence of f)
where the action of W on the occupied–occupied term is Wij(r) ≡ ∫W(r, r′; ω = 0)ϕi(r′)ϕj(r′)dr′.
The linear form of the time-dependent equation readily implies that the frequency-dependent spectrum can be obtained from the dipole–dipole correlation function, where up to a constant,
where , and the (smoothed) delta function is readily expressed using a Chebyshev expansion in A,
where |fn⟩ are obtained by iteratively applying a scaled A, starting from |f0⟩, while gn(ω) are numerical coefficients.34–36 The spectrum evaluation, therefore, reduces to calculation of the residues, Rn ≡ ⟨f0|fn⟩. Numerically, one just requires the application of A on an arbitrary exciton vector fj(r), i.e., f → Af.
While here we use Eq. (6), based on the mixed hole-grid representation fj(r), we note that in many cases one would want to use an explicit electron–hole basis. For example, for systems such as large quantum dots, where Nocc is very large and we are interested in a smaller number of conduction states relative to the total number of occupied electron states, nc < Nocc, it is numerically better to replace Q by a projection to the nc conduction states, . Then, the fundamental iterated object is the electron–hole basis coefficients, fia ≡ ⟨ϕa|fi⟩. In the electron–hole basis, the initial state is simply the x-dipoles elements, . Furthermore, it is easy to see that the iterative application of A on f, as given in Eqs. (3) and (6), becomes in the electron–hole basis: (Af)ia = ∑j,bA(i, a; j, b)fjb and A is here exactly the BSE matrix from Eq. (1). Practically, the action by A would be done then as
i.e., given the set of coefficients fjb, one would calculate the mixed representation vectors fj(r), from which δv and δX follow, and then dot product per the equation above.
Note that using the electron–hole basis coefficients reduces the spectral range of A, which will be controlled now by the highest conduction state included, instead of the (much larger) highest eigenvalue of H0, thereby reducing the number of required Chebyshev terms. In addition, the use of the electron–hole basis within this method allows us to use a specific state-selected GW shift. Thus, with such a basis, the Hamiltonian is not used explicitly so, if desired ɛa and ɛi could come directly from a GW calculation rather than a global scissors shift, as shown in Eq. (1). Similarly, the same formalism carries over trivially to localized orthogonal basis sets, where ɛa and ɛi would be replaced by the Hamiltonian matrices within the electron and hole spaces, respectively. The expressions for non-orthogonal basis sets can be similarly derived.
B. Stochastic evaluation of the action of W
The main numerical task in our formalism is the preparation of all no(no + 1)/2 functions Wij(r). W is made from a static Coulomb part and a polarization component, W = v + vχv ≡ v + Wpol (where v(r, r′) = |r − r′|−1), so , where qij ≡∫|r − r′|−1ϕi(r′)ϕj(r′)dr′.
As is well-known, the action of Wpol can be obtained by time-dependent Hartree (TD-H) calculations.37–39 Specifically, for each pair of occupied functions 1 ≤ i, j ≤ no, one calculates the source “potential” qij(r) due to the ϕiϕj density-like source term. Then, the full set of all occupied states is perturbed, where α ≈ 10−6 − 10−4 is a small perturbation strength, just as in the linear-response TD-HF derivation above. Note that, to avoid a plethora of indices, we do not denote the dependence of ψs on i, j.
The perturbed states are then numerically propagated with a TD-H Hamiltonian,
where uij(r, t) is the potential due to the time-dependent density perturbation,
where nα(r, t) = 2∑s|ψs(r, t)|2 is the density due to the propagated perturbed orbitals. This potential, used to propagate the time-dependent Hamiltonian, is then scaled to give the result of acting with the time-dependent effective potential, W(t), i.e.,
Finally, the desired action of the static polarization is obtained by damped integration of the action of the time-dependent polarization, , i.e.,
where we introduced a Gaussian damping function where the width γ is a numerical convergence parameter.
The one caveat in this overall approach is that the propagation of the full set of occupied orbitals is very expensive for large systems. The need to propagate a huge number of valence and conduction orbitals is what makes the generation of W the most expensive step in a BSE calculation, even with highly optimized codes such as in Refs. 9 and 10. We, therefore, use here our stochastic-TD-H approach26,27,40 that leads to a stochastic W, outlined below. (Note that this is exactly the same approach we use in our stochastic GW method,27 with a small improvement detailed later.) Briefly, in stochastic-TD-H (or stochastic-TD-DFT in the general case), we replace the full set of occupied orbitals by a few random–sign combinations of all occupied states,
where ℓ = 1, …, L, and for large systems, a very small number of states is sufficient, L ≪ Nocc. The L stochastic occupied states are then treated in the TD-H procedure as if they were the full set of Nocc molecular orbitals, i.e., they are perturbed and propagated with H0 + uij(r, t), where the time-dependent density used in constructing u is now obtained from n(r, t) = 2∑ℓ≤L|ηℓ(r, t)|2. Note that two sets need to be propagated, the perturbed and unperturbed |ηα=0(t)⟩ stochastic orbitals.
At long times, this simple stochastic TD-H approach would eventually become unstable due to “contamination” by occupied states. This means that the excited component, , has in it occupied-states’ amplitudes. For regular TD-H propagation of all states, this is not a problem since, in the overall density, the “contamination” of a propagated state ψj(r, t) by an occupied ϕi(r) is exactly canceled by the “contamination” of the opposite pair.19 However, in our stochastic occupied orbitals, there is no such cancellation. Luckily, the instability gets tamed as the system size gets bigger, but we did find that it affects the results here if untreated for medium system sizes.
To prevent the instability, we simply “clean” the stochastic orbitals periodically, so after every M’th time step, we write
with t = 0, Mdt, 2Mdt, …. This does not increase the scaling of the method since the required cleaning frequency decreases (i.e., a larger M is possible) with increasing system size. Also note that, after each cleaning step, we renormalize each orbital so it keeps its initial norm. In our calculations detailed below, we found M = 10 suffices for a total time of t = 300.
Finally, a well-known technical aspect is that the finite grid box leads to a fixed vacuum potential at the edge of the box that affects the zero-point energy of the system when calculating W, thus requiring a constant shift to the system. Formally, this term relates to the zero-momentum limit k → 0 of the interaction, so it is labeled as W(k → 0). Existing codes address differently the numerical quadrature issues associated with accurately converging the calculation of this zero-point, either accounting for the average potential with a constant shift (done in Refs. 9, 42, and 43) or Monte-Carlo sampling the zero-point shift (as done in Ref. 10).
We use a variation of the constant-shift procedures (Refs. 42 and 43) to find the required shift. We repeat the iterative calculation for a medium size (i.e., a lesser number of valence states, no, than the one eventually used) at several different box sizes, and for each run, find the average Kohn–Sham potential on the box boundary, vbndry. Then, we approximate , and the parameter , playing the role of an inverse dielectric constant, is fit so that the spectra from the different box-sizes runs overlap.
Summarizing the resulting algorithm: we use stochastic TD-H to prepare Wij(r) for no(no + 1)/2 occupied-state pairs. Separately we use sGW to calculate the self-consistent GW scissors shift Δ and subtract from it the W(k → 0) term. Then, for each polarization, we start with the dipole exciton state. The Tamm–Dancoff operator [Eq. (6)] is successively applied and the Chebyshev residues Rn are used to calculate the absorption frequencies. As usual, if one wants to characterize the different peaks, then the Chebyshev expansion of the frequency-resolved exciton state [Eq. (8)] can be used (potentially with filter-diagonalization44,45 for resolving different sub-peaks).
The two parts of the method, preparing Wij and applying the A operator, both scale as for ng grid points, more gentle than most current methods. Formally, this is cubic scaling with system size but, in practice, the scaling is better since no often rises only gently with Nocc. In addition, the number of grid points would be reduced in future studies as we have shown that very sparse grids suffice when using orthogonal projected augmented waves (OPAW) instead of pseudo-potentials.46
III. RESULTS
We demonstrate the algorithm on a characteristic large organic semiconducting system, a carbon nanohoop–fullerene complex. In the past decade, cycloparaphenylenes (CPPs), also known as carbon nanohoops, have emerged as highly structurally tunable emitters, with rich size-dependent opto-electronic properties and host–guest chemistry.47 While substantial DFT modeling has been completed on CPP + fullerene complexes,41,48–50 extraction of optical properties at this level of theory is difficult due to the characteristic charge transfer states in CPPs. Furthermore, it has already been established that the BSE formalism is very accurate in predicting the properties of other fullerene–polymer complexes.51 Here, we present detailed results for the smallest such CPP + fullerene “pea-pod,”52,53 [10]-CPP + C60 (Fig. 1).
To simulate [10]-CPP + C60, we use a generous box of (Nx, Ny, Nz) = (100, 100, 84) with a grid spacing of 0.5 bohr using norm-conserving pseudo-potentials (NCPPs) with stochastic DFT at the LDA level.26,54,55 The effective inverse dielectric constant was found to be 0.30, which gives, at this grid size, a shift of −W(k → 0) = −0.29 eV.
The DFT gap is 1.02 eV. This gap is corrected with stochastic GW by an amount of 1.24 eV, and after applying the self-consistent ΔGW0, this gap correction rises to 1.33 eV (i.e., a fundamental ΔGW0 gap of 2.35 eV). Combined with −W(k → 0), the overall scissors shift used is Δ = 1.04 eV.
The calculations of the action of W [Eqs. (10)–(13)] were done with a broadening of γ = 0.1 Hartree. A time-step dt = 0.1 a.u. was used for a split-operator propagation, and the cleaning was done every M = 10 steps. The runs were done for n0 = 100 valence states, requiring n0(n0 + 1)/2 = 5005 actions of W. For each realization of the action of W, only L = 10 stochastic time-dependent orbitals were used. We verified that the error due to the finite L was small, observing a shift <0.02 eV shift in the most sensitive lowest exciton upon doubling the number of stochastic orbitals.
The BSE iterative Chebyshev procedure was then done using the no = 100 valence states. The Chebyshev expansion of δ(A − ω) is evaluated with a Gaussian broadening of 0.08 eV; this does not affect the spectrum significantly as it is naturally broadened due to a large number of excitons. The runs took a total of 4000 core hours, i.e., 40 wall hours on a 100-core AMD Milan cluster.
In Fig. 2, we show the calculated absorption spectrum, both total and separated to in-plane (x and y), and perpendicular (z) polarizations. While there are no gas phase spectra of [10]-CPP + C60 due to the fullerene slipping out of its “pea-pod,” for the lowest exciton energy, we get a reasonable agreement with just [10]-CPP,56,57 and a stabilizing [10]-CPP + C60-[2]-Rotaxane complex.58 For just 10-[CPP] in the gas phase, the lowest strong transition sits 0.4 eV higher than that shown in Fig. 2. Qualitatively, with the addition of the fullerene in the middle in our simulations, the overall dielectric constant would increase, thereby lowering the energy of the first exciton state. This shift is consistent with the shift to lower energies seen in the stabilized [10]-CPP + C60-[2]-Rotaxane complex in solution.
Figure 3 shows the exciton density of four prominent excitons as labeled in Fig. 2, and, for the sake of analysis, we also extract the exciton density in the electron–hole state basis. Specifically, we calculate |fj(ω)⟩ at the four exciton peak frequencies and then calculate the overlap onto a set of unoccupied wave functions, giving us fia for as many a as we desire. For the lowest peak, most of the exciton density is concentrated at near-gap i, a states. However, for the largest spectrum peak at 6.83 eV, labeled (iv) in Fig. 2, one ought to go beyond the figure and use (100, 250) states to capture the same amount of density, a basis size that is very substantial. This demonstrates the strengths of the stochastic resolution of the action of W in our present approach, as the full unoccupied space is sampled rather than just a subset of conduction states.
IV. FORWARD PERSPECTIVE
Our results show that even larger systems are feasible. This is evident by the fact that the runs were not optimized. For example, while we used n0 = 100, a smaller number n0 = 70 would have sufficed, reducing the cost by a factor of two. Similarly, the box size was very large, and a grid with almost half the points (or even less with OPAW46) would have sufficed. With optimized parameters, one can, therefore, easily reach systems with 1000–2000 electrons.
An interesting feature of the method is that, for larger systems, it becomes less and less sensitive to stochastic errors. Those errors appear primarily in the sGW calculation of the scissors shift, but this calculation scales sub-linearly with system size. Already here the sGW calculation took less than 20% of the total calculation and had an error of 0.05 eV, so for larger systems, an even higher accuracy would be obtained with a small fraction of the total cost.
An interesting question is how to go to huge systems, with many thousands of electrons. For this, we note that, while this implementation of the BSE scales formally cubically, it holds promise to give eventually quadratic scaling. To achieve this, one would need to implement localized occupied basis sets (see Refs. 21, 22, 59, and 60) for reducing the number of i, j pairs for which Wij(r) needs to be calculated, and perhaps even stochastically sample these ϕi(r)ϕj(r) pairs. These previous methods with optimal basis sets will serve as direct guides for further reducing the scaling of this method.
Furthermore, as briefly mentioned above, we hope to expand beyond the Tamm–Dancoff approximations in future works, in which the antiresonant perturbation must also be propagated, and interact with each other via W.14,15 In addition, dynamical corrections to W could be implemented at the plasmon-pole level of theory37 or beyond at select frequencies.61
ACKNOWLEDGMENTS
N.C.B. acknowledges the National Science Foundation Graduate Research Fellowship Program (Grant No. DGE-2034835). J.R.C. acknowledges the ACS Petroleum Research Fund (Grant No. 62717-DNI6). N.C.B., M.N., and D.N. acknowledge support by the NSF (Grant No. CHE-1763176). Computational resources were supplied through XSEDE allocation TG-CHE170058.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.