In this article, the authors present a technique using variational Monte Carlo to solve for excited states of electronic systems. This technique is based on enforcing orthogonality to lower energy states, which results in a simple variational principle for the excited states. Energy optimization is then used to solve for the excited states. This technique is applied to the well-characterized benzene molecule, in which ∼10 000 parameters are optimized for the first 12 excited states. Agreement within ∼0.2 eV is obtained with higher scaling coupled cluster methods; small disagreements with experiment are likely due to vibrational effects.

## I. INTRODUCTION

The ground and first few excited states determine the behavior of most systems in materials, condensed matter, and chemistry. A hallmark of correlated electron physics is the presence of many disparate excited states near the ground states, which may differ in complex ways that depend on exactly how electronic correlation is treated, as has been shown in the Hubbard model.^{1} Methods to access these low energy states in strongly correlated systems are fundamental to understanding them. There is, thus, a need for scalable methods that can treat strongly correlated systems and non-perturbatively access excited states, which would generate new insights into long-standing challenging systems such as the low energy behavior of high-Tc cuprates/pnictides and twisted bilayer graphene.

Quantum Monte Carlo techniques such as variational and diffusion Monte Carlo (VMC, DMC)^{2} are based on many-body, non-perturbative wave functions. They offer low scaling, typically of order $O(Ne3\u22124)$, where *N*_{e} is the number of electrons, and can offer impressive accuracy for that low scaling.^{3} However, these techniques are mostly developed for ground state calculations. By far, the most common technique to approximate excited states is to fix a single Slater determinant or linear combination of Slater determinants to approximate the ground state, while using a Jastrow^{4} factor and diffusion Monte Carlo to address some of the electron correlations. This is the technique outlined in Ref. 2, and similar techniques are used in auxiliary field quantum Monte Carlo.^{5} This technique works surprisingly well for many materials.^{6–12} However, this technique depends on the quality of the wave function that generated the fixed trial function; if correlation included by the Jastrow factor would change the optimal Slater part of the wave function, then the *ansatz* is suboptimal, as demonstrated clearly in Ref. 13.

Recently, and not so recently, there have been extensions to the variational and diffusion Monte Carlo (VMC, DMC) methods to address excited states systematically. These techniques seek to improve the accuracy of excited states over the simple method explained above by optimizing the antisymmetric part of the wave function. Bernu and Ceperley proposed an algorithm based on diffusion Monte Carlo,^{14} which has not been applied to many practical cases, since it has a sign problem that leads to exponential scaling in the system size. Blunt *et al.*^{15} proposed an algorithm using full CI (configuration interaction), which is also exponentially scaling. On the low-scaling side, Filippi and co-workers^{16–18} implemented a method similar to the state averaged complete active space self-consistent field (CASSCF) in VMC and demonstrated the technique on impressively large wave function expansions. However, state averaging is not optimal when the optimal ground and excited orbitals are very different,^{19} as often occurs in strongly correlated systems. Neuscamman and co-workers^{19–22} have proposed a low-scaling method that instead uses alternate objective functions to optimize excited states. While this technique does not suffer from the state averaging problem, it so far has only been applied to very few excited states and can experience difficulties converging to the correct state.^{23} Finally, Choo *et al.*^{24} used an orthogonalization step after each optimization step to access excited states on a lattice, but the algorithm has not been demonstrated on first-principles models.

In this article, we implement and demonstrate a simple penalty method based on orthogonalizing to lower energy states to compute excited states using variational Monte Carlo. Similar techniques are commonly used in density matrix renormalization group calculations^{25} but, to our knowledge, have not been applied in the variational Monte Carlo context. This technique obtains excited states one by one by enforcing orthogonality to lower energy states and can optimize general wave function parameters, including orbital parameters, as shown here. The scaling of the technique is $O(NexNeM)+cO(Nex2NeM)$, where *N*_{e} is the number of electrons, *N*_{ex} is the number of excited states computed, *c* is a small constant, and *M* is dependent on the wave function, three for a Slater–Jastrow wave function. The orthogonalization-based technique allows for access to multiple excited states and does not require state averaging. The technique is implemented in the pyqmc package, available online.^{26} The method is applied to benzene with an ∼10 000 parameter wave function, showing high accuracy compared to experiment and coupled cluster calculations on 12 excited states. The state-specific orbital optimization made possible by the new technique results in improvements in excited state energies from fixed node DMC.

## II. PENALTY-BASED OPTIMIZATION USING VARIATIONAL MONTE CARLO

The method solves for each energy eigenstate one at a time by following the procedure outlined here:

First, the stochastic reconfiguration method

^{27}is used to find the VMC approximation to the ground state, |Ψ_{0}⟩.The first excited state |Ψ

_{1}⟩ is found by optimizing the objective function [Eq. (7)] with $S\u2192*=[0]$ and*λ*set larger than the expected*E*_{1}−*E*_{0}. The algorithm is not very sensitive to the value of*λ*, so we typically use*λ*of order 1 hartree, substantially higher than excitation energies in the systems considered here.The second excited state is found by optimizing the objective function [Eq. (7)] with $S\u2192*=[0,0]$ and anchor states |Ψ

_{0}⟩ and |Ψ_{1}⟩.Further excited states are found in the same way by orthogonalizing to the ones found in the previous steps.

### A. Objective function

As shown in Fig. 1, it is straightforward to show that if |Φ_{0}⟩ is the ground state of the Hamiltonian, then so long as *λ* > *E*_{1} − *E*_{0}, the function

is equal to the first excited state |Φ_{1}⟩, where *E*[Ψ] is the expectation value of the energy and $N\Psi =1/|\Psi |\Psi |$.

For completeness, we show this here. Consider the objective functional

Then, the functional derivative of each term is given by

and

We have boldfaced the unpaired kets for clarity. Combining the two and setting $\delta O\delta \Psi *$ to zero, we obtain

The *H* − *E*[Ψ] term ensures that this equation can only be satisfied by an energy eigenstate |Φ_{i}⟩. For *i* ≠ 0, |Ψ⟩ = |Φ_{i}⟩ is a solution because ⟨Φ_{i}|Φ_{0}⟩ = 0. |Φ_{0}⟩ is a solution since ⟨Φ_{0}|Φ_{0}⟩ = 1, which allows the *λ* terms to cancel. The value of the functional *O*[Φ_{i}] = *E*_{i} + *λδ*_{i0}. Therefore, if *λ* > *E*_{1} − *E*_{0}, the global minimum is at |Φ_{1}⟩. Because of the structure demonstrated in Fig. 1, there are no local minima in the complete Hilbert space. Similarly, the *N*th excited state is given by

as long as *λ*_{i} > *E*_{i} − *E*_{0}.

We write the algorithm in terms of *anchor states* |Ψ_{i}⟩, where *i* = 0, …, *N* − 1. These states are fixed during the optimization, and only the parameters of a single wave function |Ψ⟩ are optimized. While ideally the anchor states are energy eigenstates, in general, they will be best approximations to them. We also find it useful to consider the objective function

where

and $S\u2192*$ is a set of *target* overlaps. For example, to obtain the *N*th excited state, one would use *N* − 1 anchor states each set to the best approximation to the *N* − 1 lowest energy eigenstates and set $S\u2192*$ equal to a zero vector of *N* − 1 length.

### B. Computation of the objective function and its derivatives using variational Monte Carlo

In this section, we will explain how to evaluate the quantities needed using standard variational Monte Carlo techniques.^{2} In this implementation, we sample a different distribution for each anchor state,

Then, the unnormalized overlap is estimated in Monte Carlo

where **R** is the many-electron coordinate and **R** ∼ *ρ*_{i} means that **R** is sampled from the normalized distribution $\rho i(R)\u222b\rho i(R)$. Here, subscript *i* indicates that the overlap was estimated using *ρ*_{i}.

The relative normalization of the wave function |Ψ⟩ is, thus,

which is evaluated the same way as Eq. (10). For a parameter *p* of |Ψ⟩,

The overlap is given by

where $Ai=\Psi |\Psi i\Psi i|\Psi ii$. The derivative of the unnormalized overlap is

The derivative of the normalized overlap is computed using the above-mentioned components as follows:

Thus, all derivatives can be computed using only the wave function parameter derivatives.

We also regularize all derivatives using the stochastic reconfiguration^{27} step to compute

for parameter indices *p* and *q*.

### C. Practical details

The algorithm is outlined in Table I. We provide comments on steps that have some nontrivial considerations. In step **2**, the parameters are initialized. We typically initialize the parameters using an approximate excited state, typically either from an orbital promotion in a single determinant or from a small quantum chemistry calculation. We have checked that it is possible to optimize starting from the ground state, but such a strategy is unnecessarily expensive, in particular, because the objective function of Eq. (7) is a saddle point at the ground state.

1. Choose target overlaps $S\u2192*$ |

2. Initialize $p0\u2192$ |

3. for i in range(nsteps), do |

(a) Compute N_{i}, $S\u2192$, ∇_{p}N_{0}, $\u2207pS\u2192$, E, and ∇_{p}E |

(b) if abs(N_{0}-0.5) > threshold, normalize Ψ_{N} |

(c) Objective function is $O=E+\lambda \u2192\u22c5(S\u2192\u2212S\u2192*)2$ |

(d) Construct gradient $\u2207pO=\u2207pE+\u2207p(\lambda \u2192\u22c5(S\u2192\u2212S\u2192*)2)$ |

(e) ∇_{p}N ← R^{−1}∇_{p}N |

(f) ∇_{p}O ← R^{−1}∇_{p}O |

(g) $\u2207pO\u2190\u2207pO\u2212(\u2207pO)\u22c5(\u2207pN)|\u2207pN|2\u2207pN$ |

(h) $p\u2192(\tau )\u2190p\u2192i\u22121\u2212\tau \u2207pO$ |

(i) $p\u2192i\u2190arg minp(O(p\u2192(\tau )),\tau )$ |

1. Choose target overlaps $S\u2192*$ |

2. Initialize $p0\u2192$ |

3. for i in range(nsteps), do |

(a) Compute N_{i}, $S\u2192$, ∇_{p}N_{0}, $\u2207pS\u2192$, E, and ∇_{p}E |

(b) if abs(N_{0}-0.5) > threshold, normalize Ψ_{N} |

(c) Objective function is $O=E+\lambda \u2192\u22c5(S\u2192\u2212S\u2192*)2$ |

(d) Construct gradient $\u2207pO=\u2207pE+\u2207p(\lambda \u2192\u22c5(S\u2192\u2212S\u2192*)2)$ |

(e) ∇_{p}N ← R^{−1}∇_{p}N |

(f) ∇_{p}O ← R^{−1}∇_{p}O |

(g) $\u2207pO\u2190\u2207pO\u2212(\u2207pO)\u22c5(\u2207pN)|\u2207pN|2\u2207pN$ |

(h) $p\u2192(\tau )\u2190p\u2192i\u22121\u2212\tau \u2207pO$ |

(i) $p\u2192i\u2190arg minp(O(p\u2192(\tau )),\tau )$ |

In step **3(a)**, all the quantities in Sec. II B are computed. A Monte Carlo sampling of *ρ*_{i} is done for each anchor wave function. The algorithm, thus, scales mildly with the number of anchor wave functions. The energy and its derivatives are averaged among all samples, so the most costly component of the calculation does not increase much with the number of anchor wave functions.

It is important for the normalization of the wave functions to be similar; otherwise, the density in Eq. (9) is unbalanced. One could adjust weights in Eq. (9), but we find it more convenient to normalize all wave functions. We ensure that all anchor states have the same normalization and use the first anchor state as a reference. Before performing an optimization move, we first check whether *N*_{0} is too far from $12$. The threshold is typically about 0.1. If it is too far, the parameters are rescaled to normalize the wave function and VMC is repeated with the renormalized wave function parameters. This is performed in step **3(b)** in Table I.

In steps **3(e)** and **3(f)**, we regularize the gradients of the normalization and the objective function. It is necessary to regularize both prior to the projection that will come in the next step.

To prevent moves that change the normalization, we project out the derivative of the normalization from the objective function. Otherwise, the moves diverge from the equal normalization manifold, and it becomes difficult to evaluate the overlaps accurately. This is performed in step **3(g)** in Table I.^{28}

We find that line minimization, performed in step **3(i)** in Table I, improves the performance of the algorithm significantly. We use correlated sampling to compute the objective function for various values of *τ* and fit to a quadratic. We also reject moves that change the relative normalization by more than 0.3.

## III. DEMONSTRATION OF EXCITED STATE OPTIMIZATION USING VMC

To demonstrate the technique, we apply it to two cases: H_{2} at varying bond lengths to check for correctness vs an exact solution and the excited states of benzene to demonstrate that it is capable of optimizing about 10 000 parameters on a system with 30 electrons in the calculation.

### A. Application to H_{2}

For application to H_{2}, the trial wave function was a simple complete active space (CASCI) wave function with two electrons and two orbitals taken from the restricted Hartree–Fock function. We take the three lowest excited states from this calculation, labeled Ψ_{CASCI,i}, where *i* runs from 0 to 2. This wave function was modified using a two-body Jastrow factor as parameterized in previous work^{29} to obtain

We then optimized the determinant, orbital, and Jastrow parameters using a modified version of stochastic reconfiguration implemented in pyqmc to obtain Ψ_{CASCI-J,0}.

We then considered two Jastrow-based approximations to the excited states. The first, which we denote “Fixed,” is commonly used in the literature. It is given by

In this fixed wave function, no parameters are optimized at all, that is, the Jastrow factor is the same as the ground state, and the determinant and orbital coefficients are kept fixed. The second, we denote “Optimized,” begins with Eq. (18) and uses the penalty method to optimize determinant, orbital, and Jastrow parameters while ensuring orthogonality to lower states. In the supplementary material, a Snakemake workflow^{30} is provided that performs the calculations shown here in pyqmc and pyscf.^{31} We use a *λ* of two hartree for these calculations, which were converged.

For reference values, we used full configuration interaction (Full CI) to compute exact energies in a finite basis. We found that at the triple *ζ* level, the energies were fairly well converged. This is slightly earlier than most materials due to the fact that H_{2} is very simple.

In Fig. 2, we demonstrate the targeting capability of this technique. Each point is a wave function generated in the Dunning cc-pvdz^{32} basis as described above, with *S*^{*} set to various points on the Bloch sphere connecting the first three excited states. As expected, the superpositions of low energy wave functions fall on a plane, as sketched in Fig. 1, a critical check that the calculation is creating the desired wave functions.

Comparisons between the CASCI, QMC, and full CI results are presented in Fig. 3. With a rather compact wave function, the optimized CASCI-J wave function obtains close agreement with the exact calculation, while optimizing significantly from the starting fixed CASCI-J wave function. In the case of the first excited state, simply adding a Jastrow factor to an existing CASCI wave function does not improve the energies at all; optimization is required to obtain accurate results.

### B. Application to benzene excited states

As a demonstration on a larger system, we computed the full *π* space spectrum of benzene using our excited state optimization method. This set of 13 states contains a rich physical structure, with two different spin channels, single and double electron excitations, and states with ionic bonding in contrast to the covalently bonded ground state.^{33} Some of these states, such as the ^{1}E_{1u} state, have been considered strongly correlated by previous authors.^{33} As such, these excitations are a standard benchmark set for excited state methods and have been used to validate and compare electronic structure techniques for some time.^{34,35}

To represent the wave functions, we used a multi-Slater–Jastrow wave function parameterization of the form

The Jastrow factor $J(\alpha \u2192)$ is a two-body Jastrow^{29} factor. The determinants |*D*_{i}⟩ in the multi-Slater expansion were selected from a minimal CASCI calculation over the six *π* electrons and six *π* orbitals in benzene. The single particle orbitals used in the CASCI were computed using density functional theory (DFT) with the B3LYP functional, BFD triple-*ζ* basis, and BFD pseudo-potential.^{36} The CASCI and DFT calculations were carried out using pyscf.^{31} The parameters in the wave function are as follows: 108 Jastrow parameters $\alpha \u2192$, 400 determinant coefficients $c\u2192$, and 9288 orbital parameters $\beta \u2192$.

We used the parameterization of Eq. (19) to compute the benzene spectra using three different methods, with increasing cost. The first method, denoted as fixed, is a standard QMC excited state technique, where the coefficients $\alpha \u2192,c\u2192$ are first optimized on the ground state CASCI root with frozen orbital coefficients $\beta \u2192$, then the optimized Jastrow factor is multiplied with higher energy CASCI roots, and finally, these trial wave functions are used in VMC to compute excited state energies. This method does not allow for the state-specific optimization of any of the parameters $\alpha \u2192,c\u2192$, or $\beta \u2192$.

To understand the effects of orbital optimization in this system, we consider two parameter sets using the penalty technique. In the first, we fix the orbital parameters $\beta \u2192$ to the DFT ground state orbital coefficients but allow the other parameters $\alpha \u2192,c\u2192$ to be optimized; we denote the wave functions as VMC $\alpha \u2192,c\u2192$. Finally, we optimize all coefficients in Eq. (19), denoting those wave functions as $\alpha \u2192,c\u2192,\beta \u2192$. All QMC calculations were carried out in pyqmc.^{26}

The results of our excited state computations at the VMC level are shown in Fig. 4(a). We see a consistent 0.2 eV decrease in total energy across all 12 excited states going from the fixed parameter QMC method to the $\alpha \u2192,c\u2192$ method using our new optimization technique. We find that optimizing $\alpha \u2192,c\u2192,\beta \u2192$ yields up to a 0.5 eV reduction in total energy relative to the fixed technique across nearly all of the excited states, a gain of around 0.3 eV due to orbital optimization. However, the differences between the excited states and the ground state are very similar between the frozen and optimized orbital calculations, which demonstrates the extent to which standard excited state techniques depend heavily on error cancellation.

We computed the time step extrapolated DMC energies for the computed excited state wave functions, which are shown in Fig. 4(b). The additional benefit of optimizing the orbitals in DMC is seen primarily for just three states, states 1, 6, and 11, with reductions in the energy of 0.14(3), 0.18(3), and 0.22(3) eV, respectively. The latter two of these states are the ^{1}*B*_{1u}, ^{1}*E*_{1u} states, which have strong ionic bonding character among the *π* orbitals, unlike the ground state, which has covalent bonding character.^{33} This difference in bonding character is captured by the orbital optimization, leading to larger reductions in total energy in these states, while the other states have no reduction in total energy. The fact that only some excited states benefit from orbital optimization in DMC means that the energy differences are affected; ultimately, they are improved.

Before comparing to the experiment, we note that most theoretical results report the vertical excitation energy, which omits nuclear relaxation and vibrational effects. The transition from the ground vibrational level of the ground state to the ground vibrational level of the excited state is called the 0–0 excitation energy (*E*_{00}) and is not necessarily the maximum intensity. The vertical excitation energy can be related to the 0–0 excitation energy as follows:

where *E*_{j} is the Born–Oppenheimer energy of state *j*, *X*_{j} are the set of nuclear coordinates at the minimum energy of state *j*, and *ω*_{j,i} is the frequency of vibrational mode *i* for electronic state *j*. The first line is the fixed nuclei vertical excitation energy, the quantity which is directly computed in our work and in the cited theoretical calculations, where the total energy of the *j*th excited state is computed at the ground state equilibrium nuclear positions $X\u21920$. The second line is the adiabatic correction, which accounts for shifting of the nuclear positions in the *j*th excited state to the configuration $X\u2192j$. The last line is the zero-point vibrational energy (ZPVE) correction and accounts for changes in the vibrational degrees of freedom between the ground and excited states. The experimental vertical excitation is obtained by subtracting the adiabatic and ZPVE energies from the 0–0 excitation energies.

In Table II, we compare the vertical excitation energy computed from the experiment, the fully optimized DMC results in this work, and literature results of the large-basis coupled cluster (CC3),^{35} complete active space perturbation theory (CASPT2),^{33} and time dependent density functional theory using the PBE0 functional (TDDFT-PBE0).^{41} We also report computed ZPVE and 0–0 corrections using TDDFT with a 6-31g basis and PBE functional for the singlet states; we encountered difficulties converging the triplet states in TDDFT. We believe the TDDFT estimates to be reasonable, given the agreement with computed and measured values in the literature: −0.15 eV^{39} and −0.17 eV^{42} adiabatic and ZPVE corrections for the ^{1}*B*_{2u} state and −0.56 eV^{43} and −0.17 eV^{44} for the ^{3}*B*_{1u} state. For the available vertical excitations, both the DMC and CC3 results are within 0.2 eV of the experimental values, while the CASPT2 and TDDFT methods have larger differences. The corrections due to adiabatic and ZPVE effects are large enough that this conclusion might be reversed without these effects included.

. | Spectroscopy^{34,37–40}
. | Corrections . | Vertical excitation . | ||||||
---|---|---|---|---|---|---|---|---|---|

State . | Maximum . | E_{00}
. | Adiabatic . | ZPVE . | Expt. . | DMC $\alpha \u2192,c\u2192,\beta \u2192$ . | CC3^{35}
. | CASPT2^{33}
. | TDDFT-PBE0^{41}
. |

^{1}B_{2u} | 4.90 | 4.72 | −0.19 | −0.18 | 5.09 | 5.15(3) | 5.08 | 4.7 | 5.39 |

^{1}B_{1u} | 6.20 | 6.03 | −0.19 | −0.33 | 6.55 | 6.62(4) | 6.54 | 6.1 | 6.05 |

^{1}E_{1u} | 6.94 | 6.87 | −0.24 | −0.32 | 7.43 | 7.72(4) | 7.13 | 7.06 | 7.21 |

^{1}E_{1u} | 6.94 | 6.87 | −0.24 | −0.32 | 7.43 | 7.63(3) | 7.13 | 7.06 | 7.21 |

^{1}E_{2g} | 7.80(20) | 7.81 | −0.21 | −0.45 | 8.47 | 8.38(3) | 8.41 | 7.77 | 7.52 |

^{1}E_{2g} | 7.80(20) | 7.81 | −0.21 | −0.45 | 8.47 | 8.34(3) | 8.41 | 7.77 | 7.52 |

^{3}B_{1u} | 3.94 | 3.65 | −0.55 | −0.19 | 4.39 | 4.15(3) | 4.15 | 3.94 | 3.82 |

^{3}E_{1u} | 4.76 | 4.63 | 4.89(3) | 4.86 | 4.5 | 4.7 | |||

^{3}E_{1u} | 4.76 | 4.63 | 4.96(4) | 4.86 | 4.5 | 4.7 | |||

^{3}B_{2u} | 5.60 | 5.58 | 6.08(4) | 5.88 | 5.44 | 5.05 | |||

^{3}E_{2g} | 7.49(25) | 7.49(25) | 7.74(4) | 7.51 | 7.03 | 7.18 | |||

^{3}E_{2g} | 7.49(25) | 7.49(25) | 7.60(4) | 7.51 | 7.03 | 7.18 |

. | Spectroscopy^{34,37–40}
. | Corrections . | Vertical excitation . | ||||||
---|---|---|---|---|---|---|---|---|---|

State . | Maximum . | E_{00}
. | Adiabatic . | ZPVE . | Expt. . | DMC $\alpha \u2192,c\u2192,\beta \u2192$ . | CC3^{35}
. | CASPT2^{33}
. | TDDFT-PBE0^{41}
. |

^{1}B_{2u} | 4.90 | 4.72 | −0.19 | −0.18 | 5.09 | 5.15(3) | 5.08 | 4.7 | 5.39 |

^{1}B_{1u} | 6.20 | 6.03 | −0.19 | −0.33 | 6.55 | 6.62(4) | 6.54 | 6.1 | 6.05 |

^{1}E_{1u} | 6.94 | 6.87 | −0.24 | −0.32 | 7.43 | 7.72(4) | 7.13 | 7.06 | 7.21 |

^{1}E_{1u} | 6.94 | 6.87 | −0.24 | −0.32 | 7.43 | 7.63(3) | 7.13 | 7.06 | 7.21 |

^{1}E_{2g} | 7.80(20) | 7.81 | −0.21 | −0.45 | 8.47 | 8.38(3) | 8.41 | 7.77 | 7.52 |

^{1}E_{2g} | 7.80(20) | 7.81 | −0.21 | −0.45 | 8.47 | 8.34(3) | 8.41 | 7.77 | 7.52 |

^{3}B_{1u} | 3.94 | 3.65 | −0.55 | −0.19 | 4.39 | 4.15(3) | 4.15 | 3.94 | 3.82 |

^{3}E_{1u} | 4.76 | 4.63 | 4.89(3) | 4.86 | 4.5 | 4.7 | |||

^{3}E_{1u} | 4.76 | 4.63 | 4.96(4) | 4.86 | 4.5 | 4.7 | |||

^{3}B_{2u} | 5.60 | 5.58 | 6.08(4) | 5.88 | 5.44 | 5.05 | |||

^{3}E_{2g} | 7.49(25) | 7.49(25) | 7.74(4) | 7.51 | 7.03 | 7.18 | |||

^{3}E_{2g} | 7.49(25) | 7.49(25) | 7.60(4) | 7.51 | 7.03 | 7.18 |

In Table III, we list the mean and standard deviation of the difference between CC3 and other methods. CC3 has been found to be very accurate for electronic excitations, including correlated doubles-like excitations for benzene.^{34,45} The consistency between accurate, explicitly correlated methods such as CC3 and our new QMC technique, which makes different approximations, is encouraging, as is the fact that the QMC results approach the CC3 results as the number of optimized parameters is increased.

. | . | ΔE_{j} = E_{j}(m) − E_{j}(CC3)
. | |
---|---|---|---|

Method . | Parameters . | Mean (eV) . | rms (eV) . |

CASPT2 | −0.38 | 0.42 | |

TTDFT-PBE0 | −0.33 | 0.50 | |

VMC | None | 0.34 | 0.46 |

VMC | $\alpha \u2192,c\u2192$ | 0.12 | 0.35 |

VMC | $\alpha \u2192,c\u2192,\beta \u2192$ | 0.19 | 0.35 |

DMC | None | 0.24 | 0.35 |

DMC | $\alpha \u2192,c\u2192$ | 0.18 | 0.31 |

DMC | $\alpha \u2192,c\u2192,\beta \u2192$ | 0.15 | 0.24 |

. | . | ΔE_{j} = E_{j}(m) − E_{j}(CC3)
. | |
---|---|---|---|

Method . | Parameters . | Mean (eV) . | rms (eV) . |

CASPT2 | −0.38 | 0.42 | |

TTDFT-PBE0 | −0.33 | 0.50 | |

VMC | None | 0.34 | 0.46 |

VMC | $\alpha \u2192,c\u2192$ | 0.12 | 0.35 |

VMC | $\alpha \u2192,c\u2192,\beta \u2192$ | 0.19 | 0.35 |

DMC | None | 0.24 | 0.35 |

DMC | $\alpha \u2192,c\u2192$ | 0.18 | 0.31 |

DMC | $\alpha \u2192,c\u2192,\beta \u2192$ | 0.15 | 0.24 |

The cost of each excited state calculation is roughly a factor of 2–3 higher than the ground state calculation, as shown in Fig. 5. As a point of reference, the ground state optimization took 1.25 h on a 40 core processor for the fixed orbital optimization and 23 h on a modern 40 core processor for the orbital optimized calculation. One should keep in mind that as is typical in Monte Carlo, this cost is highly dependent on the desired uncertainty in the result. We chose very converged parameters, which results in a relatively high computational cost. The total relative cost over the ground state for all 13 states is a factor of 25 for the $\alpha \u2192,c\u2192$ calculation and 30 for $\alpha \u2192,c\u2192,\beta \u2192$. Importantly, the relative cost does not increase rapidly with the state since, as mentioned in the Introduction, the overlaps are not very expensive to evaluate.

## IV. CONCLUSION

We presented a scalable algorithm to compute approximate excited states of many-body systems using variational Monte Carlo. Our method is somewhat less complicated to implement than the linear method of Filippi^{16–18} since the derivatives of the local energy are not required. Furthermore, our technique is capable of optimizing complex parameters such as orbital coefficients in a state-specific manner and likely would be able to optimize parameters from wave functions such as backflow^{46,47} and neural network forms,^{48–51} as well as pairing functions^{52} since the complexity is not much higher than in normal wave function optimization. In comparison to the method of Neuscamman,^{19–22} this technique does not require a tuned parameter *ω*. One positive aspect of this is that the penalty technique can access several distinct but degenerate excited states separately. Degenerate ground states would also be detected by the penalty technique.

The penalty technique is capable of optimizing wave functions with any overlap with a reference state. This capability may be useful in some circumstances, particularly for strongly correlated systems and magnetic systems, in which energy eigenstates may not be easily representable by simple wave functions, but non-orthogonal basis states can be used to make relevant experimental predictions.^{9} Such wave functions are also appropriate for use in density matrix downfolding.^{53}

The study on the benzene molecule revealed a few interesting physical insights. First, orbital optimization improves the nodal surface for excited states, particularly those with significant ionic character as compared to the ground state. Second, the classical association of the vertical excitation with the maximum intensity in spectroscopy leads to errors of up to 0.4 eV in the excitation energies of benzene. A fully quantum treatment of the Franck–Condon principle, as shown here, brings the experimental estimates in closer alignment with coupled cluster and quantum Monte Carlo results.

## SUPPLEMENTARY MATERIAL

See the supplementary material for all collected data presented in the figures of the graph in a comma separated value format and scripts and a workflow for the H_{2} calculation for reproducibility.

## ACKNOWLEDGMENTS

We would like to thank David Ceperley for helpful comments regarding the theorem of MacDonald^{54} and William Wheeler for a careful reading of the code and bugfixes.

This material is partially based on the work supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences, Computational Materials Sciences Program, under Award No. DE-SC-0020177. B.B. was supported by Flatiron Institute. Flatiron Institute is a division of the Simons Foundation. L.K.W. was supported by the Simons Collaboration on the Many-Electron Problem. This research is part of the Blue Waters sustained-petascale computing project, which is supported by the National Science Foundation (Award Nos. OCI-0725070 and ACI-1238993), the State of Illinois, and as of December 2019, the National Geospatial-Intelligence Agency. Blue Waters is a joint effort of the University of Illinois at Urbana-Champaign and its National Center for Supercomputing Applications.

## DATA AVAILABILITY

The data that support the findings of this study are available within the article and its supplementary material.

### APPENDIX: PROPERTIES OF *E*(|⟨Ψ|Φ_{0}⟩|^{2}), AND THE APPROXIMATE VERSION $E(|\u2009\Psi |\Phi \u03030\u2009|2)$

Here, we present a few simple properties of the energy functional for reference.

*E*(|⟨Ψ|Φ_{i}⟩|^{2}) ≥*E*_{0}− (*E*_{1}−*E*_{0})|⟨Ψ|Φ_{i}⟩|^{2}. This is the bounding line.For any complete linear subspace, the minimization will form a line, and there are variational upper bounds.

^{54}Broken symmetry wave function parameterizations (such as unrestricted Slater–Jastrow wave functions) do not usually comprise a complete linear space, so they may not form a line.

## REFERENCES

Since we are using multi-Slater–Jastrow wave functions in this work, it was more convenient to control the normalization of the wave function in this way. Another option, not tried here, is to parameterize the wave function in a norm-conserving manner, which could increase the efficiency of the algorithm. On the other hand, for emerging wave functions, it may be more convenient to allow the normalization to vary.