Projection-based embedding offers a simple framework for embedding correlated wavefunction methods in density functional theory. Partitioning between the correlated wavefunction and density functional subsystems is performed in the space of localized molecular orbitals. However, during a large geometry change—such as a chemical reaction—the nature of these localized molecular orbitals, as well as their partitioning into the two subsystems, can change dramatically. This can lead to unphysical cusps and even discontinuities in the potential energy surface. In this work, we present an even-handed framework for localized orbital partitioning that ensures consistent subsystems across a set of molecular geometries. We illustrate this problem and the even-handed solution with a simple example of an S_{N}2 reaction. Applications to a nitrogen umbrella flip in a cobalt-based CO_{2} reduction catalyst and to the binding of CO to Cu clusters are presented. In both cases, we find that even-handed partitioning enables chemically accurate embedding with modestly sized embedded regions for systems in which previous partitioning strategies are problematic.

## I. INTRODUCTION

Embedding methods for electronic structure offer the possibility of accurate description of chemical reactions at greatly reduced computational cost by treating different parts of a chemical system with different levels of theory. Many versions of embedding exist, including implicit solvent,^{1–3} QM/MM^{4–6} and ONIOM,^{7} subsystem density functional theory (DFT) and frozen density embedding,^{8–24} active space methods,^{25,26} local correlation treatments,^{27,28} Green’s function embedding,^{29–31} and density matrix embedding theory.^{32,33} Central to all of these methods is the question of how the system is to be partitioned into a number of subsystems to be treated at different levels of theory.

Prescriptions for partitioning the system usually consider a single geometry at a time. However, this approach may fail when considering a chemical reaction in which the nature of the embedded subsystem changes across the reaction coordinate. For example, this problem can arise in QM/MM when MM atoms wander into the QM region.^{6} In active space methods, this issue can manifest as the intruder state problem.^{34} Recently, in the context of active space methods, progress has been made in the automated selection of active spaces based on a user-specified set of relevant atomic orbitals (AO)^{35} or based on a correlated wavefunction ansatz less expensive than the full active space method.^{36–39} These methods have demonstrated a robust ability to select consistent active spaces across reaction coordinates.

In this work, we present a method for handling the subsystem inconsistency problem in the context of wavefunction-in-DFT embedding, where a subset of localized occupied molecular orbitals (LMOs) is selected for embedded wavefunction treatment. We begin with a previously established charge-based criterion for the automated selection of these embedded orbitals. We next demonstrate how—even for a simple S_{N}2 reaction—this procedure can result in a set of embedded LMOs which is inconsistent with respect to the reactant and product. Drawing inspiration from the domain merging method for local correlation,^{40} we propose an “even-handed” LMO selection procedure. Our method seeks to form a consensus set of LMOs which contains for every geometry every orbital that is important for any geometry. This results in an automatic procedure which uses information available at the DFT level and requires no user input beyond the set of atoms to be embedded. Although we present this method in the context of projection-based embedding specifically,^{20,22} the methodology is sufficiently general to be applied in other embedding contexts.

## II. THEORY

### A. Projection-based embedding

We begin by reviewing the projection-based embedding method, which is a rigorous framework for embedding a correlated wavefunction theory in a mean field (MF) theory such as Hartree Fock (HF) theory or DFT, with interactions between the embedded and embedding densities treated at the MF level. The system is partitioned into two subsystems: subsystem *A* is generally treated at the correlated wavefunction level and subsystem *B* is treated at the MF level. Projection-based embedding belongs to a class of methods which ensure orthogonality between the MF description of subsystems *A* and *B*.^{20,22} This class includes frozen-core approximations,^{41} the region method for local correlation,^{42} Henderson’s coupled cluster in the DFT embedding method,^{43} Khait and Hoffman’s modified Kohn-Sham equations with Constrained Electron Density (KSCED),^{44} the Huzinaga projection operator method,^{45} and more recently the multi-level Hartree Fock^{46} and orthogonality constrained basis set expansion (OCBSE) methods.^{47}

As inputs to a projection-based calculation, we specify a low-level MF method, a high-level correlated wavefunction method, and a set of atoms to be embedded, $ X A $. The algorithm begins by performing a MF calculation on the full system, resulting in a set of occupied molecular orbitals, which are then rotated to form a set of LMOs, $ \psi $. The LMOs are partitioned into two sets, $ \psi A $ and $ \psi B $, corresponding to subsystems *A* and *B*, respectively. This partitioning is usually performed by choosing $ \psi A $ to be the set of LMOs with sufficient population on $ X A $ (detailed in Sec. II B). The wave function calculation is then performed with the number of electrons necessary to occupy the $ \psi A $ and a modified core Hamiltonian^{20,22} written in the atomic orbital (AO) basis as

where **h** is the core Hamiltonian of the full system, *γ*^{A} and *γ*^{B} are the one-particle reduced density matrices (1RDMs) corresponding to $ \psi A $ and $ \psi B $, respectively, and **g** includes all mean-field two-electron interactions. The last term enforces orthogonality between the subsystems using a level shift, *μ*, and a projector onto subsystem *B*,

where **S** is the AO overlap matrix. In the limit *μ* → ∞, subsystems *A* and *B* are exactly orthogonal; in practice, this procedure is numerically robust for *μ* between 10^{4} and 10^{7} hartree, and *μ* = 10^{6} hartree is usually sufficient to converge the embedding to a type-in-type error of less than a microhartree.^{20,22}

### B. Charge method for LMO selection

In this subsection, we present the current “charge” algorithm that has previously proven effective for partitioning the LMOs into $ \psi A $ and $ \psi B $.^{20,48} We then construct a simple example where this method breaks down, motivating the “even-handed” solution presented in Sec. II C.

The chemically intuitive notion of embedded atoms must be translated into a specific set of LMOs, which do not fully reside on specific atoms. The charge method includes in $ \psi A $ all occupied LMOs with significant population on the $ X A $. This selection is performed using a charge threshold, *q*,

where $ Q A \psi i $ is the charge of LMO *ψ*_{i} assigned to the atoms in $ X A $ and *q* is typically chosen to be 0.4. For example, charges may be computed using the gross Mulliken population^{49}

where *κ* indexes all AOs, λ on *X*_{j} indexes AOs λ centered on atom *X*_{j}, **S** is the AO overlap matrix, and $ \gamma \psi i $ is the 1RDM corresponding to *ψ*_{i}. [Alternative population schemes, such as intrinsic bond orbital (IBO),^{50} can be applied similarly.] For a single-point energy calculation, this algorithm generally selects a reasonable $ \psi A $. However, because it considers only a single geometry at a time, problems can arise during large molecular geometry changes.

One problem is that the number of orbitals in $ \psi A $ selected by the charge method can differ at different molecular geometries. Because of the difference in electron chemical potentials between the high-level method of subsystem *A* and the low-level method of subsystem *B*, a change in the number of LMOs in $ \psi A $ versus $ \psi B $ manifests as a discontinuity in the potential energy. This problem is easily remedied by choosing $ \psi A $ to be the *N* orbitals corresponding to the *N* largest values of *Q*_{A} at each geometry; *N* is chosen to be the largest size of $ \psi A $ selected by Eq. (3) for any geometry along the reaction coordinate. In the following, all results presented for the charge method include this simple fix.

A second—and less trivial—problem occurs when LMO populations on the embedded atoms change qualitatively between geometries. Consider, for example, the S_{N}2 reaction of I^{−} with bromomethane with $ X A $ chosen to be the carbon atom. Figure 1 shows the reaction profile for coupled cluster singles and doubles with perturbative triples, CCSD(T),^{51} embedded in the B3LYP density functional,^{52–55} using LMOs selected by the charge method. The result of this selection can be seen in the embedding potential energy surface for this reaction. Near either the reactant or the product, the CCSD(T)-in-B3LYP energy profile follows the CCSD(T) result. However, as the transition state is crossed, an unphysically high barrier appears in the energy profile.

Figure 2 shows the embedded density, *ρ*_{A}, that results from the charge selection method (top panel, highlighted in yellow). At geometries near the reactant, *ρ*_{A} resembles the density of bromomethane; near the product, *ρ*_{A} resembles iodomethane. In the transition region, *ρ*_{A} switches abruptly between these two qualitatively different densities. Examining the orbitals $ \psi A $ that comprise *ρ*_{A} at each geometry (bottom panel, yellow), we see that the reactant, transition state, and product differ on whether the LMOs associated with the C–Br and C–I *σ* bonds are included in $ \psi A $, with reactant-like geometries including the C–Br bond LMO and excluding the C–I bond LMO, and vice versa for product-like geometries. In the transition region, the abrupt change in the orbital character from C–Br to C–I results in an incorrectly high reaction barrier. Although this problem is presented in the context of projection-based embedding, we note that it is common to any embedding method which relies on atomic populations to partition orbitals.

### C. Even-handed LMO selection

This problem can be solved by ensuring that $ \psi A $ spans a consistent volume of Hilbert space as a function of geometry and thus provides a consistent description of *γ*_{A} throughout a reaction profile. The solution requires specification of (i) a strategy for determining what the $ \psi A $ should be and (ii) a method for quantitatively comparing $ \psi A $ between geometries.

To address the first point, we propose an “even-handed” procedure. Starting with a reaction coordinate composed of an ordered set of geometries, the charge selection procedure is performed at every geometry. We then form the union of these charge-selected orbitals. To do this, we sweep from the reactant to product and back. For each geometry *k* along this sweep, the corresponding $ \psi A k $ is augmented to include any orbitals that both match a member of the $ \psi A k \u2212 1 $ corresponding to the previous geometry (on the basis of the method for quantitatively comparing orbitals between geometries, described later) and that are not already included in $ \psi A k \u2212 1 $.

Figure 2 illustrates the application of this procedure to the aforementioned example S_{N}2 reaction. For visual simplicity, we consider three geometries: the reactant *R*, the transition state *T*, and the product *P*. Four charge-selected LMOs are common to $ \psi A R $, $ \psi A T $, and $ \psi A P $. Sweeping along the reaction coordinate from *R* to *P*, we begin by comparing $ \psi A R $ to $ \psi A T $. $ \psi A R $ contains a C–Br σ bond LMO that $ \psi A T $ lacks; the matching LMO in $ \psi T $ is thus added to $ \psi A T $, resulting in a five orbital set (labeled in the figure as step 1). Comparing this updated $ \psi A T $ to $ \psi A P $, we see that $ \psi A P $ lacks a C–Br σ bond LMO and contains an additional C–I σ bond LMO absent in $ \psi T $; the matching C–Br σ bond LMO is thus added to $ \psi A P $, resulting in a six orbital set (labeled step 2). We now sweep backwards from *P* to *R*. $ \psi A P $ is compared $ \psi A T $, and the latter is found to lack the C–I σ bond LMO, which is added to form a new $ \psi A T $ (labeled step 3). Finally, $ \psi A T $ is compared $ \psi A R $ and the C–I σ bond LMO is added to the latter (labeled step 4). The result is that for all three geometries, $ \psi A $ contains the four methyl LMOs, the C–Br σ bond LMO, and the C–I σ bond LMO. Examining the *ρ*_{A} resulting from even-handed LMO selection (Fig. 2, top panel, purple), we see that it remains consistent from the reactant to the transition state to the product, especially in comparison to the *ρ*_{A} obtained from the charge selection method (yellow). Figure 1 shows that the even-handed scheme corrects the error in the CCSD(T)-in-B3LYP energy profile generated using the charge selection method, yielding a CCSD(T)-in-B3LYP energy profile in quantitative agreement with a CCSD(T) calculation on the full system. In Sec. III, we will examine the relative performance of even-handed selection versus charge selection with regard to a given increase in the size of $ \psi A $.

To complete the even-handed LMO selection algorithm, we must define a way to quantitatively compare LMOs $ \psi A k $ and $ \psi A k + 1 $ associated with neighboring geometries along the reaction coordinate. The LMOs for geometry *k* do not directly correspond to those of *k* + 1; they have different basis functions (due to motion of the atom-centered gaussians) and different LMO coefficients. The latter problem is addressed with a maximum overlap formalism, similar to that used in non-Aufbau MF calculations.^{56} Specifically, for geometry *k* + 1, we form a new subset of $ \psi k + 1 $ which contains the *M* orbitals that best overlap with the span of the *M* orbitals so far included in subsystem *A* for geometry *k*, $ \psi A k $. For each LMO $ \psi i \u2208 \psi k + 1 $, we assign an overlap with the span of $ \psi A k $,

where *N*_{AO} is the number of basis functions, *N*_{occ} is the number of occupied orbitals, **L**^{k} is the LMO coefficient matrix corresponding to geometry *k*, and **S**^{k,k+1} is an overlap matrix between the AO bases corresponding to the two geometries.

Because the two geometries do not share the same AO basis, the precise definition of **S**^{k,k+1} is not immediately obvious. We might choose the spatial overlap of the AOs; however, due to the exponential tails of gaussian basis functions, geometries must be very closely spaced to ensure numerical stability. Instead, we compare the orbitals between two different geometries as though the basis had not moved. In order to correct for changes in the AO overlap, we introduce Löwdin orthogonalization, resulting in

where **S**^{k} and **S**^{k+1} are the AO overlap matrices for geometries *k* and *k* + 1. As we sweep back and forth across the reaction coordinate, at each geometry *k* + 1 we form a new set of LMOs $ \psi A k + 1 $ which contains the *M* LMOs in $ \psi k + 1 $ corresponding to the *M* largest *o*_{i} as well as any charge-selected orbitals missed by this criterion. The even-handed selection algorithm is summarized in Algorithm 1.

1: Input: LMOs $ \psi k $ for each geometry k, k = 1, …, N. |

2: For each geometry, select $ \psi A k $ using the charge method [Eq. (3)]. |

3: for k = 1, …, N − 1 do |

4: Compute $ o i k + 1 $ using Eq. (5). |

5: Let M be the number of LMOs in $ \psi A k $. |

6: Construct $ \psi EH k + 1 $ as the M LMOs in $ \psi k + 1 $ |

corresponding to the M largest $ o i k + 1 $. |

7: Set $ \psi A k + 1 $ = $ \psi A k + 1 \u222a \psi EH k + 1 $. |

8: end for |

9: Repeat the above “for loop” with the order of the geometries reversed. |

10: Repeat the forward and reverse “for loops” until the |

number of selected LMOs converges, typically after one cycle. |

1: Input: LMOs $ \psi k $ for each geometry k, k = 1, …, N. |

2: For each geometry, select $ \psi A k $ using the charge method [Eq. (3)]. |

3: for k = 1, …, N − 1 do |

4: Compute $ o i k + 1 $ using Eq. (5). |

5: Let M be the number of LMOs in $ \psi A k $. |

6: Construct $ \psi EH k + 1 $ as the M LMOs in $ \psi k + 1 $ |

corresponding to the M largest $ o i k + 1 $. |

7: Set $ \psi A k + 1 $ = $ \psi A k + 1 \u222a \psi EH k + 1 $. |

8: end for |

9: Repeat the above “for loop” with the order of the geometries reversed. |

10: Repeat the forward and reverse “for loops” until the |

number of selected LMOs converges, typically after one cycle. |

A potential concern is that the overlap of Eq. (6) could become inaccurate if neighboring geometries are spatially distant. A useful diagnostic of whether two neighboring geometries are too distant can be formulated by comparing the *M*th largest *o*_{i} to the (*M* + 1)th largest *o*_{i} of Eq. (5) after the even-handed procedure is complete. This corresponds to the difference in overlap between the least overlapping LMO included in $ \psi A $ and most overlapping LMO excluded from $ \psi A $. If this difference becomes small, additional geometry points may be added along the reaction coordinate. These interpolating geometries require only MF calculations and thus have negligible impact on the overall cost of the calculation. For the systems studied in this work, this problem does not arise; for the energy profiles presented, we choose the density of interpolating geometries to illustrate the smoothness of energy profiles rather than to minimize the number of interpolating geometries.

It is worth emphasizing that the even-handed LMO selection procedure only uses information from MF calculations and only involves computation of orbital overlaps; even-handed LMO selection thus has a negligible effect on the overall cost of a projection-based wavefunction-in-DFT embedding calculation. It is also worth noting that even-handed LMO selection introduces no additional parameters beyond that appearing in the original charge method for LMO selection and the choice of the reaction pathway.

### D. Even-handed AO truncation

Up to this point, we have provided a framework for reducing the number of occupied orbitals in the embedded calculation while retaining all virtual orbitals. However, correlated wavefunction calculations usually scale more strongly with the number of virtual orbitals than occupied. Noting that the number of virtual orbitals increases with the size of the AO basis set, a useful approach is to remove the AOs least necessary to represent *γ*^{A}.^{48,57} A charge-based heuristic, similar in spirit to Eq. (4), is used to select which AOs are necessary to represent *γ*^{A} and therefore retained. And similar to charge selection above, this heuristic can lead to different basis sets for different geometries. To correct this error, for all geometries, we retain the union of all AOs heuristically retained at each geometry. That is, at every geometry, we use the same truncated basis set, $ \lambda $,

where $ \lambda k $ is the set of AOs retained at geometry *k*. In the systems presented in this work which contain up to 1772 basis functions, this procedure results in retention of no more than 20 additional basis functions compared to standard AO truncation.^{48}

## III. RESULTS AND DISCUSSION

### A. Intramolecular hydrogen bonding in a CO_{2} reduction complex

As a first example of even-handed partitioning, we consider a cobalt aminopyridine complex that has been shown to catalyze the two-electron two-proton reduction of carbon dioxide to carbon monoxide.^{58,59} Experimental and DFT studies on analogous Co, Fe, and Ni complexes suggest that CO_{2} is stabilized by intramolecular hydrogen bonds from pendant protons and that these protons may transfer to CO_{2} during the reduction process.^{60–62} However, crystallography on this particular aminopyridine complex suggests that, prior to CO_{2} binding, the pendant protons point away from the binding pocket, necessitating a conformational rearrangement in order for intramolecular hydrogen bonds to form. The energetics of this rearrangement provide mechanistic insight into the catalytic role of hydrogen-bonding and intramolecular proton transfer in this system. However, the energetics of this rearrangement presents challenges for DFT; although widely used for Co-based systems,^{63–72} DFT can be inaccurate for reaction barriers, particularly those involving multireference character.^{73–77} Furthermore, the geometry and strength of hydrogen bonds can be strongly functional dependent.^{78,79}

Here, we employ projection-based wavefunction-in-DFT embedding to better capture these physical effects with a correlated wavefunction. This approach has previously been used to study a similar Co-centered hydrogen evolution catalyst.^{80} We specifically consider the conformational change associated with forming one intramolecular hydrogen bond. This rearrangement requires umbrella flipping of a pendant nitrogen center and rotation of the CO_{2} ligand about the dihedral axis of the Co–C bond. A guess minimum energy path is computed at the B3LYP/6-31+G^{*}^{81} level with a Lebedev (75 302) exchange-correlation grid^{82} using the freezing string method^{83} as implemented in the QChem 4.1 software package^{84} with 15 nodes and 3 gradient descent steps. The system is a doublet and treated with an unrestricted reference. The path is refined by relaxing each image independently, holding fixed the hydrogen bond length and the Co–C bond dihedral.

The resulting B3LYP energy profile is shown in Fig. 3(b), with stationary-point geometries illustrated [Fig. 3(a)]. These profiles are computed with a restricted open-shell reference for comparison with embedding calculations (below); as a result, the B3LYP profile does not completely reach the product minimum. For comparison, a second profile is shown corresponding to Perdew–Burke–Ernzerhof (PBE)^{85} energy calculations performed at the same geometries. Interestingly, the barrier predicted by PBE is larger than that predicted by B3LYP, running contrary to the general trend that hybrid functionals predict larger barriers than generalized gradient approximation (GGA) functionals,^{86} due to the latter over-stabilizing the relatively delocalized electronic structure of the transition state;^{87} consistently, HF predicts a still lower barrier of 3.5 kcal/mol. In this system, the reactant and product feature electronic structures with a greater degree of delocalization than in that of the transition state. Thus, the self-interaction error present in GGA functionals may be over-stabilizing the reactant and product relative to the transition state, resulting in a barrier that is too high. The exact exchange present in B3LYP partially corrects this error, reducing the barrier.

With this reaction coordinate in hand, the even-handed procedure is used to select $ \psi A $ corresponding to the choice of $ X A $ shown as opaque atoms in Fig. 3(a). Embedding calculations are performed in the molpro 2018.0 software package^{88,89} with the def2-TZVP basis set, an exchange-correlation grid threshold of 10^{−10}, and with density fitting for both Coulomb and exchange integrals^{90} evaluated with the def2-TZVPP/JKFIT basis.^{91} A restricted open-shell reference is employed for all embedding calculations. The intrinsic bond orbital (IBO) procedure^{50} is employed for orbital localization.

We first embed PBE in B3LYP to test the smoothness of the embedded reaction coordinate and to test the convergence of the embedded energy with respect to the size of subsystem *A*; a correlated wavefunction calculation on this large system would prove computationally infeasible and so PBE is used as a proxy. Figure 3(b) presents the results for PBE-in-B3LYP embedding for various LMO selection procedures. Selection of LMOs using the charge method (Sec. II B) results in subsystem *A* containing 55 occupied LMOs (of 121 total) and a discontinuous PBE-in-B3LYP energy profile (yellow circles). Even-handed LMO selection (Sec. II C) yields a continuous PBE-in-B3LYP energy profile in quantitative agreement with the whole-system PBE energy profile (purple circles). Even-handed selection adds only 3 LMOs to $ \psi A $ beyond those selected by the charge method, for a total of 58. As a further comparison, the charge method is performed with a modified threshold in Eq. (3) chosen such that every geometry has 58 selected LMOs in $ \psi A $. As for the charge method with the default threshold (0.4 electrons), this modified treatment of the charge threshold also results in a discontinuous energy profile (yellow open circles). Taken together, these results demonstrate that (i) even-handed LMO selection successfully improves upon the charge method to yield a continuous energy profile; (ii) the results of the even-handed LMO selection procedure cannot be replicated by simply performing the charge method with a different threshold; and (iii) the resulting embedded energy profile is in qualitative agreement with PBE calculations performed on the whole system, suggesting that the embedding calculation is converged with respect to the choice of $ X A $ and that the even-handed $ \psi A $ is an appropriate set of LMOs for embedding a correlated wavefunction method (see below).

We next embed correlated wavefunction methods using the even-handed set of LMOs determined above. To reduce the size of the virtual space, even-handed AO truncation is employed as discussed in Sec. II D with a charge threshold of 0.001 electrons, resulting in the retention of 713 out of 1102 AO basis functions in this compact complex. Comparison of MP2 with MP2-in-B3LYP calculations confirms that this AO truncation alters the reaction energy by less than 0.3 kcal/mol. Initial CCSD-in-B3LYP calculations performed at each geometry along the reaction coordinate yield T1 diagnostics^{92} of at least 0.07; this value exceeds both the standard threshold of 0.02 and the threshold of 0.05 proposed for first-row transition metals in Ref. 93, suggesting the presence of strong multireference character that CCSD cannot reliably describe.

To treat this potential multireference character, embedded multireference configuration interaction singles and doubles (MRCI)^{94,95} in DFT calculations with a complete active space self-consistent field (CASSCF)^{96,97} reference are performed. CASSCF calculations are performed without density fitting. The active space is chosen to comprise nine electrons in nine orbitals. Two factors suggest this active space is sufficiently large. First, the unrestricted natural orbital complete active space method of Bofill and Pulay^{98} produces a guess active space that is at most three electrons in three orbitals at any geometry. Second, the CAS(9,9) canonical orbital populations are at least 1.98 for the lowest-energy active orbital and at most 0.03 for the highest-energy active orbital across all geometries. Multireference character is confirmed by substantial deviations from integer occupations within these active canonical orbitals. All reference configurations with the norm larger than 0.01 are included in the MRCI dynamical correlation calculation. This parameter is sufficient to converge the resulting energy profile and results in a maximum of 52 reference configurations. Relaxed reference Davidson inextensivity corrections are applied.^{99}

Figure 3(c) shows the results of embedding MRCI in both B3LYP and PBE. (An even-handed $ \psi A $ was determined for the case of PBE in the same manner as for the case of B3LYP.) The resulting reaction profiles are continuous and smooth like those of the test PBE-in-B3LYP calculations. Under the MRCI-in-DFT embedding, the hydrogen bond is weaker and has a longer bond length (corresponding to a smaller optimal value of the reaction coordinate) compared to the underlying DFT calculations. The embedded profiles also show a further decrease in the barrier height. Comparing embedding in B3LYP versus in PBE, both result in the same qualitative shape of the reaction profile. Moving along the reaction coordinate, the MRCI-in-B3LYP and MRCI-in-PBE energy profiles diverge, resulting in a disagreement of 2 kcal/mol at the geometry corresponding to a reaction coordinate value of 1. This disagreement is smaller than the 4 kcal/mol disagreement between B3LYP and PBE at the same geometry, but is still significant. This residual difference is due to different descriptions by the two functionals of the geometric changes in subsystem *B*. Across this reaction coordinate, significant strain forms in the ligand backbone which lies largely in subsystem *B*. Thus, we see that embedding largely removes the dependence of the description of subsystem *A* on the density functional employed for subsystem *B*.

In this example, charge selection results in a qualitatively incorrect discontinuous energy profile, while even-handed selection results in a quantitatively accurate and continuous profile. This improvement comes at the cost of merely 3 additional LMOs included in $ \psi A $.

### B. Binding of CO to Cu clusters

As a second demonstration of even-handed selection, we consider the binding of carbon monoxide to copper clusters. This class of systems has proven difficult for DFT, with different functionals predicting a wide range of binding energies as well as different preferred binding sites. For the example of CO binding to a Cu(111) surface, PBE, which is often accurate for metals, incorrectly predicts a hollow site preference for CO binding.^{100} Meanwhile B3LYP, which is often accurate for molecules, predicts the correct on-top binding site preference, but with a binding energy of −2.2 kcal/mol,^{101} compared to experimental measurements which range between −10.4 and −12.0 kcal/mol.^{102–106} Previous work employing frozen-density embedding with approximate non-additive kinetic energy potentials has shown that embedded MRCI-in-LDA calculations can reproduce the measured CO–Cu(111) binding-site preference and binding energy.^{100}

We first consider the binding of CO to a Cu_{38} cluster, constructed as a hemispheric cut of a Cu(111) surface centered around a top site, corresponding to two surface neighbor rings around this site. A reaction path for the binding of CO is constructed by varying the C–Cu bond length in steps of 0.3 Å and relaxing the C–O bond length at the PBE/def2-TZVP level of theory, holding the Cu atoms fixed in their bulk lattice positions. The Cu atoms are treated with the 10 electron Stuttgart/Cologne effective core potential ECP10MDF.^{107} As in Sec. III A, calculations are performed in the molpro 2018.0 software package with an exchange-correlation grid threshold of 10^{−10} and with density fitting for Coulomb and exchange integrals evaluated with the def2-TZVPP/JKFIT basis. The binding energy is found to be −17.3 kcal/mol with a C–Cu bond length of 1.90 Å. At the same geometry, the binding energy from B3LYP is −9.4 kcal/mol.

Figure 4 examines the convergence of the B3LYP-in-PBE binding energy as a function of subsystem *A*. Four choices of subsystem *A* are considered, corresponding to embedding CO and 1, 4, 7, and 10 copper atoms [Fig. 4(c)]. Even-handed AO truncation is employed with a charge threshold of 0.001 electrons and is found to introduce negligible errors. Figure 4(a) shows the convergence of the embedded binding energy using charge (yellow) versus even-handed LMO selection (purple). Charge selection results in an inconsistent set of LMOs for subsystem *A*, $ \psi A $, between the bound and unbound CO geometries, leading to large errors even for the larger sizes of subsystem *A* considered. By contrast, even-handed selection results in rapid convergence to the full B3LYP result. Even for the smallest choice of subsystem *A* tested (*n*_{Cu} = 1), even-handed selection reproduces the full B3LYP result within 1.5 kcal/mol, despite treating only three atoms (8 occupied LMOs) at the B3LYP level. Notably, the charge and even-handed methods select the same number of LMOs for each choice of $ X A $. Equation (3) selects the most LMOs at bound geometries, and LMOs must be added to the $ \psi A $ corresponding to unbound geometries in order to keep the number of occupied LMOs constant throughout the binding coordinate. Due to different polarization of the Cu atoms in the bound versus unbound geometry, the charge method selects qualitatively different LMOs than in the $ \psi A $ corresponding to the bound geometry. Figure 4(b) shows CO–Cu_{38} binding curves computed with PBE, B3LYP, and B3LYP-in-PBE embedding. These B3LYP-in-PBE energy profiles are smooth and converge rapidly to the reference B3LYP result. Furthermore, these results include ligand-induced relaxation of the electronic density on the metal cluster that was neglected in previous frozen-density embedding calculations.^{100} Taken together, Figs. 4(a) and 4(b) demonstrate that even-handed LMO selection enables rapid convergence with respect to the size of subsystem *A* and yields smooth energy profiles.

B3LYP-in-PBE embedding provides some computational savings compared to a B3LYP calculation on the whole system. Averaged over the twelve geometries considered, B3LYP-in-PBE embedding results in speedups^{108} versus B3LYP ranging from 5.8× for *n*_{Cu} = 1 to 3.4× for *n*_{Cu} = 10. (PBE is on average 7.2× faster than B3LYP for this system in the implementation used here.) This speedup results both from a reduction in the number of AOs for which exact exchange is evaluated and from a reduction of the number of SCF cycles necessary to converge the embedded B3LYP calculation.^{109}

The metallic nature of this system (PBE HOMO-LUMO gap of 0.2 eV) results in poor orbital localization, particularly for those MOs near the Fermi level with primarily 4s character.^{110} Consequently, AO truncation, which depends on the spatial locality of *γ*^{A}, results in a large fraction of retained AO basis functions. For *n*_{Cu} = 1, the smallest subsystem *A* considered, 789 AOs are retained (out of 1772 total); 1413 AOs are retained for the case of *n*_{Cu} = 10. As a result, wavefunction-in-DFT embedding with this approach is computationally infeasible without further method development regarding virtual space reduction or occupied MO localization; for the purposes of the current work, we instead focus on a smaller cluster with a larger bandgap.

A Cu_{10} cluster is constructed similarly to the Cu_{38} cluster by a hemispheric cut of a Cu(111) surface centered around a top site, corresponding to one nearest-neighbor shell around this site. The bound CO–Cu_{10} geometry is optimized at the PBE/def2-SVP level of theory with the Cu atoms held fixed in their bulk lattice positions. The smaller size of this cluster results in an increase of the PBE HOMO-LUMO gap to 0.7 eV.

CCSD-in-PBE and CCSD-in-HF embedding calculations are performed in a def2-SVP basis. Three choices of subsystem *A* are constructed in the same manner as in the Cu_{38} cluster (see Fig. 5). $ \psi A $ is chosen with even-handed LMO selection. (Although for these three choices of $ X A $, the charge method selects the same LMOs.) Figure 5 presents the binding energy error associated with CCSD-in-PBE versus CCSD-in-HF embedding as a function of the choice of subsystem *A*. Errors are computed relative to reference full-system CCSD calculation, available due to the smaller size of this cluster. The T1 diagnostic is found to be less that 0.02 at all geometries. CCSD-in-PBE embedding converges rapidly to the full CCSD result, with an error of 1 kcal/mol at the smallest subsystem *A* size considered. By contrast, CCSD-in-HF embedding does not reach the same level of accuracy until subsystem *A* contains all atoms in the system except the three subsurface Cu atoms. As in Fig. 3, projection-based embedding removes dependence on the MF method used for subsystem *B*; however, a larger subsystem *A* is required when the MF poorly describes the electronic structure of subsystem *B* (as in HF for this Cu cluster).

Compared to the charge selection method, which is fragile to changes in polarization upon CO binding, we see that even-handed LMO selection gives a consistent $ \psi A $ with respect to all geometries along the binding coordinate (Fig. 4). Using even-handed LMO selection leads to rapid convergence of the energy calculation with respect to the size of subsystem A.

## IV. CONCLUSION

Projection-based embedding provides a framework for rigorous wavefunction-in-DFT embedding. In this paper, we demonstrate that the standard charge-based criterion for selecting the integer number of occupied LMOs that comprise the embedded subsystem can lead to discontinuous energy profiles and slow convergence of the calculated energy with respect to the size of subsystem *A*. We have introduced an even-handed selection procedure that ensures a consistent set of embedded occupied LMOs throughout a reaction coordinate. This algorithm only uses mean-field quantities that have already been computed as part of the projection-based embedding wavefunction-in-DFT procedure; thus, it adds negligible cost.

This method has been applied in several situations including for the cases of an organometallic catalyst and the binding of a molecule to a metal cluster, considering both DFT-in-DFT embedding and wavefunction-in-DFT embedding. In all cases, the even-handed method has been shown to be superior to the original charge method for selecting LMOs, resulting in smooth potential embedded energy surfaces and more rapid convergence of the embedded energy with respect to the size of subsystem *A*. Furthermore, embedding calculations performed using even-handed selection are largely insensitive to the underlying mean-field theory used to treat subsystem *B*, although it is worth noting that a sufficiently poor choice of mean-field theory, such as HF for a metal cluster, can lead to slower convergence with respect to the size of subsystem *A* than when starting from a better mean-field theory (Fig. 5).

Projection-based embedding has recently been extended to include periodic boundary conditions,^{109,111} and the ideas of even-handed selection can be naturally applied in such a framework; however, caution should be taken in the case of wavefunction-in-DFT embedding for metals due to slow convergence of the AO truncation in small-bandgap systems for which poorly localized occupied LMOs contribute substantially to the correlation energy. Regardless, we show that even-handed selection results in highly accurate B3LYP-in-PBE embedding for Cu binding to a Cu_{38} cluster despite substantial changes in polarization of the metal (Fig. 4), and accurate CCSD-in-PBE embedding was demonstrated for CO binding to the smaller Cu_{10} cluster (Fig. 5).

The even-handed selection method introduced here provides a consistent set of occupied LMOs for an embedding calculation across a reaction coordinate. Because its input comes entirely from mean-field calculations at each reaction coordinate geometry, the method has negligible cost compared to the original charge method for LMO selection. In cases where even-handed and charge selection yield the same embedded occupied LMOs, the even-handed procedure provides a measure of confidence that the embedded density is consistent across geometries. When the two procedures differ, even-handed selection has in all cases resulted in qualitatively superior results at the cost of a modest number of additional embedded occupied LMOs. We therefore recommend the use of the even-handed LMO selection procedure whenever considering a specified reaction pathway.

## ACKNOWLEDGMENTS

We thank Feizhi Ding and Sebastian Lee for helpful discussions. M.W. thanks the Resnick Sustainability Institute for a postdoctoral fellowship. T.F.M. acknowledges support in part from the NSF under Award No. CHE-1611581; additionally, this material is based upon work performed by the Joint Center for Artificial Photosynthesis, a DOE Energy Innovation Hub, supported through the Office of Science of the U.S. Department of Energy under Award No. DE-SC0004993. F.R.M. is grateful for funding from EPSRC (Grant No. EP/M013111/1).

## REFERENCES

*ab initio*programs, 2018, see http://www.molpro.net.

B3LYP reference timing calculations are started from both the Superposition of Atomic Densities (SAD) guess [J. H. Van Lenthe *et al.*, J. Comput. Chem. **27**, 926 (2006)] and from the converged PBE orbitals. The faster of these two timings is used for speedup calculations. PBE and B3LYP-in-PBE timings are computed starting from the SAD guess.

The Pipek-Mezey [J. Pipek and P. G. Mezey, J. Chem. Phys. **90**, 4916 (1989)] and Boys [S. F. Boys, Rev. Mod. Phys. **32**, 296 (1960)] localization schemes result in a greater number of poorly localized LMOs, as does changing the IBO localization exponent from 4 to 2.