This work examines the viability of matrix completion methods as cost-effective alternatives to full nuclear Hessians for calculating quantum and variational effects in chemical reactions. The harmonic variety-based matrix completion (HVMC) algorithm, developed in a previous study [S. J. Quiton *et al.*, J. Chem. Phys. **153**, 054122 (2020)], exploits the low-rank character of the polynomial expansion of potential energy to recover vibrational frequencies (square roots of eigenvalues of nuclear Hessians) constituting the reaction path using a small sample of its entities. These frequencies are essential for calculating rate coefficients using variational transition state theory with multidimensional tunneling (VTST-MT). HVMC performance is examined for four S_{N}2 reactions and five hydrogen transfer reactions, with each H-transfer reaction consisting of at least one vibrational mode strongly coupled to the reaction coordinate. HVMC is robust and captures zero-point energies, vibrational free energies, zero-curvature tunneling, and adiabatic ground state and free energy barriers as well as their positions on the reaction coordinate. For medium to large reactions involving H-transfer, with the sole exception of the most complex Ir catalysis system, less than 35% of total eigenvalue information is necessary for accurate recovery of key VTST-MT observables.

## I. INTRODUCTION

In this work, we assess the viability of a recently developed algorithm, namely, harmonic variety-based matrix completion (HVMC), as a computationally efficient alternative to multiple Hessian calculations otherwise necessary to compute quantum and variational effects in chemical reactions.^{1} Our larger objective is to enable inexpensive reaction rate predictions for catalytic reactions with direct dynamics approaches. Most computational kinetics studies employ conventional transition state theory (TST).^{2–4} TST is based on the principle that the potential energy maximum (transition state, a first-order saddle point) in a reaction step defines the dividing surface between reactants and products. It is employed alongside electronic structure calculations and the independent-mode, harmonic oscillator approximation to determine rate coefficients of elementary chemical reactions. However, conventional TST presents an oversimplified picture of the reaction path as bond-breaking/making does not occur at the harmonic limit and the no-recrossing assumption is not valid at higher temperatures. Therefore, it routinely breaks down at high temperatures when the reaction is barrierless or when modes are strongly coupled to the reaction coordinate.^{5–8} Tunneling, a quantum mechanical phenomenon that is significant at low temperatures particularly when light atoms, such as hydrogen, participate in the reaction step,^{9,10} is also not captured under the framework of conventional TST.

Variational transition state theory (VTST), also known as canonical variational theory (CVT), pursues the free energy maximum in a reaction step or, equivalently, finds the dividing surface that minimizes recrossing of reactive trajectories.^{10–12} For loose transition states in barrierless association reactions, variable-reaction-coordinate VTST (VRC-VTST) has been developed.^{13–15} Fernandez-Ramos and co-workers established frameworks for quantifying multidimensional tunneling (MT) effects on VTST (VTST-MT) reaction rates by means of a multiplicative transmission coefficient.^{16} MT corrections include vibrational modes via the adiabatic-ground state potential and, hence, contain quantized energies. They are also based on the fact that the tunneling path is shorter than the minimum energy path (MEP) of the reaction (also known as “corner-cutting”). Small-curvature tunneling (SCT) and large-curvature tunneling (LCT) take into account of the corner-cutting, while zero-curvature tunneling (ZCT) neglects mode-coupling and corner-cutting and, therefore, underestimates tunneling.^{9,16,17} Several programs have been developed to perform VTST-MT rate coefficient calculations. The FORTRAN-based POLYRATE program for polyatomic molecules supports both TST and VTST for reaction rate calculations and ZCT, SCT, and LCT corrections.^{18} More recently, a Python-based program called Pilgrim has been developed, which calculates thermal rate coefficients with VTST and supports tunneling calculations via ZCT and SCT approximations.^{19}

Despite the availability of well-established theories and software, conventional TST continues to be widely used, particularly in thermal catalysis studies. The practical use of VTST is limited for several reasons, including the ambiguity associated with the treatment of very small vibrational frequencies. The harmonic oscillator is a poor model for these low-frequency or large amplitude modes.^{20} The multi-structural approximation with the torsional anharmonicity (MS-T) method, available as a Fortran based package called *MSTor*,^{21,22} determines torsional correction factors to the harmonic approximation for such modes by using partition functions to calculate thermodynamic properties.^{23} VTST also requires multiple nuclear Hessians to determine vibrational modes constituting the entire MEP as opposed to conventional TST, which only requires Hessians at two stationary points (initial state and transition state). Truhlar and co-workers developed interpolated VTST (IVTST), which aims to minimize this effort and enables efficient tunneling and rate coefficient calculations by fitting splines to energies, gradients, moments of inertia, and eigenvalues of Hessians constituting the MEP.^{24–29} Several studies, ranging from gas phase radical chemistry to reactions of transition metal complexes, have employed IVTST to compute rate coefficients with tunneling corrections.^{30–33} However, it must be noted that some of these studies still required several Hessian calculations. An alternative strategy known as multiconfigurational molecular mechanics (MCMM) has also been proposed by Lin and co-workers, which reduces Hessian costs for non-stationary points by only evaluating some Hessian elements with electronic structure methods and the remaining via interpolation.^{34} The differing computational treatment, however, involves user intervention to classify constituent atoms into core, vicinal, geminal, and distant groups depending on the extent of participation in bond-breaking or formation events.

Our goal is to leverage advances in signal processing methods, particularly matrix completion (MC),^{35} to render VTST-MT more tractable. MC methods exploit an underlying property of the system, such as rank or sparsity, to complete/recover missing information. Methods that exploit sparsity, known as compressed sensing (CS), have found widespread applications in imaging,^{36} analytical chemistry,^{37–39} thermodynamics,^{40} and quantum chemistry.^{1,41–44} In the latter domain, CS has been employed to compute lattice couplings,^{41} determine large Hessians,^{42} construct anharmonic potential energy surfaces,^{43} and identify scaling relationships in heterogeneous catalysis.^{44} To the best of our knowledge, MC methods that leverage properties other than sparsity have yet to find applications in chemistry. The sole exception is the previous work by our groups toward rendering VTST-related calculations more efficient.^{1} We constructed a matrix of nuclear Hessian eigenvalues (squares of vibrational frequencies) constituting the MEP based on the reaction path Hamiltonian (RPH) expansion of the potential energy term in projected normal mode coordinates.^{45,46} We demonstrated that variety-based matrix completion (VMC)^{47} exploits the low-rank character of this polynomial problem and recovers randomly missing eigenvalues. The algorithm, termed harmonic variety-based matrix completion, recovers the full matrix using only a small subset of its elements with accuracies that yield reliable zero-point and vibrational free energies (temperature up to 1000 K) for several S_{N}2 reactions. Sampling requirements (or cost) decrease with increasing system size, showing that HVMC may scale favorably with size. While this is encouraging, S_{N}2 reactions typically consist of reaction coordinates that are not strongly coupled to the remaining vibrational degrees of freedom. In other words, the robustness of HVMC across a broad spectrum of reactions, where more than one vibrational mode varies strongly in the course of the reaction step, must be examined.

This work addresses the question of HVMC robustness by expanding the scope of algorithm testing in both reaction classes and observables. Five representative reactions spanning gas phase, inorganic, and enzyme chemistry are chosen, each involving the transfer of at least one hydrogen atom/hydride/proton. The S_{N}2 systems are also expanded to include two larger reactions (56 and 66 atoms). Zero-point energies, vibrational free energies, ZCT transmission coefficients, and values and positions of maxima are calculated to assess HVMC performance. The minimum sampling density, or fraction of overall eigenvalue information of the projected Hessians necessary to recover key quantities, is small and generally decreases with increasing system size, indicating favorable scaling of the algorithm. Very high densities are only observed for either very small systems (≤10 atoms) or in very rare instances where the reaction coordinate consists of multiple bond-breaking/making events. HVMC predicts free energy maxima more reliably for reactions in which the peak of the free energy profile is sharp. Higher scatter in HVMC predictions of free energies in the vicinity of transition states makes predictions of maxima for flatter free energy profiles more challenging. This also indicates that an alternative to the current random sampling method is necessary, with strategies that are more compatible with routine quantum chemistry calculations outlined at the end of this study. Robust and scalable HVMC performance, therefore, shows that MC methods offer promising and efficient means for applying VTST-MT approaches to study complex catalytic reactions.

## II. METHODS

To calculate variational rate and/or multidimensional tunneling coefficients, knowledge of the reaction path—initial, final, and transition structures (TS), as well as the minimum energy path connecting the three structures—is essential. In addition, vibrational analyses for points constituting the MEP are required, which become prohibitive for large systems or when analytical second derivatives are unavailable. Our goal is to lower the computational effort associated with such vibrational analyses. This section outlines the procedure employed to examine the ability of matrix completion methods, specifically HVMC, to recover (calculate) all vibrational frequencies constituting the MEP by using only a small, randomly available fraction of the total information. Reaction paths with complex variations in vibrational frequencies are chosen to probe the robustness of HVMC.

### A. HVMC implementation and performance evaluation

The construction of a polynomial variety problem for an MEP using the reaction path Hamiltonian (RPH)^{45,46} is described in detail in our previous work.^{1} In the RPH formalism, the potential energy for a small displacement [*V*(*s*, **q**)] from the MEP [*V*_{0}(*s*)] can be expressed as

The reaction coordinate is parameterized by *s*, which is zero at the TS, negative in the reactant region (reactant → *s*_{1}), and positive in the product region (product → *s*_{M}). ${\omega i2(s)}$ are eigenvalues of Hessians constituting the MEP, from which translations, rotations, and the reaction coordinate are projected out. The eigenvalues are also the squares of transverse vibrational frequencies. ${qi2}$ constitute 3*N* − 7 (or *F* − 1 where *F* quantifies vibrational degrees of freedom) projected normal mode displacements from the MEP, where *N* is the number of atoms in the system. A lifted polynomial equation of the form **x**^{T}**C**′ = 0 can be constructed by expressing Eq. (1) for all points on the MEP, where **x** is the vector of projected normal mode displacements. This representation constitutes a low-rank problem as only a fraction of the overall degrees of freedom of a system changes significantly in the course of a reaction step. By neglecting the top row of **C**′, which consists of [*V*(*s*, **q**) − *V*_{0}(*s*)] terms that are assumed to be small and approximately uniform for very small displacements, we obtain **C**,

The number of columns of **C** is equal to the number of points, *m*, on the MEP. A **C** matrix with missing entries can be completed using a matrix completion method with rank minimization as its objective function. Our previous work demonstrates that HVMC achieves this goal and accurately predicts zero-point energies and vibrational free energies for model S_{N}2 reactions.^{1} The HVMC algorithm employed in both the current and previous work is nearly identical to the VMC implementation developed by Ongie and co-workers.^{47} The algorithm is summarized in Sec. 1 of the supplementary material.

This work determines whether HVMC can recover eigenvalues of the projected Hessians accurately for reactions in which at least one transverse mode is strongly coupled to the reaction coordinate. We note key implementation details and distinctions from our previous work. First, as described in Sec. 2 of the supplementary material, the eigenvalues in each column of **C** are in ascending order rather than being based on eigenvector overlap between adjacent columns (diabatic ordering)^{48} employed in our previous work. This is significant because otherwise eigenvectors are required to achieve diabatic ordering of the elements in each column—necessitating the same computational effort that we are attempting to reduce or avoid entirely. The second difference lies in the construction of incomplete **C** matrices as HVMC input. HVMC performance is evaluated by constructing a complete **C** matrix and masking elements at random to achieve the desired sampling density, *ρ*, defined as the number of elements sampled per column. While our previous work utilizes purely random sampling to achieve the desired value of *ρ*, here we ensure that at least one element is sampled (at random) for every row of **C**.^{35} Therefore, the minimum possible value of *ρ* (*ρ*_{min}) for any system corresponds to the reciprocal of the number of columns or number of points constituting the MEP.

Similar to our previous study, 100 random trials are performed for each system for a given value of *ρ*. HVMC performance is assessed based on the statistical outcomes of these trials. The minimum required sampling density for every system is determined based on multiple criteria—Frobenius norm error of the recovered matrix (*Err*) and prediction errors associated with zero-point energy (ZPE), vibrational free energy [*G*_{vib}(*T*, *s*)] at high temperature (1000 K), and transmission coefficient based on zero-curvature tunneling at low temperature (300 K). The minimum sampling required, *ρ*_{min}, at which at least 75 out of 100 random HVMC trials recover these quantities within the error margins described below, is reported and used as the metric to assess HVMC scaling and robustness.

The HVMC recovery error via the Frobenius norm is calculated as

with subscripts *HVMC* and *Target* representing model prediction and target values obtained from density functional theory (DFT), respectively. *G*_{vib}(*T*, *s*) along the reaction coordinate (*s*) is given by^{9,48}

The threshold for accuracy of ZPE and *G*_{vib}(*T*, *s*) is chosen as the maximum deviation in either quantity being less than 2.39 kcal/mol, identical to our prior work.^{1} The treatment of small modes that are not true vibrational degrees of freedom differs from previous work. We had previously neglected frequencies lower than a fixed, pre-determined threshold value in free energy calculations owing to the well-known issue of divergence of the entropy term.^{1} However, this approach causes an inconsistency in total degrees of freedom. To remedy this issue in our current work, we replace any frequencies below 100 cm^{−1} with 100 cm^{−1} and preserve the overall number of degrees of freedom. Matrix errors and minimum sampling densities, therefore, differ between the two studies for the S_{N}2 reactions. We note that the number of eigenvalues below this threshold depends on the system but does not change significantly between HVMC trials and, therefore, does not impact our assessment of HVMC accuracies in predicting *G*_{vib}.

Our previous work focusing on HVMC development utilized ZPE and *G*_{vib} to assess performance but not multidimensional tunneling effects. In this work, quantum effects are captured using zero-curvature tunneling. ZCT does not rely on the eigenvector information as it neglects curvature and mode coupling,^{16} which makes it an appropriate choice for our current objectives although it underestimates tunneling contributions compared to SCT or LCT. The adiabatic ground-state potential is calculated by adding harmonically approximated vibrational ZPEs to the electronic energy at every point on the MEP,

The zero-curvature tunneling transmission coefficient is given by the ratio of Boltzmann averages of the semiclassical adiabatic ground-state (SAG) probability to the classical probability,^{16}

*β* is 1/(*k*_{B}*T*), reciprocal of the product of the temperature and Boltzmann constant. The probability function is energy-dependent and given in terms of limits determined by critical points on the adiabatic ground-state potential,^{16}

Here, *E*_{0} is the reactant (product) energy for exothermic (endothermic) reaction and the maximum of the energy curve is referred to as *V*^{AG}. We use DFT-calculated values of *E*_{0} and *V*^{AG} for the calculation of transmission coefficients from eigenvalues predicted by HVMC since *κ*^{ZCT} is sensitive to these two end-points. Generally, Hessian calculations of the reactant, product, and transition state are done before MEP computations, and thus, this assumption does not limit the applicability of HVMC. The action integral *θ*(*E*) based on the reduced mass *μ* (equal to unity for ZCT in atomic units) is calculated between the left (*s*<) and right (*s*>) turning points (s-values corresponding to given energy on the $VaG$ curve)^{16} and any additional points (*s*_{1} and *s*_{2}), if present,

We follow the procedure described in the Pilgrim software to calculate ZCT using both the target and HVMC-recovered matrices.^{19} Splines are used to obtain a smooth potential function, and integration is carried out using Gaussian quadrature. We validate our Python program for transmission coefficient calculations by comparing them against those obtained with Pilgrim. The difference is below 1% for all test systems and temperatures available in the Pilgrim test set. HVMC-based ZCT values are compared with target (DFT) coefficients at 300 K. The accuracy is selected such that the predicted rate is within an order of magnitude of the target rate. Since the transmission coefficient is a multiplicative factor, HVMC accuracy is deemed acceptable if the predicted *κ*^{ZCT} is within one order of magnitude of the target value.

### B. Reactions

Using the metrics described above, we assess HVMC performance across a broad spectrum of molecular reactions spanning gas phase, enzymatic, and catalytic chemistry. With the exception of S_{N}2 reactions, the eigenvalue spectra consist of one or more modes that are strongly coupled to the reaction coordinate. Therefore, they serve as ideal candidates to test the robustness of HVMC. The following reaction classes and reactions are examined with the corresponding transition structures (TSs) illustrated in Fig. 1 and Hessian eigenvalue spectra constituting the MEP in Fig. 2:

S

_{N}2: Four symmetric reactions of alkyl/aryl chlorides with system sizes ranging from 6 to 66 atoms. As the reactions are symmetric, HVMC is employed to recover eigenvalues corresponding to the half-reaction. Complete reaction paths are employed to calculate transmission coefficients.*Hydrogen abstraction*: Three reactions are chosen—CF_{3}CH_{3}+ OH → CF_{3}CH_{2}+ H_{2}O,^{50}C_{6}H_{6}+ CH_{3}→ C_{6}H_{5}+ CH_{4},^{51}and C_{6}H_{6}+ C_{2}H_{5}→ C_{6}H_{5}+ C_{2}H_{6},^{51}consisting of 10, 16, and 19 atoms, respectively. As tunneling plays a significant role in determining rate coefficients of hydrogen atom transfer reactions, these reactions (and those below) serve to also test HVMC accuracies in predicting quantum effects.*Metal–ligand cooperative homogeneous*Ir*catalysis*: In this reaction, isopropanol is oxidized to acetone catalyzed by an Ir complex via concerted hydride transfer to the metal center and proton transfer via a solvent-assisted proton shuttle to the ligand.^{52}This reaction represents the most complex example we could find where the reaction coordinate is comprised of at least three stretching modes (one hydride transfer and two proton transfers). It is possible therefore that this reaction represents a high-rank problem that is close to the upper limit of complexity observable in a single-step reaction. The system consists of 46 atoms.*Enzymatic deprotonation*: This is the second largest system in our study (64 atoms) and largest among non-S_{N}2 reactions. It describes the oxidation of methylamine by the Methylamine Dehydrogenase (MADH) enzyme.^{53}

The *ω*B97X/def2-SVP level of theory^{54,55} is employed in all cases, with the exception of the iridium catalyst system, and all calculations are carried out using the Q-Chem *ab initio* quantum chemistry software (version 5.2 and higher).^{56} For the Ir catalysis system, the dispersion-corrected (Grimme’s D3 with the Becke–Johnson damping, D3-BJ)^{57,58} B3LYP functional^{59,60} is used. The Stuttgart small-core effective core potential is used to describe the Ir atom,^{61,62} and the remaining elements are represented using double-*ζ* 6-31G* basis set. The level of theory is comparable to that utilized in the original work.^{52}

MEPs are calculated in mass-weighted coordinates, also known as intrinsic reaction coordinates (IRCs), using the gradient-based predictor-corrector algorithm in Q-Chem.^{63–65} We note that employing mass-weighted Cartesian coordinates in vibrational frequency calculations can lead to unphysical modes. Such issues can be eliminated using curvilinear coordinates.^{66–68} Shapes of the eigenvalue spectrum and adiabatic ground-state potential curves are likely to be affected by the choice of coordinates. However, independent of the coordinate selection, our HVMC method is applicable, and examining performance for **C** constructed in internal coordinates constitutes future work. The maximum IRC stepsize for each system is chosen to ensure a smooth MEP and is equal to 0.1 a.u. for most reactions, except MADH and hydrogen abstractions (0.05 a.u., except C_{6}H_{6} + C_{2}H_{5}, for which 0.025 a.u. is used).

Section 3 of the supplementary material describes the procedure used to construct complete **C** matrices from Hessian calculations for points on the MEP. The procedure ensures that **C** matrices for all reactions have the identical number of columns or points on the MEP (24) and only differ in the number of rows (3*N* − 7). As seen in Fig. 2, these systems represent diverse variations in vibrational frequencies in the course of a reaction step from monotonic in S_{N}2 to at least four modes exhibiting significant changes when traversing the MEP in the Ir metal–ligand cooperative catalytic reaction. We demonstrate in this work that zero-point energies, vibrational free energies, and zero-curvature tunneling coefficients are accurately captured with HVMC when only a small (random) subset of vibrational frequency information is available for every system.

This work examines HVMC accuracies relative to DFT in predicting terms constituting the VTST-MT rate coefficient expression rather than to compute and contrast rate coefficients themselves with accurate experimental or computational benchmarks. This is because uncertainties arise from both the chosen level of theory as well as our approximate treatment of small modes, which are not easy to isolate and quantify. All data, the HVMC program, and supporting Python routines to calculate quantum and variational parameters are publicly available on GitHub (https://github.com/RateTheory/HVMC).^{69}

## III. RESULTS AND DISCUSSION

### A. HVMC accuracy and robustness

Scaling and robustness of HVMC are evaluated based on the minimum sampling density, *ρ*_{min}, necessary to recover ZPE and *G*_{vib} (1000 K) within “chemical accuracy” (maximum deviation ≤ 2.39 kcal/mol or 10 kJ/mol) and ZCT transmission coefficient (300 K) to within an order of magnitude in at least 75 out of 100 trials. Mean errors in *G*_{vib} (1000 K) and ZPE are within chemical accuracy for all 100 trials for each of the S_{N}2 and non-S_{N}2 systems examined. Table I describes mean Frobenius norm errors associated with HVMC recovery as well as the percentage of trials that satisfy our criteria for maximum deviations in key quantum and variational observables. The results are divided into S_{N}2 and non-S_{N}2 systems, as the former is employed to demonstrate HVMC scaling and the latter to examine the robustness. *R*_{est} is the estimated rank of the full matrix, determined using an information criterion presented in Sec. 4 of the supplementary material. *S*_{N}2 reactions are all low-rank, while the remaining systems exhibit higher (and varying) complexity, confirming what is visually observable in Fig. 2. The Ir catalysis reaction exhibits the highest rank among all systems, thereby allowing us to probe how low-rank the problem needs to be for HVMC to be a reliable, efficient alternative to full MEP Hessian calculations.

. | . | . | . | |Max.deviation| ≤ 2.39 kcal/mol . | . | |
---|---|---|---|---|---|---|

Reaction . | ρ_{min}
. | R_{est}
. | μ_{Err,mat} ± σ
. | G_{vib} (1000 K)
. | ZPE . | $0.1\u2264\kappa HVMCZCT\kappa DFTZCT\u226410$ . |

S_{N}2(1) | 0.2 | 3 | 2.32 ± 0.81 | 84 | 100 | 100 |

S_{N}2(2) | 0.05 | 2 | 1.45 ± 0.2 | 100 | 100 | 100 |

S_{N}2(3) | 0.045 | 2 | 0.76 ± 0.13 | 100 | 100 | 100 |

S_{N}2(4) | 0.045 | 2 | 0.66 ± 0.13 | 100 | 100 | 100 |

CF_{3}CH_{3} + OH | 0.6 | 8 | 5.56 ± 1.56 | 86 | 100 | 100 |

CH_{3} + C_{6}H_{6} | 0.35 | 6 | 5.57 ± 1.51 | 80 | 100 | 100 |

C_{2}H_{5} + C_{6}H_{6} | 0.3 | 6 | 6.10 ± 1.34 | 84 | 100 | 100 |

Ir catalysis | 0.5 | 10 | 2.52 ± 0.74 | 81 | 90 | 100 |

MADH enzyme | 0.1 | 7 | 4.30 ± 0.89 | 75 | 98 | 100 |

. | . | . | . | |Max.deviation| ≤ 2.39 kcal/mol . | . | |
---|---|---|---|---|---|---|

Reaction . | ρ_{min}
. | R_{est}
. | μ_{Err,mat} ± σ
. | G_{vib} (1000 K)
. | ZPE . | $0.1\u2264\kappa HVMCZCT\kappa DFTZCT\u226410$ . |

S_{N}2(1) | 0.2 | 3 | 2.32 ± 0.81 | 84 | 100 | 100 |

S_{N}2(2) | 0.05 | 2 | 1.45 ± 0.2 | 100 | 100 | 100 |

S_{N}2(3) | 0.045 | 2 | 0.76 ± 0.13 | 100 | 100 | 100 |

S_{N}2(4) | 0.045 | 2 | 0.66 ± 0.13 | 100 | 100 | 100 |

CF_{3}CH_{3} + OH | 0.6 | 8 | 5.56 ± 1.56 | 86 | 100 | 100 |

CH_{3} + C_{6}H_{6} | 0.35 | 6 | 5.57 ± 1.51 | 80 | 100 | 100 |

C_{2}H_{5} + C_{6}H_{6} | 0.3 | 6 | 6.10 ± 1.34 | 84 | 100 | 100 |

Ir catalysis | 0.5 | 10 | 2.52 ± 0.74 | 81 | 90 | 100 |

MADH enzyme | 0.1 | 7 | 4.30 ± 0.89 | 75 | 98 | 100 |

For the smallest S_{N}2 reaction, HVMC recovers *G*_{vib} accurately in more than 80% of all trials if 20% of the total eigenvalue information is available, with at least one data-point available for every eigenvalue mode (row of **C**). The corresponding mean norm error (*μ*_{Err,mat}) is 2.32%. In line with our previous work,^{1} HVMC scales favorably with system size demonstrated by a sharp decrease in both *ρ*_{min} and *μ*_{Err,mat} from *S*_{N}2(**1**) to *S*_{N}2(**4**). *ρ*_{min} hits the minimum allowable limit (4.5%) for *S*_{N}2(**3**) and *S*_{N}2(**4**), at which at least one element per row is sampled (≈1/24). Maximum deviations of ZPE and *G*_{vib} for all trials are within chemical accuracy for the larger systems. ZCT transmission coefficients (*κ*^{ZCT}) calculated from HVMC-recovered eigenvalues are within an order of magnitude of DFT-based coefficients. Plots describing HVMC accuracies in predicting *G*_{vib}(1000 K), ZPE, and *κ*^{ZCT} are presented in Sec. 6.1 of the supplementary material.

While HVMC is both accurate and requires lower sampling for larger S_{N}2 reactions, eigenvalues [Figs. 2(a)–2(d)] vary monotonically along the MEP, indicating that alternative, straightforward approaches, such as curve-fitting or spline-interpolation, may also yield similar results. Therefore, diverse spectra [Figs. 2(e)–2(i)] are employed to test the robustness of HVMC. Representative eigenvalue spectra predicted by HVMC are shown in Figs. S7–S11 of the supplementary material. Table I (bottom) shows that *G*_{vib}, ZPE, and *κ*^{ZCT} are accurately captured with limited sampling despite larger *μ*_{Err,mat} compared to S_{N}2 reactions. The smallest system, CF_{3}CH_{3} + OH (10 atoms), has the highest *ρ*_{min} of 60%, which is higher than S_{N}2 reactions of comparable size. This shows that non-monotonic variations in eigenvalues necessitate greater sampling. Both CH_{3} + C_{6}H_{6} and C_{2}H_{5} + C_{6}H_{6} eigenvalue spectra, similar to CF_{3}CH_{3} + OH, possess at least one mode each that varies more strongly with *s* than others. Reduced sampling requirements for these two systems compared to CF_{3}CH_{3} + OH, similar to S_{N}2 systems, indicate that a smaller fraction of matrix information is required as systems grow larger. However, even though the error tolerances in observables are met in the case of C_{2}H_{5} + C_{6}H_{6}, the trough of the strongly varying mode in Fig. 2(g) cannot be completely captured (Fig. S9 of the supplementary material). While this means that higher sampling may be necessary to recover key features of the spectrum, establishing general selection criteria for sampling density *a priori* across various classes of reactions and system sizes is a difficult task. Instead, we examine the implications of our criteria on HVMC predictions of free energy maxima in Sec. III B.

As shown in Fig. 2(h), the Ir-based metal–ligand cooperative catalysis system consists of the highest number of eigenvalues that change significantly over the course of the reaction among all systems examined (highest *R*_{est}). Consequently, a *ρ*_{min} of about 50% is essential to capture quantum and variational effects accurately. HVMC-recovered spectra for specific trials (Fig. S10 of the supplementary material) demonstrate the method’s ability to capture complex spectra with sufficient sampling. The MADH enzyme model is the largest system examined and is associated with the lowest *ρ*_{min} among non-S_{N}2 reactions. Similar to the C_{2}H_{5} + C_{6}H_{6} system, the strongest varying mode in MADH as well as the smaller dips on the reactant side [Fig. 2(i)] are not adequately captured in HVMC trials (Fig. S11 of the supplementary material) despite *ρ*_{min} satisfying all our accuracy thresholds.

### B. Potential and free energy profiles, barriers, and ZCT coefficients

Table I demonstrates that HVMC is robust as it is able to capture non-monotonic variations in modes for most reactions. It is also necessary to examine the impact of approximately recovered eigenvalues on the resulting adiabatic ground-state potential energy and vibrational free energy curves. For the non-S_{N}2 reactions, Figs. 3 and 4 depict HVMC (predicted) vs DFT (target) adiabatic ground-state potential $(VaG)$ and vibrational free energy differences (Δ*G*_{vib}) at elevated temperatures (1000 K), respectively. With the sole exception of Ir catalysis [Fig. 3(d)], $VaG$’s obtained from HVMC [Figs. 3(a)–3(c) and 3(e)] are well within desired chemical accuracy (gray shaded region). The majority of outliers in Ir catalysis are in the region of the TS. The deviations of HVMC-predicted $VaG$ values from DFT for the CF_{3}CH_{3} + OH system are also pronounced near the TS. This is because modes vary stronger in the vicinity of *s* = 0 compared to regions of the MEP closer to the reactant or product. Increased scatter of $VaG$ in the vicinity of the TS indicates that a different sampling strategy that is less random, or more structured, may be necessary (Sec. III C). $VaG$ errors are more distributed for the MADH enzyme compared to the other systems due to the poorer overall prediction of eigenvalue spectra described earlier (Fig. S11 of the supplementary material). The scatter in $VaG$ can be lowered by increased sampling, shown in Fig. S12 of the supplementary material.

The ground-state potential [Eq. (5)] is given by the sum of square roots of the elements constituting a column of the **C** matrix, whereas free energies are more complex functions of matrix elements [Eq. (4)]. Similar to our previous study, we utilize high-temperature Δ*G*_{vib} (1000 K) as the goal is to assess HVMC performance for applications in thermal catalysis.^{1} HVMC predictions of Δ*G*_{vib} variations along the MEP in Fig. 4 exhibit greater scatter compared to $VaG$ and algorithm performance is less straightforward to interpret. For all the systems, at least some HVMC trials lead to Δ*G*_{vib} values outside the prescribed error range. Although the extent appears to be smaller for the Ir catalysis system in Fig. 4(e), this is the result of a wider free energy range (y-axis) spanned by the reaction step and not the absence of scatter. Scatter is reduced at lower temperatures (Fig. S13 of the supplementary material). *ρ*_{min} is therefore determined not only by the complexity of the system but also by the chosen range of reaction temperatures.

Next, we examine HVMC’s ability to accurately predict the location and magnitude of the activation barrier or the difference between the maximum of adiabatic ground state/free energy and the reactant. It must be noted that the algorithm is designed for recovering all elements and not specifically to pursue maxima.^{70} Therefore, it is not necessary that HVMC yield reliable barrier predictions. The non-S_{N}2 reactions can be categorized into two based on whether the free energy maximum is distinct from the adiabatic ground-state potential maximum at *s* = 0 (CF_{3}CH_{3} + OH) or not (all others). CF_{3}CH_{3} + OH has free energy maximum shifted to the right of the MEP maximum at *s* = 0.1204 [Fig. 4(a)].

Table II shows the number of HVMC trials that yield a maximum at the same *s* as the DFT maximum. The normalized standard deviations in the position of the maxima are also reported. There is no direct correlation between the matrix errors reported in Table I and the number of trials with the correct maximum. More than 90 trials out of 100 yield Δ*G*^{CVT} and *V*^{AG} at *s* = 0 in the case of MADH despite HVMC’s poor recovery of eigenvalue spectra (Fig. S11) and large Frobenius norm error (*μ*_{Err,mat} = 4.30). Based on our limited set of five reactions, it would appear that the maxima are accurately predicted by HVMC when they are sharp, which is the case with Ir catalysis and the MADH enzyme. For the remaining systems with shallower peaks (CF_{3}CH_{3} + OH, CH_{3} + C_{6}H_{6} and C_{2}H_{5} + C_{6}H_{6}), HVMC is less reliable, and less than half of the trials yield correct maxima in two out of the three reactions. The shallow feature in the $VaG$ peak for CF_{3}CH_{3} + OH is consistent with prior observations by Espinosa-García *et al.*^{50} and arises from the strong downward trend in a single vibrational mode in the region of the TS [Fig. 2(e)]. Normalized standard deviations associated with the location of maxima are higher forCF_{3}CH_{3} + OH and C_{2}H_{5} + C_{6}H_{6} compared to the remaining systems. With the exceptions of CF_{3}CH_{3} + OH and Ir catalysis, deviations in Δ*G*^{CVT} are higher owing to increased scatter compared to *V*^{AG}.

. | ΔG^{CVT} (1000 K)
. | V^{AG}
. | ||
---|---|---|---|---|

Reaction . | No. of trials . | σ_{s}
. | No. of trials . | σ_{s}
. |

CF_{3}CH_{3} + OH | 33 | 0.051 | 35 | 0.059 |

CH_{3} + C_{6}H_{6} | 38 | 0.026 | 45 | 0.019 |

C_{2}H_{5} + C_{6}H_{6} | 51 | 0.048 | 64 | 0.040 |

Ir catalysis | 73 | 0.019 | 71 | 0.020 |

MADH enzyme | 91 | 0.026 | 95 | 0.017 |

. | ΔG^{CVT} (1000 K)
. | V^{AG}
. | ||
---|---|---|---|---|

Reaction . | No. of trials . | σ_{s}
. | No. of trials . | σ_{s}
. |

CF_{3}CH_{3} + OH | 33 | 0.051 | 35 | 0.059 |

CH_{3} + C_{6}H_{6} | 38 | 0.026 | 45 | 0.019 |

C_{2}H_{5} + C_{6}H_{6} | 51 | 0.048 | 64 | 0.040 |

Ir catalysis | 73 | 0.019 | 71 | 0.020 |

MADH enzyme | 91 | 0.026 | 95 | 0.017 |

^{a}

The reaction coordinate is normalized such that *s* = 0 and *s* = 1 correspond to reactant and product states, respectively.

Table III presents mean free energy barriers (Δ*G*^{CVT}) that are employed in VTST rate coefficient calculations. Across the range of temperatures examined, HVMC predictions of the barriers on average are very close to $\Delta GDFTCV T$. Despite reduced scatter in free energies along the MEP with decreasing temperature (Fig. S13 of the supplementary material), errors in free energy barriers are less sensitive to temperature. Mean barrier predictions, however, are systematically overestimated. This can be traced back to the reduced ability of HVMC to predict matrix elements corresponding to strongly varying modes. The rank-minimization scheme cannot adequately capture all sharp changes in eigenvalues of the projected Hessian close to *s* = 0 observed in the non-S_{N}2 reactions. As can be seen in Figs. 3 and 4, HVMC underestimates the reduction in ZPE near the TS and, as a result, overestimates Δ*G*^{CVT}. For the MADH enzyme, in particular, Δ*G*^{CVT} is overestimated by 26%–32% as the HVMC-recovered spectra yield linearly varying values for $\omega i2$ between 0.1 and 0.35 (corresponding to *ω*_{i} = 1625–3045 cm^{−1}—Fig. S11 of the supplementary material).

. | $\mu \Delta GHVMCCVT$ $(\Delta GDFTCV T)$ . | |||
---|---|---|---|---|

Reaction . | 100 K . | 200 K . | 500 K . | 1000 K . |

CF_{3}CH_{3} + OH | 4.15 (3.16) | 4.27 (3.29) | 4.81 (3.77) | 5.70 (4.39) |

CH_{3} + C_{6}H_{6} | 10.11 (9.08) | 10.14 (9.11) | 10.34 (9.36) | 10.76 (9.73) |

C_{2}H_{5} + C_{6}H_{6} | 13.27 (12.07) | 13.41 (12.26) | 13.95 (12.98) | 14.97 (14.08) |

Ir catalysis | 18.45 (17.28) | 18.62 (17.46) | 19.55 (18.40) | 21.18 (19.90) |

MADH enzyme | 6.57 (4.98) | 6.60 (5.07) | 6.69 (5.32) | 6.80 (5.36) |

. | $\mu \Delta GHVMCCVT$ $(\Delta GDFTCV T)$ . | |||
---|---|---|---|---|

Reaction . | 100 K . | 200 K . | 500 K . | 1000 K . |

CF_{3}CH_{3} + OH | 4.15 (3.16) | 4.27 (3.29) | 4.81 (3.77) | 5.70 (4.39) |

CH_{3} + C_{6}H_{6} | 10.11 (9.08) | 10.14 (9.11) | 10.34 (9.36) | 10.76 (9.73) |

C_{2}H_{5} + C_{6}H_{6} | 13.27 (12.07) | 13.41 (12.26) | 13.95 (12.98) | 14.97 (14.08) |

Ir catalysis | 18.45 (17.28) | 18.62 (17.46) | 19.55 (18.40) | 21.18 (19.90) |

MADH enzyme | 6.57 (4.98) | 6.60 (5.07) | 6.69 (5.32) | 6.80 (5.36) |

Finally, we examine the accuracies of ZCT transmission coefficients using eigenvalues predicted by HVMC. Figure 5 depicts histograms of HVMC-predicted to target (DFT) *κ*^{ZCT} ratios for non-S_{N}2 reactions. The lowest percentage errors are obtained for the C_{2}H_{5} + C_{6}H_{6} system with all trials yielding less than 20%. Although the remaining reactions can exhibit errors as high as 80%, the impact on the rate coefficient is still within an order of magnitude (Table I). The MADH enzyme reaction shows an error distribution that is distinct from the remaining systems, with most errors above 50%, in line with noisy $VaG$ that are on average higher than the DFT values for this system. There is a systematic underestimation of the *κ*^{ZCT} values with HVMC due to the increased values of $VaG$ along the MEP. Tables S3 and S4 show mean transmission coefficients over 100 trials for *S*_{N}2 and non-*S*_{N}2 reactions, respectively. On average, HVMC predicts ZCT coefficients with less than 53% error for all systems, with the largest error corresponding to the MADH enzyme. Standard deviations in HVMC transmission coefficients exhibit a decrease with increasing system size for the three hydrogen atom transfer reactions. The highest standard deviation in *κ*^{ZCT} is observed for Ir catalysis owing to the presence of multiple strongly varying modes in its spectrum. Even with such a complex system, the average error is less than 40%, equivalent to a 0.7-fold change in rate coefficient.

The preceding analyses demonstrate that by exploiting the low-rank property of the polynomial expansion of potential energy, we can recover eigenvalues of the projected Hessians constituting the MEP by sampling a small subset of elements at random. Sampling requirements decrease with increasing system size, indicating that the algorithm may scale favorably, therefore reducing the computational effort associated with Hessian calculations for larger systems. Extensive examination of HVMC outcomes for non-S_{N}2 reactions demonstrates that the algorithm is robust and captures both quantum and variational effects—ZCT transmission coefficients, ZPEs, free energies, barriers, and barrier positions—with reasonable accuracies. Potential difficulties arise in systems such as CF_{3}CH_{3} + OH in which $VaG$ and *G*_{vib} flatten in the region of *s* = 0, and HVMC yields a lower success rate with identifying the right maxima than it does when the MEP is characterized by a sharp peak at the TS.

Minimum sampling densities are difficult to predict exactly *a priori*. However, we show that these are generally very small $(\u22640.2)$ for reactions that do not involve hydrogen transfer. With the exception of the smallest system (10 atoms), reactions consisting of single hydrogen transfer exhibit decreasing *ρ*_{min} with increasing size, with a peak value of 0.35. The concerted reaction step for the Ir catalysis system that involves simultaneous hydride and two proton transfers has the highest rank and naturally requires about 50% of total Hessian information constituting the reaction path. However, such complexity is rare in most catalytic reactions, and therefore, we believe that the *ρ*_{min} for Ir catalysis represents an upper bound to HVMC sampling requirements for complex reactive systems.

### C. Future directions: Sampling strategy

The primary barrier to a practical HVMC implementation for Hessian eigenvalue recovery lies in the sampling approach. Most matrix completion methods rely on the random sampling of elements. One way to calculate random elements of **C** is by employing finite differences of gradients^{71–73} with some knowledge of the corresponding Hessian eigenvectors from either low-resolution methods (say, molecular mechanics)^{42} or from approximate Hessians constructed in the course of MEP generation via Hessian-based predictor-corrector (HPC) methods.^{74,75} Such an approach, however, is expected to yield approximate or “noisy” eigenvalues, with deviations from target values that are non-random in nature. Therefore, MC methods that are specifically suited for noisy data and incorporate flexible noise models become necessary if we pursue the development of a practical algorithm based on random sampling of elements for use alongside quantum chemistry software.^{76} It is also possible to use “noisy” eigenvector data obtained via the HPC method and apply MC methods to predict the target eigenvector tensor. Such information can pave the way forward to evaluate more accurate tunneling coefficients. Reaction-path curvature leading to corner-cutting can be incorporated to expand quantum effects to SCT or even LCT if eigenvector information can be recovered.^{16}

In computational catalysis studies, full Hessians are routinely calculated to verify and determine the vibrational spectra of stationary points—reactants, products, and TSs. In other words, columns of **C** are available to us rather than random elements. Classical MC approaches, which typically use randomized sampling, fail to accommodate column sampling on account of the objective function consisting purely of an expression for rank minimization. However, there is work that exploits column sampling coupled with information about the row space of the incomplete matrix.^{77,78} Our group is modifying the objective function of HVMC by including an additive term that preserves the near-polynomial character of matrix elements *across* the columns of **C**.^{79} Note that the polynomial character describes the row space of the incomplete matrix. Such an objective function can then be coupled with a pre-processing technique that identifies which additional columns are most different from those corresponding to stationary points (or, in general, those that are already sampled).^{80} Only these columns are then sampled to accurately recover the matrix elements. This is expected to lead to an MC algorithm that is readily compatible with quantum chemistry programs as it identifies the exact points on the MEP for which full Hessian calculations are necessary.

## IV. CONCLUSIONS

Matrix completion methods are widely used to recover signals from noisy, incomplete data with high fidelity. Our goal is to employ these methods as cost-reduction strategies, to recover otherwise expensive eigenvalues of Hessians for points constituting the MEP, and to calculate quantum effects and variational observables for chemical reactions. In our previous work, we developed a harmonic variety-based matrix completion (HVMC) algorithm that utilizes the underlying polynomial structure of potential energies of points on the MEP to construct a low-rank problem. We demonstrated that for S_{N}2 reactions, a small (random) fraction of matrix elements is sufficient for HVMC to complete the missing information and, therefore, predict ZPE and Δ*G*_{vib} with reasonable accuracy. In this work, we expand the suite of reactions to include larger *S*_{N}2 systems and hydrogen transfer reactions (CF_{3}CH_{3} + OH, CH_{3} + C_{6}H_{6}, C_{2}H_{5} + C_{6}H_{6}, metal–ligand cooperative Ir catalysis, and MADH enzy1me catalysis) to evaluate HVMC performance in predicting ZPE, Δ*G*_{vib}, positions and magnitudes of barriers, and ZCT transmission coefficients. Minimum sampling densities (*ρ*_{min}) required for accurate predictions of quantum and variational effects drop with an increase in the system size demonstrating favorable scaling of the algorithm, although they also depend strongly on the problem rank or the number of modes that are strongly coupled to the reaction coordinate. HVMC accurately predicts barriers even though they are systematically overestimated on account of under-prediction of the drop in ZPE in the vicinity of the TS. HVMC predictions of the barrier positions are also accurate but more reliable for reactions that possess sharper peaks in their $VaG$ (and therefore Δ*G*_{vib}) curves. While it is difficult to predict or recommend *ρ*_{min} for an arbitrary reaction, our study shows that 20% is an upper limit for those without hydrogen transfer and about 35% for medium to large systems involving a single hydrogen atom transfer. As random sampling may not constitute an immediately practical strategy for reducing computational cost, we outline ongoing efforts toward developing a novel MC algorithm that is amenable to column sampling and, therefore, will be directly applicable for VTST-MT calculations in computational catalysis studies. We also note that matrix completion methods can be designed to optimize functions of matrices and not just reconstruction error, such as finding maxima.^{70} We propose to consider such approaches in the future to improve barrier predictions.

## SUPPLEMENTARY MATERIAL

Additional information regarding the HVMC algorithm; methods for eigenvalue ordering, column truncation, and rank estimation; quantitative data for tunneling coefficients; and additional plots and data for *S*_{N}2 and non-*S*_{N}2 reaction systems are available in the supplementary material.

## ACKNOWLEDGMENTS

This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences, under Award No. DE-SC0021417. The authors also acknowledge computational resources and support from the National Energy Research Scientific Computing Center (NERSC) and USC’s Center for Advanced Research Computing (CARC). S.J.Q. acknowledges support from the Provost’s Undergraduate Research Fellowship at USC.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

## DATA AVAILABILITY

The data that support the findings of this study are available within the article and its supplementary material. The HVMC code along with analysis scripts is available in the RateTheory/HVMC GitHub repository at https://github.com/RateTheory/HVMC.^{69}

## REFERENCES

_{2}H

_{3}+ H → C

_{2}H

_{4}reaction

_{4}at 223–2400 K

_{2}H

_{2}

_{4}and CH

_{3}Cl

_{2})], [Fe (μ–O

_{2})] and Fe (IV)–O cores based on DFT potential energy surfaces

_{3}CH

_{3}+ OH hydrogen abstraction reaction

_{6}H

_{6}+ CH

_{3}/C

_{2}H

_{5}= C

_{6}H

_{5}+ CH

_{4}/C

_{2}H

_{6}reactions

^{−}+ CH

_{4}→ CH

_{4}+ H

^{−}