This work examines the viability of matrix completion methods as cost-effective alternatives to full nuclear Hessians for calculating quantum and variational effects in chemical reactions. The harmonic variety-based matrix completion (HVMC) algorithm, developed in a previous study [S. J. Quiton et al., J. Chem. Phys. 153, 054122 (2020)], exploits the low-rank character of the polynomial expansion of potential energy to recover vibrational frequencies (square roots of eigenvalues of nuclear Hessians) constituting the reaction path using a small sample of its entities. These frequencies are essential for calculating rate coefficients using variational transition state theory with multidimensional tunneling (VTST-MT). HVMC performance is examined for four SN2 reactions and five hydrogen transfer reactions, with each H-transfer reaction consisting of at least one vibrational mode strongly coupled to the reaction coordinate. HVMC is robust and captures zero-point energies, vibrational free energies, zero-curvature tunneling, and adiabatic ground state and free energy barriers as well as their positions on the reaction coordinate. For medium to large reactions involving H-transfer, with the sole exception of the most complex Ir catalysis system, less than 35% of total eigenvalue information is necessary for accurate recovery of key VTST-MT observables.
I. INTRODUCTION
In this work, we assess the viability of a recently developed algorithm, namely, harmonic variety-based matrix completion (HVMC), as a computationally efficient alternative to multiple Hessian calculations otherwise necessary to compute quantum and variational effects in chemical reactions.1 Our larger objective is to enable inexpensive reaction rate predictions for catalytic reactions with direct dynamics approaches. Most computational kinetics studies employ conventional transition state theory (TST).2–4 TST is based on the principle that the potential energy maximum (transition state, a first-order saddle point) in a reaction step defines the dividing surface between reactants and products. It is employed alongside electronic structure calculations and the independent-mode, harmonic oscillator approximation to determine rate coefficients of elementary chemical reactions. However, conventional TST presents an oversimplified picture of the reaction path as bond-breaking/making does not occur at the harmonic limit and the no-recrossing assumption is not valid at higher temperatures. Therefore, it routinely breaks down at high temperatures when the reaction is barrierless or when modes are strongly coupled to the reaction coordinate.5–8 Tunneling, a quantum mechanical phenomenon that is significant at low temperatures particularly when light atoms, such as hydrogen, participate in the reaction step,9,10 is also not captured under the framework of conventional TST.
Variational transition state theory (VTST), also known as canonical variational theory (CVT), pursues the free energy maximum in a reaction step or, equivalently, finds the dividing surface that minimizes recrossing of reactive trajectories.10–12 For loose transition states in barrierless association reactions, variable-reaction-coordinate VTST (VRC-VTST) has been developed.13–15 Fernandez-Ramos and co-workers established frameworks for quantifying multidimensional tunneling (MT) effects on VTST (VTST-MT) reaction rates by means of a multiplicative transmission coefficient.16 MT corrections include vibrational modes via the adiabatic-ground state potential and, hence, contain quantized energies. They are also based on the fact that the tunneling path is shorter than the minimum energy path (MEP) of the reaction (also known as “corner-cutting”). Small-curvature tunneling (SCT) and large-curvature tunneling (LCT) take into account of the corner-cutting, while zero-curvature tunneling (ZCT) neglects mode-coupling and corner-cutting and, therefore, underestimates tunneling.9,16,17 Several programs have been developed to perform VTST-MT rate coefficient calculations. The FORTRAN-based POLYRATE program for polyatomic molecules supports both TST and VTST for reaction rate calculations and ZCT, SCT, and LCT corrections.18 More recently, a Python-based program called Pilgrim has been developed, which calculates thermal rate coefficients with VTST and supports tunneling calculations via ZCT and SCT approximations.19
Despite the availability of well-established theories and software, conventional TST continues to be widely used, particularly in thermal catalysis studies. The practical use of VTST is limited for several reasons, including the ambiguity associated with the treatment of very small vibrational frequencies. The harmonic oscillator is a poor model for these low-frequency or large amplitude modes.20 The multi-structural approximation with the torsional anharmonicity (MS-T) method, available as a Fortran based package called MSTor,21,22 determines torsional correction factors to the harmonic approximation for such modes by using partition functions to calculate thermodynamic properties.23 VTST also requires multiple nuclear Hessians to determine vibrational modes constituting the entire MEP as opposed to conventional TST, which only requires Hessians at two stationary points (initial state and transition state). Truhlar and co-workers developed interpolated VTST (IVTST), which aims to minimize this effort and enables efficient tunneling and rate coefficient calculations by fitting splines to energies, gradients, moments of inertia, and eigenvalues of Hessians constituting the MEP.24–29 Several studies, ranging from gas phase radical chemistry to reactions of transition metal complexes, have employed IVTST to compute rate coefficients with tunneling corrections.30–33 However, it must be noted that some of these studies still required several Hessian calculations. An alternative strategy known as multiconfigurational molecular mechanics (MCMM) has also been proposed by Lin and co-workers, which reduces Hessian costs for non-stationary points by only evaluating some Hessian elements with electronic structure methods and the remaining via interpolation.34 The differing computational treatment, however, involves user intervention to classify constituent atoms into core, vicinal, geminal, and distant groups depending on the extent of participation in bond-breaking or formation events.
Our goal is to leverage advances in signal processing methods, particularly matrix completion (MC),35 to render VTST-MT more tractable. MC methods exploit an underlying property of the system, such as rank or sparsity, to complete/recover missing information. Methods that exploit sparsity, known as compressed sensing (CS), have found widespread applications in imaging,36 analytical chemistry,37–39 thermodynamics,40 and quantum chemistry.1,41–44 In the latter domain, CS has been employed to compute lattice couplings,41 determine large Hessians,42 construct anharmonic potential energy surfaces,43 and identify scaling relationships in heterogeneous catalysis.44 To the best of our knowledge, MC methods that leverage properties other than sparsity have yet to find applications in chemistry. The sole exception is the previous work by our groups toward rendering VTST-related calculations more efficient.1 We constructed a matrix of nuclear Hessian eigenvalues (squares of vibrational frequencies) constituting the MEP based on the reaction path Hamiltonian (RPH) expansion of the potential energy term in projected normal mode coordinates.45,46 We demonstrated that variety-based matrix completion (VMC)47 exploits the low-rank character of this polynomial problem and recovers randomly missing eigenvalues. The algorithm, termed harmonic variety-based matrix completion, recovers the full matrix using only a small subset of its elements with accuracies that yield reliable zero-point and vibrational free energies (temperature up to 1000 K) for several SN2 reactions. Sampling requirements (or cost) decrease with increasing system size, showing that HVMC may scale favorably with size. While this is encouraging, SN2 reactions typically consist of reaction coordinates that are not strongly coupled to the remaining vibrational degrees of freedom. In other words, the robustness of HVMC across a broad spectrum of reactions, where more than one vibrational mode varies strongly in the course of the reaction step, must be examined.
This work addresses the question of HVMC robustness by expanding the scope of algorithm testing in both reaction classes and observables. Five representative reactions spanning gas phase, inorganic, and enzyme chemistry are chosen, each involving the transfer of at least one hydrogen atom/hydride/proton. The SN2 systems are also expanded to include two larger reactions (56 and 66 atoms). Zero-point energies, vibrational free energies, ZCT transmission coefficients, and values and positions of maxima are calculated to assess HVMC performance. The minimum sampling density, or fraction of overall eigenvalue information of the projected Hessians necessary to recover key quantities, is small and generally decreases with increasing system size, indicating favorable scaling of the algorithm. Very high densities are only observed for either very small systems (≤10 atoms) or in very rare instances where the reaction coordinate consists of multiple bond-breaking/making events. HVMC predicts free energy maxima more reliably for reactions in which the peak of the free energy profile is sharp. Higher scatter in HVMC predictions of free energies in the vicinity of transition states makes predictions of maxima for flatter free energy profiles more challenging. This also indicates that an alternative to the current random sampling method is necessary, with strategies that are more compatible with routine quantum chemistry calculations outlined at the end of this study. Robust and scalable HVMC performance, therefore, shows that MC methods offer promising and efficient means for applying VTST-MT approaches to study complex catalytic reactions.
II. METHODS
To calculate variational rate and/or multidimensional tunneling coefficients, knowledge of the reaction path—initial, final, and transition structures (TS), as well as the minimum energy path connecting the three structures—is essential. In addition, vibrational analyses for points constituting the MEP are required, which become prohibitive for large systems or when analytical second derivatives are unavailable. Our goal is to lower the computational effort associated with such vibrational analyses. This section outlines the procedure employed to examine the ability of matrix completion methods, specifically HVMC, to recover (calculate) all vibrational frequencies constituting the MEP by using only a small, randomly available fraction of the total information. Reaction paths with complex variations in vibrational frequencies are chosen to probe the robustness of HVMC.
A. HVMC implementation and performance evaluation
The construction of a polynomial variety problem for an MEP using the reaction path Hamiltonian (RPH)45,46 is described in detail in our previous work.1 In the RPH formalism, the potential energy for a small displacement [V(s, q)] from the MEP [V0(s)] can be expressed as
The reaction coordinate is parameterized by s, which is zero at the TS, negative in the reactant region (reactant → s1), and positive in the product region (product → sM). are eigenvalues of Hessians constituting the MEP, from which translations, rotations, and the reaction coordinate are projected out. The eigenvalues are also the squares of transverse vibrational frequencies. constitute 3N − 7 (or F − 1 where F quantifies vibrational degrees of freedom) projected normal mode displacements from the MEP, where N is the number of atoms in the system. A lifted polynomial equation of the form xTC′ = 0 can be constructed by expressing Eq. (1) for all points on the MEP, where x is the vector of projected normal mode displacements. This representation constitutes a low-rank problem as only a fraction of the overall degrees of freedom of a system changes significantly in the course of a reaction step. By neglecting the top row of C′, which consists of [V(s, q) − V0(s)] terms that are assumed to be small and approximately uniform for very small displacements, we obtain C,
The number of columns of C is equal to the number of points, m, on the MEP. A C matrix with missing entries can be completed using a matrix completion method with rank minimization as its objective function. Our previous work demonstrates that HVMC achieves this goal and accurately predicts zero-point energies and vibrational free energies for model SN2 reactions.1 The HVMC algorithm employed in both the current and previous work is nearly identical to the VMC implementation developed by Ongie and co-workers.47 The algorithm is summarized in Sec. 1 of the supplementary material.
This work determines whether HVMC can recover eigenvalues of the projected Hessians accurately for reactions in which at least one transverse mode is strongly coupled to the reaction coordinate. We note key implementation details and distinctions from our previous work. First, as described in Sec. 2 of the supplementary material, the eigenvalues in each column of C are in ascending order rather than being based on eigenvector overlap between adjacent columns (diabatic ordering)48 employed in our previous work. This is significant because otherwise eigenvectors are required to achieve diabatic ordering of the elements in each column—necessitating the same computational effort that we are attempting to reduce or avoid entirely. The second difference lies in the construction of incomplete C matrices as HVMC input. HVMC performance is evaluated by constructing a complete C matrix and masking elements at random to achieve the desired sampling density, ρ, defined as the number of elements sampled per column. While our previous work utilizes purely random sampling to achieve the desired value of ρ, here we ensure that at least one element is sampled (at random) for every row of C.35 Therefore, the minimum possible value of ρ (ρmin) for any system corresponds to the reciprocal of the number of columns or number of points constituting the MEP.
Similar to our previous study, 100 random trials are performed for each system for a given value of ρ. HVMC performance is assessed based on the statistical outcomes of these trials. The minimum required sampling density for every system is determined based on multiple criteria—Frobenius norm error of the recovered matrix (Err) and prediction errors associated with zero-point energy (ZPE), vibrational free energy [Gvib(T, s)] at high temperature (1000 K), and transmission coefficient based on zero-curvature tunneling at low temperature (300 K). The minimum sampling required, ρmin, at which at least 75 out of 100 random HVMC trials recover these quantities within the error margins described below, is reported and used as the metric to assess HVMC scaling and robustness.
The HVMC recovery error via the Frobenius norm is calculated as
with subscripts HVMC and Target representing model prediction and target values obtained from density functional theory (DFT), respectively. Gvib(T, s) along the reaction coordinate (s) is given by9,48
The threshold for accuracy of ZPE and Gvib(T, s) is chosen as the maximum deviation in either quantity being less than 2.39 kcal/mol, identical to our prior work.1 The treatment of small modes that are not true vibrational degrees of freedom differs from previous work. We had previously neglected frequencies lower than a fixed, pre-determined threshold value in free energy calculations owing to the well-known issue of divergence of the entropy term.1 However, this approach causes an inconsistency in total degrees of freedom. To remedy this issue in our current work, we replace any frequencies below 100 cm−1 with 100 cm−1 and preserve the overall number of degrees of freedom. Matrix errors and minimum sampling densities, therefore, differ between the two studies for the SN2 reactions. We note that the number of eigenvalues below this threshold depends on the system but does not change significantly between HVMC trials and, therefore, does not impact our assessment of HVMC accuracies in predicting Gvib.
Our previous work focusing on HVMC development utilized ZPE and Gvib to assess performance but not multidimensional tunneling effects. In this work, quantum effects are captured using zero-curvature tunneling. ZCT does not rely on the eigenvector information as it neglects curvature and mode coupling,16 which makes it an appropriate choice for our current objectives although it underestimates tunneling contributions compared to SCT or LCT. The adiabatic ground-state potential is calculated by adding harmonically approximated vibrational ZPEs to the electronic energy at every point on the MEP,
The zero-curvature tunneling transmission coefficient is given by the ratio of Boltzmann averages of the semiclassical adiabatic ground-state (SAG) probability to the classical probability,16
β is 1/(kBT), reciprocal of the product of the temperature and Boltzmann constant. The probability function is energy-dependent and given in terms of limits determined by critical points on the adiabatic ground-state potential,16
Here, E0 is the reactant (product) energy for exothermic (endothermic) reaction and the maximum of the energy curve is referred to as VAG. We use DFT-calculated values of E0 and VAG for the calculation of transmission coefficients from eigenvalues predicted by HVMC since κZCT is sensitive to these two end-points. Generally, Hessian calculations of the reactant, product, and transition state are done before MEP computations, and thus, this assumption does not limit the applicability of HVMC. The action integral θ(E) based on the reduced mass μ (equal to unity for ZCT in atomic units) is calculated between the left (s<) and right (s>) turning points (s-values corresponding to given energy on the curve)16 and any additional points (s1 and s2), if present,
We follow the procedure described in the Pilgrim software to calculate ZCT using both the target and HVMC-recovered matrices.19 Splines are used to obtain a smooth potential function, and integration is carried out using Gaussian quadrature. We validate our Python program for transmission coefficient calculations by comparing them against those obtained with Pilgrim. The difference is below 1% for all test systems and temperatures available in the Pilgrim test set. HVMC-based ZCT values are compared with target (DFT) coefficients at 300 K. The accuracy is selected such that the predicted rate is within an order of magnitude of the target rate. Since the transmission coefficient is a multiplicative factor, HVMC accuracy is deemed acceptable if the predicted κZCT is within one order of magnitude of the target value.
B. Reactions
Using the metrics described above, we assess HVMC performance across a broad spectrum of molecular reactions spanning gas phase, enzymatic, and catalytic chemistry. With the exception of SN2 reactions, the eigenvalue spectra consist of one or more modes that are strongly coupled to the reaction coordinate. Therefore, they serve as ideal candidates to test the robustness of HVMC. The following reaction classes and reactions are examined with the corresponding transition structures (TSs) illustrated in Fig. 1 and Hessian eigenvalue spectra constituting the MEP in Fig. 2:
SN2: Four symmetric reactions of alkyl/aryl chlorides with system sizes ranging from 6 to 66 atoms. As the reactions are symmetric, HVMC is employed to recover eigenvalues corresponding to the half-reaction. Complete reaction paths are employed to calculate transmission coefficients.
Hydrogen abstraction: Three reactions are chosen—CF3CH3 + OH → CF3CH2 + H2O,50 C6H6 + CH3 → C6H5 + CH4,51 and C6H6 + C2H5 → C6H5 + C2H6,51 consisting of 10, 16, and 19 atoms, respectively. As tunneling plays a significant role in determining rate coefficients of hydrogen atom transfer reactions, these reactions (and those below) serve to also test HVMC accuracies in predicting quantum effects.
Metal–ligand cooperative homogeneous Ir catalysis: In this reaction, isopropanol is oxidized to acetone catalyzed by an Ir complex via concerted hydride transfer to the metal center and proton transfer via a solvent-assisted proton shuttle to the ligand.52 This reaction represents the most complex example we could find where the reaction coordinate is comprised of at least three stretching modes (one hydride transfer and two proton transfers). It is possible therefore that this reaction represents a high-rank problem that is close to the upper limit of complexity observable in a single-step reaction. The system consists of 46 atoms.
Enzymatic deprotonation: This is the second largest system in our study (64 atoms) and largest among non-SN2 reactions. It describes the oxidation of methylamine by the Methylamine Dehydrogenase (MADH) enzyme.53
Transition structures (TSs) of reactions employed to assess HVMC performance overlaid on reactants and products (transparent). Arrows indicate atom transfer(s) occurring in the TS. Numbers in parentheses represent the total number of atoms in the system. Color scheme—gray: C, white: H, green: Cl, red: O, orange: F, blue: N, cyan: Ir, and yellow: S. Model visualizations are created using the Jmol package.49
Transition structures (TSs) of reactions employed to assess HVMC performance overlaid on reactants and products (transparent). Arrows indicate atom transfer(s) occurring in the TS. Numbers in parentheses represent the total number of atoms in the system. Color scheme—gray: C, white: H, green: Cl, red: O, orange: F, blue: N, cyan: Ir, and yellow: S. Model visualizations are created using the Jmol package.49
Eigenvalues of the projected Hessians constituting the MEP (C matrix elements) for which HVMC performance is examined. Half-MEPs are depicted for SN2 reactions. Squares of frequencies and reaction coordinate s are in atomic units.
Eigenvalues of the projected Hessians constituting the MEP (C matrix elements) for which HVMC performance is examined. Half-MEPs are depicted for SN2 reactions. Squares of frequencies and reaction coordinate s are in atomic units.
The ωB97X/def2-SVP level of theory54,55 is employed in all cases, with the exception of the iridium catalyst system, and all calculations are carried out using the Q-Chem ab initio quantum chemistry software (version 5.2 and higher).56 For the Ir catalysis system, the dispersion-corrected (Grimme’s D3 with the Becke–Johnson damping, D3-BJ)57,58 B3LYP functional59,60 is used. The Stuttgart small-core effective core potential is used to describe the Ir atom,61,62 and the remaining elements are represented using double-ζ 6-31G* basis set. The level of theory is comparable to that utilized in the original work.52
MEPs are calculated in mass-weighted coordinates, also known as intrinsic reaction coordinates (IRCs), using the gradient-based predictor-corrector algorithm in Q-Chem.63–65 We note that employing mass-weighted Cartesian coordinates in vibrational frequency calculations can lead to unphysical modes. Such issues can be eliminated using curvilinear coordinates.66–68 Shapes of the eigenvalue spectrum and adiabatic ground-state potential curves are likely to be affected by the choice of coordinates. However, independent of the coordinate selection, our HVMC method is applicable, and examining performance for C constructed in internal coordinates constitutes future work. The maximum IRC stepsize for each system is chosen to ensure a smooth MEP and is equal to 0.1 a.u. for most reactions, except MADH and hydrogen abstractions (0.05 a.u., except C6H6 + C2H5, for which 0.025 a.u. is used).
Section 3 of the supplementary material describes the procedure used to construct complete C matrices from Hessian calculations for points on the MEP. The procedure ensures that C matrices for all reactions have the identical number of columns or points on the MEP (24) and only differ in the number of rows (3N − 7). As seen in Fig. 2, these systems represent diverse variations in vibrational frequencies in the course of a reaction step from monotonic in SN2 to at least four modes exhibiting significant changes when traversing the MEP in the Ir metal–ligand cooperative catalytic reaction. We demonstrate in this work that zero-point energies, vibrational free energies, and zero-curvature tunneling coefficients are accurately captured with HVMC when only a small (random) subset of vibrational frequency information is available for every system.
This work examines HVMC accuracies relative to DFT in predicting terms constituting the VTST-MT rate coefficient expression rather than to compute and contrast rate coefficients themselves with accurate experimental or computational benchmarks. This is because uncertainties arise from both the chosen level of theory as well as our approximate treatment of small modes, which are not easy to isolate and quantify. All data, the HVMC program, and supporting Python routines to calculate quantum and variational parameters are publicly available on GitHub (https://github.com/RateTheory/HVMC).69
III. RESULTS AND DISCUSSION
A. HVMC accuracy and robustness
Scaling and robustness of HVMC are evaluated based on the minimum sampling density, ρmin, necessary to recover ZPE and Gvib (1000 K) within “chemical accuracy” (maximum deviation ≤ 2.39 kcal/mol or 10 kJ/mol) and ZCT transmission coefficient (300 K) to within an order of magnitude in at least 75 out of 100 trials. Mean errors in Gvib (1000 K) and ZPE are within chemical accuracy for all 100 trials for each of the SN2 and non-SN2 systems examined. Table I describes mean Frobenius norm errors associated with HVMC recovery as well as the percentage of trials that satisfy our criteria for maximum deviations in key quantum and variational observables. The results are divided into SN2 and non-SN2 systems, as the former is employed to demonstrate HVMC scaling and the latter to examine the robustness. Rest is the estimated rank of the full matrix, determined using an information criterion presented in Sec. 4 of the supplementary material. SN2 reactions are all low-rank, while the remaining systems exhibit higher (and varying) complexity, confirming what is visually observable in Fig. 2. The Ir catalysis reaction exhibits the highest rank among all systems, thereby allowing us to probe how low-rank the problem needs to be for HVMC to be a reliable, efficient alternative to full MEP Hessian calculations.
HVMC performance (100 trials/system)—minimum sampling density ρmin, estimated rank (Rest), mean norm error (μErr,mat), number of trials within chemical accuracy (2.39 kcal/mol) of maximum deviation of vibrational free energy (Gvib—1000 K), and ZPE and number of trials within chemical accuracy (one order of magnitude) of the transmission coefficient. Mean ZPE and Gvib errors are within chemical accuracy for all 100 trials across all systems.
. | . | . | . | |Max.deviation| ≤ 2.39 kcal/mol . | . | |
---|---|---|---|---|---|---|
Reaction . | ρmin . | Rest . | μErr,mat ± σ . | Gvib (1000 K) . | ZPE . | . |
SN2(1) | 0.2 | 3 | 2.32 ± 0.81 | 84 | 100 | 100 |
SN2(2) | 0.05 | 2 | 1.45 ± 0.2 | 100 | 100 | 100 |
SN2(3) | 0.045 | 2 | 0.76 ± 0.13 | 100 | 100 | 100 |
SN2(4) | 0.045 | 2 | 0.66 ± 0.13 | 100 | 100 | 100 |
CF3CH3 + OH | 0.6 | 8 | 5.56 ± 1.56 | 86 | 100 | 100 |
CH3 + C6H6 | 0.35 | 6 | 5.57 ± 1.51 | 80 | 100 | 100 |
C2H5 + C6H6 | 0.3 | 6 | 6.10 ± 1.34 | 84 | 100 | 100 |
Ir catalysis | 0.5 | 10 | 2.52 ± 0.74 | 81 | 90 | 100 |
MADH enzyme | 0.1 | 7 | 4.30 ± 0.89 | 75 | 98 | 100 |
. | . | . | . | |Max.deviation| ≤ 2.39 kcal/mol . | . | |
---|---|---|---|---|---|---|
Reaction . | ρmin . | Rest . | μErr,mat ± σ . | Gvib (1000 K) . | ZPE . | . |
SN2(1) | 0.2 | 3 | 2.32 ± 0.81 | 84 | 100 | 100 |
SN2(2) | 0.05 | 2 | 1.45 ± 0.2 | 100 | 100 | 100 |
SN2(3) | 0.045 | 2 | 0.76 ± 0.13 | 100 | 100 | 100 |
SN2(4) | 0.045 | 2 | 0.66 ± 0.13 | 100 | 100 | 100 |
CF3CH3 + OH | 0.6 | 8 | 5.56 ± 1.56 | 86 | 100 | 100 |
CH3 + C6H6 | 0.35 | 6 | 5.57 ± 1.51 | 80 | 100 | 100 |
C2H5 + C6H6 | 0.3 | 6 | 6.10 ± 1.34 | 84 | 100 | 100 |
Ir catalysis | 0.5 | 10 | 2.52 ± 0.74 | 81 | 90 | 100 |
MADH enzyme | 0.1 | 7 | 4.30 ± 0.89 | 75 | 98 | 100 |
For the smallest SN2 reaction, HVMC recovers Gvib accurately in more than 80% of all trials if 20% of the total eigenvalue information is available, with at least one data-point available for every eigenvalue mode (row of C). The corresponding mean norm error (μErr,mat) is 2.32%. In line with our previous work,1 HVMC scales favorably with system size demonstrated by a sharp decrease in both ρmin and μErr,mat from SN2(1) to SN2(4). ρmin hits the minimum allowable limit (4.5%) for SN2(3) and SN2(4), at which at least one element per row is sampled (≈1/24). Maximum deviations of ZPE and Gvib for all trials are within chemical accuracy for the larger systems. ZCT transmission coefficients (κZCT) calculated from HVMC-recovered eigenvalues are within an order of magnitude of DFT-based coefficients. Plots describing HVMC accuracies in predicting Gvib(1000 K), ZPE, and κZCT are presented in Sec. 6.1 of the supplementary material.
While HVMC is both accurate and requires lower sampling for larger SN2 reactions, eigenvalues [Figs. 2(a)–2(d)] vary monotonically along the MEP, indicating that alternative, straightforward approaches, such as curve-fitting or spline-interpolation, may also yield similar results. Therefore, diverse spectra [Figs. 2(e)–2(i)] are employed to test the robustness of HVMC. Representative eigenvalue spectra predicted by HVMC are shown in Figs. S7–S11 of the supplementary material. Table I (bottom) shows that Gvib, ZPE, and κZCT are accurately captured with limited sampling despite larger μErr,mat compared to SN2 reactions. The smallest system, CF3CH3 + OH (10 atoms), has the highest ρmin of 60%, which is higher than SN2 reactions of comparable size. This shows that non-monotonic variations in eigenvalues necessitate greater sampling. Both CH3 + C6H6 and C2H5 + C6H6 eigenvalue spectra, similar to CF3CH3 + OH, possess at least one mode each that varies more strongly with s than others. Reduced sampling requirements for these two systems compared to CF3CH3 + OH, similar to SN2 systems, indicate that a smaller fraction of matrix information is required as systems grow larger. However, even though the error tolerances in observables are met in the case of C2H5 + C6H6, the trough of the strongly varying mode in Fig. 2(g) cannot be completely captured (Fig. S9 of the supplementary material). While this means that higher sampling may be necessary to recover key features of the spectrum, establishing general selection criteria for sampling density a priori across various classes of reactions and system sizes is a difficult task. Instead, we examine the implications of our criteria on HVMC predictions of free energy maxima in Sec. III B.
As shown in Fig. 2(h), the Ir-based metal–ligand cooperative catalysis system consists of the highest number of eigenvalues that change significantly over the course of the reaction among all systems examined (highest Rest). Consequently, a ρmin of about 50% is essential to capture quantum and variational effects accurately. HVMC-recovered spectra for specific trials (Fig. S10 of the supplementary material) demonstrate the method’s ability to capture complex spectra with sufficient sampling. The MADH enzyme model is the largest system examined and is associated with the lowest ρmin among non-SN2 reactions. Similar to the C2H5 + C6H6 system, the strongest varying mode in MADH as well as the smaller dips on the reactant side [Fig. 2(i)] are not adequately captured in HVMC trials (Fig. S11 of the supplementary material) despite ρmin satisfying all our accuracy thresholds.
B. Potential and free energy profiles, barriers, and ZCT coefficients
Table I demonstrates that HVMC is robust as it is able to capture non-monotonic variations in modes for most reactions. It is also necessary to examine the impact of approximately recovered eigenvalues on the resulting adiabatic ground-state potential energy and vibrational free energy curves. For the non-SN2 reactions, Figs. 3 and 4 depict HVMC (predicted) vs DFT (target) adiabatic ground-state potential and vibrational free energy differences (ΔGvib) at elevated temperatures (1000 K), respectively. With the sole exception of Ir catalysis [Fig. 3(d)], ’s obtained from HVMC [Figs. 3(a)–3(c) and 3(e)] are well within desired chemical accuracy (gray shaded region). The majority of outliers in Ir catalysis are in the region of the TS. The deviations of HVMC-predicted values from DFT for the CF3CH3 + OH system are also pronounced near the TS. This is because modes vary stronger in the vicinity of s = 0 compared to regions of the MEP closer to the reactant or product. Increased scatter of in the vicinity of the TS indicates that a different sampling strategy that is less random, or more structured, may be necessary (Sec. III C). errors are more distributed for the MADH enzyme compared to the other systems due to the poorer overall prediction of eigenvalue spectra described earlier (Fig. S11 of the supplementary material). The scatter in can be lowered by increased sampling, shown in Fig. S12 of the supplementary material.
Adiabatic ground-state potential obtained using HVMC (various colors) across 100 trials. Thick black curves represent target (DFT) potentials and gray shaded regions depict the ranges of chemical accuracy (±2.39 kcal/mol).
Adiabatic ground-state potential obtained using HVMC (various colors) across 100 trials. Thick black curves represent target (DFT) potentials and gray shaded regions depict the ranges of chemical accuracy (±2.39 kcal/mol).
Vibrational free energy differences (ΔGvib) at 1000 K along the reaction path for 100 trials. The reference is the HVMC-predicted energy of the reactant state for each trial. Thick black curves represent target (DFT) ΔGvib,1000K, and gray shaded regions depict chemical accuracy (±2.39 kcal/mol) ranges.
Vibrational free energy differences (ΔGvib) at 1000 K along the reaction path for 100 trials. The reference is the HVMC-predicted energy of the reactant state for each trial. Thick black curves represent target (DFT) ΔGvib,1000K, and gray shaded regions depict chemical accuracy (±2.39 kcal/mol) ranges.
The ground-state potential [Eq. (5)] is given by the sum of square roots of the elements constituting a column of the C matrix, whereas free energies are more complex functions of matrix elements [Eq. (4)]. Similar to our previous study, we utilize high-temperature ΔGvib (1000 K) as the goal is to assess HVMC performance for applications in thermal catalysis.1 HVMC predictions of ΔGvib variations along the MEP in Fig. 4 exhibit greater scatter compared to and algorithm performance is less straightforward to interpret. For all the systems, at least some HVMC trials lead to ΔGvib values outside the prescribed error range. Although the extent appears to be smaller for the Ir catalysis system in Fig. 4(e), this is the result of a wider free energy range (y-axis) spanned by the reaction step and not the absence of scatter. Scatter is reduced at lower temperatures (Fig. S13 of the supplementary material). ρmin is therefore determined not only by the complexity of the system but also by the chosen range of reaction temperatures.
Next, we examine HVMC’s ability to accurately predict the location and magnitude of the activation barrier or the difference between the maximum of adiabatic ground state/free energy and the reactant. It must be noted that the algorithm is designed for recovering all elements and not specifically to pursue maxima.70 Therefore, it is not necessary that HVMC yield reliable barrier predictions. The non-SN2 reactions can be categorized into two based on whether the free energy maximum is distinct from the adiabatic ground-state potential maximum at s = 0 (CF3CH3 + OH) or not (all others). CF3CH3 + OH has free energy maximum shifted to the right of the MEP maximum at s = 0.1204 [Fig. 4(a)].
Table II shows the number of HVMC trials that yield a maximum at the same s as the DFT maximum. The normalized standard deviations in the position of the maxima are also reported. There is no direct correlation between the matrix errors reported in Table I and the number of trials with the correct maximum. More than 90 trials out of 100 yield ΔGCVT and VAG at s = 0 in the case of MADH despite HVMC’s poor recovery of eigenvalue spectra (Fig. S11) and large Frobenius norm error (μErr,mat = 4.30). Based on our limited set of five reactions, it would appear that the maxima are accurately predicted by HVMC when they are sharp, which is the case with Ir catalysis and the MADH enzyme. For the remaining systems with shallower peaks (CF3CH3 + OH, CH3 + C6H6 and C2H5 + C6H6), HVMC is less reliable, and less than half of the trials yield correct maxima in two out of the three reactions. The shallow feature in the peak for CF3CH3 + OH is consistent with prior observations by Espinosa-García et al.50 and arises from the strong downward trend in a single vibrational mode in the region of the TS [Fig. 2(e)]. Normalized standard deviations associated with the location of maxima are higher forCF3CH3 + OH and C2H5 + C6H6 compared to the remaining systems. With the exceptions of CF3CH3 + OH and Ir catalysis, deviations in ΔGCVT are higher owing to increased scatter compared to VAG.
Number of HVMC trials in which ΔGvib maxima (ΔGCVT) and VAG coincide with DFT maxima for non-SN2 reactions. The free energy maxima are at s = 0 for all systems, except CF3CH3 + OH for which the maximum is at s = 0.1204. Normalized standard deviationa of s for 100 trials is also reported.
. | ΔGCVT (1000 K) . | VAG . | ||
---|---|---|---|---|
Reaction . | No. of trials . | σs . | No. of trials . | σs . |
CF3CH3 + OH | 33 | 0.051 | 35 | 0.059 |
CH3 + C6H6 | 38 | 0.026 | 45 | 0.019 |
C2H5 + C6H6 | 51 | 0.048 | 64 | 0.040 |
Ir catalysis | 73 | 0.019 | 71 | 0.020 |
MADH enzyme | 91 | 0.026 | 95 | 0.017 |
. | ΔGCVT (1000 K) . | VAG . | ||
---|---|---|---|---|
Reaction . | No. of trials . | σs . | No. of trials . | σs . |
CF3CH3 + OH | 33 | 0.051 | 35 | 0.059 |
CH3 + C6H6 | 38 | 0.026 | 45 | 0.019 |
C2H5 + C6H6 | 51 | 0.048 | 64 | 0.040 |
Ir catalysis | 73 | 0.019 | 71 | 0.020 |
MADH enzyme | 91 | 0.026 | 95 | 0.017 |
The reaction coordinate is normalized such that s = 0 and s = 1 correspond to reactant and product states, respectively.
Table III presents mean free energy barriers (ΔGCVT) that are employed in VTST rate coefficient calculations. Across the range of temperatures examined, HVMC predictions of the barriers on average are very close to . Despite reduced scatter in free energies along the MEP with decreasing temperature (Fig. S13 of the supplementary material), errors in free energy barriers are less sensitive to temperature. Mean barrier predictions, however, are systematically overestimated. This can be traced back to the reduced ability of HVMC to predict matrix elements corresponding to strongly varying modes. The rank-minimization scheme cannot adequately capture all sharp changes in eigenvalues of the projected Hessian close to s = 0 observed in the non-SN2 reactions. As can be seen in Figs. 3 and 4, HVMC underestimates the reduction in ZPE near the TS and, as a result, overestimates ΔGCVT. For the MADH enzyme, in particular, ΔGCVT is overestimated by 26%–32% as the HVMC-recovered spectra yield linearly varying values for between 0.1 and 0.35 (corresponding to ωi = 1625–3045 cm−1—Fig. S11 of the supplementary material).
HVMC-predicted (averaged over 100 trials) and DFT-based free energy barriers (ΔGCVT) in kcal/mol at temperatures ranging between 100 and 1000 K.
. | . | |||
---|---|---|---|---|
Reaction . | 100 K . | 200 K . | 500 K . | 1000 K . |
CF3CH3 + OH | 4.15 (3.16) | 4.27 (3.29) | 4.81 (3.77) | 5.70 (4.39) |
CH3 + C6H6 | 10.11 (9.08) | 10.14 (9.11) | 10.34 (9.36) | 10.76 (9.73) |
C2H5 + C6H6 | 13.27 (12.07) | 13.41 (12.26) | 13.95 (12.98) | 14.97 (14.08) |
Ir catalysis | 18.45 (17.28) | 18.62 (17.46) | 19.55 (18.40) | 21.18 (19.90) |
MADH enzyme | 6.57 (4.98) | 6.60 (5.07) | 6.69 (5.32) | 6.80 (5.36) |
. | . | |||
---|---|---|---|---|
Reaction . | 100 K . | 200 K . | 500 K . | 1000 K . |
CF3CH3 + OH | 4.15 (3.16) | 4.27 (3.29) | 4.81 (3.77) | 5.70 (4.39) |
CH3 + C6H6 | 10.11 (9.08) | 10.14 (9.11) | 10.34 (9.36) | 10.76 (9.73) |
C2H5 + C6H6 | 13.27 (12.07) | 13.41 (12.26) | 13.95 (12.98) | 14.97 (14.08) |
Ir catalysis | 18.45 (17.28) | 18.62 (17.46) | 19.55 (18.40) | 21.18 (19.90) |
MADH enzyme | 6.57 (4.98) | 6.60 (5.07) | 6.69 (5.32) | 6.80 (5.36) |
Finally, we examine the accuracies of ZCT transmission coefficients using eigenvalues predicted by HVMC. Figure 5 depicts histograms of HVMC-predicted to target (DFT) κZCT ratios for non-SN2 reactions. The lowest percentage errors are obtained for the C2H5 + C6H6 system with all trials yielding less than 20%. Although the remaining reactions can exhibit errors as high as 80%, the impact on the rate coefficient is still within an order of magnitude (Table I). The MADH enzyme reaction shows an error distribution that is distinct from the remaining systems, with most errors above 50%, in line with noisy that are on average higher than the DFT values for this system. There is a systematic underestimation of the κZCT values with HVMC due to the increased values of along the MEP. Tables S3 and S4 show mean transmission coefficients over 100 trials for SN2 and non-SN2 reactions, respectively. On average, HVMC predicts ZCT coefficients with less than 53% error for all systems, with the largest error corresponding to the MADH enzyme. Standard deviations in HVMC transmission coefficients exhibit a decrease with increasing system size for the three hydrogen atom transfer reactions. The highest standard deviation in κZCT is observed for Ir catalysis owing to the presence of multiple strongly varying modes in its spectrum. Even with such a complex system, the average error is less than 40%, equivalent to a 0.7-fold change in rate coefficient.
Zero-curvature tunneling transmission coefficient (κZCT) errors for 100 HVMC trials compared to DFT (target) values for non-SN2 reactions. The data are reported in Tables S2 and S4 of the supplementary material. Corresponding κZCT errors for SN2 reactions are shown in Fig. S6 of the supplementary material.
Zero-curvature tunneling transmission coefficient (κZCT) errors for 100 HVMC trials compared to DFT (target) values for non-SN2 reactions. The data are reported in Tables S2 and S4 of the supplementary material. Corresponding κZCT errors for SN2 reactions are shown in Fig. S6 of the supplementary material.
The preceding analyses demonstrate that by exploiting the low-rank property of the polynomial expansion of potential energy, we can recover eigenvalues of the projected Hessians constituting the MEP by sampling a small subset of elements at random. Sampling requirements decrease with increasing system size, indicating that the algorithm may scale favorably, therefore reducing the computational effort associated with Hessian calculations for larger systems. Extensive examination of HVMC outcomes for non-SN2 reactions demonstrates that the algorithm is robust and captures both quantum and variational effects—ZCT transmission coefficients, ZPEs, free energies, barriers, and barrier positions—with reasonable accuracies. Potential difficulties arise in systems such as CF3CH3 + OH in which and Gvib flatten in the region of s = 0, and HVMC yields a lower success rate with identifying the right maxima than it does when the MEP is characterized by a sharp peak at the TS.
Minimum sampling densities are difficult to predict exactly a priori. However, we show that these are generally very small for reactions that do not involve hydrogen transfer. With the exception of the smallest system (10 atoms), reactions consisting of single hydrogen transfer exhibit decreasing ρmin with increasing size, with a peak value of 0.35. The concerted reaction step for the Ir catalysis system that involves simultaneous hydride and two proton transfers has the highest rank and naturally requires about 50% of total Hessian information constituting the reaction path. However, such complexity is rare in most catalytic reactions, and therefore, we believe that the ρmin for Ir catalysis represents an upper bound to HVMC sampling requirements for complex reactive systems.
C. Future directions: Sampling strategy
The primary barrier to a practical HVMC implementation for Hessian eigenvalue recovery lies in the sampling approach. Most matrix completion methods rely on the random sampling of elements. One way to calculate random elements of C is by employing finite differences of gradients71–73 with some knowledge of the corresponding Hessian eigenvectors from either low-resolution methods (say, molecular mechanics)42 or from approximate Hessians constructed in the course of MEP generation via Hessian-based predictor-corrector (HPC) methods.74,75 Such an approach, however, is expected to yield approximate or “noisy” eigenvalues, with deviations from target values that are non-random in nature. Therefore, MC methods that are specifically suited for noisy data and incorporate flexible noise models become necessary if we pursue the development of a practical algorithm based on random sampling of elements for use alongside quantum chemistry software.76 It is also possible to use “noisy” eigenvector data obtained via the HPC method and apply MC methods to predict the target eigenvector tensor. Such information can pave the way forward to evaluate more accurate tunneling coefficients. Reaction-path curvature leading to corner-cutting can be incorporated to expand quantum effects to SCT or even LCT if eigenvector information can be recovered.16
In computational catalysis studies, full Hessians are routinely calculated to verify and determine the vibrational spectra of stationary points—reactants, products, and TSs. In other words, columns of C are available to us rather than random elements. Classical MC approaches, which typically use randomized sampling, fail to accommodate column sampling on account of the objective function consisting purely of an expression for rank minimization. However, there is work that exploits column sampling coupled with information about the row space of the incomplete matrix.77,78 Our group is modifying the objective function of HVMC by including an additive term that preserves the near-polynomial character of matrix elements across the columns of C.79 Note that the polynomial character describes the row space of the incomplete matrix. Such an objective function can then be coupled with a pre-processing technique that identifies which additional columns are most different from those corresponding to stationary points (or, in general, those that are already sampled).80 Only these columns are then sampled to accurately recover the matrix elements. This is expected to lead to an MC algorithm that is readily compatible with quantum chemistry programs as it identifies the exact points on the MEP for which full Hessian calculations are necessary.
IV. CONCLUSIONS
Matrix completion methods are widely used to recover signals from noisy, incomplete data with high fidelity. Our goal is to employ these methods as cost-reduction strategies, to recover otherwise expensive eigenvalues of Hessians for points constituting the MEP, and to calculate quantum effects and variational observables for chemical reactions. In our previous work, we developed a harmonic variety-based matrix completion (HVMC) algorithm that utilizes the underlying polynomial structure of potential energies of points on the MEP to construct a low-rank problem. We demonstrated that for SN2 reactions, a small (random) fraction of matrix elements is sufficient for HVMC to complete the missing information and, therefore, predict ZPE and ΔGvib with reasonable accuracy. In this work, we expand the suite of reactions to include larger SN2 systems and hydrogen transfer reactions (CF3CH3 + OH, CH3 + C6H6, C2H5 + C6H6, metal–ligand cooperative Ir catalysis, and MADH enzy1me catalysis) to evaluate HVMC performance in predicting ZPE, ΔGvib, positions and magnitudes of barriers, and ZCT transmission coefficients. Minimum sampling densities (ρmin) required for accurate predictions of quantum and variational effects drop with an increase in the system size demonstrating favorable scaling of the algorithm, although they also depend strongly on the problem rank or the number of modes that are strongly coupled to the reaction coordinate. HVMC accurately predicts barriers even though they are systematically overestimated on account of under-prediction of the drop in ZPE in the vicinity of the TS. HVMC predictions of the barrier positions are also accurate but more reliable for reactions that possess sharper peaks in their (and therefore ΔGvib) curves. While it is difficult to predict or recommend ρmin for an arbitrary reaction, our study shows that 20% is an upper limit for those without hydrogen transfer and about 35% for medium to large systems involving a single hydrogen atom transfer. As random sampling may not constitute an immediately practical strategy for reducing computational cost, we outline ongoing efforts toward developing a novel MC algorithm that is amenable to column sampling and, therefore, will be directly applicable for VTST-MT calculations in computational catalysis studies. We also note that matrix completion methods can be designed to optimize functions of matrices and not just reconstruction error, such as finding maxima.70 We propose to consider such approaches in the future to improve barrier predictions.
SUPPLEMENTARY MATERIAL
Additional information regarding the HVMC algorithm; methods for eigenvalue ordering, column truncation, and rank estimation; quantitative data for tunneling coefficients; and additional plots and data for SN2 and non-SN2 reaction systems are available in the supplementary material.
ACKNOWLEDGMENTS
This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences, under Award No. DE-SC0021417. The authors also acknowledge computational resources and support from the National Energy Research Scientific Computing Center (NERSC) and USC’s Center for Advanced Research Computing (CARC). S.J.Q. acknowledges support from the Provost’s Undergraduate Research Fellowship at USC.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
DATA AVAILABILITY
The data that support the findings of this study are available within the article and its supplementary material. The HVMC code along with analysis scripts is available in the RateTheory/HVMC GitHub repository at https://github.com/RateTheory/HVMC.69