Structured statistical methods are promising for recovering or completing information from noisy and incomplete data with high fidelity. In particular, matrix completion exploits underlying structural properties such as rank or sparsity. Our objective is to employ matrix completion to reduce computational effort associated with the calculation of multiple quantum chemical Hessians, which are necessary for identification of temperature-dependent free energy maxima under canonical variational transition state theory (VTST). We demonstrate proof-of-principle of an algebraic variety-based matrix completion method for recovering missing elements in a matrix of transverse Hessian eigenvalues constituting the minimum energy path (MEP) of a reaction. The algorithm, named harmonic variety-based matrix completion (HVMC), utilizes the fact that the points lying on the MEP of a reaction step constitute an algebraic variety in the reaction path Hamiltonian representation. We demonstrate that, with as low as 30% random sampling of matrix elements for the largest system in our test set (46 atoms), the complete matrix of eigenvalues can be recovered. We further establish algorithm performance for VTST rate calculations by quantifying errors in zero-point energies and vibrational free energies. Motivated by this success, we outline next steps toward developing a practical HVMC algorithm, which utilizes a gradient-based sampling protocol for low-cost VTST rate computations.

Transition state theory (TST) together with the harmonic oscillator approximation has been invaluable toward understanding reaction mechanisms and quantifying kinetics. While widely accepted and successful, the limitations of conventional TST are well-established.1,2 By presenting a one-dimensional picture of a reaction step wherein the reactive mode is uncoupled from other modes and the TS is a potential energy (and not free energy) maximum, TST constitutes an oversimplified approximation to true kinetics. Moreover, bond-breaking does not occur at the harmonic limit. As a result, harmonic TST routinely breaks down for reactions without potential energy maxima, at high temperature, when tunneling cannot be neglected, when vibrational modes are strongly coupled, or when lattice couplings cannot be neglected.1,3–5

Theories that enable more accurate rate estimates through the inclusion of recrossing, anharmonic coupling, tunneling, and dynamic effects are well-established.6 The most widely used is canonical variational transition state theory (VTST),7–9 also known as canonical variational theory (CVT), in which a temperature-dependent maximum is determined by calculating free energies along the entire minimum energy path (MEP). Wigner and Keck developed the variational theory of reaction rates.10,11 Further theoretical developments and practical VTST implementations based on direct dynamics were led by Garrett, Truhlar, and co-workers.7–9 These implementations utilize interpolation schemes (interpolated VTST or IVTST) to generate functions that describe energy, vibrational frequency, moment of inertia, and reduced mass variations along the MEP.12–16 Steckler et al. also developed a package, POLYRATE, that enables VTST rate computations to be used alongside select quantum chemistry software.17 Advances have also been made in alternative approaches, including molecular dynamics-based transition state ensemble methods that maximize transmission coefficients.18,19 Variable reaction coordinate VTST (VRC-VTST),20–22 which defines the reaction coordinate based on pivot points and utilizes Monte Carlo sampling to compute ensemble averages, is also gaining traction owing to speedups achievable by replacing the electronic structure sampling with system-specific neural network potentials.23 

Despite these advances, VTST is not yet widely accepted or utilized to probe the variational character of chemical reactions and catalytic systems of practical interest. This is because the identification of temperature-dependent free energy maxima requires (a) expensive, sometimes prohibitive nuclear Hessians along the entire MEP at sufficiently high level of theory and (b) estimation of non-negligible entropy contributions (e.g., internal rotational degrees of freedom) that are typically system-specific and difficult to capture. Owing to significant effort associated with computing Hessians for reactions of practical interest, coupled with the dearth of VTST tools that are compatible across multiple quantum chemistry software, most computational catalysis and kinetics studies are limited to applying conventional TST.

To render VTST-based rate computations more routine, our goal is to reduce the effort associated with MEP Hessian calculations. Techniques such as compressed sensing (CS), which recovers incomplete signals by exploiting the sparse nature of a problem in a particular basis,24–27 are being increasingly applied to problems in quantum chemistry. CS has been successfully employed to calculate large Hessians,28 construct anharmonic potential energy surfaces,29 and determine lattice couplings.30 While CS methods exploit only sparsity, matrix completion methods are capable of leveraging several properties of the underlying problem, including variety, rank, and sparsity.31 We adopt a specific class of recently introduced matrix completion algorithms for data that belongs to an algebraic variety. Variety-based matrix completion (VMC) utilizes the underlying algebraic or polynomial character of the problem and constructs a low-rank “lifted” matrix, which can then be completed (or recovered) using a rank-minimization procedure.32–34 Here, we report the implementation of a matrix completion scheme for recovering Hessian eigenvalues by sampling (calculating) a small fraction of all the eigenvalues constituting the MEP.

We employ the fact that a series of harmonic expansions of the effective potential for selected points along the MEP constitute a family of polynomials that define an underlying algebraic variety. This work reports the first step of algorithm development and proof-of-concept of the applicability of VMC for VTST rate computations. We establish a framework for developing the harmonic VMC algorithm, or HVMC, for recovery of transverse Hessian eigenvalues constituting the MEP in mass-weighted coordinates. The framework utilizes the reaction path Hamiltonian (RPH) to create the polynomial variety and construct a matrix of Hessian eigenvalues.35,36 We demonstrate HVMC performance and system size dependence with model SN2 reactions in the gas phase. For Cl + CH3Cl → ClCH3 + Cl, only 65% of total eigenvalue information is necessary for HVMC to recover the remaining eigenvalues. Zero-point energies (ZPE) and Gibbs free energies at 1000 K are recovered to within chemical accuracy. With increasing system size, the problem rank is reduced, and HVMC requires even less information (30%) to completely recover eigenvalues. Therefore, HVMC is ideally suited for VTST. We outline planned modifications to this framework to enable eigenvalue sampling using finite differences of gradients, with the goal of developing a practical, efficient HVMC algorithm for VTST rate computations.

An algebraic variety37 is the solution set to a system of polynomial equations,

V=xRd:p1(x)==pK(x)=0,
(1)

where the pk(x) are polynomial expressions arising from constraints in the underlying problem. It is assumed that d is the maximal degree of all of the pk(x). One of the key properties of algebraic varieties is that the monomial features matrix (matrix comprised of features after mapping the data points into monomials) is rank-deficient if and only if the data points belong to a variety. The resulting matrix after this mapping is called the lifted matrix because the mapping boosts the working dimension. By “lifting,” a full-rank representation into polynomial, low-rank space matrices can be recovered with minimal sampling.33 We create this representation by leveraging the RPH framework.

The reaction path Hamiltonian, introduced by Miller and co-workers35 and further developed by Page and McIver,36 is an elegant formalism to explore mechanisms, rates, and dynamics for a broad spectrum of chemical reactions.38 The Hamiltonian (without angular momentum) for displacements from the MEP along vibrational modes, q, which are orthogonal to the reaction coordinate as well as translational and rotational degrees of freedom, is given by

H=[psqTB(s)p]22[1+qTB(s)]2+12p2+V0(s)+12qTΩ(s)2q.
(2)

The MEP is characterized by arc length s, which is zero at the transition state, positive along the product branch, and negative along the reactant branch. (s, ps) and (q, p) correspond to the reaction path and transverse degrees of freedom, respectively. B(s) consists of coupling constants between various modes and therefore captures a fraction of the system’s anharmonicity. V0(s) is the potential energy at s on the MEP, and Ω2 is a diagonal matrix of eigenvalues {ωk2} of the projected Hessian matrix, corresponding to the squares of transverse vibrational frequencies.

Our work focuses on the potential energy component, V(s, q), of the RPH,

V(s,q)=V0(s)+12qTΩ(s)2q=V0(s)+12Σk=1F1ωk2qk2,
(3)

where F − 1 is the number of transverse modes, with F = 3N − 6 for a nonlinear system consisting of N atoms. The first-order or gradient term is absent in the RPH potential energy expression since all displacements, qk, are perpendicular to the gradient or path of steepest descent. This equation can be rearranged and expressed for all points along the MEP. If the MEP consists of m points (s1, s2, …, sm), m equations are obtained, along the lines of Eq. (1),

0=V0(s1)V(s1,q)+12Σk=1F1ωk2(s1)qk2,0=V0(s2)V(s2,q)+12Σk=1F1ωk2(s2)qk2,0=V0(sm)V(sm,q)+12Σk=1F1ωk2(sm)qk2.
(4)

This can then be expressed as the lifted polynomial equation, xTC = 0, that forms the underlying model for VMC, where x and C are given by

x=1q12q22qF12,  C=2(V0(s1)V(s1,q))2(V0(s2)V(s2,q))2(V0(sm)V(sm,q))ω12(s1)ω12(s2)ω12(sm)ω22(s1)ω22(s2)ω22(sm)ωF12(s1)ωF12(s2)ωF12(sm),
(5)

respectively. Each column in C contains all the eigenvalues corresponding to transverse vibrational modes for a point on the MEP. Although it is difficult to predict the exact rank of C for an arbitrary reaction, we believe that the underlying problem is rank-deficient. This is because only a small fraction of the total vibrational degrees of freedom undergo marked changes over the course of a reaction step. VMC recovers the complete C matrix through limited sampling of matrix elements coupled with a rank minimization method.33 

Matrix completion methods31,39 rely on the underlying low-rank character of data that must be recovered or completed. Recovery is carried out by minimizing the rank of the matrix. Ongie and co-workers exploit the rank-deficient character of the lifted matrix representation of an algebraic variety to develop VMC.33 They use a kernelized representation of the polynomial feature map to overcome the problem of dimensionality and carry out rank minimization using the iterative reweighted least-squares (IRLS) method.40,41 We implement this VMC framework for recovering transverse Hessian eigenvalues constituting the MEP from an incomplete C matrix. Since we retain all the central aspects of the original algorithm, a step-by-step procedure is not described in this work. We direct readers to the pioneering VMC work by Ongie and co-workers for a detailed account of key steps constituting the algorithm.33 

In this work, we demonstrate proof-of-concept that HVMC can efficiently recover eigenvalues constituting C and can serve as a viable alternative to multiple, sometimes prohibitive nuclear Hessian calculations. We do so by first constructing the complete matrix of eigenvalues, C. Elements of C are randomly masked to create missing data. HVMC performance is then assessed based on the minimum data (or sampling density) necessary to recover the full matrix as well as the accuracy of resulting thermochemical quantities. The procedure is described below.

1. Matrix construction

The MEP, or intrinsic reaction coordinate (IRC),42–44 for a model reaction is generated using the built-in IRC method in Q-Chem 5.245 in mass-weighted coordinates by setting the maximum step-size to 0.1a0. We invoke the RPH and project out the reaction coordinate, translations, and rotations from the Hessians. The complete C matrix is constructed from eigenvalues of the projected Hessian by ordering the eigenvectors constituting successive MEP points using scalar products. The top row [Eq. (5)] consisting of potential energy differences is assumed to be constant and small and is therefore neglected in this stage of algorithm development. Matrix elements are randomly masked (removed) to achieve a range of sampling densities, denoted by ρ (0 < ρ < 1), that represents the fraction of available elements or information in each column. The value of ρ is varied from 0.1 to 0.9 in increments of 0.05.

2. HVMC algorithm

We implement a Python 3.7 adaptation of the original MATLAB VMC program developed by Ongie and co-workers to recover the full C matrix via rank minimization.33 The kernel function is defined by

kd=(CTC+1)d,
(6)

where the superscript ⊙ denotes element-wise power and 1 is a matrix of all ones. Since the harmonic RPH potential energy expression is a quadratic polynomial, we set the dimension d = 2. The eigenvalues and eigenvectors obtained upon diagonalization of the kernel function are used to solve the rank-minimization problem.41 We use the Schatten-p quasi-norm in the IRLS method with p = 0.5. Smoothing parameters for the gradient descent step and convergence parameters are optimized based on tests with model systems. HVMC is converged when the IRLS gradient update step is less than 1 · 10−8 or when maximum allowable iterations (10 000) are reached. The algorithm typically converges in less than 4000 iterations.

3. Model performance

The Frobenius norm is used to compute errors in recovery of columns of C,

Erri=CHVMC,iCTrue,i2CTrue,i2,
(7)

where i is the column index corresponding to a point on the MEP and subscripts HVMC and True correspond to the model prediction and true values, respectively. The total error, Errmat, is also computed by calculating a matrix norm in Eq. (7) instead of the column norm. We further quantify HVMC performance based on prediction errors of ZPE and vibrational free energy [Gvib(T, s)] along the reaction coordinate,6,46

Gvib(T,s)=V0(s)+Σk=1F1×ωk(s)2+kBTln1expωk(s)kBT.
(8)

While ZPE errors enable us to assess the recovery of larger eigenvalues, errors in Gvib at high temperatures (say 1000 K) reflect model accuracy in the recovery of smaller eigenvalues. However, since very small eigenvalues typically correspond to rocking modes or internal rotations and not vibrational modes, we calculate Gvib based on a system-dependent cut-off value.

For every value of ρ, we carry out 100 trials with randomly generated masked C matrices. The errors are calculated as differences between HVMC and true values. Both mean and extremum errors across 100 trials are utilized to assess HVMC performance. This is because, for a given trial, mean ZPE and Gvib errors attain desired accuracies even with small ρ but do not capture large deviations that might occur at specific points on the MEP. The desired sampling density is the minimum ρ for which at least 75 HVMC trials yield both mean and extremum errors in ZPE and Gvib within chemical accuracy (10 kJ/mol or 2.39 kcal/mol). This relaxed criterion, rather than successful recovery in all 100 trials, is imposed in recognition of the fact that large errors in ZPE and Gvib are encountered in a small number of trials wherein blocks, or entire submatrices, of C are masked, leading to poor recovery. The mean norm error over 100 trials, μErr,mat, corresponding to the desired accuracies in ZPE and ΔGvib is used as the basis for examining HVMC performance with varying system size.

4. Test systems—SN2 reaction

We calculate the MEP for an archetypal SN2 reaction, Cl + CH3Cl → ClCH3 + Cl,47 treated using the ωB97X/def2-SVP level of theory.48 Since the reaction is symmetric, HVMC is employed to recover eigenvalues associated with one half of the reaction, shown in Fig. 1. Although the variational effect for this reaction is expected to be small, it serves as a good platform for testing the efficacy of interpolation schemes for VTST.8 We choose density functional theory (DFT)49,50 and not correlated wavefunction theory because our goal here is to quantify HVMC performance and determine feasibility for eigenvalue recovery, rather than to accurately calculate absolute rates with VTST.

FIG. 1.

One half of symmetric SN2 reaction: Cl + CH3Cl → ClCH3 + Cl. There are 24 points along this path. VaG is the adiabatic ground state potential obtained upon adding ZPE to V0 (level of theory: ωB97X/def2-SVP). ΔGvib (right axis) is the vibrational free energy change at 1000 K, calculated using vibrational frequencies higher than a cut-off value of 100 cm−1. All energies are in kcal/mol.

FIG. 1.

One half of symmetric SN2 reaction: Cl + CH3Cl → ClCH3 + Cl. There are 24 points along this path. VaG is the adiabatic ground state potential obtained upon adding ZPE to V0 (level of theory: ωB97X/def2-SVP). ΔGvib (right axis) is the vibrational free energy change at 1000 K, calculated using vibrational frequencies higher than a cut-off value of 100 cm−1. All energies are in kcal/mol.

Close modal

HVMC converges to desired tolerances with a sampling density, ρ = 0.65. Figure 2 depicts algorithm performance for recovery of eigenvalues associated with all 24 points along the half-MEP. Most columns are recovered to within 1% error across all 100 trials. By sampling 65% of the elements, we recover ZPE and Gvib (1000 K) with mean errors within chemical accuracy in all 100 trials. Maximum deviations are also within chemical accuracy for ZPE across all 100 trials and for Gvib in 83 trials. Errors in the prediction of Gvib are sensitive to the imposed cut-off value. For this reaction, we find that most rocking/internal rotation modes lie below 100 cm−1, and therefore, these modes, along with imaginary ones, are removed when estimating Gvib. The results are summarized in Fig. 3.

FIG. 2.

HVMC performance (100 trials)—recovery of C for Cl + CH3Cl → ClCH3 + Cl: columns correspond to MEP points, sorted in descending order of recovery error. Column errors (y-axis) are computed using Eq. (7). “Initial” errors correspond to random masking of the C matrix to achieve ρ = 0.65. “HVMC” errors reflect algorithm performance.

FIG. 2.

HVMC performance (100 trials)—recovery of C for Cl + CH3Cl → ClCH3 + Cl: columns correspond to MEP points, sorted in descending order of recovery error. Column errors (y-axis) are computed using Eq. (7). “Initial” errors correspond to random masking of the C matrix to achieve ρ = 0.65. “HVMC” errors reflect algorithm performance.

Close modal
FIG. 3.

HVMC performance (100 trials)—VTST: (top) mean errors and (bottom) maximum deviations observed in ZPE and Gvib (1000 K) with random sampling (ρ = 0.65). All energies are in kcal/mol.

FIG. 3.

HVMC performance (100 trials)—VTST: (top) mean errors and (bottom) maximum deviations observed in ZPE and Gvib (1000 K) with random sampling (ρ = 0.65). All energies are in kcal/mol.

Close modal

The required sampling density for accurate recovery of ZPE alone is much lower than 0.65. All 100 trials recover ZPE within chemical accuracy (both mean and extremum errors) for ρ as low as 0.25. This is an intuitive result because smaller eigenvalues do not contribute significantly to ZPE, and therefore, deviations in their predicted values have little impact on ZPE accuracies. Consequently, if the objective is to only recover VaG(s), the adiabatic ground state potential energy, sampling requirements, and therefore computational costs are lower when accurate Gvib is also desired. While most ΔGvib predictions on the half-MEP are within chemical accuracy for ρ = 0.65, deviations occur more often in the region closer to the transition structure than the product (Fig. 4). This is because the region around the saddle point (s = 0) sees larger variation in certain vibrational degrees of freedom compared to either minimum. Therefore, the sampling strategy in a practical HVMC implementation must ideally be s-dependent, with denser sampling closer to s = 0 to optimize recovery and costs.

FIG. 4.

ΔGvib at 1000 K (100 trials): HVMC prediction vs target values, with each trial represented by a different color. Free energy differences are reported with respect to the product. Shaded region indicates a range of desired accuracy (±2.39 kcal/mol).

FIG. 4.

ΔGvib at 1000 K (100 trials): HVMC prediction vs target values, with each trial represented by a different color. Free energy differences are reported with respect to the product. Shaded region indicates a range of desired accuracy (±2.39 kcal/mol).

Close modal

We note that the choice of 1000 K for assessing HVMC performance is based on our own interest in applying VTST to problems in thermal catalysis for which 1000 K is a reasonable ceiling value. Figure 5 shows the percentage trials that recover Gvib to within desired accuracies as a function of ρ at various temperatures. HVMC sampling requirements depend strongly on the reactions of interest and corresponding temperature ranges. Sampling requirements are low (ρ ≤ 0.25) for reactions at low temperature. On the other hand, HVMC in its current form may not yet be practical (ρ > 0.8) for VTST analysis of similar-sized systems at very high temperatures, as may be the case with combustion kinetics. The system size dependence of HVMC performance is described in Sec. III B.

FIG. 5.

HVMC performance (100 trials)—recovery of Gvib for Cl + CH3Cl → ClCH3 + Cl at various temperatures: (top) μErr,mat ± σ, mean normed matrix recovery error, with the error bar representing one standard deviation, and (bottom) percentage of trials that recover Gvib within desired accuracy at four temperatures—500 K, 1000 K, 1500 K, and 2500 K.

FIG. 5.

HVMC performance (100 trials)—recovery of Gvib for Cl + CH3Cl → ClCH3 + Cl at various temperatures: (top) μErr,mat ± σ, mean normed matrix recovery error, with the error bar representing one standard deviation, and (bottom) percentage of trials that recover Gvib within desired accuracy at four temperatures—500 K, 1000 K, 1500 K, and 2500 K.

Close modal

Since Hessians involve significant computational effort and costs and scale unfavorably with system size, it is important to probe the system size dependence of HVMC performance and efficiency. We turn to the expression for theoretical minimum sampling density, ρtheoretical, derived by Ongie and co-workers for variety-based matrix completion,33 

ρtheoretical=Rm+Rn1Rm,1/d
(9)

where R is the rank of the matrix, n is the number of features in polynomial space, and d is the highest degree of the polynomials (=2 in this case). Although Eq. (9) is a necessary condition, it is a general expression that does not take into account underlying characteristics of the specific matrices being recovered with VMC. Nevertheless, we employ Eq. (9) to help interpret the system-size dependence of HVMC performance.

The rank R is not known a priori for this problem. Therefore, we estimate ρtheoretical using the ceiling value of R, Rmax = min(F − 1, m). n is the dimensionality of monomials up to quadratic terms, {1,q1,q2,,q12,q1q2,,qF12}, given by (F1)+dd. We first consider the model SN2 reaction consisting of 6 atoms and 24 points on the half-MEP. Optimal recovery, based on the performance parameters defined in Sec. III A, is achieved at ρ = 0.65 or higher. If we neglect the role of any underlying problem complexities, this is close to ρtheoretical = 0.69, provided that C is nearly full-rank (RRmax = 11). If this is true, increasing MEP points, m, must still lead to a value of ρ that is close to the theoretical prediction. To test this, we construct C with m = 50 points by setting the IRC step-size to 0.05a0. To recover C with mean norm error close to the MEP with 24 points (μErr,mat = 0.32%), we find that ρ = 0.5, identical to the theoretical value predicted by Eq. (9). Therefore, it is very likely that C for the model reaction, Cl + CH3Cl → ClCH3 + Cl, is close to full-rank.

The critical question then is whether the lifted matrix remains full-rank when applied to larger systems. Our hypothesis is that the problem rank becomes smaller as systems grow in size. This stems from the fact that the number of modes that change significantly in the course of a reaction step becomes an increasingly smaller fraction of the total degrees of freedom constituting the system. Therefore, we expect a reduction in the rank of C with increasing system size. Consequently, true sampling density must be lower than the theoretical bounds predicted by Eq. (9) using Rmax.

To test this hypothesis and examine HVMC performance with increasing system size, we create five additional systems, shown in Fig. 6, using the same SN2 reaction (1) as the template. In each case, the half-MEP is truncated at 24 points to maintain constant m, so we can exclusively probe the impact of increasing system size. It must be noted that if C matrices are full-rank, Eq. (9) predicts ρtheoretical ≈ 1 for all systems except (1). We use mean norm error over 100 trials, μErr,mat, and not thermochemical quantities to assess HVMC performance because Gvib accuracies depend on a system-specific cut-off frequency. As discussed previously, ρ = 0.65 yields μErr,mat = 0.32% for reaction (1). For the remaining five systems, we determine the closest value of ρ for which we observe Errmat in the range 0.32 ± 0.10%, where 0.10% represents the standard deviation (σ) of matrix norm errors over 100 trials.

FIG. 6.

Transition structures of model SN2 reactions chosen to probe HVMC performance with varying system size. Carbon atoms are in cyan, hydrogen in white, and chlorine in green.

FIG. 6.

Transition structures of model SN2 reactions chosen to probe HVMC performance with varying system size. Carbon atoms are in cyan, hydrogen in white, and chlorine in green.

Close modal

Table I describes HVMC performance for (1)–(6). The minimum density required to achieve the same model accuracy as (1) increases by a modest amount for (2) but decreases sharply and plateaus at ρ = 0.3 for (3)–(6). In addition, both μErr,mat and variance steadily decrease from (3) to (6) for the same value of ρ. Therefore, as hypothesized, the problem rank decreases and likely becomes constant as the system size increases. For systems (3)–(6) and m = 24, we find that R ≈ 3 yields ρtheoretical ≈ 0.3 when substituted in Eq. (9).

TABLE I.

HVMC system size dependence (100 trials)—Minimum sampling density, ρ, and mean norm error (μErr,mat).

ReactionρμErr,mat ± σ
(1) 0.65 0.32 ± 0.10 
(2) 0.80 0.45 ± 0.27 
(3) 0.30 0.37 ± 0.19 
(4) 0.30 0.29 ± 0.12 
(5) 0.30 0.25 ± 0.09 
(6) 0.30 0.23 ± 0.08 
ReactionρμErr,mat ± σ
(1) 0.65 0.32 ± 0.10 
(2) 0.80 0.45 ± 0.27 
(3) 0.30 0.37 ± 0.19 
(4) 0.30 0.29 ± 0.12 
(5) 0.30 0.25 ± 0.09 
(6) 0.30 0.23 ± 0.08 

The anomalous value of ρ observed with (2) does not reflect poor HVMC performance but is rather the result of the scalar product-based procedure for ordering eigenvectors. The ordering of two near-degenerate modes in the transition structure and its nearest MEP point, corresponding to symmetric –CH3 and –CH2 stretches, is reversed owing to non-negligible overlap of the eigenvector of the transition structure with three eigenvectors of its nearest neighbor. HVMC performance is poorer as a result of this imperfect ordering. Going forward, we aim to employ a rigorous ordering approach to construct C that takes into account all possible rotations of degenerate or near-degenerate eigenvectors.46 Therefore, with the exception of (2) and for fixed number of points (m = 24) on the MEP, we demonstrate that only 30% of C is essential to recover the complete matrix. Based on the success of the algorithm demonstrated by this proof-of-principle work, we outline key next steps toward developing a practical HVMC implementation for VTST-based rate calculations.

We conceive of two ways in which HVMC can be adapted for canonical VTST rate calculations. The first is where we compute only as many Hessians on the MEP as dictated by the minimum sampling requirement, which we can now determine based on the approximate rank identified in this study. Hessians at minima and the transition structure are typically already available since they are computed to verify stationary points. The projected eigenvalues are then employed to construct C for complete recovery with HVMC. However, this approach can still be challenging in situations where (a) analytical Hessians are intractable either owing to large system size or choice of level of theory and (b) analytical Hessians are unavailable for the chosen level of theory, in which case finite difference Hessians are expensive even for small to medium-sized systems such as those utilized in this study. In such scenarios, stationary point verification can be carried out based on approximate schemes, several of which have been reported in the literature.51–58 Similar approximate schemes, using gradient-based finite differences, can also be employed for sampling eigenvalues. We outline our future work—integration of HVMC with a gradient-based, low-cost sampling scheme—in Sec. III C.

With the goal of constructing and completing C when limited (e.g., stationary points) or no full Hessian information is available, we outline the following steps. To construct the incomplete C matrix, we aim to utilize energies and gradients constituting the MEP along with sufficient eigenvalues to match minimum ρ requirements. The sampling strategy involves calculation of an approximate eigenvalue using finite differences of gradients obtained via small perturbations along the eigenvector, shown to be successful in prior studies.57,59 These eigenvalues can be calculated and ordered using approximate eigenvectors obtained from inexpensive molecular mechanics (MM) Hessians.28 Furthermore, we aim to explore the possibility of incorporating physically meaningful curvilinear internal coordinates to overcome the problem of unphysical eigenvalues that emerge in the projected mass-weighted coordinate system employed in this study.60 

Matrix completion methods rely on incoherence of matrix elements with the standard basis as well as random sampling of elements.31 This conflicts with the rational sampling strategy desired in the case of full MEP recovery, where either entire columns of C are available or denser sampling is desired closer to transition structures (Fig. 4). Our proposed solution is to carry out a change of basis, a common strategy to create an incoherent basis for completion, via either discrete cosine or fast Fourier transforms.28,61 Finally, since MM eigenvectors are approximate, the uncertainty associated with the finite difference eigenvalue must be incorporated in the form of a noise parameter in the HVMC algorithm. Integrating the current HVMC implementation with gradient-based sampling is therefore expected to yield a practical, efficient algorithm for free energy and rate calculations with VTST. This will also set the stage for more sophisticated matrix completion algorithms that leverage additional properties of the problem (e.g., sparsity) to efficiently compute even higher derivatives, or anharmonic terms, for reactions that require accurate treatment of mode coupling.

Matrix completion methods are traditionally employed to recover complete signal or image information based on limited, noisy data. We identify a novel, unique application of these methods—as efficient schemes for recovering otherwise prohibitive nuclear Hessian eigenvalues. We exploit the underlying algebraic structure of the harmonic potential energy expansion for points lying on an MEP. Using the RPH framework, we construct a quadratic, rank-deficient matrix of projected eigenvalues that can be solved using variety-based matrix completion. We call this algorithm HVMC, or harmonic variety-based matrix completion. The method successfully recovers projected Hessian eigenvalues by sampling a small fraction of elements constituting the matrix. Model performance for VTST applications is demonstrated by assessing accuracies of predicted ZPEs and vibrational free energies at elevated temperatures for SN2 reactions. We find that sampling requirements drop to 30% as systems become larger for the same number of points on the MEP. This indicates that matrix rank becomes a constant, small number for larger systems. This is promising for significant cost reduction when we develop a gradient-based, practical HVMC algorithm for VTST applications. The modified algorithm relies on approximate, inexpensive eigenvectors and finite differences calculation of eigenvalues to recover the complete set of projected eigenvalues constituting the MEP. We therefore expect HVMC to become a useful tool for easing costs associated with analyzing the variational character of chemical reactions and catalytic systems of practical importance.

The data that support the findings of this study are available from the corresponding author upon reasonable request.

We acknowledge High-Performance Computing (HPC) at the USC for computing resources and support. S.J.Q. acknowledges support from the Undergraduate Research Associates Program (2019–2020) at the USC. U.M. acknowledges support from the National Science Foundation (Grant Nos. NSF CCF-1718560, NSF CPS-1446901, and NSF CCF-1817200), the Army Research Office (Grant No. ARO W911NF1910269), and the Office of Naval Research (Grant No. ONR N00014-15-1-2550). S.M.S. acknowledges support from startup resources provided by the USC. We are grateful to Dr. Amr Elnakeeb (USC) for valuable discussions.

1.
D. G.
Truhlar
,
B. C.
Garrett
, and
S. J.
Klippenstein
, “
Current status of transition-state theory
,”
J. Phys. Chem.
100
,
12771
12800
(
1996
).
2.
K.
Zinovjev
and
I.
Tuñón
, “
Quantifying the limits of transition state theory in enzymatic catalysis
,”
Proc. Natl. Acad. Sci. U. S. A.
114
,
12390
12395
(
2017
).
3.
X.
Hu
and
W. L.
Hase
, “
Properties of canonical variational transition state theory for association reactions without potential energy barriers
,”
J. Phys. Chem.
93
,
6029
6038
(
1989
).
4.
E.
Gonzalez-Lavado
,
J. C.
Corchado
,
Y. V.
Suleimanov
,
W. H.
Green
, and
J.
Espinosa-Garcia
, “
Theoretical kinetics study of the O(3P) + CH4/CD4 hydrogen abstraction reaction: The role of anharmonicity, recrossing effects, and quantum mechanical tunneling
,”
J. Phys. Chem. A
118
,
3243
3252
(
2014
).
5.
B.
Jackson
and
S.
Nave
, “
The dissociative chemisorption of methane on Ni(111): The effects of molecular vibration and lattice motion
,”
J. Chem. Phys.
138
,
174705
(
2013
).
6.
B.
Peters
, in
Reaction Rate Theory and Rare Events Simulations
(
Elsevier
,
2017
), Chap. 10, pp.
227
271
.
7.
D. G.
Truhlar
and
B. C.
Garrett
, “
Variational transition-state theory
,”
Acc. Chem. Res.
13
,
440
448
(
1980
).
8.
B. C.
Garrett
and
D. G.
Truhlar
, “
Variational transition state theory
,” in
Theory and Applications of Computational Chemistry
, edited by
C. E.
Dykstra
,
G.
Frenking
,
K. S.
Kim
, and
G. E.
Scuseria
(
Elsevier
,
Amsterdam
,
2005
), pp.
67
87
.
9.
J. L.
Bao
and
D. G.
Truhlar
, “
Variational transition state theory: Theoretical framework and recent developments
,”
Chem. Soc. Rev.
46
,
7548
7596
(
2017
).
10.
E.
Wigner
, “
Calculation of the rate of elementary association reactions
,”
J. Chem. Phys.
5
,
720
725
(
1937
).
11.
J. C.
Keck
, “
Variational theory of chemical reaction rates applied to three-body recombinations
,”
J. Chem. Phys.
32
,
1035
1050
(
1960
).
12.
J. C.
Corchado
,
E. L.
Coitiño
,
Y.-Y.
Chuang
,
P. L.
Fast
, and
D. G.
Truhlar
, “
Interpolated variational transition-state theory by mapping
,”
J. Phys. Chem. A
102
,
2424
2438
(
1998
).
13.
A.
Gonzalez-Lafont
,
T. N.
Truong
, and
D. G.
Truhlar
, “
Interpolated variational transition-state theory: Practical methods for estimating variational transition-state properties and tunneling contributions to chemical reaction rates from electronic structure calculations
,”
J. Chem. Phys.
95
,
8875
8894
(
1991
).
14.
V. S.
Melissas
and
D. G.
Truhlar
, “
Interpolated variational transition state theory and tunneling calculations of the rate constant of the reaction OH + CH4 at 223–2400 K
,”
J. Chem. Phys.
99
,
1013
1027
(
1993
).
15.
Y.-Y.
Chuang
and
D. G.
Truhlar
, “
Improved dual-level direct dynamics method for reaction rate calculations with inclusion of multidimensional tunneling effects and validation for the reaction of H with trans-N2H2
,”
J. Phys. Chem. A
101
,
3808
3814
(
1997
).
16.
W.-P.
Hu
,
Y.-P.
Liu
, and
D. G.
Truhlar
, “
Variational transition-state theory and semiclassical tunnelling calculations with interpolated corrections: A new approach to interfacing electronic structure theory and dynamics for organic reactions
,”
J. Chem. Soc., Faraday Trans.
90
,
1715
1725
(
1994
).
17.
R.
Steckler
,
W.-P.
Hu
,
Y.-P.
Liu
,
G. C.
Lynch
,
B. C.
Garrett
,
A. D.
Isaacson
,
V. S.
Melissas
,
D.-h.
Lu
,
T. N.
Truong
,
S. N.
Rai
,
G. C.
Hancock
,
J. G.
Lauderdale
,
T.
Joseph
, and
D. G.
Truhlar
, “
POLYRATE 6.5: A new version of a computer program for the calculation of chemical reaction rates for polyatomics
,”
Comput. Phys. Commun.
88
,
341
343
(
1995
).
18.
B.
Peters
, “
Inertial likelihood maximization for reaction coordinates with high transmission coefficients
,”
Chem. Phys. Lett.
554
,
248
253
(
2012
).
19.
K.
Zinovjev
and
I.
Tuñón
, “
Transition state ensemble optimization for reactions of arbitrary complexity
,”
J. Chem. Phys.
143
,
134111
(
2015
).
20.
S. J.
Klippenstein
, “
Variational optimizations in the Rice–Ramsperger–Kassel–Marcus theory calculations for unimolecular dissociations with no reverse barrier
,”
J. Chem. Phys.
96
,
367
371
(
1992
).
21.
Y.
Georgievskii
and
S. J.
Klippenstein
, “
Transition state theory for multichannel addition reactions: Multifaceted dividing surfaces
,”
J. Phys. Chem. A
107
,
9776
9781
(
2003
).
22.
Y.
Georgievskii
and
S. J.
Klippenstein
, “
Variable reaction coordinate transition state theory: Analytic results and application to the C2H3 + H → C2H4 reaction
,”
J. Chem. Phys.
118
,
5442
5455
(
2003
).
23.
X.
Chen
and
C. F.
Goldsmith
, “
Accelerating variational transition state theory via artificial neural networks
,”
J. Phys. Chem. A
124
,
1038
1046
(
2020
).
24.
S. S.
Chen
,
D. L.
Donoho
, and
M. A.
Saunders
, “
Atomic decomposition by basis pursuit
,”
SIAM Rev.
43
,
129
159
(
2001
).
25.
E. J.
Candès
,
J.
Romberg
, and
T.
Tao
, “
Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information
,”
IEEE Trans. Inf. Theory
52
,
489
509
(
2006
).
26.
D. L.
Donoho
, “
Compressed sensing
,”
IEEE Trans. Inf. Theory
52
,
1289
1306
(
2006
).
27.
E. J.
Candès
and
M. B.
Wakin
, “
An introduction to compressive sampling
,”
IEEE Signal Process. Mag.
25
,
21
30
(
2008
).
28.
J. N.
Sanders
,
X.
Andrade
, and
A.
Aspuru-Guzik
, “
Compressed sensing for the fast computation of matrices: Application to molecular vibrations
,”
ACS Cent. Sci.
1
,
24
32
(
2015
).
29.
P.
Rai
,
K.
Sargsyan
,
H.
Najm
, and
S.
Hirata
, “
Sparse low rank approximation of potential energy surfaces with applications in estimation of anharmonic zero point energies and frequencies
,”
J. Math. Chem.
57
,
1732
1754
(
2018
).
30.
F.
Zhou
,
W.
Nielson
,
Y.
Xia
, and
V.
Ozolinš
, “
Lattice anharmonicity and thermal conductivity from compressive sensing of first-principles calculations
,”
Phys. Rev. Lett.
113
,
185501
(
2014
).
31.
E. J.
Candès
and
B.
Recht
, “
Exact matrix completion via convex optimization
,”
Found. Comput. Math.
9
,
717
(
2009
).
32.
D.
Pimentel-Alarcón
,
G.
Ongie
,
L.
Balzano
,
R.
Willett
, and
R.
Nowak
, “
Low algebraic dimension matrix completion
,” in
2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton)
(
IEEE
,
2017
), pp.
790
797
.
33.
G.
Ongie
,
R.
Willett
,
R. D.
Nowak
, and
L.
Balzano
, “
Algebraic variety models for high-rank matrix completion
,” in
Proceedings of the 34th International Conference on Machine Learning
(
JMLR
,
2017
), Vol. 70, pp.
2691
2700
.
34.
A.
Elnakeeb
and
U.
Mitra
, “
Variety-based background subtraction for nonlinear trajectory tracking
,” in
2019 53rd Asilomar Conference on Signals, Systems, and Computers
(
IEEE
,
2019
), pp.
69
73
.
35.
W. H.
Miller
,
N. C.
Handy
, and
J. E.
Adams
, “
Reaction path Hamiltonian for polyatomic molecules
,”
J. Chem. Phys.
72
,
99
112
(
1980
).
36.
M.
Page
and
J. W.
McIver
, Jr.
, “
On evaluating the reaction path Hamiltonian
,”
J. Chem. Phys.
88
,
922
935
(
1988
).
37.
D. A.
Cox
,
J.
Little
, and
D.
O’Shea
, “
Geometry, algebra, and algorithms
,” in
Ideals, Varieties, and Algorithms
(
Springer
,
2015
), pp.
1
47
.
38.
E.
Kraka
, “
Reaction path Hamiltonian and the unified reaction valley approach
,”
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
1
,
531
556
(
2011
).
39.
E. J.
Candès
and
Y.
Plan
, “
Matrix completion with noise
,”
Proc. IEEE
98
,
925
936
(
2010
).
40.
M.
Fornasier
,
H.
Rauhut
, and
R.
Ward
, “
Low-rank matrix recovery via iteratively reweighted least squares minimization
,”
SIAM J. Control
21
,
1614
1640
(
2011
).
41.
K.
Mohan
and
M.
Fazel
, “
Iterative reweighted algorithms for matrix rank minimization
,”
J. Mach. Learn. Res.
13
,
3441
3473
(
2012
).
42.
K.
Fukui
, “
Formulation of the reaction coordinate
,”
J. Phys. Chem.
74
,
4161
4163
(
1970
).
43.
K.
Ishida
,
K.
Morokuma
, and
A.
Komornicki
, “
The intrinsic reaction coordinate. An ab initio calculation for HNC–HCN and H + CH4 → CH4 + H
,”
J. Chem. Phys.
66
,
2153
2156
(
1977
).
44.
M. W.
Schmidt
,
M. S.
Gordon
, and
M.
Dupuis
, “
The intrinsic reaction coordinate and the rotational barrier in silaethylene
,”
J. Am. Chem. Soc.
107
,
2585
2589
(
1985
).
45.
Y.
Shao
,
Z.
Gan
,
E.
Epifanovsky
,
A. T. B.
Gilbert
,
M.
Wormit
,
J.
Kussmann
,
A. W.
Lange
,
A.
Behn
,
J.
Deng
,
X.
Feng
,
D.
Ghosh
,
M.
Goldey
,
P. R.
Horn
,
L. D.
Jacobson
,
I.
Kaliman
,
R. Z.
Khaliullin
,
T.
Kuś
,
A.
Landau
,
J.
Liu
,
E. I.
Proynov
,
Y. M.
Rhee
,
R. M.
Richard
,
M. A.
Rohrdanz
,
R. P.
Steele
,
E. J.
Sundstrom
,
H. L.
Woodcock
 III
,
P. M.
Zimmerman
,
D.
Zuev
,
B.
Albrecht
,
E.
Alguire
,
B.
Austin
,
G. J. O.
Beran
,
Y. A.
Bernard
,
E.
Berquist
,
K.
Brandhorst
,
K. B.
Bravaya
,
S. T.
Brown
,
D.
Casanova
,
C.-M.
Chang
,
Y.
Chen
,
S. H.
Chien
,
K. D.
Closser
,
D. L.
Crittenden
,
M.
Diedenhofen
,
R. A.
DiStasio
, Jr.
,
H.
Do
,
A. D.
Dutoi
,
R. G.
Edgar
,
S.
Fatehi
,
L.
Fusti-Molnar
,
A.
Ghysels
,
A.
Golubeva-Zadorozhnaya
,
J.
Gomes
,
M. W. D.
Hanson-Heine
,
P. H. P.
Harbach
,
A. W.
Hauser
,
E. G.
Hohenstein
,
Z. C.
Holden
,
T.-C.
Jagau
,
H.
Ji
,
B.
Kaduk
,
K.
Khistyaev
,
J.
Kim
,
J.
Kim
,
R. A.
King
,
P.
Klunzinger
,
D.
Kosenkov
,
T.
Kowalczyk
,
C. M.
Krauter
,
K. U.
Lao
,
A. D.
Laurent
,
K. V.
Lawler
,
S. V.
Levchenko
,
C. Y.
Lin
,
F.
Liu
,
E.
Livshits
,
R. C.
Lochan
,
A.
Luenser
,
P.
Manohar
,
S. F.
Manzer
,
S.-P.
Mao
,
N.
Mardirossian
,
A. V.
Marenich
,
S. A.
Maurer
,
N. J.
Mayhall
,
E.
Neuscamman
,
C. M.
Oana
,
R.
Olivares-Amaya
,
D. P.
O’Neill
,
J. A.
Parkhill
,
T. M.
Perrine
,
R.
Peverati
,
A.
Prociuk
,
D. R.
Rehn
,
E.
Rosta
,
N. J.
Russ
,
S. M.
Sharada
,
S.
Sharma
,
D. W.
Small
,
A.
Sodt
,
T.
Stein
,
D.
Stück
,
Y.-C.
Su
,
A. J. W.
Thom
,
T.
Tsuchimochi
,
V.
Vanovschi
,
L.
Vogt
,
O.
Vydrov
,
T.
Wang
,
M. A.
Watson
,
J.
Wenzel
,
A.
White
,
C. F.
Williams
,
J.
Yang
,
S.
Yeganeh
,
S. R.
Yost
,
Z.-Q.
You
,
I. Y.
Zhang
,
X.
Zhang
,
Y.
Zhao
,
B. R.
Brooks
,
G. K. L.
Chan
,
D. M.
Chipman
,
C. J.
Cramer
,
W. A.
Goddard
 III
,
M. S.
Gordon
,
W. J.
Hehre
,
A.
Klamt
,
H. F.
Schaefer
 III
,
M. W.
Schmidt
,
C. D.
Sherrill
,
D. G.
Truhlar
,
A.
Warshel
,
X.
Xu
,
A.
Aspuru-Guzik
,
R.
Baer
,
A. T.
Bell
,
N. A.
Besley
,
J.-D.
Chai
,
A.
Dreuw
,
B. D.
Dunietz
,
T. R.
Furlani
,
S. R.
Gwaltney
,
C.-P.
Hsu
,
Y.
Jung
,
J.
Kong
,
D. S.
Lambrecht
,
W.
Liang
,
C.
Ochsenfeld
,
V. A.
Rassolov
,
L. V.
Slipchenko
,
J. E.
Subotnik
,
T.
Van Voorhis
,
J. M.
Herbert
,
A. I.
Krylov
,
P. M. W.
Gill
, and
M.
Head-Gordon
, “
Advances in molecular quantum chemistry contained in the Q-Chem 4 program package
,”
Mol. Phys.
113
,
184
215
(
2015
).
46.
B.
Peters
,
A. T.
Bell
, and
A.
Chakraborty
, “
Rate constants from the reaction path Hamiltonian. I. Reactive flux simulations for dynamically correct rates
,”
J. Chem. Phys.
121
,
4453
4460
(
2004
).
47.
P.
de Sainte Claire
,
G. H.
Peslherbe
,
H.
Wang
, and
W. L.
Hase
, “
Linear free energy of activation relationship for barrierless association reactions
,”
J. Am. Chem. Soc.
119
,
5007
5012
(
1997
).
48.
J.-D.
Chai
and
M.
Head-Gordon
, “
Systematic optimization of long-range corrected hybrid density functionals
,”
J. Chem. Phys.
128
,
084106
(
2008
).
49.
P.
Hohenberg
and
W.
Kohn
, “
Inhomogeneous electron gas
,”
Phys. Rev.
136
,
B864
(
1964
).
50.
W.
Kohn
and
L. J.
Sham
, “
Self-consistent equations including exchange and correlation effects
,”
Phys. Rev.
140
,
A1133
A1138
(
1965
).
51.
Y.
Kumeda
,
D. J.
Wales
, and
L. J.
Munro
, “
Transition states and rearrangement mechanisms from hybrid eigenvector-following and density functional theory: Application to C10H10 and defect migration in crystalline silicon
,”
Chem. Phys. Lett.
341
,
185
194
(
2001
).
52.
P.
Deglmann
and
F.
Furche
, “
Efficient characterization of stationary points on potential energy surfaces
,”
J. Chem. Phys.
117
,
9535
9538
(
2002
).
53.
M.
Reiher
and
J.
Neugebauer
, “
A mode-selective quantum chemical method for tracking molecular vibrations applied to functionalized carbon nanotubes
,”
J. Chem. Phys.
118
,
1634
1641
(
2003
).
54.
R. A.
Olsen
,
G. J.
Kroes
,
G.
Henkelman
,
A.
Arnaldsson
, and
H.
Jónsson
, “
Comparison of methods for finding saddle points without knowledge of the final states
,”
J. Chem. Phys.
121
,
9776
9792
(
2004
).
55.
A. L.
Kaledin
, “
Gradient-based direct normal-mode analysis
,”
J. Chem. Phys.
122
,
184106
(
2005
).
56.
A.
Sawamura
, “
A new approach to find a saddle point efficiently based on the Davidson method
,”
JSIAM Lett.
3
,
17
19
(
2011
).
57.
S. M.
Sharada
,
A. T.
Bell
, and
M.
Head-Gordon
, “
A finite difference Davidson procedure to sidestep full ab initio Hessian calculation: Application to characterization of stationary points and transition state searches
,”
J. Chem. Phys.
140
,
164115
(
2014
).
58.
M.
Bergeler
,
C.
Herrmann
, and
M.
Reiher
, “
Mode-tracking based stationary-point optimization
,”
J. Comput. Chem.
36
,
1429
1438
(
2015
).
59.
A.
Heyden
,
A. T.
Bell
, and
F. J.
Keil
, “
Efficient methods for finding transition states in chemical reactions: Comparison of improved dimer method and partitioned rational function optimization method
,”
J. Chem. Phys.
123
,
224101
(
2005
).
60.
C. F.
Jackels
,
Z.
Gu
, and
D. G.
Truhlar
, “
Reaction-path potential and vibrational frequencies in terms of curvilinear internal coordinates
,”
J. Chem. Phys.
102
,
3188
3201
(
1995
).
61.
D.
Needell
and
J. A.
Tropp
, “
CoSaMP: Iterative signal recovery from incomplete and inaccurate samples
,”
Commun. ACM
53
,
93
100
(
2010
).