Structured statistical methods are promising for recovering or completing information from noisy and incomplete data with high fidelity. In particular, matrix completion exploits underlying structural properties such as rank or sparsity. Our objective is to employ matrix completion to reduce computational effort associated with the calculation of multiple quantum chemical Hessians, which are necessary for identification of temperature-dependent free energy maxima under canonical variational transition state theory (VTST). We demonstrate proof-of-principle of an algebraic variety-based matrix completion method for recovering missing elements in a matrix of transverse Hessian eigenvalues constituting the minimum energy path (MEP) of a reaction. The algorithm, named harmonic variety-based matrix completion (HVMC), utilizes the fact that the points lying on the MEP of a reaction step constitute an algebraic variety in the reaction path Hamiltonian representation. We demonstrate that, with as low as 30% random sampling of matrix elements for the largest system in our test set (46 atoms), the complete matrix of eigenvalues can be recovered. We further establish algorithm performance for VTST rate calculations by quantifying errors in zero-point energies and vibrational free energies. Motivated by this success, we outline next steps toward developing a practical HVMC algorithm, which utilizes a gradient-based sampling protocol for low-cost VTST rate computations.
I. INTRODUCTION
Transition state theory (TST) together with the harmonic oscillator approximation has been invaluable toward understanding reaction mechanisms and quantifying kinetics. While widely accepted and successful, the limitations of conventional TST are well-established.1,2 By presenting a one-dimensional picture of a reaction step wherein the reactive mode is uncoupled from other modes and the TS is a potential energy (and not free energy) maximum, TST constitutes an oversimplified approximation to true kinetics. Moreover, bond-breaking does not occur at the harmonic limit. As a result, harmonic TST routinely breaks down for reactions without potential energy maxima, at high temperature, when tunneling cannot be neglected, when vibrational modes are strongly coupled, or when lattice couplings cannot be neglected.1,3–5
Theories that enable more accurate rate estimates through the inclusion of recrossing, anharmonic coupling, tunneling, and dynamic effects are well-established.6 The most widely used is canonical variational transition state theory (VTST),7–9 also known as canonical variational theory (CVT), in which a temperature-dependent maximum is determined by calculating free energies along the entire minimum energy path (MEP). Wigner and Keck developed the variational theory of reaction rates.10,11 Further theoretical developments and practical VTST implementations based on direct dynamics were led by Garrett, Truhlar, and co-workers.7–9 These implementations utilize interpolation schemes (interpolated VTST or IVTST) to generate functions that describe energy, vibrational frequency, moment of inertia, and reduced mass variations along the MEP.12–16 Steckler et al. also developed a package, POLYRATE, that enables VTST rate computations to be used alongside select quantum chemistry software.17 Advances have also been made in alternative approaches, including molecular dynamics-based transition state ensemble methods that maximize transmission coefficients.18,19 Variable reaction coordinate VTST (VRC-VTST),20–22 which defines the reaction coordinate based on pivot points and utilizes Monte Carlo sampling to compute ensemble averages, is also gaining traction owing to speedups achievable by replacing the electronic structure sampling with system-specific neural network potentials.23
Despite these advances, VTST is not yet widely accepted or utilized to probe the variational character of chemical reactions and catalytic systems of practical interest. This is because the identification of temperature-dependent free energy maxima requires (a) expensive, sometimes prohibitive nuclear Hessians along the entire MEP at sufficiently high level of theory and (b) estimation of non-negligible entropy contributions (e.g., internal rotational degrees of freedom) that are typically system-specific and difficult to capture. Owing to significant effort associated with computing Hessians for reactions of practical interest, coupled with the dearth of VTST tools that are compatible across multiple quantum chemistry software, most computational catalysis and kinetics studies are limited to applying conventional TST.
To render VTST-based rate computations more routine, our goal is to reduce the effort associated with MEP Hessian calculations. Techniques such as compressed sensing (CS), which recovers incomplete signals by exploiting the sparse nature of a problem in a particular basis,24–27 are being increasingly applied to problems in quantum chemistry. CS has been successfully employed to calculate large Hessians,28 construct anharmonic potential energy surfaces,29 and determine lattice couplings.30 While CS methods exploit only sparsity, matrix completion methods are capable of leveraging several properties of the underlying problem, including variety, rank, and sparsity.31 We adopt a specific class of recently introduced matrix completion algorithms for data that belongs to an algebraic variety. Variety-based matrix completion (VMC) utilizes the underlying algebraic or polynomial character of the problem and constructs a low-rank “lifted” matrix, which can then be completed (or recovered) using a rank-minimization procedure.32–34 Here, we report the implementation of a matrix completion scheme for recovering Hessian eigenvalues by sampling (calculating) a small fraction of all the eigenvalues constituting the MEP.
We employ the fact that a series of harmonic expansions of the effective potential for selected points along the MEP constitute a family of polynomials that define an underlying algebraic variety. This work reports the first step of algorithm development and proof-of-concept of the applicability of VMC for VTST rate computations. We establish a framework for developing the harmonic VMC algorithm, or HVMC, for recovery of transverse Hessian eigenvalues constituting the MEP in mass-weighted coordinates. The framework utilizes the reaction path Hamiltonian (RPH) to create the polynomial variety and construct a matrix of Hessian eigenvalues.35,36 We demonstrate HVMC performance and system size dependence with model SN2 reactions in the gas phase. For Cl + CH3Cl → ClCH3 + Cl, only 65% of total eigenvalue information is necessary for HVMC to recover the remaining eigenvalues. Zero-point energies (ZPE) and Gibbs free energies at 1000 K are recovered to within chemical accuracy. With increasing system size, the problem rank is reduced, and HVMC requires even less information (30%) to completely recover eigenvalues. Therefore, HVMC is ideally suited for VTST. We outline planned modifications to this framework to enable eigenvalue sampling using finite differences of gradients, with the goal of developing a practical, efficient HVMC algorithm for VTST rate computations.
II. ALGORITHM DEVELOPMENT
A. Algebraic variety
An algebraic variety37 is the solution set to a system of polynomial equations,
where the pk(x) are polynomial expressions arising from constraints in the underlying problem. It is assumed that d is the maximal degree of all of the pk(x). One of the key properties of algebraic varieties is that the monomial features matrix (matrix comprised of features after mapping the data points into monomials) is rank-deficient if and only if the data points belong to a variety. The resulting matrix after this mapping is called the lifted matrix because the mapping boosts the working dimension. By “lifting,” a full-rank representation into polynomial, low-rank space matrices can be recovered with minimal sampling.33 We create this representation by leveraging the RPH framework.
B. Reaction path Hamiltonian
The reaction path Hamiltonian, introduced by Miller and co-workers35 and further developed by Page and McIver,36 is an elegant formalism to explore mechanisms, rates, and dynamics for a broad spectrum of chemical reactions.38 The Hamiltonian (without angular momentum) for displacements from the MEP along vibrational modes, q, which are orthogonal to the reaction coordinate as well as translational and rotational degrees of freedom, is given by
The MEP is characterized by arc length s, which is zero at the transition state, positive along the product branch, and negative along the reactant branch. (s, ps) and (q, p) correspond to the reaction path and transverse degrees of freedom, respectively. B(s) consists of coupling constants between various modes and therefore captures a fraction of the system’s anharmonicity. V0(s) is the potential energy at s on the MEP, and Ω2 is a diagonal matrix of eigenvalues of the projected Hessian matrix, corresponding to the squares of transverse vibrational frequencies.
C. Lifted matrix representation for matrix completion
Our work focuses on the potential energy component, V(s, q), of the RPH,
where F − 1 is the number of transverse modes, with F = 3N − 6 for a nonlinear system consisting of N atoms. The first-order or gradient term is absent in the RPH potential energy expression since all displacements, qk, are perpendicular to the gradient or path of steepest descent. This equation can be rearranged and expressed for all points along the MEP. If the MEP consists of m points (s1, s2, …, sm), m equations are obtained, along the lines of Eq. (1),
This can then be expressed as the lifted polynomial equation, xTC = 0, that forms the underlying model for VMC, where x and C are given by
respectively. Each column in C contains all the eigenvalues corresponding to transverse vibrational modes for a point on the MEP. Although it is difficult to predict the exact rank of C for an arbitrary reaction, we believe that the underlying problem is rank-deficient. This is because only a small fraction of the total vibrational degrees of freedom undergo marked changes over the course of a reaction step. VMC recovers the complete C matrix through limited sampling of matrix elements coupled with a rank minimization method.33
D. Variety-based matrix completion
Matrix completion methods31,39 rely on the underlying low-rank character of data that must be recovered or completed. Recovery is carried out by minimizing the rank of the matrix. Ongie and co-workers exploit the rank-deficient character of the lifted matrix representation of an algebraic variety to develop VMC.33 They use a kernelized representation of the polynomial feature map to overcome the problem of dimensionality and carry out rank minimization using the iterative reweighted least-squares (IRLS) method.40,41 We implement this VMC framework for recovering transverse Hessian eigenvalues constituting the MEP from an incomplete C matrix. Since we retain all the central aspects of the original algorithm, a step-by-step procedure is not described in this work. We direct readers to the pioneering VMC work by Ongie and co-workers for a detailed account of key steps constituting the algorithm.33
E. Implementation of HVMC—Recovering C
In this work, we demonstrate proof-of-concept that HVMC can efficiently recover eigenvalues constituting C and can serve as a viable alternative to multiple, sometimes prohibitive nuclear Hessian calculations. We do so by first constructing the complete matrix of eigenvalues, C. Elements of C are randomly masked to create missing data. HVMC performance is then assessed based on the minimum data (or sampling density) necessary to recover the full matrix as well as the accuracy of resulting thermochemical quantities. The procedure is described below.
1. Matrix construction
The MEP, or intrinsic reaction coordinate (IRC),42–44 for a model reaction is generated using the built-in IRC method in Q-Chem 5.245 in mass-weighted coordinates by setting the maximum step-size to 0.1a0. We invoke the RPH and project out the reaction coordinate, translations, and rotations from the Hessians. The complete C matrix is constructed from eigenvalues of the projected Hessian by ordering the eigenvectors constituting successive MEP points using scalar products. The top row [Eq. (5)] consisting of potential energy differences is assumed to be constant and small and is therefore neglected in this stage of algorithm development. Matrix elements are randomly masked (removed) to achieve a range of sampling densities, denoted by ρ (0 < ρ < 1), that represents the fraction of available elements or information in each column. The value of ρ is varied from 0.1 to 0.9 in increments of 0.05.
2. HVMC algorithm
We implement a Python 3.7 adaptation of the original MATLAB VMC program developed by Ongie and co-workers to recover the full C matrix via rank minimization.33 The kernel function is defined by
where the superscript ⊙ denotes element-wise power and 1 is a matrix of all ones. Since the harmonic RPH potential energy expression is a quadratic polynomial, we set the dimension d = 2. The eigenvalues and eigenvectors obtained upon diagonalization of the kernel function are used to solve the rank-minimization problem.41 We use the Schatten-p quasi-norm in the IRLS method with p = 0.5. Smoothing parameters for the gradient descent step and convergence parameters are optimized based on tests with model systems. HVMC is converged when the IRLS gradient update step is less than 1 · 10−8 or when maximum allowable iterations (10 000) are reached. The algorithm typically converges in less than 4000 iterations.
3. Model performance
The Frobenius norm is used to compute errors in recovery of columns of C,
where i is the column index corresponding to a point on the MEP and subscripts HVMC and True correspond to the model prediction and true values, respectively. The total error, Errmat, is also computed by calculating a matrix norm in Eq. (7) instead of the column norm. We further quantify HVMC performance based on prediction errors of ZPE and vibrational free energy [Gvib(T, s)] along the reaction coordinate,6,46
While ZPE errors enable us to assess the recovery of larger eigenvalues, errors in Gvib at high temperatures (say 1000 K) reflect model accuracy in the recovery of smaller eigenvalues. However, since very small eigenvalues typically correspond to rocking modes or internal rotations and not vibrational modes, we calculate Gvib based on a system-dependent cut-off value.
For every value of ρ, we carry out 100 trials with randomly generated masked C matrices. The errors are calculated as differences between HVMC and true values. Both mean and extremum errors across 100 trials are utilized to assess HVMC performance. This is because, for a given trial, mean ZPE and Gvib errors attain desired accuracies even with small ρ but do not capture large deviations that might occur at specific points on the MEP. The desired sampling density is the minimum ρ for which at least 75 HVMC trials yield both mean and extremum errors in ZPE and Gvib within chemical accuracy (10 kJ/mol or 2.39 kcal/mol). This relaxed criterion, rather than successful recovery in all 100 trials, is imposed in recognition of the fact that large errors in ZPE and Gvib are encountered in a small number of trials wherein blocks, or entire submatrices, of C are masked, leading to poor recovery. The mean norm error over 100 trials, μErr,mat, corresponding to the desired accuracies in ZPE and ΔGvib is used as the basis for examining HVMC performance with varying system size.
4. Test systems—SN2 reaction
We calculate the MEP for an archetypal SN2 reaction, Cl + CH3Cl → ClCH3 + Cl,47 treated using the ωB97X/def2-SVP level of theory.48 Since the reaction is symmetric, HVMC is employed to recover eigenvalues associated with one half of the reaction, shown in Fig. 1. Although the variational effect for this reaction is expected to be small, it serves as a good platform for testing the efficacy of interpolation schemes for VTST.8 We choose density functional theory (DFT)49,50 and not correlated wavefunction theory because our goal here is to quantify HVMC performance and determine feasibility for eigenvalue recovery, rather than to accurately calculate absolute rates with VTST.
One half of symmetric SN2 reaction: Cl + CH3Cl → ClCH3 + Cl. There are 24 points along this path. is the adiabatic ground state potential obtained upon adding ZPE to V0 (level of theory: ωB97X/def2-SVP). ΔGvib (right axis) is the vibrational free energy change at 1000 K, calculated using vibrational frequencies higher than a cut-off value of 100 cm−1. All energies are in kcal/mol.
One half of symmetric SN2 reaction: Cl + CH3Cl → ClCH3 + Cl. There are 24 points along this path. is the adiabatic ground state potential obtained upon adding ZPE to V0 (level of theory: ωB97X/def2-SVP). ΔGvib (right axis) is the vibrational free energy change at 1000 K, calculated using vibrational frequencies higher than a cut-off value of 100 cm−1. All energies are in kcal/mol.
III. RESULTS AND DISCUSSION
A. Algorithm performance: 2 reaction
HVMC converges to desired tolerances with a sampling density, ρ = 0.65. Figure 2 depicts algorithm performance for recovery of eigenvalues associated with all 24 points along the half-MEP. Most columns are recovered to within 1% error across all 100 trials. By sampling 65% of the elements, we recover ZPE and Gvib (1000 K) with mean errors within chemical accuracy in all 100 trials. Maximum deviations are also within chemical accuracy for ZPE across all 100 trials and for Gvib in 83 trials. Errors in the prediction of Gvib are sensitive to the imposed cut-off value. For this reaction, we find that most rocking/internal rotation modes lie below 100 cm−1, and therefore, these modes, along with imaginary ones, are removed when estimating Gvib. The results are summarized in Fig. 3.
HVMC performance (100 trials)—recovery of C for Cl + CH3Cl → ClCH3 + Cl: columns correspond to MEP points, sorted in descending order of recovery error. Column errors (y-axis) are computed using Eq. (7). “Initial” errors correspond to random masking of the C matrix to achieve ρ = 0.65. “HVMC” errors reflect algorithm performance.
HVMC performance (100 trials)—recovery of C for Cl + CH3Cl → ClCH3 + Cl: columns correspond to MEP points, sorted in descending order of recovery error. Column errors (y-axis) are computed using Eq. (7). “Initial” errors correspond to random masking of the C matrix to achieve ρ = 0.65. “HVMC” errors reflect algorithm performance.
HVMC performance (100 trials)—VTST: (top) mean errors and (bottom) maximum deviations observed in ZPE and Gvib (1000 K) with random sampling (ρ = 0.65). All energies are in kcal/mol.
HVMC performance (100 trials)—VTST: (top) mean errors and (bottom) maximum deviations observed in ZPE and Gvib (1000 K) with random sampling (ρ = 0.65). All energies are in kcal/mol.
The required sampling density for accurate recovery of ZPE alone is much lower than 0.65. All 100 trials recover ZPE within chemical accuracy (both mean and extremum errors) for ρ as low as 0.25. This is an intuitive result because smaller eigenvalues do not contribute significantly to ZPE, and therefore, deviations in their predicted values have little impact on ZPE accuracies. Consequently, if the objective is to only recover , the adiabatic ground state potential energy, sampling requirements, and therefore computational costs are lower when accurate Gvib is also desired. While most ΔGvib predictions on the half-MEP are within chemical accuracy for ρ = 0.65, deviations occur more often in the region closer to the transition structure than the product (Fig. 4). This is because the region around the saddle point (s = 0) sees larger variation in certain vibrational degrees of freedom compared to either minimum. Therefore, the sampling strategy in a practical HVMC implementation must ideally be s-dependent, with denser sampling closer to s = 0 to optimize recovery and costs.
ΔGvib at 1000 K (100 trials): HVMC prediction vs target values, with each trial represented by a different color. Free energy differences are reported with respect to the product. Shaded region indicates a range of desired accuracy (±2.39 kcal/mol).
ΔGvib at 1000 K (100 trials): HVMC prediction vs target values, with each trial represented by a different color. Free energy differences are reported with respect to the product. Shaded region indicates a range of desired accuracy (±2.39 kcal/mol).
We note that the choice of 1000 K for assessing HVMC performance is based on our own interest in applying VTST to problems in thermal catalysis for which 1000 K is a reasonable ceiling value. Figure 5 shows the percentage trials that recover Gvib to within desired accuracies as a function of ρ at various temperatures. HVMC sampling requirements depend strongly on the reactions of interest and corresponding temperature ranges. Sampling requirements are low (ρ ≤ 0.25) for reactions at low temperature. On the other hand, HVMC in its current form may not yet be practical (ρ > 0.8) for VTST analysis of similar-sized systems at very high temperatures, as may be the case with combustion kinetics. The system size dependence of HVMC performance is described in Sec. III B.
HVMC performance (100 trials)—recovery of Gvib for Cl + CH3Cl → ClCH3 + Cl at various temperatures: (top) μErr,mat ± σ, mean normed matrix recovery error, with the error bar representing one standard deviation, and (bottom) percentage of trials that recover Gvib within desired accuracy at four temperatures—500 K, 1000 K, 1500 K, and 2500 K.
HVMC performance (100 trials)—recovery of Gvib for Cl + CH3Cl → ClCH3 + Cl at various temperatures: (top) μErr,mat ± σ, mean normed matrix recovery error, with the error bar representing one standard deviation, and (bottom) percentage of trials that recover Gvib within desired accuracy at four temperatures—500 K, 1000 K, 1500 K, and 2500 K.
B. System size and rank
Since Hessians involve significant computational effort and costs and scale unfavorably with system size, it is important to probe the system size dependence of HVMC performance and efficiency. We turn to the expression for theoretical minimum sampling density, ρtheoretical, derived by Ongie and co-workers for variety-based matrix completion,33
where R is the rank of the matrix, n is the number of features in polynomial space, and d is the highest degree of the polynomials (=2 in this case). Although Eq. (9) is a necessary condition, it is a general expression that does not take into account underlying characteristics of the specific matrices being recovered with VMC. Nevertheless, we employ Eq. (9) to help interpret the system-size dependence of HVMC performance.
The rank R is not known a priori for this problem. Therefore, we estimate ρtheoretical using the ceiling value of R, Rmax = min(F − 1, m). n is the dimensionality of monomials up to quadratic terms, , given by . We first consider the model SN2 reaction consisting of 6 atoms and 24 points on the half-MEP. Optimal recovery, based on the performance parameters defined in Sec. III A, is achieved at ρ = 0.65 or higher. If we neglect the role of any underlying problem complexities, this is close to ρtheoretical = 0.69, provided that C is nearly full-rank (R → Rmax = 11). If this is true, increasing MEP points, m, must still lead to a value of ρ that is close to the theoretical prediction. To test this, we construct C with m = 50 points by setting the IRC step-size to 0.05a0. To recover C with mean norm error close to the MEP with 24 points (μErr,mat = 0.32%), we find that ρ = 0.5, identical to the theoretical value predicted by Eq. (9). Therefore, it is very likely that C for the model reaction, Cl + CH3Cl → ClCH3 + Cl, is close to full-rank.
The critical question then is whether the lifted matrix remains full-rank when applied to larger systems. Our hypothesis is that the problem rank becomes smaller as systems grow in size. This stems from the fact that the number of modes that change significantly in the course of a reaction step becomes an increasingly smaller fraction of the total degrees of freedom constituting the system. Therefore, we expect a reduction in the rank of C with increasing system size. Consequently, true sampling density must be lower than the theoretical bounds predicted by Eq. (9) using Rmax.
To test this hypothesis and examine HVMC performance with increasing system size, we create five additional systems, shown in Fig. 6, using the same SN2 reaction (1) as the template. In each case, the half-MEP is truncated at 24 points to maintain constant m, so we can exclusively probe the impact of increasing system size. It must be noted that if C matrices are full-rank, Eq. (9) predicts ρtheoretical ≈ 1 for all systems except (1). We use mean norm error over 100 trials, μErr,mat, and not thermochemical quantities to assess HVMC performance because Gvib accuracies depend on a system-specific cut-off frequency. As discussed previously, ρ = 0.65 yields μErr,mat = 0.32% for reaction (1). For the remaining five systems, we determine the closest value of ρ for which we observe Errmat in the range 0.32 ± 0.10%, where 0.10% represents the standard deviation (σ) of matrix norm errors over 100 trials.
Transition structures of model SN2 reactions chosen to probe HVMC performance with varying system size. Carbon atoms are in cyan, hydrogen in white, and chlorine in green.
Transition structures of model SN2 reactions chosen to probe HVMC performance with varying system size. Carbon atoms are in cyan, hydrogen in white, and chlorine in green.
Table I describes HVMC performance for (1)–(6). The minimum density required to achieve the same model accuracy as (1) increases by a modest amount for (2) but decreases sharply and plateaus at ρ = 0.3 for (3)–(6). In addition, both μErr,mat and variance steadily decrease from (3) to (6) for the same value of ρ. Therefore, as hypothesized, the problem rank decreases and likely becomes constant as the system size increases. For systems (3)–(6) and m = 24, we find that R ≈ 3 yields ρtheoretical ≈ 0.3 when substituted in Eq. (9).
HVMC system size dependence (100 trials)—Minimum sampling density, ρ, and mean norm error (μErr,mat).
Reaction . | ρ . | μErr,mat ± σ . |
---|---|---|
(1) | 0.65 | 0.32 ± 0.10 |
(2) | 0.80 | 0.45 ± 0.27 |
(3) | 0.30 | 0.37 ± 0.19 |
(4) | 0.30 | 0.29 ± 0.12 |
(5) | 0.30 | 0.25 ± 0.09 |
(6) | 0.30 | 0.23 ± 0.08 |
The anomalous value of ρ observed with (2) does not reflect poor HVMC performance but is rather the result of the scalar product-based procedure for ordering eigenvectors. The ordering of two near-degenerate modes in the transition structure and its nearest MEP point, corresponding to symmetric –CH3 and –CH2 stretches, is reversed owing to non-negligible overlap of the eigenvector of the transition structure with three eigenvectors of its nearest neighbor. HVMC performance is poorer as a result of this imperfect ordering. Going forward, we aim to employ a rigorous ordering approach to construct C that takes into account all possible rotations of degenerate or near-degenerate eigenvectors.46 Therefore, with the exception of (2) and for fixed number of points (m = 24) on the MEP, we demonstrate that only 30% of C is essential to recover the complete matrix. Based on the success of the algorithm demonstrated by this proof-of-principle work, we outline key next steps toward developing a practical HVMC implementation for VTST-based rate calculations.
We conceive of two ways in which HVMC can be adapted for canonical VTST rate calculations. The first is where we compute only as many Hessians on the MEP as dictated by the minimum sampling requirement, which we can now determine based on the approximate rank identified in this study. Hessians at minima and the transition structure are typically already available since they are computed to verify stationary points. The projected eigenvalues are then employed to construct C for complete recovery with HVMC. However, this approach can still be challenging in situations where (a) analytical Hessians are intractable either owing to large system size or choice of level of theory and (b) analytical Hessians are unavailable for the chosen level of theory, in which case finite difference Hessians are expensive even for small to medium-sized systems such as those utilized in this study. In such scenarios, stationary point verification can be carried out based on approximate schemes, several of which have been reported in the literature.51–58 Similar approximate schemes, using gradient-based finite differences, can also be employed for sampling eigenvalues. We outline our future work—integration of HVMC with a gradient-based, low-cost sampling scheme—in Sec. III C.
C. Next steps: HVMC with gradient-based sampling
With the goal of constructing and completing C when limited (e.g., stationary points) or no full Hessian information is available, we outline the following steps. To construct the incomplete C matrix, we aim to utilize energies and gradients constituting the MEP along with sufficient eigenvalues to match minimum ρ requirements. The sampling strategy involves calculation of an approximate eigenvalue using finite differences of gradients obtained via small perturbations along the eigenvector, shown to be successful in prior studies.57,59 These eigenvalues can be calculated and ordered using approximate eigenvectors obtained from inexpensive molecular mechanics (MM) Hessians.28 Furthermore, we aim to explore the possibility of incorporating physically meaningful curvilinear internal coordinates to overcome the problem of unphysical eigenvalues that emerge in the projected mass-weighted coordinate system employed in this study.60
Matrix completion methods rely on incoherence of matrix elements with the standard basis as well as random sampling of elements.31 This conflicts with the rational sampling strategy desired in the case of full MEP recovery, where either entire columns of C are available or denser sampling is desired closer to transition structures (Fig. 4). Our proposed solution is to carry out a change of basis, a common strategy to create an incoherent basis for completion, via either discrete cosine or fast Fourier transforms.28,61 Finally, since MM eigenvectors are approximate, the uncertainty associated with the finite difference eigenvalue must be incorporated in the form of a noise parameter in the HVMC algorithm. Integrating the current HVMC implementation with gradient-based sampling is therefore expected to yield a practical, efficient algorithm for free energy and rate calculations with VTST. This will also set the stage for more sophisticated matrix completion algorithms that leverage additional properties of the problem (e.g., sparsity) to efficiently compute even higher derivatives, or anharmonic terms, for reactions that require accurate treatment of mode coupling.
IV. CONCLUSIONS
Matrix completion methods are traditionally employed to recover complete signal or image information based on limited, noisy data. We identify a novel, unique application of these methods—as efficient schemes for recovering otherwise prohibitive nuclear Hessian eigenvalues. We exploit the underlying algebraic structure of the harmonic potential energy expansion for points lying on an MEP. Using the RPH framework, we construct a quadratic, rank-deficient matrix of projected eigenvalues that can be solved using variety-based matrix completion. We call this algorithm HVMC, or harmonic variety-based matrix completion. The method successfully recovers projected Hessian eigenvalues by sampling a small fraction of elements constituting the matrix. Model performance for VTST applications is demonstrated by assessing accuracies of predicted ZPEs and vibrational free energies at elevated temperatures for SN2 reactions. We find that sampling requirements drop to 30% as systems become larger for the same number of points on the MEP. This indicates that matrix rank becomes a constant, small number for larger systems. This is promising for significant cost reduction when we develop a gradient-based, practical HVMC algorithm for VTST applications. The modified algorithm relies on approximate, inexpensive eigenvectors and finite differences calculation of eigenvalues to recover the complete set of projected eigenvalues constituting the MEP. We therefore expect HVMC to become a useful tool for easing costs associated with analyzing the variational character of chemical reactions and catalytic systems of practical importance.
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.
ACKNOWLEDGMENTS
We acknowledge High-Performance Computing (HPC) at the USC for computing resources and support. S.J.Q. acknowledges support from the Undergraduate Research Associates Program (2019–2020) at the USC. U.M. acknowledges support from the National Science Foundation (Grant Nos. NSF CCF-1718560, NSF CPS-1446901, and NSF CCF-1817200), the Army Research Office (Grant No. ARO W911NF1910269), and the Office of Naval Research (Grant No. ONR N00014-15-1-2550). S.M.S. acknowledges support from startup resources provided by the USC. We are grateful to Dr. Amr Elnakeeb (USC) for valuable discussions.