We report permutationally invariant polynomial (PIP) fits to energies and gradients for 15-atom tropolone. These include standard, augmented, and fragmented PIP bases. Approximately, 6600 energies and their associated gradients are obtained from direct-dynamics calculations using DFT/B3LYP/6-31+G(d) supplemented by grid calculations spanning an energy range up to roughly 35 000 cm^{−1}. Three fragmentation schemes are investigated with respect to efficiency and fit precision. In addition, several fits are done with reduced weight for gradient data relative to energies. These do result in more precision for the H-transfer barrier height. The properties of the fits such as stationary points, harmonic frequencies, and the barrier to H-atom transfer are reported and compared to direct calculations. A previous 1D model is used to obtain the tunneling splitting for the ground vibrational state and qualitative predictions for excited vibrational states. This model is applied to numerous fits with different barrier heights and then used to extrapolate the H and D atom tunneling splittings to values at the CCSD(T)-F12 barrier. The extrapolated values are 2.3 and 0.14 cm^{−1}, respectively for H and D. These are about a factor of two larger than experiment, but within the expected level of agreement with experiment for the 1D method used and the level of the electronic structure theory.

## I. INTRODUCTION

In the past 15 years, there has been dramatic progress in developing methods to represent *ab initio*-based potential energy surfaces (PESs) for complex chemical reactions with up to eight atoms, molecules with symmetric minima, and non-covalent complexes with up to 10 atoms. Much of this progress has been made in the development of non-parametric, machine learning (ML) approaches to fit large datasets of electronic energies for polyatomic molecules and clusters.^{1–15} Among these, there are several that have specifically been applied to molecules (as opposed to homogeneous materials). They are permutationally invariant polynomials (PIPs),^{1,14} Neural Networks (NNs),^{3–6,16–19} NN with PIP inputs,^{10–13,20,21} Gaussian Process Regression (GPR),^{8,15,22} and GPR with PIP inputs.^{23} These methods are now basically routine for 4–8 atoms. There are a number of issues to go beyond 10 atoms. One that is common to all fitting methods is the highly non-linear increase in the computational effort in using the “gold standard” level of *ab initio* theory, i.e., coupled cluster singles, doubles, and perturbative triples [CCSD(T)], for larger molecules. Other issues include the number of variables, which is equivalent to the mathematical dimensionality of the space. For PIP methods, both direct PIP regression and using PIPs for inputs for NN and GPR, the number of polynomials grows rapidly with the number of variables. This particular bottleneck has been the motivation for a recent strategy from our group to break the 10-atom limit.^{24–26} The basic idea (given in more detail below) is to fragment the PIP basis. We investigate this approach here for 15-atom tropolone. While this is a significant step beyond the 10-atom limit, we are able to generate a full PIP basis (of maximum polynomial order of 3) and this provides a “benchmark” PES.

Before we present the new work on tropolone, we make some general remarks about the plethora of methods mentioned above. They all have in common that they do not rely on a model for representing potentials, e.g., Lennard-Jones, LEPS, exp/6, and force-fields. In this sense, they are all non-parametric in the language of machine-learning. They also use a universal set of inputs, also known as descriptors, for the fitting. For PIP, PIP-NN, and PIP-GPR, they are all the internuclear distances or transformations, e.g., Morse variables. This is distinct from approaches using models, where the variables are molecule-specific. Hence, based on these basic aspects, the above are all machine-learning methods; however, there is a fundamental difference in the way the data are used in these methods. In PIP and NN, the parameters contained in those approaches are optimized, generally using a least squares minimization criterion. In PIP, where parameters are linear, this is linear-least-squares. In NN fitting, the parameters are non-linear and so the least squares optimization is non-linear. The number of parameters in both approaches can be thousands or even tens of thousands, depending on the dimensionality of the space. As an aside, it is worth noting that if Morse variables are used as the inputs for PIP or PIP-NN, then the range parameter(s) of the variables could be treated as additional non-linear parameters. The number of such parameters is very small, however, compared to the thousands of parameters just mentioned. In any case, in these methods, the data are used in the optimizations, but not for prediction. By contrast, other machine-learning methods use the data explicitly for prediction. GPR is a prominent example of this method. Modified Shepard interpolation^{27} and kernel ridge regression^{28} are other examples. All these methods have the same goals, which are a precise representation of the known data (GP can reproduce the data exactly, but, since this generally leads to a singular matrix for prediction, “noise” is added to the data) and smooth and accurate predictions. It is of course of interest to compare the performance of these different ML approaches. We refer the interested reader to a recent assessment of GPR and PIP regression for four case studies, where both methods perform with about equal high quality.^{23}

Of course, all machine-learning approaches aim for ever increasing ranges of applicability, in particular, to larger molecules. As noted above, the PIP approach has been applied for molecules with as many as 10 atoms and this value has been cited in the literature as the practical limit for the PIP approach, while the PIP-NN approach has been applied to the seven atom OH + CH_{4} reaction.^{29} The “10-atom limit” using the PIP approach was broken for the 12-atom *trans*- and *cis-N-methyl acetamide*.^{24,25} The major point of that paper, which is preliminary to the present one, was to describe a fragmented PIP approach that is able to extend the PIP method to larger molecules. We also note very recent progress for larger molecules using other methods. These include Gaussian Process Regression (using a different fragmentation strategy) to obtain a PES for the 19-atom protonated imidazole dimer^{30} and a high-dimensional NN approach to develop PESs for H-atom transfers in three molecules with up to 15 atoms^{31} using the PhysNet software.^{32}

Returning to the PIP approach, we recently suggested that there is a regime of applicability for molecules with more than 10 atoms but probably less than hundreds of atoms where the PIP approach could be extended.^{24,25} This suggestion arises from the fact that Morse variables go asymptotically to zero at large inter-nuclear distances. Thus, for large molecules, with necessarily a large number of large inter-nuclear distances, many Morse variables are essentially zero, so that any PIP basis function containing these variables is zero and can be dropped from the total basis. This observation allows one to fragment the larger target molecule into smaller moieties for which PIP basis sets can be generated efficiently.^{24,25} This approach is indeed successful, and it was used with recent, extended MSA software that includes gradient data for fits.^{33} However, several issues with the algorithm were noted. These included redundant terms in the basis and also the cost of evaluating gradients. New software, described in a recent paper,^{26} solves these two problems. In addition, a novel approach to prune or augment a full PIP basis was also proposed in that paper; more details are now provided.

In this paper, we apply a combination of full basis, fragmentation, augmentation, and pruning to develop potential energy surfaces for the 15-atom molecule tropolone. Like the previously studied malonaldehyde,^{34,35} tropolone is of interest because of its hydrogen transfer dynamics.^{36} The small mass of hydrogen allows it to tunnel through the barrier of a double well, resulting in a splitting of vibrational levels. Building on substantial FTIR work by Redington and co-workers,^{37–44} as well as jet-cooled infrared fluorescence dip spectroscopy,^{45} Vaccaro and co-workers recently used two-color resonant four-wave mixing techniques to measure the splitting in tropolone as a function of vibrational level.^{46} Although there have been several previous theoretical approaches to this problem,^{47–50} these new measurements beg for a detailed explanation, one that no doubt will require an accurate potential energy surface as well as a quantum dynamics approach for determining the splitting and its dependence on mode. Because the double well in tropolone is symmetric, the surface needs to maintain permutational symmetry, posing yet another challenge.

In Sec. II, we briefly describe the various new approaches to developing a PES for tropolone; then, we present the details of the database of energies and gradients followed by an analysis of the various PESs in Sec. III. In Sec. IV, by presenting an approximate calculation of the H-atom tunneling dynamics obtained from the PESs as well as qualitative predictions of mode-specific tunneling, we make close contact with the recent experiments performed by the Vaccaro group.^{46} A short summary and conclusions are given in Sec. V.

## II. BRIEF RECAP OF THEORY

To begin, recall that the PIP approach represents the potential as follows using compact notation:

where *c*_{i} are coefficients, *p*_{i} are permutationally invariant polynomials, denoted as PIPs, (the basis set), and *n*_{p} is the total number of polynomials for a given maximum polynomial order. The *p*_{i} are typically functions of Morse variables, which themselves are functions of the inter-atomic distances, *r*_{α,β} (by the usual exponential relationship exp(−*r*_{α,β}/*λ*), where *λ* is commonly chosen to be equal to 2 bohrs). Following the practice of the MSA software,^{51} Morse variables are in this work denoted by *x*_{l}. We further stipulate that the basis functions should be chosen to maintain permutational invariance among identical atoms or at least the most important of them. The linear coefficients are obtained using standard least squares fits to large datasets of electronic energies at “scattered” geometries.

In standard PIP approaches, the computational issue arises when the basis set for the parent molecule becomes completely cumbersome—too large to be practically useful either because the number of coefficients is so large that the least squares optimization becomes problematic or because calculating the proper basis set takes too long. The size of the basis varies in a complicated and generally non-linear way with respect to the maximum polynomial order, the number of Morse variables, and the order of the symmetric group.^{1} This growth in the size of the PIP basis is the basic consideration in stating the 10-atom limit for the approach.

However, as noted above, the fragmented basis approach is an effective way to break this limit. Clearly, by fragmenting a parent molecule into groups of smaller molecular moieties, the basis for each smaller moiety can be calculated rapidly and then combined with those of other fragments to provide a compact and hopefully still precise representation of the potential energy.^{24} To be specific, consider a simple example of a 5-atom molecule with atoms labeled 1–5 and a scheme in which the molecule is fragmented into three fragments, say {1,2,3}, {2,3,4}, and {3,4,5}. In this 3-fragment scheme, the potential is given compactly by

where {*p*}, {*p*′}, and {*p*″} are PIP bases for the *n*th fragment, *n* = 1, 2, 3, {*c*}, {*c*′}, and {*c*″} are the corresponding linear coefficients, *x*_{n} represent the set of corresponding Morse variables, and *m*_{n} indicate a set of monomials built from the Morse variables. Morse variables between atoms 1 and 4, atoms 1 and 5, and atoms 2 and 5 are assumed to be zero and, hence, not in the fragmented bases. In this example, there are some Morse variables in common among the fragments, and it should be clear that there are indeed some redundant basis functions in this expression in terms of common Morse variables.

These issues were pointed out previously;^{24,25} however, they were not “fatal” ones because the linear least squares method used was able to deal with a modest number of identical basis functions. Nevertheless, there is compelling motivation to eliminate these redundant basis functions and thereby reduce the size of the basis. We do note that the redundant-term issue is similar to an issue two of us identified earlier for developing PIP representations of interaction potentials that should rigorously vanish in asymptotic regions where there is no interfragment interaction. In this case, the issue concerned basis functions involving Morse variables of fragments that do not go to zero at large internuclear distances where rigorously there is no inter-fragment interaction. An effective “empirical” pruning procedure was then employed to eliminate such basis functions and applied to several systems.^{52–54}

An “empirical” pruning or adding approach is also developed here as a post-processing task performed on the standard complete PIP bases of the fragments.

## III. COMPUTATIONAL METHODS

The development of an accurate, permutationally invariant potential energy surface for tropolone is based on the choice and application of *ab initio* methods. Because the surface requires energies and gradients at many geometries, we chose for most of the work to use density functional theory at the B3LYP level with a 6-31+G(d) basis set, as implemented in Molpro.^{55,56} The characterization of stationary points on the surface at the B3LYP/6-31+G(d) level was first carried out for the geometries of the global minimum (GM), for the transition state for hydrogen transfer between the two oxygen atoms, for conformer I in which the OH is pointed away from the second oxygen, and for the transition state between conformer I and the global minimum. Optimum geometries and harmonic vibrational frequencies were determined at each of these stationary points.

An *ab initio* molecular dynamics program was used to calculate B3LYP/6-31+G(d) energies and gradients at geometries encountered on trajectories that started from specific locations. For example, 6000 steps were calculated starting from the optimized GM for energies of 4000 cm^{−1}, 10 000 cm^{−1}, 20 000 cm^{−1}, 30 000 cm^{−1}, and 40 000 cm^{−1}, while 3000 steps were calculated starting from the optimized geometry for conformer I at an energy of 6000 cm^{−1}. The value of 40 000 cm^{−1} for the highest trajectory energy was chosen because (a) we wanted to have good accuracy at least up to the zero point energy (ZPE), which, for tropolone, is about 25 000 cm^{−1} (see Sec. IV B), and (b) the peak of the distribution of potential values occurs at 20 000 cm^{−1} for a trajectory run at 40 000 cm^{−1} of total energy. For each trajectory, a dataset of geometries, energies, and gradients was constructed by taking data from every tenth step of each trajectory, giving data at 3300 geometries. The additional 3301 data points were taken from calculations on grids centered on the geometries of the GM and of the H-transfer transition state. This dataset of energies and gradients at 6601 geometries was used for fitting all PESs. Based on the 6601 geometry dataset, two smaller datasets were also investigated: a 6249 geometry set with an upper energy cutoff of 20 000 cm^{−1} and a 5717 geometry set with an upper energy cutoff of 15 000 cm^{−1}. The 6601 geometry set was augmented to 6768 later to fix the “holes” found in the diffusion Monte Carlo (DMC) calculation of the zero-point energy. This 6768 set was only used in two of the best fits. A smaller set of 3601 geometries, including all the *ab initio* molecular dynamics points and 301 grid points centered on the transition state to hydrogen transfer, was also investigated in some fits. Finally, a “test” dataset was also taken from the trajectories using approximately every 40th point and separated maximally from the points used in the main dataset; the test set contained energies and gradients at 825 geometries.

Information about the GM and the transition state for H-atom transfer was also obtained using CCSD(T)-F12 with the aVDZ basis sets. The optimized geometries were calculated, and harmonic vibrational frequencies were obtained.

The fits are made to an *ab initio* set of, typically, 6601 energies and (3*N*) × 6601 gradient components, where *N* = 15 is the number of atoms in the parent molecule, tropolone. A distribution of energies is shown in Fig. 1. Plots showing the correlation between the *ab initio* data and the fit are shown for the two most accurate surfaces in Figs. SI-4 and SI-5. The fit is compared to the original *ab initio* data by calculating the root-mean squared errors (RMSE) for comparison of the potential energies and the gradient components between the fit and the *ab initio* data as well as by determining the time it takes to evaluate the potential energy and gradient components for a particular geometry. The times we will report are those for evaluating 3000 geometries to obtain an energy and to obtain 45 gradient components for each. There is usually a trade-off between the speed and accuracy of the calculation, and typically, the larger the basis set, the slower the calculation but the higher the accuracy. Thus, choosing the basis set presents a challenge: we need to keep the basis set small enough so that the calculation is efficient while keeping it big enough to ensure an adequate fit to the *ab initio* data.

Extensive use of fragmentation^{25,26} and permutationally invariant polynomial basis sets allowed us to investigate several possible fits to the potential energy surface. First, software described and made available elsewhere^{26} was used to investigate fits based on fragmentation. It allows one to specify various fragments of the parent molecule so as to reduce the size of the basis set of polynomials. The software ensured the elimination of duplicate polynomials, and it also provides for more rapid evaluation of gradient components, if desired. Second, we used software, reported previously,^{26} that allows one either to prune from or to add to an existing basis set in a manner that eliminates the least important existing polynomials or adds the most important new ones.

Both the fragmentation software and the pruning/adding software are based on the same idea, that is, to emphasize polynomials that involve Morse variables of high value. The Morse functions are transformed interatomic distances, *x*(*r*_{α,β}) = exp(−*r*_{α,β}/*λ*), where *λ* is commonly chosen to be equal to 2 bohrs. Thus, the Morse variables range from 0 to 1. Because the interactions of atoms at large distance do not create much attractive or repulsive contribution to the potential, it makes sense not to include their Morse variables whose values are nearly zero. The first step in determining which polynomials are most or least important is to evaluate all the Morse variables for all the geometries included in the *ab initio* dataset and to record the maximum value of each Morse variable. An example of this evaluation, for the 6768 dataset, is given in Table SI-3. One then examines each of the polynomials of interest to determine its value when evaluated with the group of maximal Morse variable values. An ordered list of polynomial values is thus developed. The next step for pruning is to use the list in order to eliminate a number of polynomials, starting from that with the lowest value and moving upward in value until the desired number of polynomials has been eliminated. For adding polynomials, the current software looks at all pair-wise products of existing polynomials. If the product is a polynomial already in use, it skips to the next pair. If not, it evaluates the new polynomial using the maximal Morse values. Using an ordered list of existing polynomial values, the program examines pairs in a sequence such that the largest-valued possible products are examined first and then lesser-valued ones are evaluated subsequently. The program stops when the desired number of new polynomials has been identified. Further steps in the software involve deleting duplicates, renumbering the polynomials, adding derivatives, etc., but these are all steps that have already been developed and included in the fragmentation software.^{26} Note that either for adding polynomials or for pruning polynomials, we maintain the permutational symmetry of the original basis set of polynomials. This preservation is due to the fact that each polynomial added or deleted has the correct permutational symmetry.

As described elsewhere,^{26} it is convenient to establish an arbitrary but consistent numbering scheme for the atoms of the parent molecule. One numbering was used for a basis set of full permutational invariance and is shown in Fig. 2(a), which depicts the minimum energy structure of tropolone. A second numbering was used in the fragmented basis sets and is shown in Fig. 2(b). The equilibrium geometry is of planar symmetry, and the primary molecular interest to chemical physics is the transfer of H atom at the top of each diagram between the two oxygen atoms over a symmetrical barrier.

Tables I–III summarize several of the fragmentation schemes that we have investigated for the quality of fit and the time for execution. The basis sets are denoted by the number of fragments and the atoms in each fragment: 1 fragment of 15 atoms in Table I, 2 fragments of 11 and 12 atoms in Table II, and 4 fragments of 9, 9, 8, and 5 atoms in Table III. In each basis, we investigated fits to the full dataset of 6601 geometries (whose histogram is shown in Fig. 1), as well as to subsets of this dataset with 6249, 5717, and 2717 geometries, corresponding to upper energy cutoffs of 20 000 cm^{−1}, 15 000 cm^{−1}, and 15 000 cm^{−1}, respectively, where the 2717 set is based only on the 3601 geometries obtained from the *ab initio* molecular dynamics and 301 of the grid points. The quality of the fit may be judged from the RMSE values for the potential and gradient components as measured by the difference between the fit and the relevant dataset using inverse energy weighting [wRMSE(pot) and wRMSE(grad)], as described in Sec. SI-1. For each fit, the height of the barrier to H transfer with respect to the global minimum was determined; these values are shown in cm^{−1}. Finally, the times for evaluating 3000 potential energies and 45 × 3000 gradient components are shown; these are relative to a value of 100 for the 1-(15)/25103 basis. We next describe each basis and its results.

Basis/ncoef . | 1-(15) full sym/25103 . | 1-(15) full sym/14843 . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

N (points) | 2717 | 5717 | 6249 | 6601 | 6601^{a} | 2717 | 5717 | 6249 | 6601 | 6768 | 6768^{a} |

wRMSE (pot) | 16 | 21 | 24 | 25 | 10 | 27 | 26 | 29 | 30 | 38 | 16 |

wRMSE (grad) | 14 | 18 | 21 | 23 | 8 | 23 | 24 | 27 | 29 | 34 | 12 |

Barrier | 2064 | 2010 | 1983 | 1971 | 2083 | 2002 | 1974 | 1952 | 1940 | 1931 | 2055 |

Splitting (H) | 3.89 | 4.32 | 4.50 | 4.58 | 3.78 | 4.23 | 4.56 | 4.69 | 4.77 | 4.85 | 3.90 |

Splitting (D) | 0.29 | 0.34 | 0.36 | 0.36 | 0.28 | 0.33 | 0.36 | 0.38 | 0.38 | 0.40 | 0.29 |

Time (pot/grad) | 1.45/100.0 | 1.03/46.5 |

Basis/ncoef . | 1-(15) full sym/25103 . | 1-(15) full sym/14843 . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

N (points) | 2717 | 5717 | 6249 | 6601 | 6601^{a} | 2717 | 5717 | 6249 | 6601 | 6768 | 6768^{a} |

wRMSE (pot) | 16 | 21 | 24 | 25 | 10 | 27 | 26 | 29 | 30 | 38 | 16 |

wRMSE (grad) | 14 | 18 | 21 | 23 | 8 | 23 | 24 | 27 | 29 | 34 | 12 |

Barrier | 2064 | 2010 | 1983 | 1971 | 2083 | 2002 | 1974 | 1952 | 1940 | 1931 | 2055 |

Splitting (H) | 3.89 | 4.32 | 4.50 | 4.58 | 3.78 | 4.23 | 4.56 | 4.69 | 4.77 | 4.85 | 3.90 |

Splitting (D) | 0.29 | 0.34 | 0.36 | 0.36 | 0.28 | 0.33 | 0.36 | 0.38 | 0.38 | 0.40 | 0.29 |

Time (pot/grad) | 1.45/100.0 | 1.03/46.5 |

^{a}

The weight of gradient components is reduced to 1/3 of the corresponding energy.

Basis/ncoef . | 2-(1112)/10271 . | 2-(1112)/11001 . | ||||||
---|---|---|---|---|---|---|---|---|

N (points) | 2717 | 5717 | 6249 | 6601 | 2717 | 5717 | 6249 | 6601 |

wRMSE (pot) | 55 | 50 | 54 | 55 | … | 49 | 53 | 54 |

wRMSE (grad) | 110 | 129 | 133 | 134 | … | 129 | 133 | 133 |

Barrier | 1903 | 1911 | 1886 | 1873 | … | 1916 | 1890 | 1877 |

Splitting (H) | 5.06 | 5.24 | 5.48 | 5.61 | … | 5.15 | 5.40 | 5.53 |

Splitting (D) | 0.42 | 0.43 | 0.46 | 0.48 | … | 0.42 | 0.45 | 0.47 |

Time (pot/grad) | 0.680/29.4 | 0.740/32.4 |

Basis/ncoef . | 2-(1112)/10271 . | 2-(1112)/11001 . | ||||||
---|---|---|---|---|---|---|---|---|

N (points) | 2717 | 5717 | 6249 | 6601 | 2717 | 5717 | 6249 | 6601 |

wRMSE (pot) | 55 | 50 | 54 | 55 | … | 49 | 53 | 54 |

wRMSE (grad) | 110 | 129 | 133 | 134 | … | 129 | 133 | 133 |

Barrier | 1903 | 1911 | 1886 | 1873 | … | 1916 | 1890 | 1877 |

Splitting (H) | 5.06 | 5.24 | 5.48 | 5.61 | … | 5.15 | 5.40 | 5.53 |

Splitting (D) | 0.42 | 0.43 | 0.46 | 0.48 | … | 0.42 | 0.45 | 0.47 |

Time (pot/grad) | 0.680/29.4 | 0.740/32.4 |

Basis/ncoef . | 4-(9985)/16138 . | |||
---|---|---|---|---|

N (points) | 2717 | 5717 | 6249 | 6601 |

wRMSE (pot) | 56 | 54 | 58 | 61 |

wRMSE (grad) | 136 | 169 | 171 | 171 |

Barrier | 1943 | 1929 | 1905 | 1889 |

Splitting (H) | 4.78 | 5.00 | 5.24 | 5.37 |

Splitting (D) | 0.39 | 0.41 | 0.43 | 0.45 |

Time (pot/grad) | 0.641/35.6 |

Basis/ncoef . | 4-(9985)/16138 . | |||
---|---|---|---|---|

N (points) | 2717 | 5717 | 6249 | 6601 |

wRMSE (pot) | 56 | 54 | 58 | 61 |

wRMSE (grad) | 136 | 169 | 171 | 171 |

Barrier | 1943 | 1929 | 1905 | 1889 |

Splitting (H) | 4.78 | 5.00 | 5.24 | 5.37 |

Splitting (D) | 0.39 | 0.41 | 0.43 | 0.45 |

Time (pot/grad) | 0.641/35.6 |

We start with the full-symmetry basis description of tropolone for which the MSA software can provide a permutationally invariant basis set with maximum polynomial order of three. We designate this basis as 1-(15), indicating that there is one fragment (the parent) of 15 atoms. All 105 Morse variables are included, and there are 11 522 monomials that must be evaluated before the polynomials can be evaluated. The number of polynomials is the number of coefficients in the fit. The permutational symmetry is designated as {1,2,2,2,2,2,2,1,1}, the atom numbering is {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}, and the numbering scheme is that of Fig. 2(a). Thus, the following pairs of atoms permute {2,3}, {4,5}, {6,7}, {8,9}, {10,11}, and {12,13}. The full permutational symmetry ensures that as the H atom at position 1 moves between the two oxygen atoms, the barrier will be symmetrical. As given in Table I, the use of a large basis set (25 103 polynomials) and a complete set of 105 Morse variables gives an excellent fit to the full 6601 geometry dataset, with weighted RMS errors for the potential and the gradient components of 25 cm^{−1} and 23 cm^{−1}/bohr, respectively. (We defer discussion of the 6768 datasets for the moment.) The pruning of the basis set to 14 843 coefficients still provides a good fit, with weighted RMS values of 30 cm^{−1} and 29 cm^{−1}/bohr, and the speed is improved by more than a factor of 2. In general, for this 1-(15) example as well as for the others, the weighted RMSE values decrease as the number of points is decreased (i.e., they decrease as the energy cutoff is decreased). The calculated barrier heights increase as the energy cutoff decreases. For comparison, the barrier height obtained by comparing the optimized H transition state to the optimized GM at the B3LYP 6-31+G(d) level used for all dataset point calculations is 2131 cm^{−1}. We note that the fitting errors are small compared to the intrinsic inaccuracy of the density functional theory (DFT) and the basis set adopted so that fit approximations to the barrier height that are within 50–100 cm^{−1} are not too bad. At the CCSD(T)-F12/aVDZ level, the barrier is 2374 cm^{−1}, about 1 kcal/mol larger than that of the best of our fits.

We now turn to the fragmentation results. Designing a fragmentation scheme is partly art and partly science. While there is no requirement that the fragments be connected in the way they are in the parent molecule, most successful schemes put in one fragment mostly atoms that are likely to be close to one another; these have the largest values of the Morse variables. One can intentionally omit Morse variables involving pairs of distant atoms by ensuring that these atoms never appear in the same fragment. However, even if all the Morse variables were included, the number of monomials and polynomials would still be fewer for the fragments than for the parent molecule as a whole. This is because the fragmentation also determines the cross-terms between the Morse variables that appear in their products up to the polynomial order used in the fit. Only cross terms between Morse variables of pairs atoms that appear together in a fragment can appear. Fragments will often overlap, that is, they will have atoms in common. The delete duplicates program ensures that Morse variables, monomials, and polynomials based on the same pairs of atoms are not included more than once. For example, in the 2-(1112) fragmentation (described below), the tropolone is considered as having an upper “half” (atoms 1–8, 12, 14, and 15) and a lower “half” (atoms 1, 5–15); they overlap since atoms 1, 5–8, 12, 14, and 15 appear in both fragments.

Two fragmentation schemes are included here, among many that were studied. The first, designated 2-(1112), has two fragments, one of 11 atoms and one of 12. The basis set uses 93 of the 105 Morse variables, and there are 6643 monomials in addition to the 10 271 polynomials. The permutation symmetries of the two fragments are {2,2,2,2,2,1} and {2,2,2,2,2,1,1} with atom identification numbers {1,5,2,4,6,14,7,15,8,12,3} and {1,5,6,14,7,15,8,12,9,13,10,11}, respectively. The atoms that permute, now using the numbering system of Fig. 2(b), are the same ones that permute in the full symmetry case. Consequently, this fragmentation scheme likewise maintains the symmetry of the barrier to H transfer. A problem with this fragmentation scheme, however, is that in order to permute, atoms must be contained in the same fragment. For example, consider the distant atom pairs {7,15} and {9,13} in Fig. 2(b). These will have small Morse variables, and from a fragmentation point of view, it would be better to have each member of the pair in a separate fragment so that we could eliminate these small-valued Morse variables. However, they would then not permute symmetrically, and the transition barrier would no longer be symmetrical. The price we pay for the fragmentation is that the fit to the datasets is not as good. For the 6601 geometry dataset, the RMSE values are 55 cm^{−1} for the potential and 134 cm^{−1}/bohr for the gradient components. We see from this example that the goals of fragmentation and those of permutational symmetry can often be at conflict with one another. An attempt to increase the accuracy by adding new polynomials to a total of 11 001 was not particularly effective, as can be seen from the right three columns.

The second fragmentation scheme, designated 4-(9985), has four fragments of 9, 9, 8, and 5 atoms. The basis set uses 89 of the 105 Morse variables, and there are 317 monomials and 16 138 polynomials. The fragment permutational symmetries are {2,1,1,1,1,1,1,1}, {2,1,1,1,1,1,1,1}, {1,1,1,1,1,1,1,1}, and {1,1,1,1,1} with atom identifications of {2,4,1,3,5,6,7,8,9}, {2,4,1,3,5,12,13,14,15}, {6,8,9,10,11,12,13,14}, and {1,5,7,10,15}, respectively. Only the two oxygen atoms [numbered 2 and 4 in Fig. 2(b)] are assumed to permute, which means that without further effort this scheme cannot guarantee a symmetrical transition barrier. We circumvent this problem by duplicating the basis set in such a way that whenever a geometry appears, the same geometry with permuted atom identities also appears carrying the same energy and gradient components, as described in Sec. SI-2. This fragmentation scheme provides less accurate fits to the *ab initio* datasets.

The 4-(9985) scheme was also used to see if the fit to the 3601 geometry dataset also fits the “test” set (825 geometries). These test-set geometries were not included in the determination of the coefficients, so this comparison offers a good estimate on how the fit works for unknown geometries. The test gave evenly weighted RMSE values for the potential and gradient components of 74 cm^{−1} and 118 cm^{−1}/bohr, whereas the fit to the 3601 geometry set gave 73 cm^{−1} and 181 cm^{−1}/bohr. Thus, it is expected that the fit should be as good for unknown geometries as for the known ones used in the large dataset.

## IV. RESULTS AND DISCUSSION

### A. Tropolone stationary points

For each of the tropolone fitting schemes listed in Tables I–III and using the 5717 dataset, we determined the harmonic vibrational frequencies for the global minimum (GM); the transition state TS(H) to H atom transfer between the oxygen atoms; Conf I in which the O–H bond is pointed away from rather than toward the remaining O; and TS(I), the transition state to Conf I from the GM. A comparison of each of these frequency sets to those for the optimized structures [at the B3LYP/6-31+G(d) level] gave a mean absolute error (MAE). These values (in cm^{−1}) are shown in Table IV. The 1-(15) full symmetry and the 2-(1112) fitting schemes provide excellent agreement with the *ab initio* data, while the 4-(9985) scheme has very good, but somewhat less accurate, agreement.

Basis/structure . | GM . | TS (H) . | Conf I . | TS (I) . |
---|---|---|---|---|

1-(15)/25103 | 1.7 | 8.4 | 22.4 | 30.4 |

1-(15)/14843 | 1.8 | 8.7 | 17.6 | 28.9 |

2-(1112)/10271 | 10.5 | 14.4 | 20.2 | 26.2 |

2-(1112)/11001 | 10.5 | 14.7 | 21.2 | 27.1 |

4-(9985)/16138 | 12.7 | 18.6 | 32.4 | 39.3 |

Basis/structure . | GM . | TS (H) . | Conf I . | TS (I) . |
---|---|---|---|---|

1-(15)/25103 | 1.7 | 8.4 | 22.4 | 30.4 |

1-(15)/14843 | 1.8 | 8.7 | 17.6 | 28.9 |

2-(1112)/10271 | 10.5 | 14.4 | 20.2 | 26.2 |

2-(1112)/11001 | 10.5 | 14.7 | 21.2 | 27.1 |

4-(9985)/16138 | 12.7 | 18.6 | 32.4 | 39.3 |

Based on the B3LYP/6-31+G(d) optimization results for the stationary points, the energy landscape for tropolone is shown in Fig. 3. The barrier to hydrogen transfer between the two oxygen atoms is about 2131 cm^{−1}. The energy of conformer I lies at about 4523 cm^{−1}, and the barrier between that conformer and the ground state is about 6345 cm^{−1} above the latter, although this latter barrier may have significant (10%–20%) errors.

Benchmark CCSD(T)-F12/aVDZ calculations were performed using Molpro.^{56} The energy of the optimized GM was found to be −420.156 093 72 hartree, while that of the optimized transition state to H transfer was found to be −420.145 277 07 Hartree. Thus, the height of the transition state above the GM is 2374 cm^{−1}. The gradient components for the GM and transition state were 10^{−7} cm^{−1}/bohr or less. Geometries for the optimized GM and TS(H) are listed in Table SI-1, while harmonic frequencies are listed in Table SI-2.

### B. Diffusion Monte Carlo calculations

Diffusion Monte Carlo (DMC) simulation is an approach to compute quantum zero-point energy (ZPE) of a molecule.^{57,58} In this work, the simple unbiased algorithm^{59} was applied. In brief, we use an ensemble of random walkers to represent the nuclear wavefunction of the molecule. At each step, a random displacement is assigned to each walker, and a walker may remain alive or be killed by comparing its potential energy with a reference energy. The reference energy, *E*_{r}(*τ*), is updated after each step using the equation

where *τ* is the imaginary time, *V*(*τ*) is the average potential over all the walkers that are alive, *N*(*τ*) is the number of live walkers at time *τ*, and *α* is a parameter that can control the fluctuations of the number of walkers and the reference energy. Finally, the average of the reference energy over the imaginary time gives an estimate of ZPE.

Five DMC simulations were performed, using the fit that is based on 1-(15)/14843 basis and 6768 geometry dataset, and in each simulation, 20 000 walkers were equilibrated for 10 000 steps and then were propagated for another 50 000 steps to compute the ZPE, with a step size of 5.0 a.u.

Our estimate of tropolone ZPE is 25 085 ± 20 cm^{−1}. This is the average of the five simulations, and the uncertainty is the standard deviation of these five simulations. The harmonic ZPE using the same PES is 25 315 cm^{−1}; therefore, the anharmonicity is more than 200 cm^{−1}.

### C. Tunneling dynamics

As noted above, the ground vibrational state H tunneling splitting has been measured accurately to be 0.96 cm^{−1}.^{43,46} The need for gradients and the size of tropolone (15 atoms) requires the use of a lower level of electronic theory for the PES, DFT as compared to, say, CCSD(T). While we succeeded in developing a good PES at the DFT level of theory, it would clearly be a challenge to expect that the surface should reproduce the experimental data for such a fine spectroscopic feature as the tunneling splitting. We do note that challenge was met in the case of malonaldehyde using a PIP PES that was a fit to highly accurate focal point energies,^{34} but malonaldehyde (9 atoms) is smaller than tropolone and the tunneling splittings are ∼20 times larger.^{60,61} The present PES is based on fitting to DFT electronic energies that as noted above underestimate the H-atom transfer barrier by almost 1 kcal/mol. Thus, even an “exact” calculation of the tunneling splitting would almost certainly be larger than the experimental value just due to the error in the barrier height. Such calculations of small splittings are possible using the ring-polymer instanton method, as illustrated for the formic acid dimer (FAD), where the even smaller splitting of 0.013 cm^{−1} was reproduced with remarkable accuracy^{62} using an accurate PIP PES for the formic acid dimer developed by two of us.^{63}

With the above caveat in mind, we did calculate the ground-state tunneling splitting for several fits using a fast 1D *Q*_{im}-path tunneling method.^{35,64} In this approach, the potential along the imaginary frequency normal mode of the H-atom TS is fully relaxed with respect to all other modes. In this way, a rectilinear reaction path is established. This path is shown in Fig. 4, and the potential is shown as a symmetric double-well. The details of this potential vary a bit depending on the fit. However, in all cases, the numerically exact eigenvalues of this potential were obtained, and the ground state splitting is then the prediction from this theory. The results, shown in Tables I–III above, vary from around 3 cm^{−1} to 5 cm^{−1}, depending on the fit. As expected, these are larger than the experiment at least due to the error in the barrier height. Of course, the 1D method also has errors. For example, for malonaldehyde, this method produced a tunneling splitting of 24 cm^{−1}, which is in error by around 3 cm^{−1} both from the experiment and essentially exact calculations. For FAD, the 1D method produced a splitting of 0.44 cm^{−1}, which again is larger than the experiment and the result from the ring-polymer instanton calculation using the same PES. Thus, it is likely that a calculation of the ground state splitting using that method would produce a result less than the few cm^{−1} reported here. With the new PESs available, hopefully, such a calculation can be done in the near future.

As noted already, the DFT method yields an electronic barrier that is 243 cm^{−1} lower than the CCSD(T) one. Thus, even an exact calculation of the tunneling splitting using a perfect PES fit would not (should not) produce quantitative agreement with the experimental splittings, 0.96 cm^{−1} and 0.05 cm^{−1} for H and D, respectively.

From the results given in Tables I–III, it is clear that there is a substantial percentage difference of the H-atom tunneling splitting among the PESs. These data provide a means to examine a possible correlation between the splitting and barrier height. We did this for H and also D-atom tunneling splitting using these PESs (see the supplementary material for methods). All calculations are done with the 1D *Q*_{im} path method, which produces semi-quantitative results, as discussed already. Figures SI-6 and SI-7 show the plots of the splitting vs barrier height for H and D, respectively, along with precise fits to the semi-classical model using a quartic potential to model the barrier. These fits can be used to extrapolate to the CCSD(T) barrier height. Doing so produces splittings of 2.3 cm^{−1} and 0.14 cm^{−1} for the H and D-transfer, respectively. An alternative fitting and extrapolation method is shown in Figs. SI-8 and SI-9 with similar results. These results are larger than the experiment by factors of 2.3 and 2.7, respectively. We find these results to be satisfying because the same PES is used for both sets of splittings (and extrapolations) and the factors in error with the experiment are almost the same. The overestimate of the splittings using the 1D *Q*_{im} method is consistent with previous rigorous tests of this method, discussed above. Hence, if more rigorous methods are used for the tunneling splittings, e.g., the ring polymer molecular dynamics instanton method,^{62,65} an extrapolation to the more accurate CCSD(T) barrier may well produce quantitative agreement with the experiment for both H and D-transfer.

While the above correlations between the splitting and barrier height look very reasonable, it must be acknowledged that this is a very simple correlation that perhaps “hides” other aspects of the barriers of the various PESs, e.g., the imaginary frequency of the saddle point. This frequency does change with the barrier height, and a plot of this relationship is shown in Fig. SI-10. The relationship in the range shown is nearly linear, and so extrapolation to the CCSD(T) barrier height can be done with some confidence. Doing so produces an imaginary frequency of 1389*i*. This is in semi-quantitative agreement with the result obtained from the direct CCSD(T) optimization, which is 1423*i*. One final point about this correlation between the barrier height and imaginary frequency is a cautionary one for tunneling calculations. Changing the barrier height from, say, a low level calculation to high level one without the consideration of the change in imaginary frequency may introduce an error that could possibly be avoided.

Next, we briefly address the interesting question of how the splitting varies with vibrational excitation. This has been examined experimentally for excitations up to around 1500 cm^{−1}.^{46} Even with a realistic full dimensional PES, as developed here, the calculation of the effect of excitation of tropolone vibrational modes on the splitting is a major challenge. We developed a simple projection model to predict mode-specific effects in tunneling splittings and applied it successfully to malonaldehyde.^{35} A similar model termed sudden vector projection (SVP) was developed independently by Guo and co-workers and aimed at predicting mode-specific effects in chemical reactions.^{66} In these models, normal modes of the molecule are projected onto the imaginary frequency normal mode of the transition state, and those modes with large projections are predicted to have large positive effects to enhance tunneling in the former case and in reactivity in the latter cases. We performed this projection analysis using the present PES for all the normal modes of tropolone. The results are given, along with the depiction and frequency of these modes, in the supplementary material. Among these, we single out modes 3 and 4. The projections of these two modes are roughly 0.19 and 0.18, respectively, whereas for other modes in this spectral range, the projections are zero or near zero, except for mode 7, which has a projection of 0.18 magnitude. Modes 3, 4, and 7 in our ordering scheme (simply by increasing harmonic frequencies) correspond to modes 36, 35, and 33, respectively, in the experimental scheme, which is in terms of decreasing experimental frequency.^{46} The measured tunneling splittings for the fundamental of modes 36 and 35 are 5.1 cm^{−1} and 1.3 cm^{−1}, respectively. The experimental splitting for mode 33 was not reported; however, by visual examination of normal modes, it was predicted to enhance tunneling.

Normal modes 3 and 4 are depicted in Fig. 5; this figure also shows that they do involve O atom and the transferring H atom motions. Thus, pictorially at least, it is clear that these are H-transfer promoting modes.

The *Q*_{im}-path model (unlike the SVP one) can make quantitative predictions of the effects of mode excitation and that was done successfully for malonaldehdye.^{35} Doing so here is beyond the scope of this paper; however, we plan to do this in the future for the complete and extensive set of experimental data.^{46}

We note that while our paper was under review, we became aware of a one-dimensional, semi-classical VPT2 calculation of the tunneling splittings of H and D transfer for tropolone.^{67} The key quantities in that approach were obtained using the MP2 level of theory with large basis sets and that the MP2 barrier height was replaced by a CCSD(T) one at the MP2 saddle point. Agreement with the experiment for H transfer was excellent but off by a factor of 2 for the D transfer.

### D. Recommended PES

We now return to Tables I–III to determine which of the fits might be best, with emphasis on the H-transfer dynamics. The fits using fragmentation are less accurate but only slightly faster than the pruned 1-(15) basis with 14 843 polynomials. The most accurate surface, not surprisingly, is the full-symmetry surface with 25 103 coefficients. The fit using the smallest dataset (2717 geometries, 15 000 cm^{−1} energy cutoff) gives very good wRMSE values and a close fit to the optimized barrier (at 2131 cm^{−1}), but the low cutoff energy and a sparse number of geometries suggest that a fit to a larger dataset might have fewer problems and be of more general use. In examining the 6601 geometry dataset with DMC, we noted that there were some “holes” in the surface, which were fixed by adding potentials and gradient components at a modest number of new geometries; this new dataset has 6768 geometries. Finally, the calculation of the barrier was closest to the optimized value when we under-weighted the gradient components in the fit by a factor of 3 relative to the potential energies. Thus, the most accurate fit is the one with 25 103 coefficients using the 6768^{*} dataset.

Unfortunately, this accurate surface is also the slowest to calculate, particularly for molecular dynamics simulations, where the calculation of the gradient components dominates the time. The corresponding 14 843 coefficient version using the 6768^{*} dataset is twice as fast for this purpose with nearly the same accuracy with regard to the TS(H) barrier, the splitting, and the wRMSE values. The most accurate surface and the 6768 geometry dataset are provided in the supplementary material, and both surfaces and the dataset are available from the authors.^{68}

## V. SUMMARY AND CONCLUSIONS

We reported permutationally invariant polynomial (PIP) fits to energies and gradient components for 15-atom tropolone. These include standard, pruned, augmented, and fragmented PIP bases. Approximately, 6600 energies and their associated gradient components are obtained from direct-dynamics calculations using DFT/B3LYP/6-31+G(d) supplemented by grid calculations spanning an energy range up to roughly 35 000 cm^{−1}. Three fragmentation schemes were investigated with respect to efficiency and fit precision. Properties such as stationary points, harmonic frequencies, and the barrier to H-atom transfer are reported and compared to direct calculations. A 1D model to obtain the tunneling splitting for the ground vibrational state was applied to each fit, and the splittings were found to be in the range of 3–5 cm^{−1} for H transfer and 0.25–0.5 cm^{−1} for D transfer, in reasonable comparison to the experiment (0.96 cm^{−1} and 0.05 cm^{−1}), given the quality of the electronic structure methods. Based on new CCSD(T)-F12/aVDZ calculations, the barrier for H transfer for the DFT-based PESs is probably low by about 1 kcal/mol, and this fact is at least partially responsible for the overestimate of the experimental splitting. Extrapolation of the splittings to the CCSD(T)-F12/aVDZ barrier height gives values that are in better agreement with the experiment but are still 2–3 times too high.

The fits used five basis sets and five datasets and employed inverse energy weighting and an adjustment of the weights between the *ab initio* potentials and gradient components. The most accurate fit was the 1-(15) full symmetry basis with 25 103 coefficients, but it was also the slowest. Pruning this basis to 14 843 coefficients improved the speed by a bit more than a factor of 2 while maintaining most of the accuracy. Fits using the two fragmentation schemes, involving either two or four fragments, were not as accurate as the two full-symmetry fits, but slightly faster. The largest dataset with 6601 geometries (later augmented to 6768) provides the most stable and accurate surface. It is clear that most of the data required for fitting the large number of coefficients are the result of the gradient components. However, this strong reliance on the gradients adversely affects the calculation of the energies, in particular, the energy of the barrier to H-atom transfer. A significant improvement in the fit for this barrier was obtained by de-emphasizing the gradients in the fit by dividing their weight by a factor of 3. Although we explored this option only for the most accurate two fits, it is likely to be effective in other systems where most of the *ab initio* data are given by the gradient components.

The prospects for extending these methods to somewhat larger molecules are good, but because at least 10^{4} coefficients will be needed, it is clear that gradient components as well as potential energies must be used in the fit. While these are readily available from DFT methods, they are much more costly for CCSD(T) calculations. One 15-atom molecule, acetyl acetone, is being studied at this level by our group and that of Meuwly^{31} with promising results.

Another challenge will be to apply the basis set techniques used here, including fragmentation and pruning/adding of polynomials, to reactive systems, especially molecules with significant re-arrangements. Bonds that are long in one conformation may become short in others. It is clear that rigid systems will be easier to handle than floppy ones. Dissociative systems are also of interest. In order to describe the dissociation, Morse variables involving bonds between dissociation products must be kept in the basis set even when they become long, at least until the products are independent of one another. Determining which Morse variables can be neglected becomes more problematic in these systems. Nonetheless, it seems possible, for example, to reduce the basis set of a dissociating molecule by choosing fragments that either (a) remain in one of the products or (b) are the simplest fragments that represent the dissociation. Thus, a parent molecule composed of four multiatom units, A–B–C–D, and dissociating to A–B + C–D might have a basis set composed of fragments A–B, B–C, and C–D, which would still allow partial or complete neglect of Morse variables between atoms of A and C, A and D, and B and D. In the end, of course, the choice of basis polynomials and the *ab initio* dataset are closely linked. If the dataset is chosen, it is easy to examine it to see which Morse variables are most and least important. However, if the dataset is subsequently extended, say, to examine a particular conformer or product, the choice of the most or least important Morse variables must be re-evaluated. A hurdle at present is that the path for examining multiple basis sets for efficiency is still a crooked one. A promise is that the preliminary results are encouraging and that there is room for creative improvements.

## SUPPLEMENTARY MATERIAL

See the supplementary material for (I) weighting of *ab initio* data, (II) duplication of *ab initio* data (Fig. SI-1), (III) CCSD(T)-F12/AVDZ results (Tables SI-1 and SI-2), (IV) Morse variable values for the 6768 dataset (Table SI-3), (V) normal modes and projections of these onto the imaginary frequency normal mode (Figs. SI-2 and SI-3), (VI) correlation between *ab initio* energies and fit energies (Figs. SI-4 and SI-5), and (VII) fit and extrapolation of vibrational splitting as a function of barrier height: method for a quartic potential, alternative methods, results for H atom transfer, and results for D atom transfer (Figs. SI-6–SI-9). Extrapolation of imaginary frequency with barrier height is shown Fig. SI-10. It also includes a zip file containing the fitting program using the 6768 dataset with reduced weights for gradient components and a program for calculating potentials and gradient components using the full-symmetry/25103 basis set.

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.

## ACKNOWLEDGMENTS

J.M.B. thanks the Army Research Office (DURIP Grant No. W911NF-14-1-0471) for funding a computer cluster where most of the calculations were performed.