Molecular Hessian matrices from a machine learning random forest regression algorithm

In this article we present a machine learning model to obtain fast and accurate estimates of the molecular Hessian matrix. In this model, based on a random forest, the second derivatives of the energy with respect to redundant internal coordinates are learned individually. The internal coordinates together with their specific representation guarantee rotational and translational invariance. The model is trained on a subset of the QM7 data set, but is shown to be applicable to larger molecules picked from the QM9 data set. From the predicted Hessian it is also possible to obtain reasonable estimates of the vibrational frequencies, normal modes and zero point energies of the molecules.


I. INTRODUCTION
The molecular Hessian matrix, namely the matrix of second derivatives of the potential energy with respect to nuclear positions, has an important role in quantum chemistry.It describes the curvature of the Potential Energy Surface (PES), and the knowledge of it is important for geometry relaxation as well as for the determination of transition states 1,2 .In the context of molecular geometry optimization, all quasi-Newton methods require the knowledge of an approximate Hessian matrix, as an inexpensive initial guess for the first step.The knowledge of the Hessian matrix also helps to interpret infrared (IR) spectra of molecules: in the harmonic approximation, the vibrational frequencies and the vibrational normal modes are calculated from the eigenvalues and eigenvectors of the mass-weighted Hessian matrix [3][4][5] .
In general, molecular vibrations are motions of the whole molecule, but some normal modes are strongly localized on particular functional groups.For instance, bond stretching and some angle bending modes often involve only few atoms.Dihedral torsions, on the other hand, are more collective in character and are often mixed with other degrees of freedom in low-frequency collective mode, making it more difficult to identify them.The frequencies of localized vibrations depend on the atom species involved, on the bond order which connects the atoms, and to a smaller extent on the chemical environment.These frequency changes very little among different molecules, and tabulated values are available for them 6 .
The ab initio calculation of the Hessian is computationally expensive and therefore several methods to approximate the Hessian have been proposed.For geometry optimization, the Hessian is often calculated using approximate quantum mechanical methods [7][8][9] or approaches based on force fields expressed in internal coordinates [10][11][12] .To approximate the Hessian matrix in internal coordinates is particularly convenient, because a) Electronic mail: giorgio.domenichini@univie.ac.at b) Electronic mail: christoph.dellago@univie.ac.at they are rotationally and translationally invariant, and because they are less coupled than Cartesian coordinates (the Cartesian Hessian may have many non-diagonal elements comparable in magnitude with the diagonal ones).
An alternative way to compute an approximate Hessian is to use Machine Learning (ML), which is currently having a significant impact in many ares of computational chemistry. 13,14Many molecular properties can be predicted directly [15][16][17][18][19][20][21][22][23] from ML models and recent works showed that it is possible to predict the shape of a PES using Redundant Internal Coordinates (RICs). 24,25 common way to calculate the Hessian with ML methods is by differentiation of the potential energy expressed using kernel in Kernel Ridge Regression (KRR) models [26][27][28][29] , or neural networks. 30However, recent methods have been developed for the direct prediction of vibrational spectra 31,32 , and Hessian matrices, but not across chemical space . 33,34 this paper we present an alternative ML approach to predict directly, element by element, the Hessian matrix in RICs, across chemical compound space. 35The central idea of the method is to learn the elements of the Hessian using a random forest of decision trees.Trained on Hessians computed quantum mechanically for subset of small molecules (up to 7 non-hydrogen atoms) extracted from the QM7 dataset 36,37 , the model succeeds in predicting vibrational frequencies and Zero point vibrational energies (ZPE) for some larger molecules, sampled from the QM9 38,39 dataset (molecules up to 9 non-hydrogen atoms).
The remainder of this article is organized as follows.In Section II we briefly summarize the definitions of RICs, and the transformations of the Hessian between coordinate systems.We also present a new set of coordinatespecific representations that can be used to train a Random Forest Regression (RFR) model for the diagonal elements of the Hessian matrix.In Section III we validate the prediction for the diagonal Hessian elements, as well as for some of the most important non-diagonal elements.Some conclusions are provided in Section IV.

A. Redundant internal Coordinates
The choice of internal coordinates is not unique, and many types of internal coordinates have been developed, such as the Z-matrix 40 , the normal coordinates 41 , the natural internal coordinates developed by Pulay and coworkers [42][43][44] , the delocalized internal coordinates developed by Baker et al. 45 , and others. 2he redundant internal coordinates introduced by Schlegel 46 , which we use in this work, describe the geometry of a molecule in terms of bonds, bond angles and dihedrals, which can be easily and uniquely defined.The number of internal coordinates exceeds the number of degrees of freedom of the molecule.This redundancy, however, guarantees a complete description of molecular distortions.In the context of geometry optimization, RICs are commonly used to build the initial guess for the Hessian matrix. 1,10,47

Definition
The RICs used in this work are defined as follows 48 : 1.A bond I-J is established if the distance between two atoms I and J is shorter than 1.3 times the sum of their covalent radii.
2. An angle I-J-K spanned by atoms I, J and K is established if atom I and J as well as J and K are bonded and the angle is larger than 45°.
3. A dihedral is established from two consecutive angles I-J-K and J − K − L. If one of the two angles is 180°, i.e., three atoms lie on a line, they are used as a new base for a dihedral (I-J = K = L − M ).
Molecules with dihedrals defined using 5 atoms (e.g.I-J = K = L − M ) will be excluded from the dataset of this article, because they generate inconsistencies in the machine learning representation.A separate treatment would be required for them.

Derivatives expression
In this work, Hessians were calculated using DFT in Cartesian coordinates, then transformed to RICs to be used as training data for the ML algorithm.The ML model predicts Hessians in RICs, which have to be transformed back to Cartesian coordinates, to calculate vibrational frequencies and zero point energies.[50] For a given set of Cartesian coordinates x and RICs q, the Wilson B matrix, 51,52 is defined as follows: The transformation of the gradient from internal (g q ) to Cartesian coordinates (g x ) is then given by: Since in general B is not a square matrix, we define for B a right pseudo-inverse matrix B inv , The transformation of the gradient from Cartesian coordinates back to RICs is: Defining B ′ as the derivative of B with respect to the Cartesian coordinates, the transformation of the Hessian matrix H from RICs to Cartesian coordinates can be written as: Accordingly, the back transformation from Cartesian coordinates to RICs reads

B. Representations
It is a convenient choice to predict the Hessian in RICs element by element using a specific local representation for every matrix element of interest.Since predicting all elements of the Hessian can be a challenging task, we will focus only on some of the most important terms while we set to zero the less relevant elements.The diagonal elements are the most important ones and they will be treated in detail, but we note our machine learning method can be extended also to non-diagonal terms (see Sec.III B).
Every diagonal element of the RIC Hessian corresponds to the second derivative of the energy with respect to an internal coordinate (bond, angle or dihedral).For each internal coordinate we construct a specific representation describing it and its chemical environment.These representations contain lists of molecular parameters, such as nuclear charges, bond lengths, bond orders, bond angles, and dihedrals.
Inside the representations, nuclear charges and bond lengths are expressed in atomic units (proton charges and Bohr), bond angles and dihedrals are expressed using the periodic form 1 + cos(α), which prevents a discontinuity at α = π, similarly to previously developed symmetry functions. 53,54Bond orders were calculated using the formula from Ref. 55 , which for a closed shell system has the form where he indices I and J refer to atoms, P is the monoelectronic density matrix and S is the atomic orbital overlap matrix.In order to improve precision, we imposed a model selection, prior to the actual regression, based on two criteria.The first is a classification based on the type of atoms which define the coordinate, e.g., to predict the C-O bond component of the Hessian, we use a model trained only on other C-O bonds.The second classification is whether the internal coordinate is part a ring system.Different machine learning models are associated with internal coordinates which have or have not two or more atoms belonging to the same ring system (we call them ring coordinates and acyclic coordinates, respectively).

Bond-bond elements
To represent the bond between two atoms I and J correctly, it is important to create a rule to label I and J uniquely.The choice was to label I and J according to their: 1. Nuclear charge: Z I ≥ Z J .
2. Hybridization: if Z I = Z J , I must have an s character higher than J.
3. Nuclear charges of the bonded atoms: if I and J have the same nuclear charge and hybridization, the atoms bonded to I should have a higher nuclear charge than the ones bonded to J.
The representation is a 26-element vector, which collects the parameters of I −J and the spatial arrangement of the atoms bonded to I and J.The first two elements of the representation are the bond length and order of I − J.After that, for every atom I n bonded to I and different from J the following parameters are added: 1. its nuclear charge, 2. the bonding order with I, 3. the distance from I, and 4. the width of the angle I n − I − J with I and J Then, the analogous parameters are added or every atom J n bonded to J. The atoms I n (J n ) are included in decreasing order of nuclear charge and bond order with I (J).Since molecules in the QM7 dataset do not have atoms with more than 4 bonds, n has values form 1 to 3; if I or J have less than 4 bonded atoms, null values are inserted to maintain the same representation size.

Angle-angle elements
In order to give a unique representation to the angle I −J −K, the atoms I and K need to be labelled uniquely according to the following criteria: For an atom J n bonded to J5 (different from I,K), the quantities are: the nuclear charge of J n ,the bond order of J − J n ,the distance J − J n ,the angle Finally, for an atom K n bonded to K (different from J), are added to the representation: the nuclear charge of K n , the bond order of K − K n , the distance K − K n , the angle J −K −K n , and the dihedral I −J −K −K n .Since each atom has forms no more than four bonds, there are at most eight atoms bonded to I, J or K and in total the list includes at most 40 elements, which together with the previous 6 elements gives a representation of length 46.(Note, that if there are less then 8 atoms bonded to I, J, or K the remaining elements are filled with zeros.) If K is a hydrogen atom, it does not have any bonded atoms (K n ) other than J and the representation defined above would have many zeros.To compensate for this lack of descriptors, we include the five parameters of the lists above for the atoms J1 n bonded to J 1 as shown in the second panel of Fig 2 .If both K and I are hydrogens, we include the five parameters of the lists above for the atoms J1 n bonded to J 1 , and also for the J2 n bonded to J 2 .
FIG. 2. The representations for the angles I − J − K is built listing the above mentioned parameters for the atoms In,Jn,Kn (highligheted in light blue), bonded to I,J, or K.
In the benzene molecule, where K is a hydrogen, the Kn are replaced in the representation by the J1n bonded to J1.

Dihedral-dihedral elements
The representation of a dihedral, used to predict the diagonal dihedral-dihedral terms of the Hessian, was built in a similar way to what was done for bonds and angles.The label ordering of the dihedral I − J − K − L was chosen such that Z J ≥ Z K , and if Z J = Z K , than Z I ≥ Z L .In case of a symmetrical atom arrangement, where Z J = Z K and Z I = Z L , it is imposed that the bond orders O IJ > O KL .The representation for a dihedral consists 65 elements.The magnitude of the dihedral, the bond lengths and bond orders for the bonds I − J,J − K and K − L, as well as the angles I − J − K and J − K − L constitute the first 9 elements.Then other 6 elements describe how and if the dihedral is part of a ring system.Finally, 50 more elements are given by all the up to 10 atoms bonded to I, J, K or L, of which we list nuclear charges, the orders and lengths of the bonds to I, J, K or L, the angles and the dihedrals formed with I, J, K or L.
For further clarification on the definition of the representations the reader is encouraged to consult the code and the dataset published on Zenodo 56 .

C. Random Forest of decision trees
The machine learning was performed using a random forest regression model [57][58][59] , as it is implemented in SciKitLearn 60 .][63] We found that in our case random forest regression performs better than neural networks and kernel ridge regression models.In fact, random forests allow more flexible representations and usually produce accurate results even for small training set sizes.
Training a random forest is also fast since the creation of every individual decision tree can be performed in parallel.In general, more decision trees in the forest lead to better (until the model saturates), but more expensive predictions.
SciKitLearn default hyper-parameters were used: those impose no restrictions on the sizes of the leaves, on the depths trees, or on the number of features chosen to perform a split; each forest contained 100 decision trees (estimators), trained with bootstrapping.
We chose then 100 decision trees as a compromise between precision and computing time.In fact, the Mean Absolute Error (MAE) for 100 decision trees is higher than the MAE for 1000 decision trees by approximately 1% (see SI), but the predictions are ten times faster.

D. Computational Details
All quantum mechanical calculations were performed using PySCF 64 .DFT calculations were executed at the B3LYP 65,66 level of theory, with the Dunning's correlation consistent cc-pVDZ atomic orbital basis set 67 .Geometry relaxation, RICs selection, and coordinate transformations were performed using a locally modified version of PyBerny 68 .Machine learning was performed using SciKitLearn 60 .All through our work were used Python 69,70 3.8 scripts, together with Numpy 71 , Scipy 72 and Matplotlib 73 libraries.Molecules were plotted using RdKit 74,75 and Py3DMol 76 , a Python interface to the Java-based 3Dmol 77 .Molecular IUPAC names were generated using Stout 78 .

A. Cross validation of the Hessian's diagonal terms
The dataset for training and testing was built from the 6810 QM7 molecules, whose dihedrals are defined only by consecutive atoms (Section II A 1).The structures of this subset were relaxed and the Hessians were calculated at the B3LYP/cc-pVDZ level of theory.The molecules were then distributed randomly between training set and the test set.Since internal coordinates of the same molecule are in general not independent quantities, it is important that the train-test splits are performed on molecules and not on individual coordinates, so that coordinates of the same molecule either all belong to the training set, or they all belong to the test set.The internal coordinates were classified according to their chemical composition and to the inclusion in a ring system (as explained in section II B).For each of these groups, different machinelearning models were trained independently.In an ideal machine learning model the learning curves 79,80 should exhibit, in a log-log plot, a linearly decaying trend.In our cases, this linear trend is approximated very well for bonds; for angles and dihedrals there is a small deviation, but still learning is achieved throughout the chosen training set.
Diagonal elements of the Hessian predicted by the machine learning models for all bonds, angle and dihedrals are compared to the reference data in Figure 4.As can be inferred from the figure, the accuracy is higher for bonds than for angles and and particularly for dihedrals.For bond derivatives the only outlier is a single C-C bond inside a conjugated atom chain.The reason is that the machine learning model treats this bond as a single bond while the true value of this Hessian element is closer to that of a double bond.
The performance of our method was estimated for all coordinates and all atomic element combinations within our QM7 dataset.The MAPEs resulting from ten traintest splits are summarized in Figure 5.
The prediction error is around 2% for bond-bond, around 5% for angle-angle, and around 10% for dihedraldihedral terms.Bond-bond Hessian elements can be predicted more accurately, probably because bond stretching is a more localized motion than angle scissoring or dihedral torsion, which can be associated with wider molecular motions and may depend more strongly on the nonbonded atomic surroundings.For similar reasons, the most accurate predictions are those for bonds between hydrogens and a heavier element (C, N, O), the MAPEs of which are below 0.2%.
Hessian elements related to C-N, C-C, and C-O bonds are also well predicted (with an error of about 0.5% if not inside a ring), because they occur frequently in the training set.
On the contrary, the biggest errors were made for S-C, O-N, and N-N bonds, because their occurrence in the training set is limited.
Also for angles and dihedrals, those including hydrogen atoms have the lowest prediction errors.In fact, the rotation of a hydrogen atom is usually not correlated with other molecular motions and thus it is more consistently repeated in chemical space.The highest errors were obtained for dihedrals and angles which have a limited number of occurrences in the dataset, mostly involving S, O, and N angles.

B. Non-diagonal terms
While the non-diagonal elements of the Hessian are usually small and often assumed zero in many approximations 10 , some of them can be successfully predicted using the machine learning framework presented in this article.We applied machine learning to some of the most important non-diagonal elements, assuming that the geometrical proximity of the internal coordinates is correlated with the magnitude of the Hessian term.In particular, we predicted the following terms of the Hessian: consecutive (sharing one atom) bond-bond terms; included (the bond is a side of the angle) and adjacent (sharing one vertex) bond-angle terms; adjacent (sharing one side and the vertex), consecutive (sharing one side), and opposite (sharing the vertex) angle-angle terms; and the mixed terms between a dihedral angle and the bonds (internal and external) between the atoms which defining the dihedral.The combinations of internal coordinates, for which we predicted the non-diagonal Hessian elements, are schematically shown in the various panels of Fig. 6.For each type of these non-diagonal terms we constructed a specific representation, built from the enumeration of the type and the positions of the atoms surrounding the involved internal coordinates, similar to what was done for the diagonal terms in Sec.II B. As for the diagonal terms of the Hessian, coordinates pairs are now classified according to the chemical elements which constitute them, so that different chemical elements are predicted with different machine learning models.We refer the reader to the code published on the Zenodo repository for a detailed definition of the representation 56 .The scatter plots shown in Figure 6 are for one train test split of our data.In the figure, data obtained for every possible combination of atomic species, are shown together.Values are expressed in atomic units (a.u.), which correspond to Hartree/Bohr pairs involved.
As can be inferred from Figure 6, the correlation between true (calculated) and predicted values is particularly good for panels a, b and c, which are related to bond-bond and bond-angle terms.The errors are largest in the prediction of consecutive angle-angle terms (panel e), and bond-dihedral elements (panels g and h).All of these non-diagonal elements are small in magnitude and can be zero.For this reason, instead of the percentage error, we report in Table I the mean absolute error (MAE), averaged over 10 train test splits.This error is on the order of 10 −3 a.u., which is small compared to the range ±0.1 a.u. of possible values shown in Fig. 6.
C. Prediction of QM9 Hessians FIG. 7. Vibrational frequencies of molecule Nr. 10,000 of the QM9 dataset predicted by ML (solid red line) and calculated with B3LYP-DFT (dashed black line).The frequencies were convoluted with a Gaussian with fixed height and standard deviation (equal to 32cm − 1).In the figure are also depicted some selected normal modes of the ML predicted (green arrows) and of the calculated (grey arrows) frequencies.
Given the locality of our representation, it is possible to predict the Hessian matrix for molecules of any given size.While we trained the machine learning model on the QM7 data set, we will now test it by predicting the Hessian for molecules of the QM9 dataset.As an example, we picked the molecules of QM9 whose index is a multiple of 10,000, from molecule 10,000 to 130,000.These 13 molecules possess 8 or 9 heavy atoms, so they are bigger than any molecule in the training set.
At first, we optimized the molecular geometries using the B3LYP/cc-pVDZ level of theory, and calculated the Hessians explicitly, in order to compared them with the ML predictions.For the ML Hessian expressed in RICs, we calculated all the diagonal elements as well as the non-diagonal elements listed in section III B. The transformation from internal (H q ) to Cartesian (H x ) coordinates was performed according to Equation 7.After the transformation to Cartesian coordinates, it is possible to calculate the expected harmonic vibrational frequencies, as the eigenvalues of the mass-weighted Hessian matrix while the corresponding eigenvectors correspond to the vibrational normal modes 4,5,81 (Eq.9).

H MW
x I x J := Here, x I is a Cartesian coordinate of atom I and m I is its atomic mass.
While the simulation of an IR spectrum should include the explicit calculation absorption intensities and peak broadenings [82][83][84] , here we applied a Gaussian convolution with a fixed intensity and broadening to graphically compare the calculated and the ML-predicted vibrational frequencies.As an example, Fig. 7 shows a comparison between the harmonic frequencies calculated using B3LYP with the cc-pVDZ basis set and the ML predictions for molecule Nr 10,000 of QM9 ((1-cyano-2-oxoethyl)formate) .From the cross-validation scores, shown in section III A, it is no surprise that the frequencies associated with bond stretching are predicted fairly accurately by the machine learning method proposed.Around 3000 cm −1 one clearly recognizes the C-H stretching modes, which differ from the calculated ones by less than 5cm −1 .The vibrational stretchings of the cyano group(2360 cm −1 ) and the carbonyl groups(1840-1890 cm −1 ), as well as the scissoring of the carbonyl O=C-H angles, are predicted with an error lower than 10cm −1 .For lower frequency vibrations the predictions are not as accurate for mainly two reasons: the error in the prediction of Hessian elements associated with angles and dihedrals is larger than for bonds, and, more importantly, such vibrations are strongly non-local, and involve larger parts of the molecule.It is not easy to predict these vibrations using just a local representation.
Within the harmonic approximation, the sum of the vibrational frequencies is the vibrational zero point energy (ZPE) of the molecules ZP E = (1/2)ℏ i ω i .This quantity represents the ground state vibrational energy of a molecule and can be easily computed from both the calculated and the ML predicted Hessian.In table III C we summarize the results for the 13 QM9 molecules analyzed: the ML approximations always underestimate the ZPE with an error which ranges from 1.3% to 14.5%.

IV. CONCLUSION
We proposed a machine-learning based method to approximate diagonal as well as non-diagonal elements of the Hessian of a molecule.The representation used is specific for every internal coordinates, and takes explicitly into account the bond order, which is sensible because a single point DFT calculation is computationally considerably less expensive that the explicit calculation of the Hessian.We trained our ML model on a relatively small dataset (subset of QM7) of less than 7000 molecules.The Hessian was computed at the B3LYP/cc-pVDZ level of theory.The agreement between ML and DFT was satisfactory.In particular, the calculated MAPE for bond stretching force constant was below 2%, and was particularly small for bonds involving hydrogen atoms because they point outwards and are less affected by the chemical environment.The MAPE for bending and torsion was of 5% and 10%, respectively.From the ML model trained on QM7 we were also able to predict the Hessian of some molecules representative of the QM9 dataset.The Hessian predicted in internal coordinates was then transformed into the mass-weighted Cartesian Hessian, the diagonalization of which yields the harmonic vibrational frequencies and normal modes, that can be compared to the ones calculated explicitly from DFT.
High frequency vibrations and normal modes were predicted accurately, while lower frequency ones were not.This behaviour is analogous to the IR spectroscopy theory, where stretchings and bendings can be identified accurately, while torsion and delocalized vibrations are more difficult to be interpreted.
The approximate Hessian obtained with ML is computational inexpensive and can be used as an initial Hessian guess for geometry optimization, or in the context of alchemical geometry relaxation [85][86][87][88] .A good starting Hessian may speed up the convergence of the geometrical optimization.An in detail assessment of the performance of the ML Hessian proposed is not yet pro-vided, but should carefully take into account many parameters on which the optimization depends, e.g. the type of molecule, the initial geometry, the optimization algorithm, and the Hessian update scheme.

FIG. 1 .
FIG. 1.The representation for the I −J bond, used to predict the bond-bond term in the Hessian, is built from listing bond lengths (L), bond orders (O), atom charges (Z) and angles (W), for I, J and all atoms In and Jn bonded to I and J

1 . 4 .
Nuclear charge: Z I ≥ Z K 2. Bond order with J: O IJ > O JK 3. Hybridization: I must have an s character higher than K Nuclear charges of the bonded atoms: the atoms bonded to I should have a higher nuclear charge than the ones bonded to K Each angle representation consists of 46 elements.The first elements are the magnitude of the angle, the bond lengths I −J, J −K, and the bond orders O IJ ,O JK ,O IK .After these six parameters, for every atom I n ,J n ,K n bonded to I,J, or K, respectively, the several other quantities are listed.More specifically, for an atom I n bonded to atom I (and different from J) the list include 1. the nuclear charge of I n 2. the bond order of I − I n 3. the distance I − I n 4. the angle J − I − I n 5. the dihedral K − J − I − I n

FIG. 3 .
FIG.3.Learning curves for bonds, angles and dihedrals between carbon atoms not included in a ring.The data come from the average of ten train-test splits of our dataset for various training set sizes and a fixed test set size of 400 molecules.

FIG. 4 .
FIG. 4.ML predictions vs. the calculated values of all diagonal Hessian elements, plotted for one particular traintest split.

FIG. 5 .
FIG. 5. MAPEs for Hessian elements belonging to bonds, angle and dihedrals for different combination of elements.Each MAPE is obtained as average over 10 different train-test splits (fixed 3:1 size ratio).Only MAPEs for coordinates that appear at least 100 times in the dataset are shown.

FIG. 6 .
FIG. 6. Predicted vs. calculated non-diagonal terms of the Hessian for different combinations of internal variables: a) consecutive bond-bond, b) included bond-angle, c) adjacent bond-angle, d) adjacent angle-angle, e) consecutive angleangle, f) opposite angle-angle, g) external bond-dihedral, and h) internal bond-dihedral.The scatter plots shown were obtained for a single train test split of our dataset.

TABLE II .
Vibrational ZPE for 13 QM9 molecules whose indices are multiples of 10,000 predicted by ML predictions and calculated explicit with B3LYP-DFT.Energy values are expressed in atomic units (milli Hartree ).