The effect of the presence of Ar on the isomerization reaction HCN ⇄ CNH is investigated via machine learning. After the potential energy surface function is developed based on the CCSD(T)/aug-cc-pVQZ level *ab initio* calculations, classical trajectory simulations are performed. Subsequently, with the aim of extracting insights into the reaction dynamics, the obtained reactivity, that is, whether the reaction occurs or not under a given initial condition, is learned as a function of the initial positions and momenta of all the atoms in the system. The prediction accuracy of the trained model is greater than 95%, indicating that machine learning captures the features of the phase space that affect reactivity. Machine learning models are shown to successfully reproduce reactivity boundaries without any prior knowledge of classical reaction dynamics theory. Subsequent analyses reveal that the Ar atom affects the reaction by displacing the effective saddle point. When the Ar atom is positioned close to the N atom (resp. the C atom), the saddle point shifts to the CNH (HCN) region, which disfavors the forward (backward) reaction. The results imply that analyses aided by machine learning are promising tools for enhancing the understanding of reaction dynamics.

## INTRODUCTION

Chemical reactions are dynamical processes in which atoms move from one configuration to another. Because the motion of atoms is governed by quantum mechanics, or its classical approximation, a chemical reaction can be understood as a solution to mechanical equations of motion. A molecular system may begin in a given initial state, and its evolution over time is determined by the rules of mechanics. Studies in the field of reaction dynamics^{1} conducted over the last few decades have led to precise state-to-state understandings of chemical reactions. Examples of experimental evidence supporting the importance of dynamical viewpoints in studying chemical reactions include, but are not limited to, the control of reactions by selecting initial rotational and/or vibrational states of molecules,^{2–7} selectivity in biochemical synthesis,^{8} and reaction control via strong laser fields.^{9–16}

A notable finding achieved in the study of reaction dynamics is the existence of “reactivity boundaries”^{14–43} in the phase space that describes atomic motions. In classical mechanics, owing to the deterministic nature of the equations of motion, the initial positions and momenta of the atoms contained in a system uniquely determine whether the reaction will occur or not. Therefore, the phase space, an abstract space that is spanned by all the atomic positions and momenta as coordinates, is divided into two domains: one is the set of all “reactive” initial conditions that lead to a reaction, and the other is the set of all “non-reactive” initial conditions. Between these two domains lies a boundary, which is called a *reactivity boundary* in this study. It is also worth mentioning that quantum mechanical versions of reactivity boundaries have also been developed through various studies.^{15,16,41}

In this study, the reactivity of the isomerization reaction HCN ⇄ CNH is investigated in the phase space under the influence of one argon atom as a buffer gas. This isomerization reaction has long been drawing interest as one of the simplest isomerization reactions, as well as for applications in interstellar chemistry^{44} and as a prototypical system for studying the phase space structures of chemical reactions.^{27,28} In gas-phase chemistry, the buffer atoms act as energy sources that activate target molecules through collisions. As the density of the system increases, they become solvents, influencing the reaction by changing the potential energy landscape experienced by the reacting molecules. The HCN–Ar cluster studied here may be regarded as a first-step model for investigating the solvent effect.

Recently, machine learning (ML) has attracted substantial attention in various fields ranging from business to the basic sciences, as studies have demonstrated that computers can solve pattern recognition problems, as humans do, via machine learning.^{45–48} Applications of machine learning to solve molecular problems are also being actively explored,^{49–51} in particular the prediction of rate constants^{52} and state-to-state cross sections.^{53,54} The performance of machine learning, which is known to occasionally outperform human recognition, would contribute to the evolution of computational analyses to improve the current understanding of molecular phenomena. Importantly, in some cases, machine learning can be used to extract essential dominant factors from data^{55–58} without resorting to any prior knowledge of physical laws.

In this study, a systematic method is presented for the analysis of the reaction dynamics to reveal the effects of the Ar atom, seen as a fundamental model for buffer gas or solvent, on the reactivity of HCN. The focus is given to the dynamics occurring in the vicinity of the saddle point lying on the potential energy surface (PES) between the reactant and product wells. The dynamics in this region is crucial to the branching of trajectories into the product and reactant wells. To explain it, the process of chemical reaction can be, for simplicity, viewed as three consecutive steps. First, the system in the reactant well is excited and climbs up the PES to reach the vicinity of the saddle point. Then, the dynamics in the region around the saddle point determine the bifurcation into the reactant or product well. Finally, the system falls into the product well (or back into the reactant well) and relaxes there. In the present work, we focus on the second step, that is, the dynamics occurring in the vicinity of the saddle point that determine the branching into the product or reactant well. The saddle region dynamics has been extensively investigated from the viewpoint of phase space geometry^{14–43} and, at least under certain conditions, it has been proved that there exist clear boundaries between reacting and non-reacting trajectories. The present work investigates how machine learning (ML) and accompanying analyses can extract such insights from trajectory data without prior knowledge of dynamics theory. Random forest (RF) classifiers learn the final state of the reaction, that is, whether the system ends in the HCN state or in the CNH state, as a function of the initial positions and momenta of the four atoms (H, C, N, and Ar). The method allows rational dimensional reductions to extract a small number of important coordinates whose values are essentially required to predict the reaction direction. After the dimensional reduction, molecular pictures are presented to interpret why those coordinates are important.

## MATERIALS AND METHODS

### Potential energy surface and trajectory calculation

*i*refers to the atoms in the system (H, C, N, and Ar). The position of an atom is described by a three-dimensional vector $ri=xi,yi,zi$, and its conjugate momentum is described by $pi=pxi,pyi,pzi$. The mass of each atom

*i*is denoted as

*m*

_{i}. For the potential energy $VrH,rC,rN,rAr$, the following function is used in the present study:

*V*

_{inter},

*ab initio*calculations were performed at the CCSD(T)/aug-cc-pVQZ level, and the intermolecular energies were obtained with counterpoise correction. The calculation points are distributed in a range including the regions close to the HCN minimum, the CNH minimum, and the saddle point between them. The concrete positions of these points are explicitly given in the supplementary material to this paper. Data of 1097 points with intermolecular energy less than 2 kJ mol

^{−1}, which is regarded as the “typical” range of the Ar–HCN interaction considering the binding energy of $\u2248136cm\u22121$ of the Ar–HCN complex, were least-squares fitted to the following analytical form:

*A*

_{i},

*B*

_{i},

*C*

_{ij}, and

*F*

_{ik}are the fitting parameters. The resulting fitting coefficients, as well as the structure and energy at the data points used for the fitting, are available in the supplementary material to this paper. The data points are concentrated on the configurations with HCN adopting the HCN minimum structure, the CNH minimum structure, and the HCN–CNH saddle point structure. Points with displacements of cos

*γ*, where

*γ*is the Jacobi angle (Fig. 2), by up to ±0.2 from these three points were also included to cover a region around the saddle point rather than the single point. The deviation of ±0.2 in cos

*γ*roughly corresponds to the region where the dynamics is investigated in the present study, as will be expressed in Eqs. (14) and (15) below. The mean square residual error of the fitting was 0.18 kJ mol

^{−1}, which is reasonably small compared to the range of the sampled intermolecular energy (∼±2 kJ mol

^{−1}) and the activation energy of 146 kJ mol

^{−1}for the HCN → CNH isomerization reaction. Figure 1 shows the contour plots of the PESs obtained in this study with respect to the position of the H atom for some fixed positions of the C, N, and Ar atoms.

It was previously pointed out^{60} that exponential functions, rather than Lennard-Jones type functions, are required to describe the inner repulsive wall accurately. This is probably because their interest is in the dissociation reactions that occur with high energy, while the present study focuses on the isomerization reaction. Based on the accuracy of the fitting, we consider that the present PES correctly describes at least the essential features of the influence of the Ar atom on the HCN isomerization reaction. A more detailed examination of the effect of the form of repulsion is left for future work. Additionally, since we fit the intermolecular component *V*_{inter} separately from the intramolecular component *V*_{HCN}, one can replace the latter with a more recent *ab initio* potential function for HCN,^{61} which gives a higher barrier height of 200 kJ mol^{−1} compared to 146 kJ mol^{−1} of Ref. 59, for further quantitative investigations in the future while using the same intermolecular potential of this work. In the present work, the focus is given to the dynamics in the saddle region, which is the pivotal stage governing the branching of trajectories into the reactant or product well. While the barrier height may certainly affect the motion climbing up the PES in the reactant well toward the saddle region, the dynamics of the saddle region is determined by the local morphology of the PES near the saddle point, and its height relative to the reactant minimum does not actually play a significant role. As we fit the intermolecular potential with data points concentrated in the saddle region (see Fig. S2 in the supplementary material), we judge that the present treatment of the PES is sufficient for the purpose while leaving room for improvement in the future investigation on the reactant excitation stage.

From a given initial condition, the equations of motion were numerically integrated by the fourth-order Runge–Kutta method with variable time steps.^{62} To monitor the reaction channels, Jacobi angle *γ* was defined as the angle between the vector connecting the C and N atoms and that connecting the H atom from the CN mass center, as shown in Fig. 2. The linear HCN structure corresponds to *γ* = 0, and the CNH structure to *γ* = π. In the simulation, when the angle *γ* became larger than 1.5 rad, the system was judged to have fallen into the CNH product, and when the angle *γ* became smaller than 1.0 rad, the system was judged to have fallen into the HCN product.

### Initial condition sampling

*x*-axis, with their mass center at the origin (Fig. 3) and having zero velocity. The molecular plane was identified with the

*xy*-plane. In the form of equations, the initial positions and momenta of H, C, N, and Ar atoms were sampled as follows:

The structure of the HCN molecule was sampled in a neighborhood of the saddle point (*γ* ≈ 1.2 rad ≈ 70°) for the isomerization. The Ar atom position was then sampled in a region near the HCN molecule. This simulated a situation in which the Ar atom collided with the HCN molecule in the middle of an isomerization reaction and affected the outcome of the proceeding reaction. Both forward and backward time propagations were calculated from each initial condition, and the reactant and product states were assigned. Consequently, there are four possible types of trajectories: forward reaction (HCN → CNH), backward reaction (CNH → HCN), no reaction from HCN (HCN → HCN), and no reaction from CNH (CNH → CNH). The type of trajectory was regarded as a function of the initial condition specified by the 11 variables (*x*_{H}, *y*_{H}, *p*_{xH}, *p*_{yH}, *x*_{N}, *p*_{xN}, *p*_{yN}, *x*_{Ar}, *y*_{Ar}, *p*_{xAr}, and *p*_{yAr}) as described earlier and was used as the input to the machine learning.

In the sampling explained earlier, the center of mass of the four-atom system is not necessarily at the origin and can also have non-vanishing momentum. The sampling was performed uniformly in the rectangular region, as described earlier. This sampling method makes the explanatory variables independently distributed. This makes the interpretation of the role of each variable more accessible from the analyses described in the section titled Results and Discussion. As an example of correlated sampling, the results of microcanonical sampling are presented in Appendix A.

In addition to the full-dimensional sampling, trajectory calculations on the points sampled with fixed Ar positions and momenta were also performed with the aim of providing further analyses on the effect of the Ar atom on the reaction. Eight Ar positions were selected by seeking “representative” positions that illustrated the effect of the Ar atom as revealed by the machine learning results of the full-dimensional sample. For each Ar position, 200 000 trajectory data were generated.

### Machine learning

In this study, we used a random forest (RF) classifier. RF is an ensemble learning method that constructs multiple decision trees using training data and classifies the input data based on majority voting.^{63} The RF classifier was trained using 180 000 data points, and 20 000 data points were used for the prediction test. All the RF calculations were conducted using the scikit-learn library (ver. 0.23.2) with the following default hyperparameters:^{64} number of trees (n_estimators) is 100, maximum depth of each tree (max_depth) is “none” (unlimited depth), minimum number of samples required to split an internal node (min_samples_split) is 2, minimum number of samples required to be at a leaf node (min_samples_leaf) is 1, and maximum number of features to consider for each split (max_features) is “auto” (square root of number of features). Before the test calculation, we checked the generalization performance of the trained model by using the fivefold cross-validation technique.^{65} In the test calculation, we utilize “accuracy” to see the Random Forest model’s ability to correctly predict the reactivity of the HCN isomerization reaction in the presence of Ar based on the given initial conditions. Accuracy is a performance metric commonly used in machine learning to assess the model’s predictive ability. It measures the proportion of correctly predicted outcomes over the total number of predictions.

To evaluate the importance of the features in the prediction test, we employed permutation importance,^{63} which quantifies the reduction in predictive performance resulting from the random shuffling of a specific feature. Specifically, for each feature, we randomly permuted its values across the dataset and measured the resulting decrease in prediction accuracy. This allowed us to assess the individual impact of each feature on the model’s performance.

In addition to the importance of permutation, we utilized the SHapley Additive exPlanation (SHAP) analysis, which is a game-theoretic method widely utilized for interpreting machine learning models.^{66,67} The SHAP analysis assigns a numerical value, known as the SHAP value, to each feature, representing its contribution to the prediction for a particular sample. The SHAP values provide insights into the direction and magnitude of the feature’s impact on the predicted reactivity of the HCN isomerization reaction in the presence of Ar. Positive SHAP values indicate that the feature positively contributes to the predicted reactivity, meaning that higher values of the feature are associated with an increased likelihood of the reaction occurring. Conversely, negative SHAP values indicate a negative contribution, indicating that lower values of the feature are associated with a higher likelihood of the reaction.

## RESULTS AND DISCUSSION

### Full dimensional learning and dimension reduction

Each sample point, specified with the 11 variables (*x*_{H}, *y*_{H}, *p*_{xH}, *p*_{yH}, *x*_{N}, *p*_{xN}, *p*_{yN}, *x*_{Ar}, *y*_{Ar}, *p*_{xAr}, *p*_{yAr}), was assigned one of the four reaction channels (HCN → CNH, CNH → HCN, HCN → HCN, CNH → CNH) from the forward and backward trajectory calculations as explained in the last section. RF machine learning was then applied to predict the reaction channels from the values of the 11 variables. The results are shown in the first row of Table I. When all 11 variables are used, the accuracy is as high as 0.95. This indicates that the present eleven-dimensional (11D) RF model can correctly predict the reaction channels of the HCN reaction for 95% of the initial conditions.

Variables . | Dimension . | Acc . | Cross_max . | Cross_min . |
---|---|---|---|---|

x_{H}, y_{H}, p_{xH}, p_{yH}, x_{N}, p_{xN}, p_{yN}, x_{Ar}, y_{Ar}, p_{xAr}, p_{yAr} | 11 | 0.952 | 0.953 | 0.948 |

x_{H}, y_{H}, p_{xH}, p_{yH}, x_{N}, p_{xN}, x_{Ar}, y_{Ar}, p_{xAr}, p_{yAr} | 10 | 0.860 | 0.860 | 0.859 |

x_{H}, y_{H}, p_{xH}, p_{yH}, p_{yN}, x_{Ar}, y_{Ar} | 7 | 0.947 | 0.949 | 0.944 |

x_{H}, y_{H}, p_{xH}, p_{yH}, x_{Ar}, y_{Ar} | 6 | 0.858 | 0.857 | 0.856 |

x_{H}, y_{H}, p_{xH}, p_{yH}, p_{yN} | 5 | 0.927 | 0.926 | 0.924 |

Variables . | Dimension . | Acc . | Cross_max . | Cross_min . |
---|---|---|---|---|

x_{H}, y_{H}, p_{xH}, p_{yH}, x_{N}, p_{xN}, p_{yN}, x_{Ar}, y_{Ar}, p_{xAr}, p_{yAr} | 11 | 0.952 | 0.953 | 0.948 |

x_{H}, y_{H}, p_{xH}, p_{yH}, x_{N}, p_{xN}, x_{Ar}, y_{Ar}, p_{xAr}, p_{yAr} | 10 | 0.860 | 0.860 | 0.859 |

x_{H}, y_{H}, p_{xH}, p_{yH}, p_{yN}, x_{Ar}, y_{Ar} | 7 | 0.947 | 0.949 | 0.944 |

x_{H}, y_{H}, p_{xH}, p_{yH}, x_{Ar}, y_{Ar} | 6 | 0.858 | 0.857 | 0.856 |

x_{H}, y_{H}, p_{xH}, p_{yH}, p_{yN} | 5 | 0.927 | 0.926 | 0.924 |

With the aim of using the machine learning results to obtain molecular-level insights into the chemical reaction, it is here intended to eliminate some unimportant variables and predict the reaction channel by using as few variables as possible to obtain a simpler picture of the reactivity. First, we conducted a permutation analysis^{63} to evaluate the importance of the variables (called “features” in the machine learning field). Briefly, the values of a variable were randomly permuted in the samples, and the prediction accuracy was then computed for these samples. The decrease in accuracy because of this permutation is regarded as indicating the importance of that variable for the prediction. The results shown in Fig. 4 reveal that the *x*-coordinate and its conjugate momentum of the H atom are the most important variables required for determining the reactivity. This is intuitively understandable because the title reaction HCN ⇄ CNH can be approximately regarded as the migration of the H atom from the C side to the N side owing to the smaller mass of H compared to those of C and N. Next to the H variables, the most important variable required for the prediction is *p*_{yN}, which approximately represents the bending motion of the HCN molecule. In fact, the elimination of this variable from the RF model decreases the accuracy of prediction by 10% (10D model, the second row in Table I). Note that the 10D model was trained using the same full-dimensional data as those used for the 11D learning. This implies that the variable *p*_{yN} has a non-negligible contribution to predicting the reactivity in the full-dimensional sample. By contrast, the decrease in accuracy induced by the elimination of *x*_{N}, *p*_{xN}, *p*_{xAr}, and *p*_{yAr} is negligible; the 7D model (third row in Table I) shows a prediction accuracy of ∼95%. This result is consistent with the permutation importance results shown in Fig. 4, which demonstrate that the importance of *x*_{N}, *p*_{xN}, *p*_{xAr}, and *p*_{yAr} is negligible. The elimination of *p*_{yN} from the 7D model results in a 6D model. The fourth row in Table I shows that this model decreases the prediction accuracy to 86%, as can be expected from the results of the 10D model. The Ar positions *x*_{Ar} and *y*_{Ar} contribute to the prediction to some extent, whereas the contributions of the Ar momenta *p*_{xAr} and *p*_{yAr} are found to be negligible. The elimination of the Ar coordinates decreases the prediction accuracy by 2% (5D model, the last row in Table I). Figure 4 also shows that although the importance of the Ar position coordinates is not as high as that of *p*_{yN}, it is not negligible.

### Molecular interpretation of the full dimensional results

Figure 1(a) shows the contours of the potential energy experienced by the H atom. The saddle point for the isomerization reaction and IRC are also shown. Near the saddle point, the direction of the IRC is approximately parallel to the *x*-direction, that is, the direction of the CN axis. Therefore, in this region, the chemical reaction almost corresponds to the movement of the H atom along the *x*-direction. This explains the high importance of *x*_{H} and *p*_{xH} as well as the low importance of *y*_{H} and *p*_{yH}.

To elucidate more concretely how *x*_{H} and *p*_{xH} affect the occurrence of the reaction, the reactivity boundaries in a section of phase space are shown in Fig. 5. For visualization, a two-dimensional plane spanned by *x*_{H} and *p*_{xH} is selected, with the other variables fixed at the following values: *y*_{H} = 1.112 Å, *x*_{N} = 0.526 Å, *x*_{Ar} = −2.50 Å, *y*_{Ar} = 3.56 Å, and *p*_{yH} = *p*_{xN} = *p*_{yN} = *p*_{xAr} = *p*_{yAr} = 0. As can be observed, the plane is divided into four domains in terms of the reaction channels: two reactive channels (HCN → CNH and CNH → HCN) and two non-reactive channels (HCN → HCN and CNH → CNH). The boundaries of these four regions are called the “reactivity boundaries.” In the calculation, the plane was divided into 200 × 200 grid points, and the boundaries were detected by assigning a reaction channel to each grid point. Reaction-channel assignment was performed both by the trajectory simulations and by the RF prediction, and their results are compared in the figure. It is seen that the RF prediction satisfactorily reproduces the reactivity boundaries obtained from trajectory simulations.

The positioning of the four regions and their boundaries shown in Fig. 5 is essentially the same as that found in previous studies conducted on dynamical chemical reaction theory.^{14–43} A large positive (negative) *p*_{xH} implies that the H atom has sufficiently high energy to overcome the reaction barrier from the C side to the N side (from the N side to the C side, resp.). On the other hand, *x*_{H}, the position of the H atom along the *x*-direction, mainly distinguishes the two non-reactive channels (HCN → HCN and CNH → CNH) for small $pxH$; hence, the kinetic energy is low and the system is confined within either the HCN or the CNH potential well.

A schematic is shown in Fig. 6 to interpret the importance of *p*_{yN}. Suppose a case with small H atom momentum. If CN does not move, the H atom is reflected by the reaction barrier and falls back into the HCN potential well [panel (a)]. If the N atom has some positive momentum along the *y* direction, the CN axis rotates counterclockwise, as shown in the figure [panel (b)]. The N atom then approaches the moving H atom. Finally, the H atom is caught by the N atom, thereby forming the CNH product. To verify this interpretation quantitatively, the reactivity boundaries are drawn in Fig. 7. The figure shows a two-dimensional section of the phase space spanned by *p*_{yN} and *p*_{xH}, with the other variables fixed at the following values: *x*_{H} = −0.574 Å, *y*_{H} = 1.112 Å, *x*_{N} = 0.526 Å, *x*_{Ar} = *y*_{Ar} = 100 Å, and *p*_{yH} = *p*_{xN} = *p*_{xAr} = *p*_{yAr} = 0. As observed in Fig. 5, the RF prediction satisfactorily reproduces the reactivity boundaries obtained via the trajectory simulations. It is noted that the boundaries slant downward as *p*_{yN} increases, enlarging the region of the HCN → CNH reaction. This indicates that for positive *p*_{yN}, the HCN → CNH reaction occurs more easily, which confirms the interpretation illustrated in Fig. 6.

The mechanical picture obtained here, as shown in Fig. 6, is intuitively reasonable and is computationally confirmed by plotting the reactivity boundaries as in Fig. 7. This plot is, however, possible only after one has chosen the coordinates (in this case, *p*_{yN} and *p*_{xH}) to plot the boundaries in the phase space. The machine learning helped to choose the coordinates for the plot by identifying the coordinates that have a significant influence on the reaction outcome, as done by inspecting the permutational importance in Fig. 4.

By contrast, *x*_{N} and *p*_{xN}, which approximately represent the CN stretching motion, are not important for predicting the reaction. This implies that the isomerization reaction is dynamically independent of the CN stretching motion.

### The effect of Ar atom position analyzed by SHAP

Machine learning and the subsequent analyses via permutation importance have revealed that the presence of Ar affects the reactivity of the HCN system. To obtain further insights into the role of the Ar atom, the SHAP values^{66,67} of the variables *x*_{Ar} and *y*_{Ar} were evaluated for the 7D model here. Briefly, the SHAP value of a variable approximately describes the extent of the variable's contribution to making a prediction. Figure 8 shows the SHAP values of *x*_{Ar} and *y*_{Ar} plotted against the two-dimensional plane of (*x*_{Ar}, *y*_{Ar}). For example, it is observed from the top right panel of Fig. 8 that the variable *x*_{Ar} makes a large positive contribution in predicting that the given initial condition corresponds to the CNH → CNH channel when the Ar atom is located at the points $xAr,yAr\u2248\u22121.0A\u030a,2.9A\u030a$.

Eight regions can be roughly recognized in the (*x*_{Ar}, *y*_{Ar})-plane from the plots of SHAP values in Fig. 8. To clarify, the top right panel of Fig. 8 is reproduced in the left panel of Fig. 9, wherein the right panel shows the division of the left-panel figure into eight regions. Regions (F) and (G) are characterized by the greatest effect of the Ar atom position $xAr,yAr$ on the reactivity. These two regions also show the most contrasting SHAP values because the effect of the Ar-atom positions experienced in the region (F) is in the opposite direction to that experienced in the region (G). For example, for the CNH → CNH channel, the variable *x*_{Ar} has a large positive effect in the region (F), whereas it has a large negative effect in the region (G). Compared to regions (F) and (G), the effect of the Ar-atom position is weaker in regions (A)–(E) and (H), and the contrast between these regions is not necessarily clear. In the molecular picture, these regions correspond to the configuration wherein the Ar atom is located rather far from the HCN molecule; therefore, they may exhibit weaker intermolecular interactions.

### Machine learning with single initial Ar position

To obtain further insights into the effect of the Ar-atom position, one representative point was chosen from each of the eight regions identified in the previous subsection, as shown in Fig. 9. Hereafter, these points are denoted by **a**, **b**, …, **h** in bold Roman letters. Machine learning was performed on datasets in which the initial position of the Ar atom was set at each of the points **a****–****h**, and the remaining variables were randomly sampled in the region described in the section titled Materials and Methods. The learning results obtained for different Ar-atom positions were compared to assess the influence of the Ar-atom position on the reaction. In the sampling, the initial momentum of the Ar atom was fixed at zero because the initial momentum of the Ar atom (*p*_{xAr}, *p*_{yAr}) was found to have negligible effects on reactivity in full-dimensional learning. Based on the high performance of the “7D model” with the input (*x*_{H}, *y*_{H}, *p*_{xH}, *p*_{yH}, *p*_{yN}, *x*_{Ar}, *y*_{Ar}) found for the full-dimensional data, machine learning was applied here with respect to the five variables *x*_{H}, *y*_{H}, *p*_{xH}, *p*_{yH}, and *p*_{yN}, since *x*_{Ar} and *y*_{Ar} were not used as variables in this analysis. In addition, machine learning was also performed for the configuration in which the Ar atom was placed sufficiently distant from the HCN molecule so that the intermolecular interaction between them was negligible. This condition is denoted as “Ar*∞*.”

As shown in Table II, the machine learning results exhibit ∼95% or higher prediction accuracy for every case, while the Ar-position dependence of accuracy is negligible. This demonstrates the consistent performance and effectiveness of the model in predicting the reactivity for each Ar position. Moreover, the permutation-importance results shown in Fig. 10 indicate high similarity between results achieved for **a****–****e**, **h**, and Ar*∞*. This further supports the molecular view that regions (A)–(E) and (H) are characterized by weak intermolecular interactions, and the reactions there are almost the same as those in the isolated HCN without Ar buffer gas. By contrast, the permutation-importance results obtained for points **g** and **h** show the increased importance of the motions of the hydrogen atom in the *y*-direction, parameterized by *y*_{H} and *p*_{yH}.

. | $xAr$ (Å) . | $yAr$ (Å) . | Acc . | Cross_max . | Cross_min . |
---|---|---|---|---|---|

a | −2.5 | 3.56 | 0.963 | 0.962 | 0.958 |

b | −1.2 | 3.56 | 0.962 | 0.961 | 0.960 |

c | 0.3 | 3.56 | 0.959 | 0.962 | 0.960 |

d | 2.0 | 3.56 | 0.960 | 0.961 | 0.958 |

e | −2.5 | 2.89 | 0.963 | 0.961 | 0.958 |

f | −1.2 | 2.89 | 0.968 | 0.970 | 0.967 |

g | 0.3 | 2.89 | 0.947 | 0.949 | 0.945 |

h | 2.0 | 2.89 | 0.960 | 0.960 | 0.959 |

Ar∞ | 100 | 0 | 0.961 | 0.960 | 0.959 |

. | $xAr$ (Å) . | $yAr$ (Å) . | Acc . | Cross_max . | Cross_min . |
---|---|---|---|---|---|

a | −2.5 | 3.56 | 0.963 | 0.962 | 0.958 |

b | −1.2 | 3.56 | 0.962 | 0.961 | 0.960 |

c | 0.3 | 3.56 | 0.959 | 0.962 | 0.960 |

d | 2.0 | 3.56 | 0.960 | 0.961 | 0.958 |

e | −2.5 | 2.89 | 0.963 | 0.961 | 0.958 |

f | −1.2 | 2.89 | 0.968 | 0.970 | 0.967 |

g | 0.3 | 2.89 | 0.947 | 0.949 | 0.945 |

h | 2.0 | 2.89 | 0.960 | 0.960 | 0.959 |

Ar∞ | 100 | 0 | 0.961 | 0.960 | 0.959 |

### Cross tests

To further demonstrate the distinction between points **f**, **g** and the other cases, the results of “cross tests” were examined. A cross-test means that the result of learning in one case is used to predict the reaction channel in another case, and the prediction accuracy is evaluated. As shown in Table III, the result of learning by using the dataset Ar*∞* can predict the reaction outcome for points **a****–****e** and **h** with almost perfect accuracy, further supporting the view that reactions that occur in the regions (A)–(E) and (H) are essentially similar to the unimolecular reaction of isolated HCN. By contrast, the learning result obtained for the unimolecular reaction shows a low performance in the prediction of reaction outcomes for **f** and **g**, and the cross-test results between **f** and **g** exhibit the worst performance.

Training . | Ar∞
. | a
. | b
. | c
. | d
. | e
. | f
. | g
. | h
. |
---|---|---|---|---|---|---|---|---|---|

Test | |||||||||

Ar∞ | 0.961 | 0.960 | 0.961 | 0.960 | 0.961 | 0.960 | 0.616 | 0.772 | 0.961 |

a | 0.961 | 0.963 | 0.958 | 0.961 | 0.961 | 0.960 | 0.619 | 0.780 | 0.961 |

b | 0.960 | 0.960 | 0.962 | 0.958 | 0.959 | 0.961 | 0.632 | 0.765 | 0.959 |

c | 0.959 | 0.959 | 0.957 | 0.959 | 0.960 | 0.958 | 0.622 | 0.773 | 0.959 |

d | 0.960 | 0.962 | 0.959 | 0.959 | 0.960 | 0.961 | 0.624 | 0.773 | 0.960 |

e | 0.962 | 0.961 | 0.963 | 0.960 | 0.962 | 0.963 | 0.622 | 0.762 | 0.960 |

f | 0.617 | 0.616 | 0.624 | 0.613 | 0.619 | 0.622 | 0.968 | 0.479 | 0.616 |

g | 0.773 | 0.774 | 0.764 | 0.776 | 0.769 | 0.765 | 0.479 | 0.947 | 0.773 |

h | 0.961 | 0.960 | 0.959 | 0.959 | 0.960 | 0.959 | 0.613 | 0.768 | 0.960 |

Training . | Ar∞
. | a
. | b
. | c
. | d
. | e
. | f
. | g
. | h
. |
---|---|---|---|---|---|---|---|---|---|

Test | |||||||||

Ar∞ | 0.961 | 0.960 | 0.961 | 0.960 | 0.961 | 0.960 | 0.616 | 0.772 | 0.961 |

a | 0.961 | 0.963 | 0.958 | 0.961 | 0.961 | 0.960 | 0.619 | 0.780 | 0.961 |

b | 0.960 | 0.960 | 0.962 | 0.958 | 0.959 | 0.961 | 0.632 | 0.765 | 0.959 |

c | 0.959 | 0.959 | 0.957 | 0.959 | 0.960 | 0.958 | 0.622 | 0.773 | 0.959 |

d | 0.960 | 0.962 | 0.959 | 0.959 | 0.960 | 0.961 | 0.624 | 0.773 | 0.960 |

e | 0.962 | 0.961 | 0.963 | 0.960 | 0.962 | 0.963 | 0.622 | 0.762 | 0.960 |

f | 0.617 | 0.616 | 0.624 | 0.613 | 0.619 | 0.622 | 0.968 | 0.479 | 0.616 |

g | 0.773 | 0.774 | 0.764 | 0.776 | 0.769 | 0.765 | 0.479 | 0.947 | 0.773 |

h | 0.961 | 0.960 | 0.959 | 0.959 | 0.960 | 0.959 | 0.613 | 0.768 | 0.960 |

Interestingly, the permutation importance evaluated for the prediction of **f** by the learning of **g** and vice versa (Fig. 11) shows negative values for *x*_{H} and *p*_{yH}. This implies that the information on *x*_{H} or *p*_{yH} worsens the prediction accuracy if the reactivity at **f** is predicted by using the learning results obtained for **g**. In other words, the roles played by *x*_{H} and *p*_{yH} in determining the reaction channels are likely to be opposite in direction between regions (F) and (G).

### Molecular interpretation for the effect of the Ar position

Figures 1(b) and 1(d) show the potential energy contours experienced by the H atom for the Ar-atom positions corresponding to points **f** and **g**, respectively. The saddle point is marked by a cross symbol whose arms show the accompanying normal directions. As can be observed, the position of the saddle point shifts along the *x*-direction, mainly because of the repulsive interaction with the Ar atom. Qualitatively, a saddle point on the PES divides the space into the “reactant” and “product” regions (at least approximately for sufficiently low energies). Therefore, the initial conditions wherein *p*_{xH} is so low that the reaction barrier cannot be overcome are classified into the HCN → HCN (CNH → CNH) channel if *x*_{H} < *x*_{sdl} (*x*_{H} > *x*_{sdl}, resp.). Here, *x*_{sdl} denotes the *x*-coordinate of the saddle point. Because the value of *x*_{sdl} changes significantly between points **f** and **g**, classifying the trajectories via learning on one of the points results in significant errors when applied to the other point. The error may be so large that it is better not to use the information in *x*_{H} at all. This explains the negative importance found in the cross-test results obtained for **f** and **g**. This interpretation can be confirmed by plotting the reactivity boundaries, as shown in Fig. 12, for the Ar-atom positions **f** and **g**. The topological positioning of the four regions is similar to that shown in Fig. 5, which is drawn with the Ar atom located at **a** and using the learning results for full-dimensional data. A comparison of the two panels in Fig. 12 with each other and with Fig. 5 reveals the shift of the reactivity boundaries in the direction of *x*_{H} in accordance with the saddle point.

As can be observed in Figs. 1(b) and 1(d), the direction of the reaction coordinate (by normal mode approximation at the saddle) has a small *y*-component for points **f** and **g**, whereas it is nearly parallel to the *x*-direction in the absence of the Ar atom. This highlights the importance of *p*_{yH} in determining the reactivity of points **f** and **g**. Moreover, the sign of the *y*-component of the reaction coordinate direction for **f** is opposite to that for **g**. This implies that the contributions of *p*_{yH} to the reactivity are opposite in direction at points **f** and **g**, explaining the negative importance of *p*_{yH} found in the cross-tests.

## SUMMARY AND OUTLOOK

In the present study, the reaction dynamics of the isomerization reaction HCN ⇄ CNH was investigated under the effect of an Ar atom as the buffer gas. Reaction trajectories were simulated on the potential energy function newly constructed with CCSD(T)/aug-cc-pVQZ level *ab initio* calculations. The four reaction channels (HCN → HCN, HCN → CNH, CNH → CNH, and CNH → HCN) assigned by the trajectory simulation were regarded as a function of the initial positions and momenta of all the atoms in the system. Machine learning by the RF method was performed on the data obtained by the trajectory simulation. Its prediction accuracy was ∼95%, which indicated that the RF model captured the features of the phase space affecting the reactivity. Subsequent analyses of the contributions of the variables via permutation importance and SHAP values enabled the extraction of variables essential to determine reactivity. Corresponding molecular pictures of the roles played by these variables were provided. The primary importance of the position and momentum of the H atom along the *x*-direction, where the *x*-axis is taken parallel to the CN bond, can be understood from the fact that the reaction is essentially the motion of the H atom because of its small mass and that the IRC is nearly parallel to the *x*-axis near the saddle point. Machine learning results revealed that *p*_{yN} is also important in predicting reactivity, which corresponds to the initial rotation of the CN. This is interpreted as meaning that the motion of the H and N atoms toward each other facilitates the formation of the HN bond. Central to the interest of the present study, the effect of the Ar atom was also elucidated through the analyses of the machine-learning results, and it was successfully interpreted in terms of the displacement of the saddle point by the repulsive interaction between the H and Ar atoms.

It is noted that the machine learning found out those variables essential to the dynamics of the reaction without any prior knowledge of this reaction. The data used for the learning were only the trajectory simulation results obtained for the sampled initial conditions. Starting from the set of all the variables in the system, the machine learning and subsequent analyses extracted a reduced number of variables regarded as important for the reaction. While the molecular interpretations provided for the roles played by these variables are clear after the extraction of the variables, it was not *a priori* trivial that only those variables were of essential importance before the machine learning analyses. The present results imply that thorough statistical analyses provided by machine learning can further advance the fundamental understanding of polyatomic reaction dynamics. It is known in many reactions that differences in initial conditions lead to significant differences in reaction dynamics.^{68–73} It would be interesting to apply the machine learning analysis of the present study to elucidate the mechanism of such chemical reactions.

In this present study, we have chosen a relatively simple system of short-time passage over an isomerization saddle and focused on the applicability of machine learning to extract insights into the dynamics. For the purpose of machine learning, the initial condition was sampled uniformly in the rectangular region with the planar geometry of the four-atom system. In fact, it is shown in Appendix A that this uniform sampling is more efficient for the purpose of evaluating the importance of each variable than microcanonical sampling, which holds the total energy of the system constant. Concerning the planarity, a partial investigation of the effect of the out-of-plane motions is presented in Appendix B. In the range investigated, the results do not change the main conclusion of this study. Moreover, a complete picture of a chemical reaction beyond the passage over the saddle will include excitation in the reactant basin before entering the saddle region and relaxation in the product basin after the passage over the saddle. Indeed, in the case of the HCN isomerization considered in the present study, the system will keep passing back and forth between the HCN and CNH wells if relaxation does not occur in the well after the passage over the saddle. More complicated behaviors like fractal dependence on initial conditions may arise in those dynamics in the well or when one proceeds to analyze larger molecular systems with the methods presented in this work. Those problems would present intriguing challenges in the technical aspects of machine learning in future investigations. Technical developments in machine learning combined with the analysis methods proposed in our current study would open the way to elucidate the dynamics of such complicated systems. For example, the use of curvilinear coordinates in contrast to the cartesian coordinates as used in the present study may offer advantages in terms of interpretability. In this respect, it is encouraging that machine learning can handle analytical formulas,^{56} which may enable it to find the best curvilinear coordinates to describe the reaction dynamics. It is also noteworthy that the concept of mapping the initial phase space by analyzing the minimum dynamic path (MDP), as proposed by Unke *et al.*,^{74} holds promise for enabling efficient sampling of reactive trajectories, which can further enhance the effectiveness of our machine learning-based analysis. However, the precise relationship between the MDP analysis and the present ML-based analysis is not yet clear and warrants further investigation as an intriguing topic for future research from a physical viewpoint. Since the reaction path is necessarily curvilinear, it would be interesting to compare the results of machine learning using curvilinear coordinates with their findings in future investigations.

Another interesting future direction may be to validate the present results through experiments, e.g., scattering experiments with molecular beams.^{1} The energy and direction of the collision can be controlled by aligned molecular beams.^{1,68,75} The initial state of the molecule can be prepared by laser excitation. With the precise control of the initial state by these experimental techniques, the reaction outcome can be controlled by selecting the reactive or non-reactive initial conditions revealed by the present study. While the present study primarily focused on the saddle region dynamics rather than the entire reaction process including the initial excitation, the obtained insights already carry some implications for experimental investigations. For instance, it was found that the position of the Ar atom affects the branching of trajectories in the saddle region, directing them into the reactant or product well. This suggests that for the same given excitation method (e.g., laser irradiation), isomerization can be promoted by colliding the Ar atom from the C side rather than the N side of the HCN molecule (see Fig. 12). This direction would open an interesting avenue for controlling chemical reactions based on clear molecular insights.

## SUPPLEMENTARY MATERIAL

See the supplementary material for the data of the electronic state calculation and the potential energy surface fitting.

## ACKNOWLEDGMENTS

This research was supported by the JSPS KAKENHI (Grant No. 16KT0050). S.K. was also supported by JSPS KAKENHI (Grant No. 16K17852). Part of the computation was performed by the supercomputers of ACCMS, Kyoto University, and TSUBAME (Grant Nos. hp210141 and hp220134). T.Y. also acknowledges the support from Scientific Research [Grant Nos. JSPS KAKENHI (C) 21K03482 and (C)18K05025], the GAP Fund (UTokyo), and the Program for Promoting Research on the Supercomputer Fugaku (Application of Molecular Dynamics Simulation to Precision Medicine Using Big Data Integration System for Drug Discovery, Grant Nos. JPMXP1020200201, hp210172, and hp220164).

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

**Takefumi Yamashita**: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Funding acquisition (equal); Investigation (equal); Methodology (equal); Project administration (equal); Supervision (equal); Validation (equal); Visualization (equal); Writing – original draft (equal). **Naoaki Miyamura**: Data curation (supporting). **Shinnosuke Kawai**: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Funding acquisition (equal); Investigation (equal); Methodology (equal); Project administration (equal); Supervision (equal); Validation (equal); Visualization (equal); Writing – original draft (equal).

## DATA AVAILABILITY

The data that support the findings of this study are available within the supplementary material.

### APPENDIX A: MACHINE LEARNING RESULTS FOR MICROCANONICAL SAMPLING

In the main text, machine learning was performed on data points uniformly sampled in the rectangular region of initial conditions [Eqs. (14)–(24)]. In this sampling, all the explanatory variables are independent of each other. One may consider more chemically “natural” ways of sampling such as the microcanonical ensemble, where energy conservation is taken into account, sampling with constant energy and angular momentum, or even a quasi-classical sampling, where all the quantum numbers are held constant in the initial condition. As a primary investigation for such sampling, here we report the results of machine learning on the microcanonical sampling.

To obtain sample points with a constant energy, a 60 ns-long trajectory simulation of HCN was performed, and the coordinates and momenta were extracted every 60 fs. Then 5000 points with 1.0 < *γ* < 1.5 were randomly selected. The energy of the trajectory was set at 69.4 kJ mol^{−1} above the saddle point, which roughly allows excitation of the vibrational mode orthogonal to the reaction coordinate with one quantum number, considering that the highest-frequency normal mode at the saddle point has the frequency 3100 cm^{−1} ≈ 35 kJ mol^{−1}. The total angular momentum was set to zero. The momenta were re-sampled every 20 fs during the trajectory. After the microcanonical sample of HCN was obtained, the Ar atom was put at random positions in the regions −1.24 < *x*_{Ar}/10^{−10} m < 0.27 and 2.89 < *y*_{Ar}/10^{−10} m < 3.56, and the momenta *p*_{xAr} and *p*_{yAr} were sampled by the normal distribution with a standard deviation of 2.0 × 10^{−23} kg m s^{−1}.

The results of the machine learning on the microcanonical sample are summarized in Table IV, corresponding to Table I in the main text, and the values of permutation importance are shown in Fig. 13, corresponding to Fig. 4 in the main text. Somewhat counter-intuitively, a non-negligible contribution of *y*_{H} is found. This is in contrast to what was found on the rectangular sampling shown in the main text (Fig. 4), and it is counterintuitive from the fact that the reaction coordinate is nearly parallel to the *x*-direction and the motion along *y* is a vibrational mode orthogonal to the reaction direction [Fig. 1(a)]. This can be interpreted as follows: As the vibration along the *y*-direction is excited, the energy available in the reaction direction is reduced due to the constancy of the total energy. Hence, the magnitude of *p*_{xH} is inversely correlated with *y*_{H} in the sample, and a higher value of *y*_{H} implies a smaller $pxH$, which makes the initial condition less likely to be reactive. From this interpretation, the reactivity itself is directly affected by the value of *p*_{xH}, but due to the correlation between *y*_{H} and $pxH$ existing in the sample, the machine simply finds a correlation between *y*_{H} and the reactivity. Therefore, for the purpose of obtaining molecular insights from the results of machine learning, it is recommended to prepare rectangular sample points so that the explanatory variables are independently distributed. Nevertheless, it would be interesting for future work to devise methodologies to extract sensible interpretations even from correlated samples.

Variables . | Dimension . | Acc . | Cross_max . | Cross_min . |
---|---|---|---|---|

x_{H}, y_{H}, p_{xH}, p_{yH}, p_{xC}, p_{yC}, x_{N}, p_{xN}, p_{yN}, x_{Ar}, y_{Ar}, p_{xAr}, p_{yAr} | 13 | 0.943 | 0.944 | 0.936 |

x_{H}, y_{H}, p_{xH}, p_{yH}, p_{xC}, p_{yC}, x_{N}, p_{xN}, x_{Ar}, y_{Ar}, p_{xAr}, p_{yAr} | 12 | 0.936 | 0.940 | 0.932 |

x_{H}, y_{H}, p_{xH}, p_{yH}, x_{Ar}, y_{Ar}, p_{xAr}, p_{yAr} | 8 | 0.928 | 0.926 | 0.919 |

x_{H}, y_{H}, p_{xH}, p_{yH}, x_{N}, x_{Ar}, y_{Ar} | 7 | 0.933 | 0.931 | 0.926 |

x_{H}, y_{H}, p_{xH}, p_{yH}, x_{Ar}, y_{Ar} | 6 | 0.933 | 0.933 | 0.926 |

x_{H}, y_{H}, p_{xH}, p_{yH} | 4 | 0.919 | 0.917 | 0.912 |

Variables . | Dimension . | Acc . | Cross_max . | Cross_min . |
---|---|---|---|---|

x_{H}, y_{H}, p_{xH}, p_{yH}, p_{xC}, p_{yC}, x_{N}, p_{xN}, p_{yN}, x_{Ar}, y_{Ar}, p_{xAr}, p_{yAr} | 13 | 0.943 | 0.944 | 0.936 |

x_{H}, y_{H}, p_{xH}, p_{yH}, p_{xC}, p_{yC}, x_{N}, p_{xN}, x_{Ar}, y_{Ar}, p_{xAr}, p_{yAr} | 12 | 0.936 | 0.940 | 0.932 |

x_{H}, y_{H}, p_{xH}, p_{yH}, x_{Ar}, y_{Ar}, p_{xAr}, p_{yAr} | 8 | 0.928 | 0.926 | 0.919 |

x_{H}, y_{H}, p_{xH}, p_{yH}, x_{N}, x_{Ar}, y_{Ar} | 7 | 0.933 | 0.931 | 0.926 |

x_{H}, y_{H}, p_{xH}, p_{yH}, x_{Ar}, y_{Ar} | 6 | 0.933 | 0.933 | 0.926 |

x_{H}, y_{H}, p_{xH}, p_{yH} | 4 | 0.919 | 0.917 | 0.912 |

### APPENDIX B: EFFECTS OF THE *z*-DIRECTION

*z*

In the main text, the sampling was performed with all four atoms in the *x*–*y* plane [Eqs. (10)–(13)] because it is expected that the effect of the Ar atom can be best observed when it is on the same plane with HCN. In this paragraph, we show some preliminary results for the effect of displacing the Ar atom in the *z*-direction. Figure 14 shows plots of the reactivity boundaries in the *x*_{H}–*p*_{xH} plane, as in Fig. 12 in the main text. In the left panel, the Ar atom is given the initial momentum of *p*_{zAr} = 1.0 × 10^{−22} kg m s^{−1}, which is the same as the maximum value given to *p*_{xAr} and *p*_{yAr} in the sampling conducted in the main text [Eqs. (23) and (24)]. It is seen that the effect of the initial *p*_{zAr} on the reactivity boundaries is negligible. In the right panel of the same figure, the Ar atom is initially displaced off the HCN plane with $zAr=1.0A\u030a$. Here, the reactivity boundaries for **f** and **g** approach each other and become more similar to those shown in Fig. 5, which was drawn for the case **a**, where the Ar effect is weak. This suggests that the effect of the Ar atom diminishes as it moves away from the HCN in the z-direction. Therefore, in either case, the effect of the *z*-direction is easily understandable. We consider that this provides a good reason to focus, at least as the first-step of the investigation, on the motion confined to the *x*–*y* plane.

## REFERENCES

*Molecular Reaction Dynamics*

_{3}reaction inhibits CH bond cleavage

*Ab initio*molecular dynamics and wavepacket dynamics of highly charged fullerene cations produced with intense near-infrared laser pulses

_{1}

*p*-terphenyl and carbon tetrachloride: Mode-specific acceleration of vibrational dephasing in reactant molecule

_{3}D) on the steps and terraces of Pt(211)

*Optical Control of Molecular Dynamics*

*Frontiers of Chemical Dynamics*

*NATO ASI Series Vol. 470*

*Geometrical Structures of Phase Space in Multidimensional Chaos: Applications to Chemical Reaction Dynamics in Complex Systems*

*Advances in Chemical Physics Vols. 130A and 130B*

*Advancing Theory for Kinetics and Dynamics of Complex, Many-Dimensional Systems: Clusters and Proteins*

*Advances in Chemical Physics Vol. 145*

_{6}

*Adv. Chem. Phys.*

*Advancing Theory for Kinetics and Dynamics of Complex, Many-Dimensional Systems: Clusters and Proteins*

*Advances in Chemical Physics Vol. 145*

*via*asymptotic trajectories

*Numerical Recipes: The Art of Scientific Computing*

*The Elements of Statistical Learning: Data Mining, Inference, and Prediction*

*ab*

*initio*potential energy surface and comparison with molecular beam experiments

^{1}

*D*) + N

_{2}O → NO + NO: Classification of reaction paths and vibrational distribution

^{1}D) + N

_{2}O → NO + NO