We present new results on the underlying guest–host interactions and spectral characterization of a CO2 molecule confined in the cages of the sI clathrate hydrate. Such types of porous solids raise computational challenges, as they are of practical interest as gas storage/capture materials. Accordingly, we have directed our efforts toward addressing their modeling in a proper manner, ensuring the quality of the input data and the efficiency of the computational approaches. The computational procedure for spectral simulations, within the multi-configurational time-dependent Hartree framework, involves the development of a fully coupled Hamiltonian, including an exact kinetic energy operator and a many-body representation of the potential, along with dipole moment surfaces, both obtained through neural network machine learning techniques. The resulting models were automatically trained and tested on extensive datasets generated by PW86PBE-XDM calculations, following the outcome of previous benchmark studies. Our simulations enable us to explore various aspects of the quantized dynamics upon confinement of CO2@D/T, such as constrained rotational–translational quantum motions and the averaged position/orientation of the CO2 guest in comparison to the experimental data available. Particularly notable are the distinct energy patterns observed in the computed spectra for the confined CO2 in the D and T cages, with a considerably high rotational–translational coupling in the CO2@T case. Leveraging reliable computations has proved instrumental, highlighting the sensitivity of the spectral features to the shape and strength of the potential interactions, with the explicit description of many-body contributions being significant.
I. INTRODUCTION
In the context of filled water ices, clathrate hydrates, or gas hydrates, represent a scientifically significant and practically important class of solid state materials. These ice-like structures share some similarities with ice lattices, such as the hydrogen-bonded framework (host-lattice), which shows unique properties due to the formation of polyhedral water open structure cavities with the ability to accommodate guest molecules. Such solid structures were first synthesized in 1810,1 when Cl gas was trapped in a water network under high pressure. Initially, research efforts were focused on gas hydrates as a scientific curiosity, while later on in the mid-30s, naturally formed gas hydrates were discovered to exist in deep oceans, permafrost regions, and extraterrestrial environments,2–4 which led to a greater practical interest focused on potential applications from an industrial and technological perspective, such as gas separation and storage materials, sequestration and desalination techniques, or energy resources.5–13
The most common natural gas hydrates belong to three crystal structures, namely, cubic structure I (sI), cubic structure II (sII), and hexagonal structure H (sH), which are built from the combination of five different cages of varying sizes, such as 512, 435663, 51262, 51264, and 51268 (xy, where x represents the number of sides per cage face and y represents the number of those face types that compose a specific cage). The formation of any of these structures depends on various factors such as temperature, pressure, and the guest atom/molecule entrapped within the clathrate cages; thus, for instance, clathrates containing CH4 and CO2 naturally tend to form sI hydrates,2 while they can also be found in sII hydrates14 when a promoter is included,15 or even binary sH ones.16
Gaining a deeper understanding of the various aspects related to the formation, stabilization, dissociation, phase transitions, and thermodynamic properties of such guest–host structures is crucial to overcome the challenges associated with their practical application in various fields. In this sense, the evolution of computer simulations, in conjunction with theoretical methods (software) and computer architecture (hardware), provides powerful tools for advancing our understanding of clathrate hydrates, offering valuable insights to unravel their behavior and drive innovation in their utilization and development. In the realm of computational modeling, the quality and accuracy of results and predictions hold paramount importance, dependent on factors such as the quality of the input data, chosen computational techniques, and underlying approximations.
On the one hand, the application of machine learning (ML) methods dramatically accelerated the computational algorithms, and predictive data-driven models of potential energy surfaces (PESs) have already been reported.17–21 On the other hand, the critical dependence on the reliability of any ML approach relies on the availability of extensive datasets for training the model.20,22 It is, therefore, essential to possess valuable high-level outcomes to ensure the effectiveness of the ML approach. The direct consequence of this is the significant added value that is conferred on benchmark studies, fostering knowledge exchange and collaboration within the scientific community. The collective effort has led to advancements in the field and facilitates the development of general-purpose open-sourced software, such as scikit-learn, PyTorch, and TensorFlow, for easily implementing ML approaches23–25 such as neural networks (NNs) and kernel-based ML models.
Following current challenges in the field, we have directed our efforts to address proper data-driven modeling of the underlying guest–host interactions using well-supported accelerated NN frameworks, such as TensorFlow,25 and then employed such descriptions in quantum spectral computations of clathrate hydrates. The present study is focused on the case of a CO2 molecule inside the small 512 or D and large 51262 or T sI hydrate cages. The CO2@sI clathrate hydrate has been experimentally investigated by solid-state nuclear magnetic resonance (NMR), infrared and Raman spectroscopy, single crystal x-ray, powder x-ray, and neutron diffraction, as well as scattering studies.26–36 The crystal x-ray and powder x-ray experimental studies have reported data on the CO2 orientation in the sI cages, while the analysis of the FTIR spectra at low temperatures has provided information on some vibrational transitions of CO2 isotopes in the sI cages.31–35 From a computational point of view, both electronic structure and molecular dynamics studies have been reported in the literature, providing information on guest–host interactions, energetic and thermodynamic stability, structural, spectroscopic, physical/chemical/mechanical properties, cage multi-occupancy, and phase transitions.37–54
The results from both experimental and theoretical investigations revealed the effect of the size, shape, and composition of the host sI cage on the structure and spectroscopy of the encapsulated guest CO2 molecule, which motivates further research on the quantum dynamics of such systems and, in particular, on the quality of the underlying PES employed. In this sense, we have recently reported both fully coupled quantum nuclear dynamics simulations43,54 within the multi-configurational time-dependent Hartree (MCTDH) framework55–57 and benchmark studies on the performance of first-principles methodologies, such as density functional theory (DFT) approaches, CCSD(T)/CBS, and explicitly correlated DLPNO-CCSD(T)-F12, DLPNO-MP2, or DLPNO-MP2-F12, in describing the underlying guest–host interactions in both finite-size clathrate-like cages and periodic crystals for CO2 clathrate hydrates.47,49–52 In addition, different representations based on a sum-of-potentials approach have also been assessed43,47,54 employing pair interactions between CO2 and H2O, as given by both effective semiempirical potentials58–63 and ab initio MP2 and CCSD(T)-F12 CO2–H2O PESs available in the literature.64–67 Such computations have shown that DFT calculations using the PW86PBE functional with the XDM dispersion correction offer a reasonable alternative to describe the CO2-cage/lattice interactions. Moreover, high sensitivity has been observed in the rotational–translational density of states, obtained from fully coupled quantum MCTDH calculations,43,54 that clearly reflects the impact of the anisotropy, anharmonicity, and relative strength of the interaction and calls for improved model representations of such noncovalent forces, as those presented in this study. Thus, by combining NN ML techniques with reliable first-principles DFT-D approaches, the purpose of this research is to generate both potential energy and dipole moment surface (DMS) models trained on reference data for such guest–host systems and then employ them in quantum simulations for spectral characterization of such nanoconfined species.
II. COMPUTATIONAL METHODS, RESULTS, AND DISCUSSION
A. Cage’s systems, coordinate system, and Hamiltonian operator setup
As in our previous studies on nanoconfined systems,43,54,68–70 we used a set of coordinates to derive the Hamiltonian operator, which includes the position and orientation of the CO2 molecule inside the small (D) or large (T) sI clathrate cages, as shown in Fig. 1.
B. Electronic structure DFT-D calculations, NN-trained ML PES/DMS models, and quality diagnostics
As mentioned above, on the basis of our previous benchmark computations,47,49–52 we have considered the PW86PBE-XDM functional to describe the interaction between the CO2 guest molecule and the D and T sI clathrate-like cages. All electronic structure calculations were carried out using the Quantum Espresso (QE) code.74,75 The DFT calculations were performed for the isolated (empty and CO2-filled) individual sI clathrate hydrate cages at the Γ-point only. A cubic simulation cell of volume 30 × 30 × 30 Å3 was employed, and the Makov–Payne method of electrostatic interaction correction was considered for these aperiodic systems, where the convergence of the energy is determined by the longest-ranged forces, which are usually electrostatic interactions, evaluated in the limit of large supercells.76 We used the standard implementation in the LIBXC library,77 with the dispersion corrections included through the exchange dipole model (XDM).78 The XDM contains three-body intermolecular dispersion contributions, and it has been parameterized for the PW86PBE functional.75,79 Single-point and geometry optimizations using the BFGS quasi-Newton algorithm,74,75 setting the convergence criteria at 0.0014 eV and 0.026 eV/Å in energy and forces components, respectively, were carried out, relaxing only the CO2 atomic positions and keeping fixed the D and T sI cages. The binding and interaction energies are then obtained as energy differences between the CO2-filled and empty sI cage systems and the CO2 molecule, as and , at the geometry-optimized and any selected (single point energy) configurations, respectively.
In this way, we generated datasets with up to 22 750 and 37 989 configurations, dipole moments, and interaction energies for the CO2@D and CO2@T systems, respectively, which were then employed in the training/testing procedure for building up the corresponding NN ML DMS and PES for each case. In Fig. 2, the energy distributions for the whole set of data for each case are shown. The sampling of the configuration space is of central importance in the model training, so we have chosen to sample configurations along internal coordinates by distributing 5 and 7 grid points in each of the x, y, z, and θ, ranging between −0.65/−2.0 and 0.65/2.0 Å for the CO2@D/CO2@T and 0°–90°, respectively, while 13 and 15 points were used in the ϕ coordinate to cover the whole range from 0° to 360°. One can see that most configurations are in the attractive region (negative energies), ensuring a good description of the interaction, although a significant amount of data is also available in the repulsive regime (positive energies). To adapt such grids to the requirements of the following bound states and spectra calculations, we used neural network regression, as included in the TensorFlow package.25 Given the energy distribution of the configurations, a weight function (see Fig. 2) was introduced for a better representation of the lower energy region of the potential surface, with n = 6/12, ΔEmin the binding energies of −1925.51 and −2492.51 cm−1, and ΔEmax the cutoff values in the interaction energies at 5482.45 and 9999.89 cm−1 for the CO2@D and CO2@T cage systems, respectively.
We have run many tests considering different combinations of optimized hyperparameters, such as the number of layers and nodes in a NN and the scaling of input/output data. In this way, we obtained good results using the StandardScaler function,23 the rectified linear unit activation function, and up to 7 hidden layers with a size of 1024 neurons each, up to a total of 7 355 393 training parameters. The best-performing hyperparameter set was selected after up to 15 optimization iterations using the automated routines in the TensorFlow package.25 The quality of the resulting NN ML-DMS and PES models was validated by monitoring regression diagnostics, such as the mean absolute error (MAE), calculated as the average of the absolute errors between the values predicted (Vi) by the model and the actual reference values , , where N is the total number of calculated configurations/dipoles/energies in the training and/or testing data sets. We have employed a k-fold cross-validation scheme to select the best performing NN ML-PES models, and in Figs. S1 and S2, we summarize our results. The total data are randomly split into k = 10 partitions, with their energy distributions (see Fig. S1) being similar, as well as the corresponding MAE values (see Fig. S2) averaged for both training (k − 1) and testing datasets over all iterations for each CO2@D and CO2@T cage, respectively.
In turn, in Fig. 3, correlation plots demonstrate the performance of the NN ML-PESs against the DFT/PW86PBE-XDM energies, together with the corresponding distributions of the MAE values as a function of DFT-D energies. We found that the averaged MAE values for the best-performing NN PES models are 5.64 cm−1 with respect to the entire datasets for the CO2@D case and 46.20 cm−1 for the CO2@T, while for the attractive region of the PES, we can see that the NN ML-PESs show smaller error values less than 4 and 10 cm−1, with an average MAE value of 2.7 and 5.2 cm−1 for the CO2@D and CO2@T cases, respectively.
Next, particular characteristics of the NN ML-PESs for CO2@D and CO2@T were also analyzed. Figure 4 presents representative 2D contour plots in (x, y), (x, z), (y, z), and (θ, ϕ) planes, where one can clearly check the smoothness of the NN ML-PESs in both CO2@D and CO2@T cages (see left-side columns of each panel). We found a global minimum at an energy of −1924.51 and −2558.65 cm−1 for the CO2 in the D and T sI cages, respectively, with the NN ML PESs also showing local minima along the rotational coordinates.
As mentioned, traditionally, only effective pair potentials are available in the literature, which offers a simplified representation of such CO2 guest–host (H2O) interactions.43,47,58–67 Such potential models lack the contribution of explicit non-additive many-body terms, which can significantly affect the energetics and structural properties of these systems.54 By incorporating such nonadditive contributions, the NN ML-PESs provide the first explicit many-body representation of interactions within such cage systems, as described by the DFT/PW86PBE-XDM computations. Therefore, for comparison reasons, in Fig. 4, 2D contour plots of the pairwise PESs (see right-side columns of each panel) based on the sum of two-body ab initio CO2–H2O potentials64 are also shown (see right-side columns of each panel). At first glance, the topology of the additive two-body and explicit many-body NN ML-PESs looks quite similar for both CO2@D and CO2@T cages, although one can observe certain differences between them, such as the well-depth values, especially in the CO2@D case (see left panels), as well as in their anisotropy and anharmonicity, which, as we will clearly see below, influence the pattern and density of states in the quantum dynamics calculations.
In the same vein, NN ML-DMS models, trained on the DFT-D datasets, were also constructed, and Fig. 5 shows 1D cuts of the μX, μY, μZ components of the obtained NN ML-DMS together with the dipole moment values computed directly from the PW86PBE-XDM calculations. One can observe that the corresponding NN ML-DMSs represent very well the PW86PBE-XDM data, with MAE values of 6.9 × 10−4, 4.6 × 10−4, and 7.1 × 10−4 D in the CO2@D case, and 1.7 × 10−3, 1.5 × 10−3, and 2.1 × 10−3 D in the CO2@T case for the μX, μY, and μZ, respectively. We should note the nearly zero value of μX, μY, μZ along the θ and ϕ coordinates in the CO2@D cage, which is expected to affect the spectra intensities in rotational transitions, as we will discuss later on.
C. Quantum MCTDH calculations, rotational–translational states, and spectral simulations
Once we have completed the quantitative analysis of the NN ML PES and DMS, we proceed with their implementation in quantum calculations within the MCTDH framework55–57 in order to investigate the effect of the interaction on translational–rotational (T–R) states and facilitate the corresponding spectroscopic data. The MCTDH package of codes80 was used to solve the multidimensional Schrödinger equation associated with the Hamiltonian operator given in Eq. (1) for CO2 inside the D and T sI cages. Within the MCTDH method, the wavefunction is expanded into a sum of products on a time-dependent basis, namely, single-particle functions (SPFs), and the Hamiltonian matrix elements are expressed in a product structure.57 The POTFIT algorithm81 was employed to transform the NN ML potential and dipole operators into a sum of products of monoparticle operators, namely, natural potentials (NPs). The improved relaxation (IR) and block improved relaxation (BIR) methods,82,83 as implemented in the MCTDH package80 were used to calculate the ground and excited T–R states of the CO2 into the D and T sI cages.
Figure 6 shows the 1D minimum energy profiles of the NN ML-PESs as a function of the translational x, y, z (upper panels) and rotational θ, ϕ (lower panels) coordinates obtained by minimizing the CO2–cage interaction with respect to the corresponding coordinate while keeping the remaining ones at their equilibrium values. In these figures, we also display the calculated n = 0–19 T–R energy levels together with the 1D probability distribution of the ground n = 0 state along each corresponding coordinate.
In Tables S1 and S2, we list the ground state energies, E0, and the excitation energies, δE, for the computed T–R states, selected 2D probability distributions, and the corresponding state assignments for the CO2@D and CO2@T cages, respectively. One can see that the E0 values are −1807.54 and −2427.48 cm−1 for the CO2 inside the D and T cages, with the corresponding zero-point energies (ZPE) estimated at 116.97 and 131.17 cm−1, respectively, which compares to 138.8 and 103.5 cm−1 from the additive pairwise ab initio PES, counting differences of 20 and 30 cm−1, respectively. In Fig. 6, we should also note that the n = 0 1D distribution functions indicate different positions and orientations of the CO2 in the D and T cages. In the CO2@D case, we found that the molecule is located almost at the center of mass of the D cage, with an average off-center shift (⟨x⟩ = −0.0321, ⟨y⟩ = 0.0045, and ⟨z⟩ = 0.0105 bohr) of 0.0341 bohr or 0.0180 Å, and an orientation of ⟨θ⟩ = 62.8°, although we should also note that the close lying in energy, around 7 cm−1, n = 1 state is located in the nearby minima of the PES at θ around 80°, and ⟨ϕ⟩ = 106.7°, while inside the T cage, the CO2 is found at the average off-center position of 0.9986 bohr or 0.5284 Å for ⟨x⟩ = 0.698, ⟨y⟩ = −0.713, and ⟨z⟩ = −0.040 bohr, and ⟨θ⟩ = 84° and ⟨ϕ⟩ = 215.6° orientation. These values are compared with those obtained from the additive pairwise ab initio PES of zero displacement inside the D cage and 1.5849 bohrs or 0.8387 Å in the T cage.54 Such results from the n = 0 ground state analysis correspond to zero temperature and should be considered as the low limit values when comparisons are made with the experimental or theoretical data at high temperatures. In particular, the average crystal CO2@sI structure has been reported by diffraction and spectroscopy experiments, such as crystal, powder x-ray, neutron diffraction, 13C solid-state NMR, and computer simulations, such as molecular dynamics and quantum chemistry.28,30,40,43–45,47,50,54,84–88 Most of the recent studies have predicted average preferred CO2 orientations within the sI cages, such as an off-center position of the CO2 by 0.5755 Å and θ angle values between 83.4° and 75.6° in the T cage and almost at the center of the D cage, oriented along the lines connected opposite pentagons. The predicted off-center shift and orientation of the CO2 molecule inside the D and T sI cages from the present NN ML-PES are the best reported values with respect to the experimental data available.
Regarding the excited T–R states (see Tables S1 and S2), one can see in their assignment the effect of close lying minima in the θ coordinate, with the rotational, vθ, ϕ, and translational, vx+y, fundamentals at 25.05 and 36.71 cm−1, respectively, in the CO2@D case, while the translational, vx−y, fundamental of the CO2 in the T cage was found at 26.82 cm−1, and next the coupled vz,ϕ state at 34.18 cm−1. Such fundamental mode frequencies are comparable to those reported from the additive pairwise ab initio PES54 in the CO2@D case, while for the CO2@T, we found that the assignment of the T–R states is distinct between the two PESs, and the T–R mode couplings are significant even at low energies in the NN ML PES. One can also observe that in the CO2@D cage, the density of states is higher from the NN ML PES than those with the additive pairwise ab initio PES.54
Next, in Fig. 7, the FIR spectra for the CO2 inside the D (see upper panel) and T (see lower panel) sI cages obtained from the present MCTDH calculations using the NN ML-PES are shown, together with the stick spectra of energy states listed in Tables S1 and S2, and the assignment of the spectral peaks. The 2D probability density distributions of each state show the main character of its excitation. Therefore, in the CO2@D case, we found that the n = 1 and 3 states show only rotational excitations with very low intensities as they depend on dipole moment operators [see Fig. 5 and Eq. (2)]. The n = 4 and 5 correspond to the translational fundamental in the x + y mode, while the coupling between translations and rotations starts at the n = 6 state. One can clearly observe that the high intensity peaks in the spectrum correspond to the n = 4, 5, 6, 7, 9, 10, 11, 14, 16, and 19 states (shown with magenta color vertical lines), with all of them assigned mainly to translational excitations (see in the upper panel of Fig. 7 the corresponding 2D probability density distributions).
In turn, the CO2@T spectrum peaks at energies of the n = 1, 2, 3, 4, 5, 12/13, 14/15, and 16/17 states, although low intensity peaks also appear at n = 7–11 states. One can see that all of them correspond mainly to translational excitations, with the n = 1, 4, and 9 being progressions of the x − y mode, and the n = 3 and 12 of the x + y, while the n = 2 and 8 of the coupled T–R z, ϕ one. The higher energy n = 14–18 states were found to correspond to translational x + y, x + z, and y + z coupled modes.
For comparison reasons, in Fig. 7, we also displayed the stick energy spectra reported54 using the additive two-body ab initio PES (see green color vertical lines). As can be seen, the energy patterns are quite distinct, especially for the CO2@D cage, indicating the importance of many-body contributions in the potential interactions.
III. CONCLUSIONS
The motivation for our present research arises from the need to accurately model the encapsulation of gas molecules, such as greenhouse CO2, in hydrostructures. The balance of guest–host interactions is essential for comprehending the formation, stability, and growth of such crystal structures, with potential applications in gas control technologies for designing CO2 capture and storage materials.
In this context, we have presented a first attempt to develop an explicit many-body description of the intermolecular guest–host interactions in the CO2@sI clathrate hydrate cages by combining NN ML techniques with high-quality training information from reliable first-principles methodologies. Thus, we have performed extensive DFT-D electronic structure calculations by sampling representative and diverse sets of configurations of the CO2 molecule inside each sI cage. We have chosen the PW86PBE functional approach with the XDM dispersion correction scheme on the basis of systematic benchmark studies reported previously on both finite-size and periodic crystal CO2@ clathrate-hydrate structures. Following current advances in the field of machine learning, we have built up new potential and dipole moment surface representations through proper and automated NN machine-learned modeling of the underlying noncovalent guest–host interactions. The best-performing PES and DMS models have been identified and validated against reference PW86PBE-XDM data by monitoring error diagnostics.
Hence, with the objective of exploring the effects of nanoconfinement on the quantized intermolecular dynamics and spectral features of the guest CO2 molecule, quantum MCTDH simulations have been carried out. The CO2@D and CO2@T systems were modeled by means of a 5D Hamiltonian describing the translational and rotational motions of the linear CO2 in the D and T sI cages, considering all couplings and anisotropies through the exact kinetic energy operator and the NN ML potentials, while the corresponding NN ML-DMS were also employed for calculating spectra intensities. We have reported energies and wavefunction distributions for the 20 lower T–R bound states for the CO2 in each sI cage. By comparing with previously reported ZPE and excited bound state energies obtained employing additive pairwise potential models, the present results reveal the influence on energetics and spectra of the explicit nonadditive many-body contributions in the potential representation. We have found distinct energy and coupling mode patterns, providing valuable insights into their spectral characterization. In particular, our results confirm the off-center location and preferential orientation of a CO2 molecule in the T sI cage, in quantitative accord with experimental x-ray diffraction data, while in the D cage, the NN ML-PES predicts a higher density of states than the additive pairwise PES.
Our findings highlight the importance of explicitly incorporating higher-order many-body effects through reliable first-principles computations in the spectroscopic characterization of such guest–host systems. The availability of such ML models will enable further investigations to explore formation processes and phase transitions, aiming to identify multi-occupancy capacity and stability conditions, which are crucial in assessing the viability of CO2 capture and storage in these hydrostructured media.
SUPPLEMENTARY MATERIAL
See the supplementary material for the k-fold cross-validation scheme and corresponding MAE values for the NN ML-PESs, as well as the T–R energy levels of the CO2@D and CO2@T sI clathrate cages.
ACKNOWLEDGMENTS
The authors thank the “Centro de Calculo del IFF, SGAI (CSIC),” and the CESGA-Supercomputing Center for the allocation of computer time. This work has been supported by MCIN Grant No. PID2020-114654GB-I00, the Universidad Nacional de Colombia Hermes code: 57465, and COST Actions CA18212 (MD-GAS) and CA21101 (COSY).
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Álvaro Valdés: Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Resources (equal); Software (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Rita Prosmiti: Conceptualization (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Project administration (equal); Resources (equal); Supervision (equal); Validation (equal); Writing – original draft (equal); Writing – review & editing (equal).
DATA AVAILABILITY
The data that support the findings of this study are available within the article and its supplementary material.