Despite the significance of gas hydrates in diverse areas, a quantitative knowledge of hydrate formation at a molecular level is missing. The impediment to acquiring this understanding is primarily attributed to the stochastic nature and ultra-fine scales of nucleation events, posing a great challenge for both experiment and simulation to explore hydrate nucleation. Here we employ advanced molecular simulation methods, including forward flux sampling (FFS), *p*_{B} histogram analysis, and backward flux sampling, to overcome the limit of direct molecular simulation for exploring both the free energy landscape and molecular pathways of hydrate nucleation. First we test the half-cage order parameter (H-COP) which we developed for driving FFS, through conducting the *p*_{B} histogram analysis. Our results indeed show that H-COP describes well the reaction coordinates of hydrate nucleation. Through the verified order parameter, we then directly compute the free energy landscape for hydrate nucleation by combining both forward and backward flux sampling. The calculated stationary distribution density, which is obtained independently of nucleation theory, is found to fit well against the classical nucleation theory (CNT). Subsequent analysis of the obtained large ensemble of hydrate nucleation trajectories show that although on average, hydrate formation is facilitated by a two-step like mechanism involving a gradual transition from an amorphous to a crystalline structure, there also exist nucleation pathways where hydrate crystallizes directly, without going through the amorphous stage. The CNT-like free energy profile and the structural diversity suggest the existence of multiple active transition pathways for hydrate nucleation, and possibly also imply the near degeneracy in their free energy profiles among different pathways. Our results thus bring a new perspective to the long standing question of how hydrates crystallize.

## I. INTRODUCTION

Gas hydrates are multicomponent crystals, composed of gas molecules (e.g., CO_{2}, CH_{4}) and water molecules. These crystals typically form under high pressure and low temperature, which often find their formation in ocean sediments and permafrost, and inside oil and gas pipelines. Although gas hydrates are much less commonly encountered than ice on a daily basis, these compounds play substantial roles in energy recovery,^{1} flow assurance,^{2} gas storage/transportation,^{3,4} global climate change,^{5} and CO_{2} sequestration.^{6–8}

Gas hydrates are similar to ice in many aspects: both appear “ice-like,” and both form in aqueous environment as a result of ordering of water molecules. However the nucleation of gas hydrates has been generally regarded as a more complex process^{9–14} than ice nucleation. As hydrate is a nonstoichiometric, multicomponent system, the nucleation process was often found to involve multiple steps. In particular, molecular simulations^{12,13,15,16} showed that the initial hydrate nucleus is often composed of disordered polyhedral cages, and does not resemble those crystalline phases. Studies^{17,18} further showed that the initial non-crystalline hydrate nucleus can slowly transform into the crystalline phase as it grows. A two-step nucleation mechanism^{11–13} was then proposed to rationalize the observed nucleation behaviors. Such multistep nucleation pathways of hydrate resemble the nucleation behaviors of minerals,^{19} proteins,^{20} and colloids^{21} that were identified to proceed in a non-classical fashion. This is in contrast to the nucleation of ice, which has been found well described by the classical nucleation theory (CNT).^{22–24} It is thus fascinating that two important and comparable solids crystallize via drastically different molecular pathways.

Although general consensus has been reached on the non-crystalline structure of the hydrate nucleus upon formation, a few recent studies^{18,25–27} also reported the existence of crystalline ordering in hydrate nucleation, albeit carrying much less statistical weight than those non-crystalline pathways. These findings thus provide an alternative to the two-step nucleation mechanism, and also call into question whether the formation of an amorphous nucleus is a necessary step for hydrate formation. However, it is unclear whether the observed structural diversity is robust against the thermostat employed in the studies.^{26,27} It also remains to be explored whether the co-existence of multiple pathways can be generalized under a broad range of conditions and/or with different types of guests. In particular, in view of the non-classical nature of hydrate nucleation, the co-existence of multiple nucleation pathways also raises an interesting question regarding the free energy landscape of hydrate nucleation. As a non-classical, multistep nucleation is often featured by multiple barriers separated by saddle points,^{28} it is of interest to examine free energy profile of hydrate nucleation, particularly if both indirect and direct nucleation pathways co-exist.

The complex nature of hydrate nucleation has attracted many theoretical efforts, mainly through molecular simulations, aiming at elucidating the key mechanisms governing the nucleation process. The field has evolved rapidly from obtaining the molecular details of hydrate nucleation on the basis of unbiased trajectories obtained in direct simulation under high driving force,^{12,13,16,17,27,29–34} to gaining the statistical insight of hydrate nucleation under more realistic conditions through advanced sampling methods such as forward flux sampling (FFS),^{18} well-tempered metadynamics,^{35} equilibrium path sampling,^{36} and generalized replica exchange method.^{37} In our previous study,^{18} we have successfully applied the FFS method^{38} to study methane hydrate nucleation under a thermodynamic condition where spontaneous nucleation becomes too slow to occur in direct molecular simulation. The FFS method also allows obtaining ensemble of hydrate nucleation trajectories which can be used to validate nucleation mechanism with statistical significance.

The application of advanced sampling methods such as FFS on nucleation study often requires the definition of order parameter(s) or collective variable(s) which effectively drives and characterizes the reaction pathway. Order parameters for hydrate nucleation have been defined based on guest species,^{39–41} host molecules (water),^{18,41,42} or both guests and hosts.^{43} It is noted that the order parameter is distinguished from the actual reaction coordinates, as the latter specifically define the pathway. In fact one of the advantages of FFS lies in its exploratory nature, acquiring knowledge of nucleation relying on natural dynamics of the system. To exploit this advantage, an appropriate order parameter should be chosen such that it should be effective but not constrictive on how nucleation occurs. In particular, a good order parameter should follow reaction coordinates closely.

Although the exact reaction coordinates are usually unknown in advance, one can quantify the error of order parameter through conducting *p*_{B} histogram analysis,^{44} which provides the distribution of committor *p*_{B} at the transition state. Using this approach, Barnes *et al.*^{36} tested the quality of their developed Mutually Coordinated Guest (MCG-1) order parameter.^{43} With a relatively small standard deviation *σ* of 0.22, the test concluded that the MCG-1 is a valid order parameter for studying methane hydrate nucleation.^{43} In a recent study,^{18} we developed a half-cage order parameter (H-COP) based on the topological analysis of host molecules only. Since the H-COP order parameter was constructed on a different basis, it is necessary to conduct a comparative study in a methane hydrate system to understand its effectiveness, because a meaningful order parameter is required for understanding the key nucleation mechanism.

In this study, we try to address the fundamental questions on both the free energy profile and molecular pathways of hydrate nucleation. In answering these questions, we first test the quality of the H-COP order parameter by employing the *p*_{B} histogram analysis. The results confirm that the H-COP order parameter describes the reaction pathways nearly equally well as the MCG-1 order parameter in the methane hydrate system. We then explore the free energy profile for nucleation in a different hydrate system containing a large and soluble guest, by computing the stationary distribution density through combining both the forward flux sampling and backward flux sampling (BFS).^{45} The independently obtained free energy profile was found to fit the classical nucleation theory well, which appears rather surprising given the non-classical nature of hydrate nucleation, but in fact is consistent with the recent study in methane hydrate.^{36} For both hydrate systems, the subsequent structural analysis of the obtained ensemble of hydrate nucleation trajectories indeed shows that on average, hydrate nucleation proceeds with a gradual transition from an amorphous to a crystalline structure, consistent with previous findings. Importantly, the analysis also provides robust evidence of diversity for hydrate nucleation behaviors. In particular, it is found that in both hydrate systems, there also exist nucleation pathways where hydrate forms directly into the crystalline phase, without necessarily going through the amorphous stage. The diversity suggests the existence of multiple active hydrate nucleation channels, which, along with the obtained CNT-like free energy curves, allows us to hypothesize that hydrate nucleation is an entropic process with multiple competing pathways with close free energy profiles.

The rest of the paper is organized as follows. In Sec. II, we briefly review the FFS technique and the H-COP order parameter. We then provide the details for computing the stationary distribution density through combining FFS and BFS. In Sec. III, we report our *p*_{B} histogram analysis of the order parameter and the computed free energy profile for hydrate nucleation. Finally the molecular details of hydrate formation are analyzed and discussed, followed by a brief discussion on the hypothesized co-existence of multiple active nucleation pathways in hydrates.

## II. METHODS

### A. Forward flux sampling method

The FFS method efficiently samples the transition paths from basin *A* to basin *B* which are separated by a free energy barrier, and allows computing the corresponding rate constant *R _{AB}* directly. The main idea of FFS is to divide the transition path into consecutive segments through interfaces defined by the order parameter(s)

*λ*.

^{38}The rate constant

*R*is then obtained by the effective positive flux expression:

_{AB}^{46}$RAB=\Phi \u0307\lambda 0P(\lambda B|\lambda 0)$, where $\Phi \u0307\lambda 0$ is the effective flux rate reaching the first interface

*λ*

_{0}from the basin

*A*, and

*P*(

*λ*|

_{B}*λ*

_{0}) is the conditional growth probability for a transition starting from interface

*λ*

_{0}and eventually reaching basin

*B*. The sampled interfaces facilitate the computation of the typically very small growth probability $P(\lambda B|\lambda 0)=\u220fi=1nP(\lambda i|\lambda i\u22121)$, through the calculated individual crossing probability

*P*(

*λ*|

_{i}*λ*

_{i−1}) between interfaces

*λ*

_{i−1}and

*λ*. Although the computation of

_{i}*P*(

*λ*|

_{B}*λ*

_{0}) usually constitutes most of the overall cost of FFS, it was noted

^{18}that the sufficient sampling of the initial flux rate $\Phi \u0307\lambda 0$ is crucial for the overall convergence of rate constant, particularly for a heterogeneous system where the parameter space in basin

*A*is large. Employing this approach, we have successfully studied the nucleation of semiconductors,

^{47,48}ice,

^{23,24,49,50}and gas hydrate.

^{18}More details of nucleation rate calculation can be found in Refs. 18 and 48.

### B. Order parameter H-COP

The interfaces in FFS are defined by the order parameter *λ*, which is usually chosen to be the size of the largest crystalline cluster when studying crystallization of a solid. For ice nucleation, *λ* was commonly defined based on the number of “ice-like” water molecules that can be characterized using the Steinhardt bond order.^{23,51,52} For hydrate nucleation, it becomes less transparent of what the size of hydrate cluster refers to, because hydrate is a multicomponent, nonstoichiometric system. In our recent work,^{18} we have developed the H-COP order parameter on the basis of topological hierarchy of the hydrate structure (see Fig. 1). A computational protocol was developed to recognize a hydrate cluster through identifying the water ring structure, half-polyhedral cages, and their connectivity, on the basis of topological analysis of the tetrahedral network. The size of a hydrate cluster is defined as the total number of water molecules contained in the largest hydrate cluster. The H-COP capitalizes the structural signature of hydrate phase–polyhedral cages, and is capable of efficiently distinguishing hydrate from ice and liquid water while allowing the formation of different phases of the hydrate.

### C. Obtaining free energy profile from forward flux sampling and backward flux sampling

The free energy profile along a sequence of the order parameter *q* can be obtained through Δ*G*(*q*) = − *k _{B}T*ln

*ρ*(

_{n}*q*), where

*ρ*(

_{n}*q*) is the

*normalized*probability distribution density of observing the order parameter

*q*. The FFS method allows examining the free energy profile through obtaining directly the unnormalized probability distribution density

*ρ*(

*q*) =

*Cρ*(

_{n}*q*),

^{45}where

*C*is the normalizing constant, via

Here *ψ _{A}*(

*q*) and

*ψ*(

_{B}*q*) are the contributions to the probability density from those trajectories coming from basin

*A*and

*B*, respectively. In molecular dynamics, both

*ψ*(

_{A}*q*) and

*ψ*(

_{B}*q*) can be measured by the time average of the frequency where the order parameter

*q*is visited. Specifically,

Here *p _{A}* is the probability for the system to be in basin

*A*, and can be obtained through

where *R _{BA}* is the backward rate constant for transition from

*B*to

*A*.

*τ*

_{+}(

*q*;

*λ*

_{0}) is the total average time spent at

*q*for a trajectory coming from interface

*λ*

_{0}, and the + sign indicates the contribution from the forward flux, i.e., from

*A*to

*B*. Under the framework of FFS,

*τ*

_{+}(

*q*;

*λ*

_{0}) can be calculated by

where *π*_{+}(*q*; *λ _{j}*) is the average time that a

*trial run*spends at

*q*starting from the interface

*λ*. In FFS, a trial run starting from interface

_{j}*λ*is terminated when either reaching the next adjacent interface

_{i}*λ*

_{i+1}or returning to basin

*A*. Note that

*τ*

_{+}(

*q*;

*λ*

_{0}) is distinguished from

*π*

_{+}(

*q*;

*λ*

_{0}) as the former involves the contributions not only from the trial shootings that visit

*q*directly (i.e.,

*π*

_{+}(

*q*;

*λ*

_{0})) but also from those that visit

*q*indirectly, i.e., the trajectories that do not visit

*q*explicitly during trial runs. The indirect contribution can be evaluated with the assistance of the sampled interfaces

*λ*, and it is re-weighted according to the transition probability

_{j}*P*(

*λ*

_{j+1}|

*λ*). When

_{j}*q*is far away from

*λ*

_{0},

*τ*

_{+}(

*q*;

*λ*

_{0}) is mostly measured by the indirect contributions, particularly from those trajectories fired from interfaces near

*q*.

The contribution *ψ _{B}* can be calculated in a similar fashion, by conducting the backward flux sampling (BFS) from basin

*B*to

*A*,

where *p _{B}* = 1 −

*p*, $\Phi \u0307\lambda n$ is the flux rate reaching the interface

_{A}*λ*from the basin

_{n}*B*,

*τ*

_{−}(

*q*;

*λ*) is total average time that a backward trajectory starting from

_{n}*λ*spends at

_{n}*q*,

The BFS can be carried out by reversing the direction of FFS. However two points should be mentioned for implementing BFS. First, although the BFS can be certainly conducted based on the same set of interfaces *λ _{i}* used in the FFS, Eq. (6) does not require that this has to be the case. In fact we find sometimes it is more efficient to employ a slightly different set of interfaces in BFS. This is because a high forward transition probability

*P*(

*λ*|

_{i}*λ*

_{i−1}) usually implies a low backward transition probability

*P*(

*λ*

_{i−1}|

*λ*). When the forward flux has well crossed the barrier, the forward transition probability becomes close to one, therefore in practice the interfaces near basin

_{i}*B*are often placed well separated from each other. Thus the

*λ*employed in the FFS would yield a very low backward transition probability near basin

_{i}*B*. To circumvent this problem, additional interfaces can be placed in between to facilitate the backward crossing.

Second, because basin *B* usually corresponds to the thermodynamically stable state, e.g., solid, the reverse transition (melting) needs to overcome a much higher free energy barrier to return to basin *A*. Therefore it becomes very difficult and unnecessary to conduct BFS by starting from the fully crystallized solid. In fact the melting of crystal may follow a quite different pathway from the freezing of liquid due to the nature of first-order phase transition. Instead a virtual reflecting wall *B*′ can be placed well beyond the critical interface to replace the true basin *B*.^{45} Trial runs that lead to the crossing the virtual wall *B*′ will be rejected.

For the nucleation of a solid cluster containing *N* particles from liquid, the free energy landscape Δ*G* can be obtained by the cluster size distribution *p*(*N*) of all the solid clusters. The distribution *p*(*N*) may be reliably sampled in direct simulation for very small clusters, i.e., in the immediate vicinity to basin *A*, but has poor statistics for large *N*, as the occurrence of large solid clusters becomes exponentially rare. Fortunately, for large *N*, the distribution *p*(*N*) can be well approximated by the distribution of the largest cluster *p*(*λ*), i.e., *p*(*N*) ≈ *p*(*λ*). Therefore by choosing *q* = *λ* in Eqs. (1)–(6), we are readily to compute the free energy profile away from the basin *A*,

where *c* = *k _{B}T*ln

*C*.

### D. Simulation details

Our molecular dynamics simulations were carried out employing the monatomic water model mW.^{53} The water-guest interactions are represented by the two-body interaction of the mW model,^{31,54} which can be tuned to mimic a range of guest molecules with different size and solubility. The current study involves two types of guest molecules: the medium (M) and the large (L) guests.^{31} The M model emulates the methane molecule, yielding a very low solubility in water: 0.0038 M faction at 178 atm and 313 K,^{54} while the L model is miscible in water, exhibiting many properties comparable to oxetane.^{31} Both M and L guests form stable hydrate clathrates, with the sI phase being the thermodynamically stable phase under the pressure (500 bars)^{31} in this study. The simulation cell for the M model contains 6912 water molecules and 1280 guest molecules, with a equilibrated dimension of 104 × 52 × 52 Å^{3} at 220 K and 500 bars. Because of the low solubility of M, the water and guest molecules are phase separated by two interfaces with a separation distance about 72 Å due to the employed periodic boundary condition. In comparison, the simulation cell for the L guest contains 7737 water molecules and 455 guests, yielding a guest-water ratio of 1/17, which is consistent with that used in the previous study by Jacobson *et al.*^{31} Because L guest is miscible in water, the simulation cell is homogeneous, with a dimension about 63 × 63 × 63 Å^{3} at 270 K and 500 bars. The studied thermodynamic conditions (220 K for M, and 270 K for L) correspond to about 72% and 83% of the equilibrium melting points of the M and L guests at 500 bars, respectively, and these are both 10 K above the temperatures where spontaneous nucleation becomes accessible in direct molecular dynamics simulations.^{12,31} The isobaric-isothermal canonical ensemble is employed with the relaxation time of 1 and 15 ps for controlling the temperature and pressure, respectively. To further distinguish different phases of the hydrate, we employ the modified vertex order parameter *ν _{jklm}*,

^{18,41}where

*j*,

*k*,

*l*, and

*m*refer to the number of 5

^{12}, 5

^{12}6

^{2}, 5

^{12}6

^{3}, and 5

^{12}6

^{4}cages surrounding a water vertex of the polyhedral cage to be identified, respectively. The crystalline phase sI and sII have well-defined patterns of

*ν*, thus allowing probing the local structure of the hydrate nucleus.

_{jklm}## III. RESULTS AND DISCUSSIONS

### A. *p*_{B} histogram analysis of the H-COP order parameter

To understand whether the developed order parameter H-COP is a good approximation to the actual reaction coordinate, we first test the quality of the order parameter by conducting the committor distribution analysis, i.e., *p*_{B} histogram analysis. The committor *p*_{B}, which is defined as the probability committed to the final state *B*, can be conveniently obtained through FFS as a function *λ _{i}*, by $pB(\lambda i)=\u220fin\u22121P(\lambda j+1|\lambda j)$. The critical size

*λ*

^{∗}can then be defined as the

*λ*that yields

*p*

_{B}= 0.5. If the order parameter indeed describes the actual reaction pathway well, then

*p*

_{B}for the configurations collected at the critical interface should be normally distributed. This provides a post-justification for the quality of the chosen order parameter.

In our previous study,^{18} the nucleation rate of the M-hydrate at 220 K and 500 bars was computed based on the developed order parameter H-COP using FFS. The calculated committor *p*_{B} yields an estimate of the critical size *λ*^{∗} ≈ 197 ± 5, as shown in Fig. 2(a). We carried out the *p*_{B} histogram analysis at the interface *λ* = 200: A total of 127 configurations were selected, each receiving 20 trial shootings with Boltzmann distributed velocities. Each trial run is terminated when the trajectory reaches either basin *A* or *B*.

As shown in Fig. 2(b), the calculated distribution is peaked at *μ* = 0.502, validating the estimated critical size *λ*^{∗}. More importantly, the overall distribution agrees well with the intrinsic Gaussian distribution with the calculated standard deviation of *σ* = 0.247, which is comparable to those obtained in other complex systems, e.g., Refs. 55–57. This indicates that the developed order parameter H-COP indeed describes the reaction coordinates reasonably well.

It is worth noting that a recent study by Barnes *et al.*^{43} has developed the Mutually Coordinated Guest (MCG) order parameter by incorporating the ordering of both the guest and host molecules. The authors have also reported a committor analysis of the MCG order parameter which yielded a *μ* = 0.50, *σ* = 0.22.^{36} This is comparable to the obtained result in the current study. It is thus interesting to understand that both sets of order parameters describe the nucleation pathway of hydrate well, although that the MCG order parameter enforces geometric constraints of both guests and waters, while H-COP only concerns the water structure. The fact that both order parameters have been shown effective may imply their approximate equivalence for describing hydrate nucleation. This may not be surprising, considering that the nucleation and growth of the hydrate are initiated and facilitated by guest molecules, therefore the formation of the hydrate skeleton must involve the ordering of guest molecules. Furthermore, since neither order parameter places constraints on how the unit structures assemble (guest-ring in MCG and half-cage in H-COP), both allow the occurrence of interface/amorphous-like hydrate structures in hydrate formation pathway, which was indeed found to play a key role in hydrate nucleation.

Similarities aside, however, there exists some difference between the two sets of order parameters. Since H-COP does not impose explicit constraints on guest molecules, it is capable of detecting the formation of empty water cages, which were indeed found in simulations^{18,31,37,59} where the mW model was employed. Although the fast dynamics of the mW model may contribute to the occurrence of the empty cages, they were also observed in simulations where the more realistic atomistic water models were used.^{27,60–63} For large guests that do not fit in a small cage, as in L-hydrate, the formation of empty cages is not unexpected. In addition, since H-COP does not restrict the ordering of guests, it also allows identifying the formation of water cages containing multiple guests.

### B. Free energy profile of hydrate nucleation

Equipped with the verified order parameter H-COP, we are now able to examine the free energy profile for hydrate nucleation through a meaningful sequence of ordering. Fig. 3(a) shows the calculated free energy profile as a function of *λ* for the nucleation of L-hydrate at 270 K and 500 bars. Since the free energy profile was computed through the FFS/BFS method without subject to any theory of nucleation, it is then of interest to understand how the independently obtained free energy landscape compares to the classical nucleation theory (CNT).

In CNT, the free energy barrier is assumed to be a sole function of the size of the new phase,^{64} i.e.,

where Δ*μ* is the chemical potential difference between the new phase and parent phase, *γ* is the surface free energy, and *V*, *S*, *r* are the volume, surface area, and radius of the new phase, respectively. If one further assumes Δ*μ* and *γ* are size independent, and the shape of the new phase remains unchanged during the nucleation process, it is easy to show that Eq. (8) can be expressed as a single function of the number of molecules in the nucleus *N*,

where *a* and *b* are constants, which depend on Δ*μ*, *γ*, and the geometry of the nucleus. Comparing Eq. (9) with Eq. (7), one obtains

We thus fit the calculated distribution density profile ln*ρ*(*λ*) against CNT according Eq. (10). As shown in Fig. 3(a), the CNT is found to describe the free energy landscape of hydrate nucleation reasonably well overall, with the noticeable differences, particularly at the small sizes. The deviation from the CNT is not surprising given the non-classical nature of hydrate nucleation. As discussed in Section C, the small hydrate clusters are usually found to bear no obvious structural resemblance to the final product: Polyhedral cages of different types are often irregularly packed and transform into each other at the initial stage of nucleation. As a result, the shape of a hydrate cluster was found to fluctuate significantly, as shown in Fig. 4. Here it is noted that although the original CNT assumes the sphericity of the new phase, Eq. (10) does not depend on this assumption. Instead, it holds as long as the shape of the nucleus stays unchanged, i.e., *S*/*V*^{2/3} is a constant. As nucleation proceeds, a more ordered structure emerges at the nucleus’ surface and the nucleus becomes more compact.

It is also worth pointing out that the free energy profile for hydrate nucleation has been previously obtained in methane hydrate using the equilibrium path sampling.^{36} Interestingly, the calculated free energy curve was also found to reasonably fit the CNT profile. In addition, the free energy curves calculated in two different hydrate systems exhibit similar shapes, being very steep before the maximum and rather flat afterward. The asymmetry near the top of the barrier leads to a less accurate estimate of the critical size by only locating the maximum on the free energy curves, if one does not take the statistical uncertainty into account. Instead, the more reliable estimate of the critical size should be obtained through the calculated *p*_{B} (see Fig. 2(a)), which yields *λ*^{∗} ≈ 360 ± 10 for the L-hydrate. The resemblance of the free energy curves obtained in two different systems based on different order parameters may well suggest that this can apply to a general hydrate nucleation. It also suggests that the “two-step” nucleation of hydrate is at variance from that of minerals and proteins, in particular for the absence of multiple nucleation barriers.

For comparison, it is also useful to examine the free energy landscape for a known classical nucleation through the FFS/BFS approach. Our previous studies have shown that the nucleation of ice, both homogeneously from bulk supercooled water^{23,49} and heterogeneously on a graphitic surface,^{24} can be well described by the CNT on the variation of nucleation rates with temperature, when the mW water model is employed. In particular, the size of the ice nucleus was also demonstrated to be an effective order parameter,^{24} in line with the key assumption of CNT. We thus chose the homogeneous ice nucleation as the case study and computed the probability distribution density profile for homogeneous ice nucleation at 220 K. As shown in Fig. 3(b), the best fit to Eq. (10) yields a perfect agreement with the CNT profile. This reinforces the conclusion for the classical nature of ice nucleation.

### C. Molecular details of hydrate nucleation

Although CNT is found to provide a reasonable description of nucleation landscape, it employs the assumption that the nucleus bears the same structure of the thermodynamically stable phase. This assumption has been often proved inaccurate by many computer simulations of nucleation for a variety of materials, including gas hydrate. With the obtained large ensemble of hydrate nucleation pathways through FFS, we are able to obtain details of hydrate nucleation at the molecular level and to analyze the statistics of structural evolution.

Fig. 4 shows a typical nucleation trajectory from the obtained ensemble. In general, the fluctuation nature of nucleation makes it challenging to trace exactly the dynamics of each cage. To better illustrate the formation and agglomerate of different types of hydrate cages, we chose three tracing guests, which are marked with different colors and highlighted in Fig. 4. The nucleation of L-hydrate starts with the formation of a non-crystalline cage 5^{12}6^{3}, enclosing the first tracing guest (shown as the silver ball, Fig. 4(a)). The neighboring tracing guests (pink and purple) are also found “locked” by incomplete cages. The non-crystalline 5^{12}6^{3} cage soon transforms into the large, crystalline cage 5^{12}6^{4}, by addition of a pair of water molecules (Fig. 4(b)).

With the three tracing guests locked in place, the neighboring cages start emerging. As shown in Figs. 4(d) and 4(e), a three-way structure composed of four small, empty 5^{12} cages forms, and appears to serve a structural motif for the formation of surrounding polyhedral cages. It is noted that at this stage, the 5^{12}6^{4} cage containing the first tracing guest breaks and releases the guest molecule which gradually diffuses away from the hydrate nucleus. The other two tracing guests (pink and purple) are instead stabilized by their enclosing cages (Fig. 4(h)). A steady core structure of the hydrate nucleus is already formed at this stage, and the hydrate continues to expand by growing new cages based on the template of the core cages. The growing hydrate nucleus does not appear to have a long range crystalline order yet, although the crystalline sI and sII motifs are evident in different parts of the hydrate nucleus.

The molecular details reveal some interesting features of hydrate nucleation. First, kinetics plays an important role for hydrate formation, particularly at the early stage. Fluctuation constantly nucleates and dissolves polyhedral cages. The cages that form at the early stage of nucleation are important for inducing the formation of surrounding cages, but themselves may not necessarily participate in the growth of the hydrate nucleus in the later stage. The arrangement of cages at the early stage can certainly be different from those in the perfect crystalline structure, and is largely affected by the configurations of guest molecules. Second, the transformation of cages among different types are observed to occur frequently in the obtained crystallization trajectories. For example, the occurrence of two cage transformations on the tracing guests is evident in Fig. 4, i.e., 5^{12}6^{3} → 5^{12}6^{4} around the silver guest (Figs. 4(a) and 4(b)), and 5^{12}6^{3} → 5^{12}6^{2} around the purple guest (Figs. 4(h) and 4(i)). In fact our preliminary analysis of the obtained ensemble of nucleation trajectories has identified cage transformations among all types of cages. This is in line with the previous study^{25} that shows cage transformations occur via water pair addition/removal and rotation. The cage transformation, observed to occur mostly at the exterior of the hydrate nucleus for the ease of rearranging water molecules without involving structural change of other completed cages, contribute to the overall structural evolution of the hydrate nucleus.

### D. Co-existence of multiple nucleation pathways

With the ensemble of nucleation trajectories obtained from the FFS simulations, we can assess the variation of crystallinity along the nucleation pathway. Fig. 5(a) shows the mean fraction of each phase as a function of the order parameter *λ*. Consistent with our previous study in M-hydrate,^{18} the overall crystallinity of the L-hydrate nucleus is found to increase steadily with the size of the nucleus. It thus provides another support to the “two-step” nucleation mechanism^{11–13} of hydrate that suggests hydrate nucleation may proceed with the formation of an amorphous nucleus first, followed by the transformation into the crystalline structure. A typical crystallization trajectory of this kind is shown in Fig. 5(c).

Interestingly, the analysis of the ensemble also clearly identifies the structural diversity of hydrate nucleation pathways. While the majority of nucleation trajectories are found to begin with the amorphous core, we do identify a few pathways that lead to the direct formation of a crystalline structure, bypassing the amorphous stage. Fig. 6 shows a representative of such nucleation pathways. It is found that upon nucleation, the cages in the core of the hydrate nucleus immediately adopt the sII crystalline structure motif. As crystallization proceeds, more cages form following the sII template, and grow into a visually ordered structure. The analysis of the water molecules contained in each phase (Fig. 6(a)) also shows that this nucleation trajectory is indeed distinguished from the average pathways as shown in Fig. 5(c).

The structural diversity can also be identified by the calculated distribution of the crystallinity *f _{x}*, which is defined as the fraction of crystalline phases in the hydrate nucleus, from all the hydrate nuclei collected at different sizes

*λ*in our FFS simulations. As shown in Fig. 5(b), the early stage of hydrate nucleation proceeds with the formation of nearly pure amorphous and crystalline cores, with the crystalline outnumbered by amorphous, which yields an asymmetric bimodal distribution of crystallinity. As the hydrate nucleus grows, the high crystallinity side of the distribution is found to gradually smear out, enriching the structural diversity of nucleation pathways. Near the critical size (∼360), the distribution of

*f*is featured by a sharp peak at low crystallinity and a flat peak at high crystallinity. Upon further growth, the bimodal distribution of

_{x}*f*is found to gradually transform into a nearly unimodal distribution centered around 0.7, which leads to the increase of the overall crystallinity of nucleus, as attested in Fig. 5(a). Clearly, the analysis indicates the existence of different nucleation channels of hydrate formation, even at the very early stage of nucleation.

_{x}The structural diversity has also been found in our previous study of M-hydrate nucleation,^{18} and a few other studies.^{25–27} It thus suggests that the existence of multiple pathways can be a generic scene for hydrate nucleation under a wide range of conditions. Although further detailed studies are clearly needed to validate and rationalize it, we conjecture that the multiple pathways originate from the complex crystalline structures, and the close stability of different polymorphs of hydrate. In particular, under the studied condition, gas hydrates exhibit a thermodynamically stable phase sI and a metastable phase sII which are distinguished from each other only by a small chemical potential difference.^{31} This may lead to a virtual degeneracy in the free energy profiles among the different pathways for cage ordering and packing, particularly when the size of the hydrate nucleus is small. Indeed, as FFS usually capitalizes the fastest nucleation pathways, and if there had existed a pathway which is much more favored than the rest, then those much slower pathways would have disappeared in the ensemble. Therefore the co-existence of different pathways in the obtained ensemble may well indicate the proximity in their free energy landscapes.^{17} Consequently, which pathway hydrates adopt to grow would become rather an entropic process. Since the direct crystallization pathway only represents a small fraction of entire pathway ensemble (as by definition, amorphous includes all other possible non-crystalline structures), its overall statistical weight would be small. Indeed a few simulation studies^{18,25–27} and this work identified the direct formation of a crystalline hydrate, but also found it is far less frequent than the non-direct pathways. Clearly, further studies are needed to validate this hypothesis, desirably through direct experimental observation. We note that the co-existence of the direct and indirect nucleation pathways was indeed recently observed experimentally in calcium carbonate through *in situ* transmission electron microscopy^{65} and protein crystal via atomic force microscope.^{66}

## IV. CONCLUSIONS

In this study, we present an advanced computational strategy integrating forward flux sampling, backward flux sampling, and *p*_{B} histogram analysis to investigate hydrate nucleation under the general conditions where spontaneous nucleation becomes too slow to occur in standard molecular simulation. Through this strategy, we pursue a quantitative, “first-principle” description of the free energy landscape and molecular pathways of hydrate formation. The computational strategy is based on the order parameter H-COP that we previously developed to drive the FFS. The quality of H-COP was verified by the *p*_{B} histogram analysis in the M-hydrate system, which showed that the host-only order parameter, e.g., H-COP, can describe hydrate nucleation pathways nearly equally well as the guest-host integrated order parameter, e.g., MCG.^{36} By combining both the forward and backward flux sampling, we implemented an approach to obtain the free energy profile through calculating the stationary distribution density as a function of the order parameter *λ*. Interestingly the independently calculated distribution profile in L-hydrate was found to fit reasonably well against the classical nucleation theory, which is consistent with recent finding in methane hydrate.^{36} The CNT-like free energy profile appears rather intriguing, given the clear non-classical molecular nucleation pathways revealed from a number of molecular simulations including the current study. Indeed, the structural analysis of the obtained ensemble of nucleation pathways obtained in both hydrate systems clearly suggests that, on an average, hydrate formation is facilitated through a “two-step” like, amorphous to crystal transition. However the analysis also clearly reveals the structural diversity in nucleation pathways, particularly the existence of the direct crystallization trajectories without going through the amorphous stage. The diversity highlights the complexity of hydrate nucleation, and confirms the existence of multiple active nucleation pathways that were reported in a few previous studies.^{18,25–27} The overall classical-like free energy landscape and the structural diversity identified in different systems and conditions further imply that hydrate nucleation could be an entropically driven, kinetic process, which proceeds via multiple pathways that have similar free energy profiles. This conjecture brings a new perspective to hydrate nucleation and needs to be verified by future studies.

## Acknowledgments

The work is supported by NSF through Award No. CBET-1264438. T.L. also thanks the Sloan Foundation through the Deep Carbon Observatory for supporting this work.