Computational studies of crystal nucleation can be impacted by finite size effects, primarily due to unphysical interactions between crystalline nuclei and their periodic images. It is, however, not always feasible to systematically investigate the sensitivity of nucleation kinetics and mechanism to system size due to large computational costs of nucleation studies. Here, we use jumpy forward flux sampling to accurately compute the rates of heterogeneous ice nucleation in the vicinity of square-shaped model structureless ice nucleating particles (INPs) of different sizes and identify three distinct regimes for the dependence of rate on the INP dimension, *L*. For small INPs, the rate is a strong function of *L* due to the artificial spanning of critical nuclei across the periodic boundary. Intermediate-sized INPs, however, give rise to the emergence of non-spanning “proximal” nuclei that are close enough to their periodic images to fully structure the intermediary liquid. While such proximity can facilitate nucleation, its effect is offset by the higher density of the intermediary liquid, leading to artificially small nucleation rates overall. The critical nuclei formed at large INPs are neither spanning nor proximal. Yet, the rate is a weak function of *L*, with its logarithm scaling linearly with 1/*L*. The key heuristic emerging from these observations is that finite size effects will be minimal if critical nuclei are neither spanning nor proximal and if the intermediary liquid has a region that is structurally indistinguishable from the supercooled liquid under the same conditions.

## I. INTRODUCTION

The main premise of molecular simulations is to use the information obtained from simulating finite-sized systems to predict their behavior in the thermodynamic limit. The accuracy of such predictions, however, can depend strongly on the size of the simulated system, as estimates of thermodynamic,^{1–8} structural,^{9} and transport^{10} properties and nucleation rates^{11} in small systems can deviate from those in the thermodynamic limit in a statistically significant manner. Such a dependence on system size is typically referred to as *finite size effects*, which can be fairly strong for very small systems, while being mostly unnoticeable for larger systems. Therefore, finite size effects can, in principle, be mitigated by simulating very large systems, a task that is only computationally feasible for simple model systems.^{12,13} Fortunately, this is not always necessary as similar conclusions can usually be obtained from simulating “sufficiently large” computationally tractable systems.^{12,14} What constitutes “sufficiently large,” however, is subject to the property that is being estimated or the process that is being studied. For instance, a system comprised of several hundred molecules is usually large enough for accurately estimating the thermodynamic, structural, and transport properties of liquids^{1–3,9,10,15} but might be too small for studying collective phenomena such as cavitation,^{16} condensation,^{17} and crystal nucleation.^{12,18,19} It is therefore critical to develop heuristics for determining what qualifies as “sufficiently large” for studying such collective phenomena in order to ensure the accuracy and reliability of the conducted simulations.

One such collective phenomenon that has been extensively studied using molecular simulations is crystal nucleation. As such, understanding the role of finite size effects in the thermodynamics and kinetics of crystal nucleation has been a topic of interest for decades.^{11} Nucleation is a process in which a sufficiently large nucleus of the new phase forms within the old metastable phase and is usually the rate-limiting step of a first-order phase transition when the underlying thermodynamic driving force is small.^{20} Finite size effects in nucleation primarily arise due to periodic boundary conditions, which can result in an unphysical confinement of the metastable phase between the nucleus and its periodic images,^{15,18,19,21} or the formation of nuclei that span across the periodic boundary.^{16,22} In the case of crystal nucleation, the effect of periodic boundaries might be stronger due to the extension of the diffuse crystal–liquid interface beyond the nucleus.^{12} However, finite size effects can also arise due to other factors, such as solute depletion in multi-component systems,^{23,24} or peculiarities of the employed ensemble.^{14,17} These effects can collectively lead to unphysical nucleation rates in both homogeneous and heterogeneous nucleation and have also been found to impact crystal growth.^{25–27} Indeed, the findings of several high-profile computational studies of nucleation are believed to have been strongly impacted by finite size effects. For instance, the observation of Matsumoto *et al.*^{28} of homogeneous ice nucleation in a system of 512 water molecules represented using the fully atomistic TIP4P^{29} model has never been reproduced in larger systems and was later shown to be an artifact of strong finite size effects.^{30} Earlier computational studies of surface freezing—or surface-induced homogeneous ice nucleation^{31,32}—by Vrbka and Jungwirth^{33,34} and Pluhaŕová *et al.*^{35} are also believed to be impacted by finite size effects.^{36}

Early efforts to characterize finite size effects in crystal nucleation focused on homogeneous nucleation in the simple Lennard-Jones (LJ) liquid.^{37} For instance, Honeycutt and Andersen^{18,19} simulated systems of up to 1500 LJ particles at a reduced density of 0.95 and a reduced temperature of 0.45 and concluded that the occurrence of the “catastrophic crystal growth” observed in earlier simulations of the deeply supercooled LJ liquid is not due to the emergence of a critical nucleus and is instead an artifact of periodic boundaries. Indeed, they later demonstrated that critical nuclei form way earlier than the catastrophic growth, but their average sizes and the time needed for their formation both tend to increase with system size, pointing to strong finite size effects even prior to catastrophic growth.^{19} Later, Swope and Andersen^{12} conducted large-scale MD simulations of 15 000 and 10^{6} LJ particles under similar conditions and observed that the properties of the 15 000-particle system were similar to the average properties of the 64 subsystems within the million-particle simulation box. They therefore concluded that the 15 000-particle system is large enough to be devoid of finite size effects. A similar conclusion was reached by Huitema *et al.*^{38} who examined homogeneous nucleation in systems with as many as 10 000 particles. According to these studies, avoiding finite size effects requires simulating systems that are at least three orders of magnitude larger than the characteristic critical nucleus size, a very stringent requirement that can only be satisfied for simple model systems. Consequently, such large-scale simulations of finite size effects in other systems are rare.^{13} This heuristic is, however, based on observations in the high-rate regime, i.e., where nucleation can occur during a computationally tractable MD trajectory, and therefore, the liquid structure is pre-disposed to freezing. It is therefore plausible to expect finite size effects to be weaker in the low-rate regime in which nucleation events are spatially isolated. Moreover, this heuristic can only apply to homogeneous nucleation at best, as the dependence of rate on system size might be completely different for heterogeneous nucleation.

Unfortunately, many of these questions are yet to be investigated in a systematic manner. Consequently, there are no rigorous guidelines or heuristics for avoiding finite size effects in crystal nucleation studies, and different authors have resorted to different *ad hoc* approaches to avoid finite size artifacts. While some have tested the robustness of their findings by repeating their simulations in computationally tractable larger systems,^{39–41} others have argued that finite size effects will be absent if the average distance between the critical nucleus and its periodic images is larger than half the box dimensions.^{42,43} This latter heuristic usually translates to critical nuclei that are at least an order of magnitude smaller than the system size in homogeneous nucleation, a heuristic that is sometimes referred to as the “10% rule.” There are, however, reasons to doubt the adequacy of these approaches. The former approach can only be conclusive if rate calculations are conducted for a wide range of system sizes, an undertaking that is usually not feasible. The size and the distance thresholds invoked in the latter approach, on the other hand, are not based on any rigorous analysis and can be fairly sensitive to the particulars of the algorithm utilized for detecting crystalline nuclei.

Here, we attempt to address some of these questions by systematically investigating how the rate and mechanism of heterogeneous ice nucleation within supercooled supported water nanofilms in the vicinity of a model structureless ice nucleating particle (INP) are affected by system size. Due to the surface-dominated nature of heterogeneous nucleation, the relevant “system size” is the dimension of the INP. We use our recently developed jumpy forward flux sampling (jFFS)^{44} algorithm to compute nucleation rates for 16 different system sizes, comprising between 1600 and 50 176 water molecules. We identify three distinct regimes for the dependence of rate and mechanism on system size and identify the critical nuclei characteristics that signify each regime. Based on our observations, we devise a rigorous set of heuristics for assessing whether a particular nucleation rate calculation is impacted by finite size effects. Moreover, we provide a scaling approach for estimating the rate in the thermodynamic limit for system sizes where finite size effects are minimal, but the rate is still a weak function of system size.

## II. METHODS

### A. System description and molecular dynamics simulations

We consider heterogeneous nucleation in supported nanofilms of supercooled water in the vicinity of a model structureless INP at a temperature of 235 K. Unlike some earlier studies^{43,45} of heterogeneous ice nucleation in which the liquid film is sandwiched between the INP and its periodic image, the films considered in this work only touch the INP on one side and are exposed to vacuum at the other interface [Fig. 1(a)]. Despite being more expensive computationally, we believe that our setup constitutes a more faithful representation of heterogeneous nucleation in nature, where the isolated INPs are in contact with a sea of the supercooled liquid. The water nanofilms considered in this work are approximately 4.8 nm thick, which makes it extremely unlikely for the free interface to impact the kinetics and mechanism of nucleation in a meaningful manner.^{46}

We model water molecules using the monoatomic water (mW) potential,^{47} a popular coarse-grained model of water obtained by re-parameterizing the Stillinger–Weber (SW) potential originally developed for modeling group IV elements, such as carbon and silicon.^{48} The model INP is square-shaped and is located in the *xy* plane. It interacts with water molecules via the Lennard-Jones (LJ) 9-3 potential^{49} with *ϵ* = 1.2 kcal mol^{−1} and *σ* = 0.32 nm and with a cutoff of 0.8 nm. These parameters are chosen so that the underlying INP does not induce ice nucleation very strongly. The utilized *ϵ* value, in particular, is the smallest *ϵ* for which heterogeneous nucleation is observed in unbiased 50-ns long MD simulations at 215 K. All MD simulations are performed in the canonical (NVT) ensemble using the Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) package.^{50} It is necessary to emphasize that the presence of a vapor–liquid interface in our simulation setup makes it possible for the liquid film to freely “adjust” to density changes during nucleation. This is equivalent to simulating nucleation under an effective zero pressure. Our calculations are therefore not susceptible to artifacts due to unphysical density fluctuations in bulk finite systems simulated in the NVT ensemble.^{14} Equations of motion are integrated using the velocity-Verlet algorithm with a time step of 5 fs, while temperature is controlled using the Nosé–Hoover^{51,52} thermostat with a time constant of 0.5 ps. While conventional thermostats are incapable of accounting for the effect of latent heat of fusion, they have been previously shown to perform reasonably well in predicting the nucleation rate due to the non-Gaussian distribution of temperatures of pre- and post-critical nuclei.^{53} As such, we do not expect any of our findings to be strongly impacted by the employed thermostat.

As mentioned above, we utilize *L*, the dimension of the structureless INP, as a proxy for system size. We consider a total of 16 different *L*’s, ranging from 3.19 nm to 17.85 nm, and use the following procedure for initializing the water nanofilms. We first generate a properly sized slab of cubic ice comprised of *n* × *n* × 8 unit cells. The oxygen–oxygen distance in each unit cell, *r*_{OO}, is adjusted so that the target *L* is an integer multiple of $Lc=4rOO/3$, the unit cell dimension of cubic ice.^{54} For most *L*’s, we use the value of *r*_{OO} = 0.276 nm, while for a few smaller system sizes, *r*_{OO} is slightly adjusted in order to fit an integer number of unit cells within the box. These include boxes with lateral dimensions of 3.38 nm, 3.61 nm, 4.06 nm, and 4.28 nm for which the *r*_{OO} values of 0.24 nm, 0.26 nm, 0.25 nm, and 0.26 nm are utilized, respectively. Each ice film is then melted at a temperature of 300 K. A minimum of 100 configurations are saved along the melted trajectory every 50 ps, which are then quenched to the target temperature of 235 K at a cooling rate of 7.69 ps K^{−1}. A list of all *L*’s and the number of water molecules within each film is given in Table I.

### B. Rate calculations

#### 1. Order parameter

We compute nucleation rates using our recently developed jFFS algorithm,^{44} which is a generalized variant of the forward flux sampling (FFS) algorithm^{55} that has been extensively utilized to study rare events.^{56} Similar to most other advanced sampling techniques, conducting an FFS calculation requires an order parameter, a mathematical function $\lambda :Q\u2192R$ that quantifies the progress of the corresponding rare event, in this case, heterogeneous ice nucleation. Here, $Q$ is the configuration space that includes all the microscopic degrees of freedom of the corresponding system, i.e., the positions of all water molecules. In this work, we use the number of molecules in the largest crystalline nucleus as the order parameter. Each molecule *i* is classified as solid-like or liquid-like based on *q*_{6}(*i*), the Steinhart bond order parameter,^{57} given by

Here, *N*_{b}(*i*) is the number of water molecules within a distance of *r*_{c} = 0.32 nm from *i* and *r*_{c} corresponds to the locus of the first valley of the radial distribution function. Therefore, the neighbors of each molecule are those that lie within its first hydration shell. **q**_{l}(*i*) is a (2*l* + 1)-component complex-valued vector, and its components are given by

with *θ*_{ij} and *ϕ*_{ij} being the polar and azimuthal angles corresponding to the separation vector **r**_{ij} = **r**_{j} − **r**_{i} and *Y*_{lm}(·, ·)’s being the spherical harmonic functions. Consistent with our earlier studies,^{36,44,58–60} we classify molecules with *q*_{6} ≥ 0.5 as solid-like, cluster the solid-like molecules that are within a distance of *r*_{c} into crystalline nuclei, and apply the chain exclusion algorithm of Reinhardt *et al.*^{61}

#### 2. jFFS calculations

With *λ*(·) at hand, the transition region between the supercooled liquid basin $A\u2254{x\u2208Q:\lambda (x)<\lambda A}$ and the crystalline basin $B\u2254{x\u2208Q:\lambda (x)\u2265\lambda B}$ is partitioned into *N* non-overlapping regions using *N* milestones *λ*_{A} < *λ*_{0} < *λ*_{1}⋯*λ*_{N} = *λ*_{B}, which are level sets of *λ*(·). As demonstrated earlier,^{44} our utilized *λ*(·) is jumpy as it undergoes high-frequency high-amplitude fluctuations along an MD trajectory. We therefore need to use jFFS in order to accurately capture the potentially subtle changes in rate upon changing *L*. In order to minimize the number of FFS iterations, we follow the approach described in Sec. III B 1 of Ref. 44. For each *L*, we first conduct conventional MD trajectories within *A* with a combined duration of at least 0.5 *µ*s and monitor for first crossings of *λ*_{0}. The configurations corresponding to such crossings $C0={x1(0),x2(0),\u2026,xN0(0)}$ are stored, and Φ_{0}, the flux of trajectories that leave *A* and cross *λ*_{0}, is given by

with $T$ being the combined duration of the MD trajectories and *L*^{2} being the surface area of the underlying INP. The next step is to compute the probability that a trajectory initiated from $C0$ reaches *B* by recursively computing the transition probabilities between successive milestones. In the simplified scheme of jFFS utilized here, the intermediate milestones are chosen on the fly. In particular, *λ*_{k} is chosen so that it is beyond $\lambda k\u22121,max=maxx\u2208Ck\u22121\lambda (x)$, wherein $Ck\u22121$ contains all the configurations obtained upon a first crossing of *λ*_{k−1}. After choosing the next target milestone *λ*_{k}, a large number of trial trajectories are initiated from the configurations in $Ck\u22121$ with their momenta sampled from the Maxwell–Boltzmann distribution, and each trajectory is terminated upon crossing *λ*_{k} or returning to *A*. The transition probability *P*(*λ*_{k}|*λ*_{k−1}) is then computed as the fraction of trial trajectories that cross *λ*_{k}. Note that *λ*_{B} is not known *a priori*. Instead, the calculation is terminated when *P*(*λ*_{k}|*λ*_{k−1}) is statistically indistinguishable from unity. The nucleation rate $J$ is then computed from

The statistical uncertainty of the computed $J$ is estimated using the approach described in Ref. 55. All reported error bars correspond to 95% confidence intervals, i.e., twice the standard errors obtained from this approach. Further technical details of the jFFS calculations, including crossing statistics, FFS milestones, and transition probabilities, and the average length of successful and unsuccessful trajectories in each iteration are given in the supplementary material.

It has been previously demonstrated in numerous studies^{43,62} that *λ*(·) is a good reaction coordinate for crystal nucleation. This implies that the critical nucleus size can be accurately determined from the committor probability given by

More precisely, *N*^{*}, the critical nucleus size, is estimated by fitting *p*_{c}(*λ*) to the following expression:

with the reported error bars corresponding to the range of *λ*’s for which 0.35 ≤ *p*_{c}(*λ*) ≤ 0.65. This choice is motivated by the fact that in an FFS rate calculation, only a handful of FFS milestones are close to the transition region, i.e., the range of *λ*’s for which the committor probability is close to 50%. As a result, the attained estimates of *p*_{c}(*λ*) (and *N*^{*}) can be potentially sensitive to milestone placement in FFS. Therefore, other measures of uncertainty, such as the confidence interval of the fit parameter of Eq. (6), might underestimate the true uncertainty in *N*^{*}, while the approach outlined here is more robust to factors such as milestone placement. It is necessary to emphasize that none of these issues impact the overall rate estimates, which are known to be insensitive to the number and location of milestones.^{44}

In probing the properties of critical configurations, we identify $\lambda k*$, the closest milestone to *N*^{*}, and denote all $x\u2208Ck*$ as critical. For most systems, this results in configurations with 32% ≤ *p*_{c}(*λ*(*x*)) ≤ 68%. In three systems, i.e., *L* = 7.65 nm, 12.11 nm, and 14.02 nm, *N*^{*} is not sufficiently close to any of the milestones, so we choose the configurations obtained from first crossings of the two closest milestones, which results in committor probabilities no smaller than 16% and no larger than 78%.

#### 3. Analysis of nucleation mechanism and identification of bottlenecks

Since a successful nucleation pathway in jFFS is generated sequentially by concatenating the trial trajectories between successive milestones, we can trace back the ancestry of any configuration in $CN$ and identify the *surviving configurations* in earlier milestones, i.e., those with some progeny at *λ*_{B}. We denote the surviving subset of $Ck$ as $Cks$. By comparing the properties of $Cks$ and $Ck$, we can identify the important features that play a key role in successful nucleation. More specifically, for a given mechanical observable $\mu :Q\u2192R$, one can compute $\mu s(\lambda k)=\u27e8\mu (x)\u27e9x\u2208Cks$ and $\mu a(\lambda k)=\u27e8\mu (x)\u27e9x\u2208Ck$. We denote the milestone at which *μ*_{s}(*λ*) and *μ*_{a}(*λ*) cross each other a *bottleneck* for *μ*(·) and the corresponding average *μ*_{b}. As demonstrated in our prior applications of this approach,^{36,44,59,63} the existence of a noticeable bottleneck indicates that *μ*(·) is likely an important feature (orthogonal to the order parameter) that determines the likelihood of an early-milestone configuration to succeed in having progeny at *λ*_{B}.

### C. Characterizing the proximity of crystalline nuclei to their periodic images

Here, we describe the procedure for quantifying the extent by which a crystalline nucleus is impacted by its closest periodic images. Let $N={r1,r2,\u2026,rm}$ be a crystalline nucleus comprised of *m* molecules within a box with side vectors **b**_{x}, **b**_{y}, and **b**_{z}. The periodic images of $N$ can be obtained by adding integer linear combinations of **b**_{x}, **b**_{y}, and **b**_{z} to **r**_{i}’s. More precisely, the *n*_{x}*n*_{y}*n*_{z}th periodic image of $N$ will be given by

We are particularly interested in the closest periodic images of $N$ and thus in a subset of displacement vectors $P$ that produce such closest images. For heterogeneous nucleation on the square-shaped INPs considered in this work, $P$ is given by $P={(\xb1L,0,0),(0,\xb1L,0)}$. It must, however, be noted that $P$, in general, will depend on the shape of the simulation box and on whether nucleation is homogeneous or heterogeneous. For instance, for homogeneous nucleation within a cubic box of side *L*, $P$ will be given by {(±*L*, 0, 0), (0, ±*L*, 0), (0, 0, ±*L*)}.

The most extreme manifestation of finite size effects is when $N$ is *spanning*, i.e., when it is connected to some, or all, of its periodic images. In order to determine whether $N$ is spanning, we label each molecule within $N$ with a set of (up to three) integers ** ξ** = (

*ξ*

_{x},

*ξ*

_{y},

*ξ*

_{z}) that describe the side of the boundary that the molecule resides with respect to a reference molecule. For heterogeneous nucleation, spanning is only possible across two dimensions so only two indices are needed. At the beginning of the analysis, we set all

*ξ*_{i}’s to zero, except for that of an arbitrary reference molecule

*j*within $N$, which is set to

**1**(

**1**stands for a vector of all 1’s). We then recursively loop through the nearest neighbors of each molecule in $N$ (starting with

*j*) and set the

**’s of the unvisited molecules depending on whether reaching that neighbor requires crossing the boundary. More specifically, if**

*ξ**q*is an unvisited neighbor of

*p*and is located at the same side of the periodic boundary as

*p*,

*ξ*_{q}is set to

*ξ*_{p}. Otherwise, the index corresponding to the boundary that needs to be crossed is multiplied by −1. For instance, if reaching

*q*requires crossing the

*x*boundary,

*ξ*

_{q,x}is set to −

*ξ*

_{p,x}. In order for $N$ to be spanning, it needs to contain two neighboring molecules with differing

**’s, but located at the same side of the periodic boundaries.**

*ξ*In order to explore the structure of the liquid that is confined between a non-spanning nucleus $N$ and its closest periodic image, we determine **u**, the shortest vector connecting $N$ to its closest periodic image and calculate the liquid density profile along **u**. In order to determine **u**, we first identify the range of *x*, *y* (or *z*) values for all $ri\u2208N$ that have identical *ξ*_{i}’s and use the attained intervals to displace all molecules in the system so that (the shifted) $N$ is not partitioned by the periodic boundaries. Note that relabeling such a shifted nucleus will yield ** ξ** =

**1**for all its constituent molecules. It must also be emphasized that such shifting can be conducted for non-spanning nuclei only, since spanning nuclei will always cross some periodic boundaries. Therefore, the labeling procedure described above is a pre-requisite for such shifting as it enables us to confirm that a particular nucleus is non-spanning. The inter-image vector

**u**can then be determined as

We call **u** the *inter-image vector* and the supercooled liquid that resides along it the *inter-image liquid*. Our next step is to compute *ρ*_{u}(*r*), the inter-image liquid density profile along **u**, with *r* being the minimum distance from $r\u0129$ or $rj\u0303p\u0303$. In order to compute *ρ*_{u}(*r*), we enumerate the average number of molecules that reside within cylindrical bins of radius *r*_{c} and thickness 0.02 nm along **u**. We use a bin radius of *r*_{c} not only to obtain better statistics but also to assure that all the water molecules that are less than a hydration distance away from **u** are included in the average. In order to obtain an accurate measure of density when either $r\u0129$ or $rj\u0303p\u0303$ is at the immediate vicinity of INP, each cylindrical bin is intersected with a plane parallel to the INP and located at the smallest *z* at which the supercooled liquid density becomes nonzero.

## III. RESULTS AND DISCUSSIONS

### A. Summary of nucleation rates

We first explore the dependence of the nucleation rate on *L*, the dimension of the square-shaped INP. The computed rates are shown in Fig. 1(b) and listed in Table I. We can, in particular, identify three distinct regimes for the dependence of rate on *L*. For small INPs, i.e., for *L* ≤ 4.46 nm, the rate is a strong function of *L* and changes by as much as four orders of magnitude, indicative of strong finite size effects and potentially spurious behavior. For large INPs, i.e., for *L* ≥ 7.65 nm, the rate is a weak and monotonic function of *L* and increases by less than two orders of magnitude upon increasing *L*. These two regimes are highlighted in Figs. 1(b) and 1(c) with shaded green and red, respectively. There is, however, a third intermediate regime that is highlighted with a gradient shade in Figs. 1(b) and 1(c) and that also exhibits a weak and monotonic dependence of rate on *L*. Unlike the second regime, however, the rate decreases slightly upon increasing *L* in this intermediate regime.

N_{p}
. | L (nm)
. | N^{*}
. | $log10J[m\u22122s\u22121]$ . |
---|---|---|---|

1 600 | 3.1869 | 145 ± 9 | 12.6243 ± 0.1882 |

2 304 | 3.3803 | 153 ± 10 | 11.5403 ± 0.1334 |

2 304 | 3.6056 | 178 ± 10 | 9.9990 ± 0.1468 |

2 304 | 3.8243 | 196 ± 10 | 9.1136 ± 0.1446 |

3 136 | 4.0563 | 230 ± 13 | 8.9292 ± 0.1302 |

3 136 | 4.2817 | 248 ± 14 | 8.4352 ± 0.1380 |

3 136 | 4.4617 | 271 ± 17 | 8.7846 ± 0.1218 |

4 096 | 5.0991 | 272 ± 17 | 8.3486 ± 0.1802 |

5 184 | 5.7365 | 276 ± 17 | 8.1461 ± 0.1474 |

9 216 | 7.6487 | 280 ± 18 | 8.2822 ± 0.1430 |

12 544 | 8.9234 | 268 ± 15 | 8.5019 ± 0.1336 |

16 384 | 10.1983 | 271 ± 17 | 8.8893 ± 0.1378 |

23 104 | 12.1104 | 272 ± 17 | 9.0289 ± 0.1246 |

30 976 | 14.0226 | 274 ± 18 | 9.2391 ± 0.1210 |

40 000 | 15.9348 | 272 ± 18 | 9.7301 ± 0.1178 |

50 176 | 17.8470 | 274 ± 18 | 9.9583 ± 0.1160 |

N_{p}
. | L (nm)
. | N^{*}
. | $log10J[m\u22122s\u22121]$ . |
---|---|---|---|

1 600 | 3.1869 | 145 ± 9 | 12.6243 ± 0.1882 |

2 304 | 3.3803 | 153 ± 10 | 11.5403 ± 0.1334 |

2 304 | 3.6056 | 178 ± 10 | 9.9990 ± 0.1468 |

2 304 | 3.8243 | 196 ± 10 | 9.1136 ± 0.1446 |

3 136 | 4.0563 | 230 ± 13 | 8.9292 ± 0.1302 |

3 136 | 4.2817 | 248 ± 14 | 8.4352 ± 0.1380 |

3 136 | 4.4617 | 271 ± 17 | 8.7846 ± 0.1218 |

4 096 | 5.0991 | 272 ± 17 | 8.3486 ± 0.1802 |

5 184 | 5.7365 | 276 ± 17 | 8.1461 ± 0.1474 |

9 216 | 7.6487 | 280 ± 18 | 8.2822 ± 0.1430 |

12 544 | 8.9234 | 268 ± 15 | 8.5019 ± 0.1336 |

16 384 | 10.1983 | 271 ± 17 | 8.8893 ± 0.1378 |

23 104 | 12.1104 | 272 ± 17 | 9.0289 ± 0.1246 |

30 976 | 14.0226 | 274 ± 18 | 9.2391 ± 0.1210 |

40 000 | 15.9348 | 272 ± 18 | 9.7301 ± 0.1178 |

50 176 | 17.8470 | 274 ± 18 | 9.9583 ± 0.1160 |

In order to understand the origin of these contrasting behaviors, we focus on the geometric features of the critical nuclei (see Sec. II B 2 for our definition of critical configurations). We first determine the fraction of critical nuclei that are spanning using the approach described in Sec. II C. Such spanning nuclei exhibit artificial directionality and spurious long-range crystalline order along the *x* and/or *y* dimensions of the simulation box. Figure 1(c) depicts the fraction of critical configurations with spanning crystalline nuclei. While such nuclei are prevalent for very small *L*’s, their fraction decreases upon increasing *L* and eventually drops to zero at *L* = 5.74 nm. Figure 2 depicts typical critical nuclei for systems of different sizes, including a spanning nucleus for *L* = 3.19 nm and a non-spanning nucleus at *L* = 3.82 nm. The presence of an appreciable number of spanning critical nuclei is the main feature that distinguishes the small-INP regime from the other two. We will therefore refer to the shaded green region of Figs. 1(b) and 1(c) as the *spanning regime*. The other two regimes, however, lack an appreciable fraction of spanning critical nuclei and thus exhibit a weaker dependence of rate on *L*.

Finite size effects can, however, exist in the absence of spanning critical nuclei, as the proximity of a critical nucleus to its periodic image can still render noticeable finite size effects even in the absence of spanning. In order to devise a more rigorous measure of such proximity, we inspect the structure of the supercooled liquid that resides within inter-image regions of non-spanning nuclei (defined in Sec. II C). Figure 3 depicts the inter-image liquid density, *ρ*_{u}(*r*), for several system sizes where the spanning fraction is negligible. The inter-image liquid is fairly structured in the immediate vicinity of the nucleus, as evident in the three peaks of *ρ*_{u}(*r*) for *r* < 0.77 nm. The fact that a crystalline surface structures the liquid at its vicinity is a universal feature of crystal–liquid interfaces and is not limited to ice nucleation. Note that the heights and loci of these peaks are independent of the system size, so their emergence is unlikely to be impacted by finite size effects. Beyond this threshold, which we denote by *r*_{c,p} = 0.77 nm, the liquid density reaches a plateau vs *r*. We therefore call the nuclei with |**u**| ≤ 2*r*_{c,p} “proximal” since the inter-image liquid confined between them and their closest periodic images clearly lacks bulk-like behavior. Since |**u**| = 0 for a spanning cluster, all spanning clusters are also proximal. Figure 1(c) depicts the fraction of proximal critical nuclei as a function of *L*. Clearly, the presence of an appreciable fraction of proximal critical nuclei in the intermediate regime, and their absence in the large-INP regime, is the main feature that distinguishes those two regimes. We refer to the large-INP regime as the *non-spanning regime*. The intermediate regime, however, constitutes a transition from the lack of spanning and proximity in the non-spanning regime to a preponderance of spanning and proximal critical nuclei in the spanning regime. We will analyze these three regimes separately and will explain in detail how the differences in the spanning behavior and proximity can affect the magnitude and the nature of finite size effects.

### B. The spanning regime

Due to the close proximity of critical nuclei and their periodic images in the spanning regime, the kinetics and mechanism of nucleation exhibit a strong dependence on system size. Overall, we observe that an increase in the fraction of spanning configurations results in an increase in rate [Fig. 1(b)] and a decrease in the size of the critical nucleus (Table I). This indicates that the ability to form spanning nuclei artificially promotes nucleation. There is indeed a reasonably good, but imperfect, correlation between the fraction of spanning configurations and $log10J$, as shown in Fig. 4. This imperfect correlation, however, suggests that collective variables other than the spanning fraction might be better predictors of the nucleation rate in the spanning regime. At a fundamental level, such a predictor needs to account for the effect of proximity between crystalline nuclei and their periodic images prior to the occurrence of spanning. One such suitable candidate is $dminproj=|(ux,uy,0)|2$ or the minimum *xy*-projected distance between a crystalline nucleus and its closest periodic image. Note that $dminproj=0$ for a spanning nucleus. Focusing on this lateral distance enables us to detect and quantify peculiarities that arise in the absence of spanning. Figure 5 shows $dminproj$ as a function of crystalline nucleus size for all seven films in the spanning regime. In addition to all configurations at a given milestone, we compute $dminproj$ for the surviving configurations as well. As expected, both $dmin,sproj(\lambda )$ and $dmin,aproj(\lambda )$ are strictly decreasing functions of *λ*, but their decline is faster in smaller films. At earlier FFS milestones and prior to the bottlenecks (the green circles in Fig. 5), however, $dmin,sproj(\lambda )$ is consistently larger than $dmin,aproj(\lambda )$, irrespective of the system size. This suggests that at initial stages of nucleation, a less spread-out nucleus is better suited to survive and to contribute to the nucleation pathway. Beyond the bottlenecks, which are always smaller than *N*^{*}, $dmin,sproj(\lambda )$ remains consistently smaller than $dmin,aproj(\lambda )$. This suggests that the more compact surviving nuclei of earlier milestones reach a geometry at the bottleneck that facilitates their further growth toward their periodic images to form spanning or proximal critical nuclei. The bottlenecks of $dminproj(\u22c5)$ therefore constitute pivotal stages in the nucleation process, and the properties of the crystalline nuclei therein are likely to play an important role in the kinetics and mechanism of nucleation.

This hypothesis is borne out by the strong correlation between $dmin,bproj$ and $log10J$, as depicted in Fig. 6. The correlation is equally strong whether $dmin,bproj$ is computed only for the non-spanning configurations [Fig. 6(a)] or all configurations [Fig. 6(b)] at the bottleneck. This is not surprising since the overwhelming majority of crystalline nuclei at the bottleneck are non-spanning, as shown in Fig. 7(b). Moreover, a large fraction of configurations at the final crystalline basin emanate from non-spanning surviving configurations at the bottleneck, as shown in Fig. 7(a). It must, however, be noted that while spanning is much less pronounced at the bottlenecks, it can still play a role in determining the kinetics of nucleation. Indeed, including the spanning configurations in the average, $dmin,bproj$, slightly improves the fit, as shown in Fig. 6(b).

### C. Non-spanning and intermediate regimes

As mentioned in Sec. III A, the non-spanning regime is characterized by critical nuclei that are neither spanning nor proximal, while the intermediate regime contains an appreciable fraction of proximal but non-spanning critical nuclei. More precisely, 19% and 1.8% of the critical nuclei within the two systems that belong to the intermediate regime, namely, *L* = 5.10 nm and 5.74 nm, are proximal, respectively. (A tiny fraction of critical nuclei (0.2%) in the smaller 5.10-nm system are spanning.) It can thus be argued that the dependence of rate on *L* in these two regimes is likely governed by the properties of the supercooled liquid within the plateau region of Fig. 3 considering the prevalence of non-proximal configurations in both regimes.

In order to test this hypothesis, we compute the average inter-image liquid density within the plateau region of *ρ*_{u}(*r*) and compare it to the density of the supercooled liquid under the same conditions. Like other solid surfaces, however, the INP considered in this work structures the liquid at its vicinity, as shown in Fig. 8(b). In particular, there are three distinct liquid layers near the INP, with average densities different from that of the bulk liquid. As such, the average liquid density within the inter-image plateau region should be compared to the average liquid density along **u**, which can, in general, be different from the bulk density. This comparison is, however, not trivial when the inter-image vector **u** passes through multiple liquid layers. We therefore simplify our analysis by computing inter-image plateau densities for configurations in which **u** is fully located within a single liquid layer. Indeed, around 50% of all critical configurations in the non-spanning and intermediate regimes have inter-image vectors lying within one of the first three liquid layers. For any such configuration, the inter-image plateau density is computed by enumerating *N*_{p}, the number of water molecules that reside within a cuboid of length $dminproj\u22122rc,p$, width $w$ = 0.64 nm, and height *h* along **u**_{xy} = (*u*_{x}, *u*_{y}, 0) and centered at the midpoint between the nucleus and its closest periodic image. We choose *w* so that all in-layer nearest neighbors of molecules along **u**_{xy} are included within the cuboid, while *h* is chosen as the thickness of the liquid layer, $L$ determined from Fig. 8(b). A typical cuboid for a configuration with a **u** within the first liquid layer is shown from three different angles in Fig. 8(a). The average inter-image plateau density for each layer is then estimated as the weighted average of individual densities $\rho p(x)=Np(x)/wh[dminproj(x)\u22122rc,p]$ and is given by

where $x\u2208L$ is a configuration whose **u** fully lies within $L$. Note that weighting enhances the statistical accuracy of $\rho pL$ by favoring configurations with larger inter-image plateau regions. The average liquid density within $L$ is estimated by integrating the density profile of Fig. 8(b) within each layer.

Figure 8(c) depicts the $\rho pL$’s of the three layers highlighted in Fig. 8(b) for the critical configurations in the intermediate and non-spanning regimes. For the two films in the intermediate regime, the inter-image plateau densities are significantly larger than the per-layer supercooled liquid densities. This effect likely arises from the negatively sloped melting curve of water, i.e., the fact that the liquid is denser than the crystal under ambient conditions. More precisely, the incorporation of new water molecules into the growing crystalline nucleus results in an increase in its volume and the compression of the inter-image liquid during the out-of-equilibrium nucleation process. Since the thermodynamic driving force for crystallization is a decreasing function of density in water, larger inter-image densities will result in larger nucleation barriers and smaller nucleation rates. While this effect is partly offset by the existence of an appreciable fraction of proximal critical nuclei, it results in rates that are generally smaller than those in the other two regimes. It is necessary to emphasize that this effect is not an artifact of using the canonical ensemble as the free interface is able to freely adjust to density changes during nucleation.

This difference almost disappears in the non-spanning regime, presumably due to the larger size of the inter-image region, which better “absorbs” the growing nucleus front. Indeed, the inter-image plateau densities are statistically indistinguishable from the corresponding liquid densities for *L* ≥ 7.65 nm. These findings generally indicate that the inter-image region is large enough to not be impacted by strong finite size effects in the non-spanning regime. While these findings are based on the configurations with inter-image vectors entirely within one of the first three layers, we expect a qualitatively similar behavior if configurations with inter-layer minimum image connections are also included.

#### 1. Linear scaling of $log10\mathcal{J}$ with 1/*L* in the non-spanning regime

Despite the lack of any quantifiable structural signatures, the nucleation rate is still a weak function of *L* in the non-spanning regime. Indeed, the rate increases by around 1.7 orders of magnitude upon increasing *L* from 7.65 nm to 17.85 nm. The dependence of rate on system size can be satisfactorily described using a linear correlation between $log10J$ and 1/*L*, as depicted in Fig. 9. It has indeed been reported that the averages of properties such as critical temperature,^{2} free energy difference,^{6} structure factor,^{9} diffusivity,^{10} surface tension,^{4} and percolation threshold^{64} have a power law dependence on the system size. The linear relationship between $log10J$ and 1/*L* can therefore be rationalized by invoking the formalism of classical nucleation theory (CNT)^{65} according to which the nucleation rate is given by

Here, *A*, *k*_{B}, and *T* are the kinetic prefactor, the Boltzmann constant, and temperature, respectively. Δ*G*_{nuc} is the nucleation barrier and is given by

with *γ*_{sl}, Δ*μ*, *ρ*_{s}, and *θ* being the solid–liquid surface tension, the chemical potential difference between the two phases, the solid number density, and the solid–liquid-INP contact angle, respectively. All these quantities are known to be impacted by finite size effects,^{4–8} and some of them exhibit a power-law dependence on system size.^{4,6} Consequently, the linear correlation between $log10J$ and 1/*L* is a likely consequence of the power-law dependence of some of these underlying thermodynamic properties, as demonstrated in the Appendix. This scaling enables us to use the rates computed in sufficiently large finite systems to estimate the heterogeneous nucleation rate in the thermodynamic limit. According to the correlation depicted in Fig. 9, the nucleation rate at the thermodynamic limit (*L* → *∞*) is given by $log10J\u221e=10.99\xb10.58$.

## IV. A PROPOSED HEURISTIC TO ASSESS AND QUANTIFY FINITE SIZE EFFECTS IN CRYSTAL NUCLEATION STUDIES

We use our observations to propose a set of heuristics for assessing whether a particular nucleation rate calculation is impacted by finite size artifacts. These heuristics constitute a decision tree comprised of the following steps.

Determine

*N*^{*}, the critical nucleus size, using an approach such as the mean first passage time (MFPT) method^{66}or committor analysis.^{67}Note that the particular classification and clustering algorithm utilized for detecting crystalline nuclei will impact the precise value of*N*^{*}but will not affect whether a particular configuration is classified as critical, i.e., whether it has a committor probability close to 50%.Obtain $C*$, a collection of critical configurations, by analyzing long MD trajectories or scanning the configurations collected at FFS or transition interface sampling (TIS)

^{68}milestones. Ideally, the largest crystalline nucleus in each such configuration should be comprised of exactly*N*^{*}molecules. In practice, though, such a strict filter will limit the number of configurations in $C*$ and will, in turn, lead to poor statistics. Therefore, a less strict criterion can be adopted for constructing $C*$, e.g., the committor probability of each configuration is between 30% and 70%.Use the approach described in Sec. II C to determine whether any configuration in $C*$ spans across the periodic boundary, in which case the rate calculation is likely impacted by strong finite size effects. Note that the utilized classification and clustering algorithm can significantly impact the fraction of spanning configurations or whether any spanning configuration is detected at all. Therefore, while the presence of spanning configurations is a strong indication of finite size artifacts, their absence does not rule out the existence of such artifacts.

Determine the inter-image vector

**u**for every non-spanning configuration in $C*$ and compute the profile of one (or several) relevant structural properties along**u**by averaging over all non-spanning configurations. In most systems, density is the most relevant quantity to monitor. Depending on the nature of the liquid and crystalline phases, however, other structural features might be more relevant. Examples include charge density in systems of polyelectrolytes, ionic liquids, or solutions, local orientational order parameters^{69}in systems of rigid or semi-rigid anisotropic building blocks, or composition in mixtures and solutions. If any of these profiles fails to converge to a plateau, the rate calculation is likely impacted by strong finite size effects. Otherwise, identify the decay length scale for each profile, and denote the nuclei with |**u**|’s shorter than twice the largest decay length scale as proximal. If an appreciable fraction of configurations in $C*$ are proximal, the calculation is likely affected by finite size artifacts. It is necessary to emphasize that whether a particular configuration is proximal or not is not very sensitive to the classification and clustering algorithm employed for detecting crystalline nuclei. Therefore, the proximity of a nucleus to its closest periodic image is a more robust measure of finite size artifacts than its apparent spanning, or lack thereof, across the periodic boundary.If only a tiny fraction of configurations in $C*$ are proximal, compute the average inter-image plateau value for each profiled property and compare it with that of the liquid under the same conditions. If no statistically significant difference is observed, the calculation is likely free of strong finite size effects. As discussed in Sec. III C, all liquid properties (including density) become functions of position in the presence of an interface. Therefore, in the case of heterogeneous nucleation, not only liquid properties will be functions of position but also the inter-image plateau averages will depend on the location of the inter-image vector, and any comparison between the two should take this into consideration.

Considering the limited scope of the calculations conducted in this work (heterogeneous ice nucleation on a structureless INP using a coarse-grained model of water), it is not clear whether the heuristics proposed here can be universally applied to crystal nucleation (homogeneous and heterogeneous) in other systems. Even in the case of ice nucleation, finite size effects might play out differently if an atomistic model (such as TIP4P/Ice) is utilized, or if the underlying INP is structured. Therefore, further follow-up studies are necessary to assess the robustness of these heuristics. Even if these heuristics are proven to be robust, there are two major outstanding challenges still to be addressed. First of all, it is not always *a priori* clear what structural features are relevant for quantifying proximity. While potential candidates can be surmised by considering the symmetries and composition of the underlying liquid, it is always possible that hidden structural features with long correlation lengths could impact nucleation even in very large systems. The second challenge is that the extent by which the rate scales with system size might also vary considerably depending on the system type (e.g., the presence of long-range electrostatic interactions or molecules with long-range connectivity such as polymers) and the magnitude of the thermodynamic driving force. This can be particularly problematic in the non-spanning regime where the proposed heuristic will predict minimal finite size effects, but the rate might still change considerably upon changing the system size.

We wish to note that even though these heuristics are to be applied to the findings of a *completed* rate calculation, they can be utilized in conjunction with other methods such as seeding^{30} to decide the proper size of a simulation box *before* a rate calculation is conducted. More precisely, one can create seeds of different sizes of a target crystal within a bath of supercooled liquid (or supersaturated solution) and equilibrate them using a method such as umbrella sampling^{70} or hybrid Monte Carlo.^{71,72} The structure of the liquid within the inter-image regions can then be inspected to determine the proper decay length scales for relevant liquid properties (such as density). The approximate size of a critical nucleus can then be determined separately from the seeding method. Assuming that the characteristic length scale of a critical nucleus and the largest decay length scale of all profiled liquid properties are given by *l*_{cr} and *l*_{d}, respectively, a simulation box with dimensions sufficiently larger than *l*_{cr} + 2*l*_{d} will likely be free of strong finite size effects.

## V. CONCLUSION

In this work, we use MD simulations and jumpy forward flux sampling to explore the sensitivity of the heterogeneous ice nucleation kinetics and mechanism to *L*, the size of a model structureless INP, and find that finite size effects that arise due to periodic boundary conditions can significantly affect heterogeneous nucleation rates over a wide range of INP sizes. We observe that in this particular system, nucleation rates change by as much as four orders of magnitude within the range of *L*’s considered in this work and are also non-monotonic functions of *L*. We identify three distinct regimes for the dependence of rate on *L* based on whether critical crystalline nuclei span across the periodic boundary, and if not, whether they are proximal, i.e., the liquid that separates them from their closest periodic images is fully structured. In the spanning regime, an appreciable fraction of critical nuclei span across the periodic boundary, while the overwhelming majority of those that do not span are proximal. This results in a strong dependence of rate on *L*, which we explain by analyzing the average minimum projected distance of subcritical crystalline nuclei at the bottleneck FFS milestones. We demonstrate that the fraction of spanning bottleneck configurations and the average minimum projected distance for the non-spanning ones collectively explain the variations in rate in the spanning regime. The second regime, which is observed for intermediate-sized INPs and is thus called the intermediate regime, is characterized by the emergence of proximal, but non-spanning, critical nuclei. Within this regime, the rate is a weak function of *L*, and its variations are governed by the fraction of critical nuclei that are proximal and the inter-image plateau liquid density value for those that are not. In the third regime, which is observed for large INPs and which we denote as the non-spanning regime, critical nuclei are neither spanning nor proximal, and the rate is a weakly increasing function of *L*. We are able to demonstrate that there is a linear scaling between $log10J$ and 1/*L*, which we use to estimate the nucleation rate in the thermodynamic limit.

The key heuristic that emerges from this work is that finite size effects are minimal or absent in systems where the critical nuclei are neither spanning nor proximal, and the inter-image liquid profiles in the plateau region do not deviate significantly from that of the liquid under the same conditions. A more algorithmic description of these heuristics is outlined in Sec. IV. Note that these criteria can be unambiguously tested for both homogeneous and heterogeneous crystal nucleation in all systems and irrespective of the particular definition of the order parameter. While whether a nucleus spans across the periodic boundary will depend on the classification algorithm utilized for detecting solid-like molecules, its proximity to its closest periodic image—or lack thereof—and its inter-image plateau density—if it is not proximal—are independent of such details. It must be noted that the particular system size beyond which these conditions are satisfied will depend on the type of the system and the thermodynamic state. It might therefore be risky to use *ad hoc* heuristics, such as the “10% rule” discussed in Sec. I to determine whether a rate calculation is impacted by finite size effects. Determining how these heuristics translate into proper system sizes in different systems can be the topic of future studies. Moreover, even when these conditions are satisfied, the rate can still be a weak function of the system size, presumably due to the size dependence of the underlying thermodynamic properties. Such a dependence might be particularly strong in systems with long-range electrostatic interactions.

As discussed in Sec. IV, the heuristics proposed in this work are expected to be applicable to homogeneous nucleation studies as well. The fact that homogeneous nucleation occurs endogenously within a liquid makes it simpler to analyze, as no particular shape or polymorph composition will be exerted onto the crystalline nuclei by an external interface. Moreover, due to the translational isotropy of bulk liquids, averages of structural properties such as density will not be position-dependent and can thus be readily calculated with high statistical precision. This makes comparing inter-image plateau properties with those of the bulk liquid more straightforward. On the other hand, validating these heuristics for homogeneous nucleation will be more expensive computationally not only because the homogeneous nucleation rate is generally much smaller but also because the system size scales as *L*^{3} (rather than *L*^{2}), with *L* being the characteristic dimension of the simulation box.

It is widely known that the nucleation rate is extremely sensitive to the mathematical form and the parameters of the employed force-field, as well as thermodynamic variables such as pressure, temperature, and concentration. Even subtle changes in any of these parameters can alter the nucleation rate considerably sometimes by as much as tens of orders of magnitude.^{60} It might therefore be argued that finite size effects are not that significant in comparison, as the ensuing errors appear to be modest. It is not, however, clear that errors due to finite size effects will always be limited to a few orders of magnitude (as observed in this work). Instead, it is totally plausible that the extent to which nucleation rates in finite systems deviate from those in the thermodynamic limit might depend heavily on the magnitude of the nucleation barrier. It is therefore important to determine the conditions under which finite size effects will be minimal. Otherwise, it will be difficult to trust the accuracy of rates computed as a function of quantities such as temperature, electric field, concentration, and force-field parameters, particularly when the computed rates differ by many orders of magnitude and exhibit anomalous or nonclassical behavior. In addition, there are instances in which one might need to compute the nucleation rate in systems with different geometries but at a single thermodynamic state point and using the same force-field. Since the underlying thermodynamic properties remain mostly unchanged under such circumstances, the ensuing changes in rate are usually modest, and resolving them accurately requires having robust tools to assess whether the corresponding calculations are impacted by finite size effects.

The strong sensitivity of rate to *L* in the spanning regime has important implications for detecting finite size artifacts in computational studies of nucleation, as it points to the unpredictable nature of such effects when the system size is too small. Indeed, the rates computed for certain *L*’s in this regime are fortuitously identical to those computed for much larger *L*’s in the other two regimes. This suggests that the common practice of validating nucleation rates by comparing them to rates computed for a larger system might yield misleading results since both systems might be affected by finite size effects, but for different reasons.

## SUPPLEMENTARY MATERIAL

See the supplementary material for technical details of the conducted FFS calculations, including the locations of milestones, and the number of trial trajectories and successful crossings for each iteration.

## ACKNOWLEDGMENTS

A.H.-A. gratefully acknowledges the support from the National Science Foundation CAREER Award (Grant No. CBET-1751971). These calculations were performed on the Yale Center for Research Computing. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation Grant No. ACI-1548562.^{73}

## DATA AVAILABILITY

The underlying data supporting the findings of this study are available from the corresponding author upon reasonable request.

### APPENDIX: POWER LAW SCALING OF $log\u2009\u2009\mathcal{J}$ WITH *L*

Let the physical properties in Eq. (11) have the following power-law dependence on *L*:

with *α*, *β*, *ν*, *κ* > 0 and *a*, *b*, *p*, *k* being real-valued constants. Note that we consider the most general case in which all model parameters in CNT have a power-law dependence on *L* even though such behavior is not established for all of them. We would, however, note that the analysis presented here can be easily revised if some of these exponents are zero, without changing our final conclusion. Using the Taylor expansion (1 + *x*)^{q} ∼ 1 + *qx* yields

where *b*′ is given by

These scaling relationships can thus be used to determine the scaling of Δ*G*_{nuc} with *L*,