The propensity of water to remain in a metastable liquid state at temperatures below its equilibrium melting point holds significant potential for cryopreserving biological material such as tissues and organs. The benefits conferred are a direct result of progressively reducing metabolic expenditure due to colder temperatures while simultaneously avoiding the irreversible damage caused by the crystallization of ice. Unfortunately, the freezing of water in bulk systems of clinical relevance is dominated by random heterogeneous nucleation initiated by uncharacterized trace impurities, and the marked unpredictability of this behavior has prevented the implementation of supercooling outside of controlled laboratory settings and in volumes larger than a few milliliters. Here, we develop a statistical model that jointly captures both the inherent stochastic nature of nucleation using conventional Poisson statistics as well as the random variability of heterogeneous nucleation catalysis through bivariate extreme value statistics. Individually, these two classes of models cannot account for both the time-dependent nature of nucleation and the sample-to-sample variability associated with heterogeneous catalysis, and traditional extreme value models have only considered variations of the characteristic nucleation temperature. We conduct a series of constant cooling rate and isothermal nucleation experiments with physiological saline solutions and leverage the statistical model to evaluate the natural variability of kinetic and thermodynamic nucleation parameters. By quantifying freezing probability as a function of temperature, supercooled duration, and system volume while accounting for nucleation site variability, this study also provides a basis for the rational design of stable supercooled biopreservation protocols.

## I. INTRODUCTION

Understanding how solid phases nucleate from the melt is a complex problem of widespread relevance. The nucleation of ice, in particular, governs the formation of clouds in our atmosphere,^{1} plays a key role in the survival of organisms living in extreme climates,^{2,3} and determines the quality of our refrigerated and frozen foods.^{4,5}

Ice nucleation also sets the current limit for the preservation of life-saving organs for transplantation.^{6} The clinical standard for *ex vivo* storage of organs involves placing an organ on ice (at roughly 0–4 °C) and only enabling brief periods of preservation (4–6 h for the heart and 12–18 h for the liver^{7}) before the organ expires and is unfit for transplantation. Due to the Arrhenius-like nature of metabolic reactions, reducing the preservation temperature further could extend these durations and, as a result, remove significant logistical hurdles, improve immunological matching, and expand access to life-saving organs.^{8} Critically, however, the freezing point of water (0 °C) represents a fundamental barrier, as freezing biological systems is often fatal. Reducing the temperature of preservation below 0 °C without ice formation thus holds significant implications for global health.

Despite manifesting at human and even global scales, the fundamental process of nucleation involves the coordinated dynamics of a handful of molecules occurring at or even below the nanoscale.^{1,9} This immense separation of scales makes making predictions about how nucleation occurs extremely difficult. Additionally, nucleation occurs almost always heterogeneously, which means that not only the physics of the liquid itself but also the interactions with all surfaces it finds itself in contact with must be understood. The result is that, depending on the catalyzing efficiency of the nucleating surfaces present in a given volume of water, ice may nucleate anywhere from just below the equilibrium melting point of 0 °C down to the homogeneous nucleation limit around −40 °C (pure water at atmospheric pressure).^{9} Similarly, when held at a constant subcooled temperature, water may freeze within seconds or remain in a metastable liquid state for years at a time.

In theory, if one were to characterize every surface present in a system, then the nucleation behavior could be predicted for that system and subsequent systems of the same heterogeneous composition. This has been the approach of many researchers, for example, in the atmospheric sciences, who have thoroughly quantified the nucleation behavior of many common aerosol particulate materials in order to understand larger phenomena such as cloud formation.^{10} These studies often seek to describe the time-dependent nature and inter-sample variability observed in freezing experiments, capturing active site variability through distributions of contact angle, surface area, or combinations thereof.^{11–14}

This scenario is uncommon, however, since the typical engineer seeking to predict the freezing behavior of arbitrary systems cannot realistically characterize each and every system, especially not *in situ*.^{15} What is more, nucleation is known to be sensitive to variation in impurities to such an extent that the nucleation behavior of samples taken from a single source may even vary significantly from one another.^{16} This phenomenon is known as non-self-averaging behavior and results from catalyst sites possessing a wide distribution of nucleation rates.^{16–18} Since the nucleation rate is proportional to the exponential of the nucleation barrier, large variations in characteristic nucleation efficiency between active sites can result in one or a few active sites making up the entirety of the total system nucleation rate. This is thought to be responsible for the common experimental observation of nucleation occurring repeatably at individual sites on a surface.^{19,20}

The recognition that the single most potent active site may be responsible for nucleation is equivalent to the so-called weakest-link hypothesis common in the study of fracture mechanics^{21} (i.e., a chain is only as strong as its weakest link, and similarly, a material placed under stress will fracture due to its weakest defect). The weakest link analogy is an example of the statistical theory of extreme values,^{22} which describes situations with a large number of random variables, *N*, for which we are interested in the probability distribution of the largest (or smallest) of the *N* random variables. Extreme value statistics are also used to model, for example, extreme weather and financial events.^{22}

In the study of nucleation, the smallest nucleation barrier, highest nucleation temperature, or shortest nucleation induction time are relevant extreme variables. Extreme value theory can thus be recognized as the foundation of the classic singular model for heterogeneous nucleation, which describes nucleation by a single characteristic nucleation temperature.^{23,24} The singular model has been found to describe the distribution of nucleation temperature among a collection of transiently cooled droplets and to predict how nucleation temperature changes with the size of the droplets.^{25,26}

Although this formulation of the classic singular model does not describe a time dependence of nucleation, extreme value theory has also been applied to experiments that measure the time until nucleation occurs for droplets held at a constant temperature. For systems exhibiting non-self-averaging behavior, the measured nucleation rate of a collection of droplets strays from the exponential decay predicted by Poisson statistics, instead becoming a stretched exponential.^{27}

Ultimately, the randomness of nucleation represents a far-reaching scientific challenge and specifically poses significant barriers to the deployability of supercooling for the preservation of biological systems in clinical settings. While the inherently stochastic nature of nucleation results in a finite probability for nucleation at all times when in a supercooled metastable state, at low undercooling, and when adequately characterized, the probability is vanishingly small.^{28} On the other hand, the heterogeneous catalysis of nucleation by random active sites adds an additional layer of unpredictability to supercooling stability. Even in the purest volumes of water, the presence of minute insoluble impurities as well as certain soluble macromolecules can result in widely varying freezing behavior, even in samples from the same source.

In this study, we perform an extensive series of both constant cooling rate and isothermal ice nucleation experiments and develop a statistical model of nucleation rooted in both the observed inter-sample variability and the intra-sample stochasticity. The time and temperature dependent stochasticity is captured by conventional Poisson statistics, and the active site variability is captured by a new bivariate extreme value model describing the variability of both the kinetic and thermodynamic heterogeneous nucleation parameters.

This article is structured as follows: In Sec. II A, an overview of the constant cooling rate and isothermal nucleation experiments is provided. In Sec. II B, the probability distributions that describe nucleation stochasticity are derived from Poisson statistics, and in Sec. II C, extreme value statistics are introduced and formally connected to Poisson statistics in order to describe heterogeneous nucleation variability.

In Sec. III A, the results from the constant cooling rate experiments are presented, which are then, in Sec. III B, evaluated within the new extreme value statistics framework. The model is then validated by assessing its ability to predict how the nucleation behavior scales with system size in Sec. III C and by measuring isothermal nucleation probabilities in Sec. III D.

By developing a unified framework for characterizing both sources of randomness associated with nucleation processes, this study uncovers the fundamental properties of heterogeneous nucleation and provides a basis for the rational design of safe and robust supercooled biopreservation protocols.

## II. METHODS

### A. Nucleation experiments

The principal goal of supercooled biopreservation is to stably hold biological material in a supercooled liquid state in order to suppress metabolic activity and achieve a certain degree of suspended animation. Nucleation is a random process, and directly characterizing supercooling stability would require conducting experiments under the experimentally relevant timescales (days, possibly weeks) and repeating experiments many times in order to build statistical confidence. Because this is generally impractical, we may alternatively seek to indirectly predict supercooling stability by characterizing the system nucleation rate. The nucleation rate describes the rate of formation of critical ice clusters in a supercooled liquid and is related to nucleation probability under the framework of Poisson statistics (see Sec. II B).

Experiments for characterizing nucleation rates generally fall into two categories: (1) isothermal and (2) constant cooling rates. In isothermal experiments, a sample is quickly quenched and held at a constant supercooled temperature, and the time at which nucleation occurs is recorded. In constant cooling rate experiments, the temperature of a sample is continually decreased at a constant rate, and the temperature at which nucleation occurs is recorded. Through the application of Poisson statistics, equivalent information (namely the nucleation rate) can be obtained from these two techniques. In this study, constant cooling rate experiments were chosen for collecting the bulk of nucleation data since they allow data to be collected more quickly. Since the isothermal supercooling experiments replicate the scenario employed during supercooled biopreservation, an additional campaign of isothermal experiments was performed at various temperatures and compared to the statistical predictions of the model formed from the constant cooling rate data.

The experimental setup employed in this study has been previously described by Consiglio *et al.*^{29} and consists of a rigid Al-7075 chamber with petrolatum coated surfaces (Vaseline, Unilever, UK). The chamber is filled completely with liquid, excluding bulk gas. The combination of hydrophobic surfaces and rigid confinement has been found to enhance the stability of supercooled solutions^{29–32} and is employed here since the practical objective of this work is to design robust supercooled biopreservation protocols.^{28,33} We also hypothesize that these conditions suppress nucleation from the container surface and external perturbations and, thus, isolate the effect of nucleation catalysts present in the solution. The petrolatum coating, which is freshly applied before each experiment, has the additional benefit of providing maximal consistency across repeated experiments and devices. All experiments were conducted with phosphate-buffered saline [PBS (1X) ID: 10 010 023, Thermo Fisher Scientific, USA], the melting point of which is approximately −0.5 °C. This solution has a physiological osmolarity of 280–315 mOsm/kg, which mimics many of the commonly utilized cold storage organ preservation solutions.

Within each constant cooling rate experiment, the temperature is lowered at a constant rate. Nucleation is monitored via an externally mounted strain gauge that enables detection of the pressure rise attributed to ice growth within the confined volume. Upon detection of nucleation, the system is rewarmed to +5 °C to melt the ice and re-equilibrate the system. After a rewarming hold period of 5 min, the cooling cycle is repeated. This process is repeated multiple times in order to construct an empirical distribution of nucleation temperatures. Each constant cooling rate (CCR) experiment consisted of an average of ∼80 freeze/thaw cycles.

The primary constant cooling rate experiments were conducted in a 5 ml chamber (*V* = 5.3 ml, *SA* = 19 cm^{2}) with a cooling rate of 2 °C/min. A second set of constant cooling rate experiments were conducted in a 20 ml chamber (*V* = 21.2 ml, *SA* = 41.9 cm^{2}) with a cooling rate of 1 °C/min. In order to study the variability of heterogeneous catalysts present across the samples, these experiments were repeated a total of 23 times in the 5 ml systems (i.e., 23 different 5 ml samples underwent ∼80 freeze/thaw cycles each) and ten times in the 20 ml systems (i.e., ten different 20 ml samples underwent ∼80 freeze/thaw cycles each). Isothermal experiments were conducted in the 5 ml systems at temperatures of −12 °C (n = 10), −13 °C (n = 6), and −14 °C (n = 18). A cutoff of 2 h was imposed in order to maintain experimental throughput.

Since this study aims to develop methods for informing the design of supercooled biopreservation protocols, it is important to consider how these experiments, which only investigate the supercooling of physiological solutions, relate to the supercooling of an organ. A convenient property of biological materials is that they are generally poor ice nucleating substrates and only freeze at low temperatures or under the influence of external ice.^{34,35} There are, of course, notable exceptions for certain bacteria,^{36} proteins,^{37} and other macromolecules;^{38} however, these are generally not present in mammalian systems. This poor nucleating property enables us to study the nucleation behavior of solutions and containers with the understanding that this behavior will extend to equivalent systems that also contain biological material.

### B. Poisson statistics of nucleation processes

^{39}This yields the result that, for a volume of uniform temperature and composition, the probability of observing

*k*nucleation events in the time interval [0,

*t*] is given by the Poisson distribution,

*λ*is the rate parameter (s

^{−1}) and

*t*is the time (s). Since the rate parameter is constant here, Eq. (1) describes a homogeneous Poisson process (note that this is distinct from the homogeneous descriptor of nucleation occurring in the bulk of a fluid and not at an interface). From this, we can find that the probability of zero nucleation events (

*k*= 0) occurring in a system with a constant nucleation rate

*J*(s

^{−1}) follows an exponential decay,

*T*

_{m},

*S*, which is defined as $SXx=Pr(X\u2265x)$ and represents the probability that a random variable,

*X*, will take on a value greater than or equal to

*x*. In the context of nucleation, the survival function represents the probability that nucleation does not occur before a given time [Eq. (2) for constant temperature] or degree of supercooling [Eq. (4) for constant cooling rate]. The survival function is related to the cumulative distribution function (CDF),

*F*

_{X}(

*x*), by

*F*

_{X}(

*x*) = 1 −

*S*

_{X}(

*x*).

*J*(

*T*) is needed. Classical nucleation theory (CNT) provides a framework for describing the nucleation process

^{40}and can be used to describe the heterogeneous nucleation rate (m

^{−2}s

^{−1}) as

^{41,42}

*n*

_{s}is the number of water molecules per unit area in contact with the ice nucleus,

*k*

_{B}is the Boltzmann’s constant,

*h*is the Planck’s constant, Δ

*F*

_{diff}is the energy of activation for a water molecule to cross the water/ice interface, $\Delta Ghom*$ is the homogeneous free energy of formation for a critical ice embryo, and

*f*

_{het}is the geometric compatibility factor describing the reduction of the free energy barrier due to the presence of an ice nucleating surface.

*G*

_{hom}, often dominates the variation of the nucleation rate with temperature.

^{43}By grouping together the kinetic terms and approximating the temperature dependence of the free energy barrier as $\Delta Ghom*\u223cT\Delta T\u22122$, we can arrive at an oft employed two-parameter relation,

^{44,45}

*T*=

*T*

_{m}−

*T*is the degree of supercooling, and

*A*and

*B*are constants. Parameter

*A*is often referred to as the pre-exponential term or kinetic factor (s

^{−1}), and parameter

*B*is the nucleation barrier or thermodynamic factor (K

^{5}). With this relation, we arrive at cumulative distribution and probability density functions, respectively, for a constant cooling rate process,

The cumulative distribution and probability density functions for a characteristic system with *T*_{m} = 0 °C and nucleation parameters *A* = 1.5 × 10^{19} s^{−1} and *B* = 1.5 × 10^{11} K^{5} are shown in Fig. 1(b). The average nucleation temperature and standard deviation for this process are −13.7 and 0.2 °C, respectively.

*n*th central moment is described generally by

The goal of nucleation experiments, to quantify the nucleation rate, is achieved by fitting the cumulative distribution function from Eq. (8) to an empirical distribution of nucleation temperatures from a constant cooling rate experiment. This nucleation rate can then be used with Eq. (3) to predict the probability that this system would freeze for an arbitrary thermal history as a function of time.

Although this procedure can be used to accurately characterize individual samples, its extension to the prediction of freezing probabilities for arbitrary systems (of nominally equivalent composition) is complicated by the fact that nucleation in real bulk systems is a heterogeneously catalyzed process. Samples of water, even when extensively purified,^{46} inevitably contain varying amounts and populations of heterogeneous nucleating agents, and this often results in different nucleation rates between seemingly identical samples.^{16,17}

*N*is the total number of active sites and

*A*

_{i}and

*B*

_{i}are the nucleation parameters of the

*i*th nucleating particle. Nucleating particles often contain surface features such as crevices that make an outsized contribution to the particle’s total nucleation rate. In this case, the particle’s total surface area is not a relevant parameter. Otherwise, the term

*A*

_{i}would contain the total surface area of the nucleating particle. Equation (11) can be used in place of the single nucleation rate from Eq. (7) in the above statistical relations.

This approach has been widely adopted by the atmospheric science community to characterize the nucleating behavior of specific aerosol materials.^{42} In these experiments, precise quantities of rigorously characterized material are added to the system. The nucleation rate of this system can then be measured, and any observed difference can be attributed to the added material if the nucleating agents result in nucleation at higher temperatures than the base solution.^{47–49}

Unfortunately, this methodology cannot be applied if the purity of a solution or the exact properties of all active heterogeneous catalysts are not known (such as in the case of a bottle of standard laboratory-grade phosphate-buffered saline). Additionally, if the nucleation rate is strongly dependent on surface features as well as surface chemistry, the observed nucleation rate between samples of the same absolute composition may not be consistent.^{20,50} Very few tools are at our disposal for predicting nucleation in these instances. In Sec. II C, we leverage the observation that nucleation often occurs at the single most potent active site,^{17,24,51} enabling the application of methods from extreme value statistics for describing the effect of random impurities and circumventing the need to fully characterize every impurity.

### C. Extreme value statistics of heterogeneous nucleation

Any model of heterogeneous nucleation must describe, in some manner, the inherent variability of nucleation active sites that contributes to the total system nucleation rate as described in Eq. (11).^{52} When specified quantities of characterized impurities are added to a solution, the nucleation parameters are known (i.e., can be characterized). Conversely, in the case of a system containing uncharacterized impurities, the total number of active sites, *N*, as well as the nucleation parameters, *A* and *B*, become random variables.^{17}

A common experimental observation is that ice nucleates repeatably at individual sites, such as a scratch on a container’s surface.^{19,46} As discussed earlier, this behavior is characteristic of quenched disorder and often leads to non-self-averaging behavior. The result is a nucleation behavior that varies randomly between seemingly identical samples.

^{23,24,26}The theory is described generally as follows: A series of independent and identically distributed random variables,

*X*

_{1}, …,

*X*

_{n}, is described by the cumulative distribution function,

*F*

_{X}(

*x*), and maximum, $Mn=maxX1,\u2026,Xn$. Although the distribution of the maximum value of this set is exactly given by $PrMn\u2264z=F(z)n$, the distribution

*F*

_{X}(

*x*) is often not known, such as in the case of heterogeneous nucleation with a distribution of active sites. In the asymptotic limit of

*n*→ ∞, however, the distribution of the maximum is described, regardless of the underlying distribution

*F*

_{X}(

*x*), by a finite family of distributions known as the generalized extreme value (GEV) distribution,

^{22,53}

*G*(

*x*) is the cumulative distribution function,

*g*(

*x*) is the probability density function, and

*μ*,

*σ*, and

*ξ*are the location, scale, and shape parameters, respectively. The specific distribution is determined by the shape parameter, which governs the tail behavior. While the Gumbel distribution is unbounded in both limits, the Weibull distribution has an upper bound and the Fréchet distribution has a lower bound. The sub-classes of the GEV distribution are summarized in Table I and depicted in Fig. 1(c) for location and scale parameters

*μ*= 0 and

*σ*= 1, respectively. Even though we cannot realistically know the complete distribution of impurities present in a solution, in the asymptotic limit, the distribution of the most potent sites is predicted to follow one of these three distributions.

Type . | Name . | Shape factor . | Cumulative distribution function . |
---|---|---|---|

I | Gumbel | ξ = 0 | $Gx=exp\u2212exp\u2212x\u2212\mu \sigma $ |

II | Fréchet | ξ > 0 | $G(x)=exp\u22121+\xi x\u2212\mu \sigma \u22121/\xi x>00x\u22640$ |

III | Weibull | ξ < 0 | $G(x)=exp\u22121+\xi x\u2212\mu \sigma \u22121/\xi x<01x\u22650$ |

Type . | Name . | Shape factor . | Cumulative distribution function . |
---|---|---|---|

I | Gumbel | ξ = 0 | $Gx=exp\u2212exp\u2212x\u2212\mu \sigma $ |

II | Fréchet | ξ > 0 | $G(x)=exp\u22121+\xi x\u2212\mu \sigma \u22121/\xi x>00x\u22640$ |

III | Weibull | ξ < 0 | $G(x)=exp\u22121+\xi x\u2212\mu \sigma \u22121/\xi x<01x\u22650$ |

In order for extreme value theory to be applied to heterogeneous nucleation experiments, a few additional assumptions must first be introduced: (1) The variability in the number of active sites between systems is much smaller than the total number of active sites, and (2) the underlying distribution of active sites is the same between systems. These assumptions are reasonable for individual samples taken from the same source.

#### 1. Determining the relevant extreme variables

The specific physical quantity that is chosen as the extreme variable confers additional constraints on the problem. For example, the classic singular model of nucleation prescribes a characteristic activation temperature for each active site and, thus, the nucleation temperature is the extreme variable. In reality, nucleation is stochastic and does not occur at a single temperature but rather over a distribution of temperatures, even for a single active site. For extreme value statistics to be a good descriptor of the nucleation temperature, the variability of the nucleation temperature for individual active sites should be small in comparison to the variability of characteristic nucleation temperature between systems. This is the case for the system studied here, as shown in Figs. 2 and 3, where the standard deviation of nucleation temperatures for a single sample is on average about 0.33 °C, while the standard deviation of nucleation temperatures between samples is ∼1.5 °C.

The classic singular model is typically applied to data from constant cooling rate experiments in which nucleation temperature is the measured quantity. Alternatively, in the case of isothermal experiments, the relevant extreme variable is the nucleation induction time, and the active site with the minimum characteristic induction time is responsible for nucleation.^{54} For extreme value statistics to have reasonable predictive power here, the spread of induction times for individual active sites caused by inherent stochasticity should likewise be small compared to the spread of induction times between samples.

These two applications of extreme value statistics are the most readily applied since the extreme variable is a measured quantity. Alternatively, we can consider the molecular mechanisms of nucleation, and instead of attributing nucleation to the active site with the highest nucleation temperature or shortest nucleation induction time, we can describe the potency of the active site by the nucleation rate itself. In this way, nucleation is attributed to the active site with the largest nucleation rate.

In the two-parameter nucleation rate relation described in Eq. (7), the nucleation barrier, *B*, is exponentiated, and as such, the nucleation rate is much more sensitive to *B* than it is to the pre-exponential factor, *A*. A theoretical extreme value analysis of nucleation performed by Sear^{16,17} held the pre-exponential factor constant and treated the nucleation barrier as the extreme variable. This approach is convenient since it reduces the problem to a single variable; however, it is not consistent with the common experimental observation that the spread of nucleation temperatures is rather constant regardless of the temperature at which nucleation occurs.^{55} Both the expected value and standard deviation of the nucleation temperature for a constant cooling rate experiment, as computed from the first and second moments of the probability distributions, are dependent on pre-exponential and nucleation barrier parameters. Therefore, both nucleation parameters must vary for the standard deviation to remain the same as the nucleation temperature changes.

*B*, as extreme quantities. These parameters are intimately related, with higher nucleation barriers yielding lower nucleation temperatures. As described in Sec. II B, these variables are formally related through Poisson statistics by the average nucleation temperature, which is defined as the first central moment of the constant cooling rate probability distribution function,

However, from this equation, we can see that the nucleation temperature is not only dependent on the nucleation barrier, *B*, but also on the nucleation kinetic factor, *A*. Each of the three nucleation parameters (namely the average nucleation temperature, ⟨*T*_{f}⟩; the nucleation kinetic factor, *A*; and the nucleation barrier, *B*) are random variables, and this equation will be utilized in the bivariate model to constrain their formal relationship.

An additional limitation of the classic singular model is that it does not capture the cooling rate dependence of the nucleation process, only prescribing a single characteristic nucleation temperature. In reality, the nucleation temperature is dependent on the cooling rate, with faster cooling rates yielding lower nucleation temperatures. This limitation is overcome in the proposed hybrid model since the expression relating the extreme variables, ⟨*T*_{f}⟩ and *B*, to the nucleation rate pre-exponential factor, *A*, in Eq. (14) also contains the cooling rate.

^{56–58}

*x*and

*y*, and

*F*

_{1}(

*x*) and

*F*

_{2}(

*y*) are the univariate marginal extreme value distributions. The quantity

*m*is a measure of the dependence between the random variables and is defined as

*ρ*is the correlation coefficient. A value of

*m*= 1 corresponds to the independence of the two variables and leads to the joint distribution $Gx,y=F1xF2(y)$. The joint probability density function follows similarly from the univariate definition and is given by

The bivariate extreme value distribution describes the variability and correlation of the two extreme variables: the nucleation temperature, ⟨*T*_{f}⟩, and the nucleation barrier, *B*. This can then be used in conjunction with Eq. (14) to solve for the distribution of the nucleation kinetic factor, *A*.

#### 2. Determining the relevant extreme value distribution

Now that we have identified the extreme parameters and determined the form of the bivariate distribution, the next task is to identify the type of GEV distribution that describes each of the marginal distributions. Although the asymptotic theory predicts that the process will converge to one of the three types of GEV distributions shown in Table I, we cannot immediately determine which one this is without information about the underlying process or distribution. If the underlying distribution of active sites is known, then the distribution of maxima is also known, and an extreme value analysis is moot. In the worst-case scenario, we cannot make any physical suppositions and are, therefore, left to fit the data directly to the full three-parameter GEV distribution. By considering some properties of the GEV sub-classes, however, we may be able to rule out some cases. For instance, we can consider the tail behavior of the distributions.

Both the nucleation temperature and the nucleation barrier are physically bound. The nucleation temperature cannot be higher than the equilibrium melting point or lower than the homogeneous limit (roughly −40 °C). In bulk systems that are not purified extensively, heterogeneous nucleation usually results in freezing occurring closer to the equilibrium melting point, making the Weibull distribution potentially a natural fit. Similarly, the nucleation barrier must be less than the homogeneous limit but also greater than zero. Since the nucleation barrier often varies many orders of magnitude in value, it is convenient to consider the logarithm of the value (i.e., ln *B*). The logarithm of the nucleation barrier does not inherently have a lower bound (since the extreme value analysis deals with maxima, we must consider −ln *B*, which conversely does not have an inherent upper bound). We can most likely rule out the Weibull distribution, leaving the Fréchet distribution as a plausible choice for the log-distributed nucleation barrier. In this case, the homogeneous nucleation barrier would provide a physically sensible lower limit.

Often, the best option for assessing which probability distribution best describes the distribution is to use the graphical method of the probability plot.^{22,58} A good fit is indicated by the formation of a straight line when plotting the quantiles of the empirical distribution against the quantiles of a reference distribution. The probability plots in Fig. 3 are linear in nature, indicating decent goodness of fit has been obtained for the measured nucleation parameters. Since there is uncertainty in the underlying active site distribution and little data in the literature on extreme value distributions for either nucleation temperatures or nucleation rate parameters, it is still recommended that each extreme value distribution be explored when studying a new system.

## III. RESULTS AND DISCUSSION

### A. Results from constant cooling rate experiments

The survival curves of the nucleation temperatures are shown in Fig. 2 for a series of 23 constant cooling rate experiments performed in the 5 ml system with PBS at 2 °C/min. While the spread of nucleation temperatures for individual trials is typically within 1 °C (average standard deviation of 0.33 °C), nucleation temperatures across the set of experiments range nearly 7 °C (standard deviation of average nucleation of 1.55 °C). This is consistent with nucleation experiments in the literature that find narrow spreads for individual systems and large spreads for experiments conducted on multiple systems.^{59,60} The stochastic behavior for individual nucleating catalysts is thus quite narrow, yet the characteristic temperature around which this purely stochastic behavior is centered can vary multiple degrees from system to system, a hallmark of non-self-averaging behavior that indicates the potential suitability of extreme value theory.

The method of maximum likelihood estimation is used to determine the cumulative distribution function from Eq. (7) and obtain the respective nucleation parameters for each of the survival curves. As a check of whether the observed nucleation process is well described by the proposed empirical nucleation rate $JT=Aexp\u2212B/T3\Delta T2$, we can linearize the constant cooling rate CDF (see Deubener and Schmelzer^{61}). The linearized nucleation spectra are shown in Fig. 1(b), and their linear form suggests the observed nucleation process is generally well described by the proposed nucleation rate.

### B. Distribution of extreme parameters

With the empirical survival curves and inferred nucleation parameters in hand, we can now evaluate the distributions of the extreme variables: average nucleation temperature, ⟨Δ*T*_{f}⟩, and nucleation barrier, ln *B*. Maximum likelihood estimation is used again for fitting the extreme value distributions, and the resulting distributions are shown in Fig. 3 for the 5 ml constant cooling rate experiments.

As discussed in Sec. II C 2, the Weibull distribution is a physically plausible choice for describing the distribution of nucleation temperatures between samples because of the melting temperature upper bound. A slight modification is first made to the Weibull distribution as defined by the corresponding expression in Table I. By considering the degree of supercooling upon freezing, Δ*T*_{f} = *T*_{m} − *T*_{f}, instead of the absolute nucleation temperature, the location of the upper bound is shifted from *T*_{m} to zero. This effectively reduces the number of parameters in the Weibull distribution from three to two. Figure 3(a) depicts a histogram of the average nucleation temperatures along with the probability density function of the Weibull fit, and Fig. 3(b) shows the Weibull probability plot of the average nucleation temperatures. Linearity in the probability plot indicates the ⟨Δ*T*_{f}⟩ data are well described by the Weibull distribution. We find an average degree of supercooling upon freezing of ⟨Δ*T*_{f}⟩ = 14.3 °C and a standard deviation of $\sigma \u27e8\Delta Tf\u27e9=$ 1.6 °C.

In the case of the nucleation barrier, the full three parameter GEV distribution is fit to the nucleation barrier data, and the best fit is found with the Fréchet distribution. This agrees with the intuition discussed in Sec. II C 2, that the homogeneous nucleation barrier represents a natural limit. Wood and Walton^{44} found a value of *B*_{hom} = 1.66 × 10^{12} K^{5} (ln *B*_{hom} = 28.14), and although this limit has not been strictly enforced when fitting the Fréchet distribution here, the probability density function evaluated at the homogeneous limit is on the order of 10^{−60} and is small enough to exclude any nonphysical contributions. Figure 3(c) depicts a histogram of the nucleation barriers along with the probability density function of the fitted Fréchet distribution. Figure 3(d) shows the Frechet probability plot, and the observed linearity suggests an appropriate fit. We find an average nucleation barrier of ⟨ln *B*⟩ = 25.4 and a standard deviation of *σ*_{lnB} = 0.7.

Having determined the marginal distributions, we can now consider the joint distribution. To that end, we first compute the correlation parameter for the two extreme variables using Eq. (16) and find a value of *m* = 1.32. The joint probability density function, $g\u27e8\Delta Tf\u27e9,lnB$, is then calculated from Eqs. (15) and (17) and is shown in Fig. 4(a). Next, we can perform a variable substitution in order to obtain the PDF in terms of the nucleation rate parameters *A* and *B*, using the definition of the mean nucleation temperature from Eq. (14). From the result shown in Fig. 4(b), we see that there is a narrow band of probable pairings, which indicates that nucleation parameters have a somewhat high degree of correlation and perhaps represents a fundamental property of heterogeneous nucleation.

*f*, which accounts for a modified contact area between the growing solid ice cluster and the catalyst particle onto which it nucleates,

^{62}

The shape factor, *f*_{het}, is a function of the contact angle, *θ*, between the ice embryo and nucleating surface and can vary between *f*(0°) = 0 (corresponding to complete wetting and no nucleation barrier) and $f180\xb0=1$ (corresponding to complete hydrophobicity and homogeneous nucleation). Hydrophilic surfaces, which have small contact angles and also smaller nucleation barriers, are often polar in nature, and polar surfaces are known to orient liquid water molecules in their close vicinity. Therefore, a potential explanation for the scaling behavior shown in Fig. 5(c) and the correlation shown in Fig. 4(b) could be that the reduction in mobility of liquid molecules caused by more potent hydrophilic nucleating surfaces results in a smaller pre-exponential diffusive coefficient.

Now that we have determined the distribution of the nucleation parameters from extreme value statistics, we can apply this model to useful ends. Since 5 ml systems are not a practical size for preserving anything beyond cell suspensions and small tissue constructs, we may be interested in seeing how the model can be used to predict nucleation behavior in larger, more relevant volumes. This will be covered in Sec. III C. Additionally, the experiments from which the model is constructed—constant cooling rate experiments—do not represent the modality of supercooling employed when actually preserving a biological specimen. In reality, preservation is conducted at a constant temperature and at temperatures warmer than the supercooling limits probed in constant cooling rate experiments. We may, therefore, be interested in evaluating how the model predicts nucleation behavior in isothermal systems and as a function of time. This will be covered in Sec. III D.

### C. Scaling of the nucleation process with system size

*n*

_{1}, then the distribution for an identical system of size

*n*

_{2}is $F1xn2/n1$. Following this logic, the size of the system can be readily incorporated into the GEV distribution, yielding

*V*

_{0}is the volume of the system from which the model was developed and

*V*is the volume of an arbitrary system. If a process were to scale with surface area rather than volume, then the relevant surface area quantity could be used instead. In systems where heterogeneous catalysts are intentionally added in order to study their nucleation behavior or to initiate nucleation, the relative scaling parameter would be the total nucleating surface area, which is then often converted to a concentration.

Incorporating the system size into the probability functions in this manner can help assess the validity of the statistical model developed thus far. For example, we can conduct a campaign of constant cooling rate experiments in a larger system and assess the level of agreement with the volume scaling predictions from the statistical model based on the 5 ml data.

The 5 and 20 ml systems employed in this study have a volume ratio of *V*_{20}:*V*_{5} = 4 and a surface area ratio of *S*_{20}:*S*_{5} = 2.2. Although we can hypothesize that the petrolatum coating removes the possibility of ice nucleation on the container surface since it is a slippery hydrocarbon-based material similar to oils used to study homogeneous nucleation in emulsions, we cannot decisively conclude whether nucleation is initiated on the container surface during the constant cooling rate experiments and, thus, expect nucleation behavior to scale with surface area or whether nucleation is initiated in the bulk of the fluid and, thus, expect nucleation to scale with volume.

*α*is the thermal diffusivity (∼0.12 mm

^{2}/s for supercooled water), Δ

*T*is the temperature difference (°C), and

*C*is the cooling rate (°C/s). For the 5 ml system and a cooling rate of 2 °C/min, we find that the center axis lags the wall temperature by ∼2.6 °C. For the 20 ml system and a cooling rate of 1 °C/min, the center lags the wall temperature by ∼5.2 °C. In the case of volume nucleation, if we assume that nucleation is occurring in the coldest part of the system, the entire volume likely would not contribute to nucleation but rather a portion of the volume within a certain distance from the inner wall.

Due to the radial temperature gradient within the chambers, there is a potential for buoyancy-driven natural convection to develop, which would increase heat transfer within the chamber and reduce the temperature gradient approximated by Eq. (3). A smaller temperature gradient would ultimately lead to effective scaling between the limiting scenarios of surface or volume scaling depicted in Fig. 5. Interestingly, one of water’s many anomalous properties is the density maximum that occurs around 4 °C. At this temperature, the coefficient of thermal expansion is zero, and there is no driving force for natural convection. Ultimately, this effect warrants further investigation, and future studies should consider incorporating convective transport effects with the aim of developing more accurate scaling predictions.

Shown in Fig. 5 are the three nucleation parameters ⟨Δ*T*_{f}⟩, ln *B*, and ln *A* computed as a function of volume, *V*. The solid lines represent the median value of each parameter, and the shaded regions are bounded by the 25% and 75% quartiles. The dependence on volume is predicted following the relation in Eq. (19), which incorporates volume into the GEV distribution. The nucleation temperatures are computed by incorporating volume into the Weibull distribution. Similarly, the scaling of the nucleation barrier, ln *B*, is computed by incorporating volume into the Fréchet distribution. The nucleation pre-exponential factor, ln *A*, is obtained using Eq. (14).

The average nucleation temperature [Fig. 5(a)] scales relatively linearly with the logarithm of the volume, leveling off at larger volumes. The decreasing trend of the nucleation temperature with volume is caused by an increased likelihood of more potent nucleating catalysts in larger samples. Whereas in very small systems (such as micro-emulsions whose droplet diameters are often on the order of single microns), it may be possible to exclude all or most insoluble impurities (other than the encapsulating oil phase), larger systems inherently will contain more impurities. This result is consistent with the early findings of Bigg,^{63,64} Langham and Mason,^{25} and Mossop^{65} and is in accordance with the singular model for nucleation. Linearity is expected to be a universal property for the nucleation of systems containing the same population of nucleating catalysts. The precise slope and position of this curve, however, are dependent on the particular source of water and the population of impurities that it contains. The scaling of the nucleation barrier, ln *B*, behaves similarly to the scaling of nucleation temperature, i.e., the larger the system is, the higher the likelihood of more potent nucleating agents and, thus, larger systems possess smaller characteristic nucleation barriers.

One result that would be difficult to predict without this analysis is how the nucleation pre-exponential factor, *A*, scales with volume. This factor is often assumed to be the same for heterogeneous nucleation and homogeneous nucleation since, in the context of CNT, it is related to the diffusion of liquid molecules to the surface of the growing solid nucleus and, thus, thought to be solely a property of the liquid and not of the nucleating catalyst. As discussed in Sec. II C 2, for consistency of the bivariate statistical model, this parameter cannot remain constant and is actually constrained by the values of the other two parameters, ⟨Δ*T*_{f}⟩ and *B*. Interestingly, the value of *A* is predicted to decrease with increasing volume [Fig. 5(c)], which is perhaps counterintuitive since a smaller *A* yields a lower nucleation rate for a constant value of *B*. This behavior is, in fact, consistent with many studies of heterogeneous nucleation and, as discussed in Sec. III B, could be a result of reduced molecular mobility in the vicinity of potent active sites that are somewhat polar and hydrophilic. Regarding the width of the distribution, it is quite wide for small volumes and becomes quite narrow for larger volumes. It is difficult to ascribe a physical justification for this behavior since, in the scope of this extreme value analysis, the parameter *A* is not an extreme variable, and as such, we can neither predict its distribution for the most potent active site nor comment on the value of *A* for the remainder of the active sites. Future studies should seek to validate this result and investigate potential physical origins.

In addition, shown in Fig. 5, are the median parameters from experiments performed in the 5 and 20 ml systems. The 20 ml data are displayed in two instances, corresponding to scaling with volume and surface area. The same general trend (increasing vs decreasing) is seen between the experiments and model predictions, which supports the underlying premises of the model. The volume scaling produces the best agreement, suggesting that nucleation is not necessarily occurring on the container’s inner surface. Future experimental studies should investigate this scaling behavior further.

### D. Freezing probability vs time in systems at constant temperature

*A*and

*B*, are random variables. The model developed in this study has treated the mean nucleation temperature and the nucleation barrier parameters as extreme variables, which enables the heterogeneous nucleation process to be described by the joint extreme value distribution in Eq. (15). To incorporate the dependence on the system volume, Eq. (15) can be modified in accordance with Eq. (19). By further transforming ⟨Δ

*T*

_{f}⟩ to

*A*using the definition of the mean nucleation temperature from Eq. (14), we can obtain an expression for the probability density function in terms of

*A*and

*B*: $g\u27e8\Delta Tf\u27e9,B\u2192gA,B$. Finally, we can obtain a generalized expression for the isothermal freezing probability as a function of volume and time by integrating over all possible values of

*A*and

*B*weighted by their joint probability density,

This relation is used to compute the freezing probability, shown in Fig. 6(a), for physiological saline in a 5 ml system with petrolatum-coated surfaces. We find that the logarithm of the probability scales approximately linearly with temperature for small probabilities (*p* < 0.5). The trend of freezing behavior with supercooled duration is also interesting. For a given freezing probability, varying the temperature by only a few degrees results in the corresponding induction time varying from one day to one year. This result may have the practical implication of extrapolating supercooled states observed to be stable on the order of days to be stable for much longer periods of time.

The results of the 5 ml isothermal experiments are also shown in Fig. 6(a). In each of these experiments, the effective quantity measured is the probability of the sample freezing within 2 h of being supercooled. Average freezing probabilities of 23% (n = 10), 48% (n = 6), and 62% (n = 18) are found for temperatures of −12, −13, and −14 °C, respectively. As anticipated, the freezing probability increases with decreasing temperature; however, the experimental values are consistently higher than those predicted by the model.

Considerable variation is observed for the isothermal trials (repeated experiments at individual temperatures yielded standard deviations of roughly +/−30%). Although this, as well as the relatively small number of repeated trials, likely contribute to the disagreement with the predictions, this general observation is, in fact, in accordance with predictions of the statistical model. Due to the relatively large variation in active site potency between experiments, a high likelihood is predicted of either observing almost no freezing events within 2 h or observing a freezing rate of nearly 100%. This highlights the difficulty of performing isothermal nucleation rate experiments as compared to constant cooling rate experiments as well as the difficulty of interpreting results from individual nucleation experiments.

Figure 5(b) depicts the freezing probability as a function of temperature and volume for a supercooling duration of 24 h. This heat map can be of immediate use to the applied cryobiologist by informing the choice of preservation temperature based on the size of their system and the appropriate freezing risk level. For example, accepting a 1% probability compared to a 0.1% chance of freezing would enable the storage temperature to be reduced by roughly 3–4 °C. Accepting a 10% chance of freezing could enable the storage temperature to be reduced by an additional 3–4 °C.

## IV. CONCLUSION

Supercooling, which describes the phenomenon of water remaining in a metastable liquid state below its equilibrium melting temperature, is an attractive premise for preserving biological material at low temperatures in the absence of ice. Being a metastable process, however, there is a nonzero probability at all times for freezing to occur, and this reality represents a significant challenge for the clinical translatability of any supercooling technique. Just as reactor engineers designing nuclear power plants need to know the probability of a meltdown occurring and civil engineers designing a suspension bridge need to know the likelihood of a high-magnitude earthquake, transplantation surgeons need certain assurance that their supercooled organ will not freeze during the storage period. Without methods to arrest ice growth before damage is imparted, quantifying freezing probability is expected to be central to enabling the widespread implementation of supercooling.

Ultimately, heterogeneous nucleation is a complex problem that is both incompletely understood from a molecular level and difficult to characterize from an experimental standpoint. Random active sites, existing as minute insoluble particles floating in solution or adhered to container walls, as well as certain water-soluble macromolecules, can remain nearly undetectable in solution and produce seemingly unpredictable freezing behavior across repeated experiments. Predicting nucleation behavior in the presence of uncharacterized impurities remains an unsolved problem with broad implications.

In this study, we leverage the observation that nucleation often occurs repeatedly on the single most potent active site present in a system in order to approach the problem using the statistical theory of extreme values. This enables us to reduce the scope of the problem by circumventing the need to characterize every potential active site. We develop a joint singular and stochastic model based on data from constant cooling rate experiments in order to quantify the sample-to-sample variability and time-dependent intra-sample stochasticity. By capturing the variability of the kinetic and thermodynamic parameters governing heterogeneous nucleation, the model is able to predict the probability of nucleation as a function of temperature, volume, and time. Together, this approach constrains the multi-faceted probabilistic nature of heterogeneous ice nucleation and enables the rational design of supercooled biopreservation protocols.

## ACKNOWLEDGMENTS

This research received financial support from the National Science Foundation (NSF) Graduate Research Fellowship under Grant No. DGE 1752814 as well as from the NSF Engineering Research Center for Advanced Technologies for Preservation of Biological Systems (ATP-BIO) under NSF EEC Grant No. 1941543.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

A.N.C., M.J.P.-P., and B.R. conceived of study. A.N.C. and Y.O. conducted experiments. A.N.C. performed statistical analyses and wrote the first draft of the manuscript. All authors contributed to revision of the manuscript.

**Anthony N. Consiglio**: Conceptualization (equal); Data curation (equal); Formal analysis (lead); Investigation (equal); Methodology (lead); Visualization (lead); Writing – original draft (lead); Writing – review & editing (equal). **Yu Ouyang**: Data curation (equal); Investigation (equal); Methodology (equal); Writing – review & editing (equal). **Matthew J. Powell-Palm**: Conceptualization (equal); Methodology (equal); Writing – review & editing (equal). **Boris Rubinsky**: Conceptualization (equal); Funding acquisition (lead); Methodology (equal); Writing – review & editing (equal).

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.

## REFERENCES

*Microstructure of Atmospheric Clouds and Precipitation*

*Water and Aqueous Solutions at Subzero Temperatures*

_{2}O and H

_{2}O-in-oil emulsions

_{3}/H

_{2}SO

_{4}/H

_{2}O solutions at stratospheric temperatures: Nucleation statistics and experiments

*Nucleation Basic Theory with Applications*

*Continuous Bivariate Distributions*

*Statistics of Extremes: Theory and Applications*