An extreme value statistics model of heterogeneous ice nucleation for quantifying the stability of supercooled aqueous systems

The propensity of water to remain in a metastable liquid state at temperatures below its equilibrium melting point holds significant potential for cryopreserving biological material such as tissues and organs. The benefits conferred are a direct result of progressively reducing metabolic expenditure due to colder temperatures while simultaneously avoiding the irreversible damage caused by the crystallization of ice. Unfortunately, the freezing of water in bulk systems of clinical relevance is dominated by random heterogeneous nucleation initiated by uncharacterized trace impurities, and the marked unpredictability of this behavior has prevented implementation of supercooling outside of controlled laboratory settings and in volumes larger than a few milliliters. Here, we develop a statistical model that jointly captures both the inherent stochastic nature of nucleation using conventional Poisson statistics as well as the random variability of heterogeneous nucleation catalysis through bivariate extreme value statistics. Individually, these two classes of models cannot account for both the time-dependent nature of nucleation and the sample-to-sample variability associated with heterogeneous catalysis, and traditional extreme value models have only considered variation of the characteristic nucleation temperature. We conduct a series of constant cooling rate and isothermal nucleation experiments with physiological saline solutions and leverage the statistical model to evaluate the natural variability of kinetic and thermodynamic nucleation parameters. By quantifying freezing probability as a function of temperature, supercooled duration, and system volume, while accounting for nucleation site variability, this study also provides a basis for the rational design of stable supercooled biopreservation protocols.


I. INTRODUCTION
Understanding how solid phases nucleate from the melt is a complex problem of wide-spread relevance.The nucleation of ice, in particular, governs the formation of clouds in our atmosphere 1 , plays a key role in the survival of organisms living in extreme climates 2,3 , and determines the quality of our refrigerated and frozen foods 4,5 .
Ice nucleation also sets the current limit for the preservation of live saving organs for transplantation 6 .The clinical standard for ex vivo storage of organs involves placing an organ on ice (at roughly 0-4°C) and only enables brief periods of preservation (4-6 hours for the heart and 12-18 hours for the liver 7 ) before the organ expires and is unfit for transplantation.Due to the Arrhenius-like nature of metabolic reactions, reducing the preservation temperature further could extend these durations, and as result, remove significant logistical hurdles, improve immunological matching and expand access to life-saving organs 8 .Critically however, the freezing point of water (0°C) represents a fundamental barrier, as freezing of biological systems is often fatal.Reducing the temperature of preservation below 0°C without ice formation thus holds significant implications for global health.
Despite manifesting at the human and even global scale, the fundamental process of nucleation involves the coordinated dynamics of a handful of molecules occurring at or even below the nanoscale 1,9 .This immense separation of scales results in making predictions about how nucleation occurs extremely difficult.Additionally, nucleation occurs almost always heterogeneously, which means that not only the physics of the liquid itself must be understood but also the interactions with all surfaces it finds itself in contact with.The result is that, depending on the catalyzing efficiency of the nucleating surfaces present in a given volume of water, ice may nucleate anywhere from just below the equilibrium melting point of 0°C down to the homogeneous nucleation limit around -40°C (pure water at atmospheric pressure) 9 .Likewise, when held at a constant subcooled temperature, water may freeze within seconds or remain in a metastable liquid state for years at a time.
In theory, if one were to characterize every surface present in a system, then the nucleation behavior could be predicted for that system and subsequent systems of the same heterogeneous composition.This has been the approach of many researchers, for example, in the atmospheric sciences, who have thoroughly quantified the nucleation behavior of many common aerosol particulate materials in order to understand larger phenomena such as cloud formation 10 .These studies often seek to describe the time dependent nature and inter-sample variability observed in freezing experiments, capturing active site variability through distributions of contact angle, surface area, or combinations thereof [11][12][13][14] .This scenario is uncommon however, since the typical engineer seeking to predict the freezing behavior of arbitrary systems cannot realistically characterize each and every system, and especially not in situ 15 .What's more, nucleation is known to be sensitive to variation in impurities to such an extent that the nucleation behavior of samples taken from a single source may even vary significantly from one another 16 .This phenomenon is known as non-self-averaging behavior and results from catalyst sites possessing a wide distribution of nucleation rates [16][17][18] .Since the nucleation rate is proportional to the exponential of the nucleation barrier, large variations in characteristic nucleation efficiency between active sites can result in one or a few active sites making up the entirety of the total system nucleation rate.This is thought to be responsible for the common experimental observation of nucleation occurring repeatably at individual sites on a surface 19,20 .
The recognition that the single most potent active site may be responsible for nucleation is equivalent to the so-called weakest-link hypothesis common in the study of fracture mechanics 21 (i.e., a chain is only as strong as its weakest link, and similarly, a material placed under stress will fracture due to its weakest defect).The weakest link analogy is an example of the statistical theory of extreme values 22 , which describes situations with a large number of random variables, , for which we are interested in the probability distribution of the largest (or smallest) of the  random variables.Extreme value statistics is also used to model, for example, extreme weather and financial events 22 .
In the study of nucleation, the smallest nucleation barrier, highest nucleation temperature, or shortest nucleation induction time are relevant extreme variables.Extreme value theory can thus be recognized as the foundation of the classic singular model for heterogeneous nucleation which describes nucleation by a single characteristic nucleation temperature 23,24 .The singular model has been found to describe the distribution of nucleation temperature among a collection of transiently cooled droplets and to predict how nucleation temperature changes with the size of droplets 25,26 .
Although this formulation of the classic singular model does not describe a time dependence of nucleation, extreme value theory has also been applied to experiments that measure the time until nucleation occurs for droplets held at a constant temperature.For systems exhibiting non-selfaveraging behavior, the measured nucleation rate of a collection of droplets strays from the exponential decay predicted by Poisson statistics, instead becoming a stretched exponential 27 .
Ultimately, the randomness of nucleation represents a far-reaching scientific challenge and specifically poses significant barriers to the deployability of supercooling for the preservation of biological systems in clinical settings.While the inherent stochastic nature of nucleation results in a finite probability for nucleation at all times when in a supercooled metastable state, at low undercoolings, and when adequately characterized, the probability is vanishingly small 28 .On the other hand, the heterogeneous catalysis of nucleation by random active sites adds an additional layer of unpredictability to supercooling stability.In even the purest volumes of water, the presence of minute insoluble impurities as well as certain soluble macromolecules can result in widely varying freezing behavior even in samples from the same source.
In this study, we develop a statistical model of nucleation rooted in both the observed intersample variability and the intra-sample stochasticity.The time and temperature dependent stochasticity is captured by conventional Poisson statistics, and the active site variability is captured by a new bivariate extreme value model describing variability of both the kinetic and thermodynamic heterogeneous nucleation parameters.By characterizing both sources of randomness, this study uncovers fundamental properties of heterogeneous nucleation and provides a basis for the rational design of safe and robust supercooled biopreservation protocols.

A. Supercooling experiments
The principal goal of supercooled biopreservation is to stably hold biological material in a supercooled liquid state in order to suppress metabolic activity and achieve a certain degree of suspended animation.Nucleation is a random process and directly characterizing supercooling stability would require conducting experiments under the experimentally relevant timescales (days, possibly weeks) and repeating experiments many times in order to build statistical confidence.Because this is generally impractical, we may alternatively seek to indirectly predict supercooling stability by characterizing the system nucleation rate.The nucleation rate describes the rate of formation of critical ice clusters in a supercooled liquid and is related to nucleation probability under the framework of Poisson statistics (see Section II.B).
Experiments for characterizing nucleation rates generally fall under two categories: 1) isothermal and 2) constant cooling rate.In isothermal experiments, a sample is quickly quenched to and held at a constant supercooled temperature, and the time at which nucleation occurs is recorded.In constant cooling rate experiments, the temperature of a sample is continually decreased at a constant rate, and the temperature at which nucleation occurs is recorded.Through the application of Poisson statistics, equivalent information (namely the nucleation rate) can be obtained from these two techniques.In this study, constant cooling rate experiments were chosen for collecting the bulk of nucleation data since they allow data to be collected more quickly.Since isothermal supercooling experiments replicate the scenario employed during supercooled biopreservation, an additional campaign of isothermal experiments was performed at various temperatures and compared to the statistical predictions of the model formed from the constant cooling rate data.
The experimental setup employed in this study has been previously described by Consiglio et al. 29 and consists of a rigid Al-7075 chamber with petrolatum coated surfaces (Vaseline, Unilever, UK).The chamber is filled completely with liquid, excluding all bulk gas.The combination of hydrophobic surfaces and rigid confinement has been found to enhance the stability of supercooled solutions [29][30][31][32] and is employed here since the practical objective of this work is designing robust supercooled biopreservation protocols 28,33 .We also hypothesize that these conditions suppress nucleation from the container surface and from external perturbations and thus isolate the effect of nucleation catalysts present in the solution.The petrolatum coating, which is freshly applied before each experiment, has the additional benefit of providing maximal consistency across repeated experiments and devices.
The primary set of experiments, from which the statistical models are developed, were conducted in a 5ml chamber (=5.3ml,=19cm 2 ) with a cooling rate of 2°C/min.A second set of experiments were conducted in a 20ml chamber (=12.2ml,=41.9cm 2 ) with a cooling rate of 1°C/min.Within each constant cooling rate experiment, an average of 80 freezing temperatures were recorded.In order to study the variability of heterogeneous catalysts present across the samples, these experiments were repeated a total of 23 times in the 5ml systems and 10 times in the 20ml systems.Isothermal experiments were conducted in the 5ml systems at temperatures of -12°C, -13°C, and -14°C.A cutoff of two hours was imposed in order to maintain experimental throughput.All experiments were conducted with phosphate-buffered saline (PBS (1X) ID: 10010023, Thermo Fisher Scientific, USA), the melting point of which is approximately -0.5°C.

B. Poisson statistics of nucleation processes
Nucleation, considered here as the steady state formation of stable solid clusters, is a stochastic process with individual nucleation events being independent of each other and of previous events.From a probabilistic standpoint, it can be shown that nucleation in an ensemble of supercooled molecules is governed by Poisson statistics 34 .This yields the result that, for a volume of uniform temperature and composition, the probability of observing  nucleation events in the time interval [0, ] is given by the Poisson distribution: where  is a rate parameter [ −1 ]and  is time [].Since the rate parameter is constant here, Equation 1 describes a homogeneous Poisson process (note that this is distinct from the homogeneous descriptor of nucleation occurring in the bulk of a fluid and not at an interface).
From this, we can find that the probability of zero nucleation events ( = 0) occurring in a system with constant nucleation rate  [ −1 ] follows an exponential decay: This probability distribution is shown in Figure 1a as a function of time for three different nucleation rates.If the nucleation rate is not constant but instead varies with time, we have a nonhomogeneous Poisson process, and the probability of zero nucleation events becomes In certain scenarios, the temperature is lowered at a constant cooling rate,  ̇= |/|, and we can define the probability as a function of temperature instead of time by integrating the nucleation rate in temperature, beginning at the equilibrium melting temperature,  m .
In statistical terms, this probability distribution is known as the survival function, , which is defined as   () = Pr( ≥ ) and represents the probability that a random variable, , will take on a value greater than or equal to .In the context of nucleation, the survival function represents the probability that nucleation does not occur before a given time (Equation 2 for constant temperature) or degree of supercooling (Equation 4 for constant cooling rate).The survival function is related to the cumulative distribution function (CDF),   (), by   () = 1 −   ().
The probability density function (PDF) is another useful statistical function and is defined as   () =     ().Since the probability for a continuous random variable to take on any one particular value is zero, in the context of nucleation, the PDF represents the relative likelihood that nucleation will occur around a given temperature (for constant cooling rate) or around a certain time (at constant temperature).For a constant cooling rate process, the PDF is given by In order to solve for the freezing probability distribution, a relation for the nucleation rate () is needed.Classical nucleation theory (CNT) provides a framework for describing the nucleation process 35 and can be used to describe the heterogeneous nucleation rate [m -2 s -1 ] as 36,37 where  s is the number of water molecules per unit area in contact with the ice nucleus,  B is Boltzmann's constant, ℎ is Planck's constant, Δ diff is the energy of activation for a water molecule to cross the water/ice interface, Δ hom * is the homogeneous free energy of formation for a critical ice embryo, and  het is a geometric compatibility factor describing the reduction of the free energy barrier due to the presence of an ice nucleating surface.
Many of the terms in Equation 6are independent of or only weakly dependent on temperature, and the nucleation free energy barrier, Δ hom , often dominates the variation of the nucleation rate with temperature 38 .By grouping together the kinetic terms and approximating the temperature dependence of the free energy barrier as Δ hom ~(Δ) −2 , we can arrive at an oft employed twoparameter relation 39,40 : where Δ =  m −  is the degree of supercooling, and  and  are constants.The parameter  is often referred to as the pre-exponential factor or kinetic term [s -1 ], and the parameter  as the nucleation barrier or thermodynamic term [K 5 ].With this relation, we arrive at cumulative distribution and probability density functions, respectively, for a constant cooling rate process: The cumulative distribution and probability density functions for a characteristic system with  m = 0°C and nucleation parameters  = 1.5 × 10 19 s −1 and  = 1.5 × 10 11 K 5 are shown in Figure 1b.The average nucleation temperature and standard deviation for this process are -13.7°Cand 0.2°C, respectively.12°C (red), computed from Equations 2 and 7. b) Cumulative distribution (blue) and probability density (red) functions for same system as in panel a) cooled at a constant rate of 1°C/min, computed from Equations 8 and 9, respectively.c) Example Type I: Gumbel (blue), Type II: Fréchet (red), and Type III: Weibull (green) generalized extreme value probability density functions (Equation 13) for  =  and  = .
The moments of a probability distribution provide useful quantitative measures about the shape of the distribution.The first moment describes the expected value (i.e., mean), the second moment is related to the variance, and the third moment is related to the skewness.The -th central moment is described generally by The goal of nucleation experiments, to quantify the nucleation rate, is achieved by fitting the cumulative distribution function from Equation 8to an empirical distribution of nucleation temperatures from a constant cooling rate experiment.This nucleation rate can then be used with Equation 3 to predict the probability that this system would freeze for an arbitrary thermal history as a function of time.
Although this procedure can be used to accurately characterize individual samples, its extension to the prediction of freezing probabilities for arbitrary systems (of nominally equivalent composition) is complicated by the fact that nucleation in real bulk systems is a heterogeneously catalyzed process.Samples of water, even when extensively purified 41 , inevitably contain varying amounts and populations of heterogeneous nucleating agents, and this often results in different nucleation rates between seemingly identical samples 16,18 .
In order to characterize the freezing probability for heterogeneous nucleation, one would ideally know properties of every nucleating agent present.The total system nucleation rate could then be expressed as the sum of the individual rates: where  is the total number of active sites and   and   are the nucleation parameters of the -th active site.Active sites are in some instances considered zero dimensional entities, such as the case for a wedge or crevice, and in this case, their surface area is not a relevant parameter.Otherwise, the term   would contain the total surface area of the active site.Equation 11 can be used in place of the single nucleation rate from Equation 7 in the above statistical relations.This approach has been widely adopted by the atmospheric science community to characterize the nucleating behavior of specific aerosol materials 37 .In these experiments, precise quantities of rigorously characterized material are added to the system.The nucleation rate of this system can then be measured, and any observed difference can be attributed to the added material if the nucleating agents result in nucleation at higher temperatures than the base solution [42][43][44] .
Unfortunately, this methodology cannot be applied if the purity of a solution or exact properties of all active heterogeneous catalysts are not known (such as the case with a bottle of standard laboratory-grade phosphate-buffered saline).Additionally, if the nucleation rate is strongly dependent on surface features as well as surface chemistry, the observed nucleation rate between samples of the same absolute composition may not be consistent 20,45 .Very few tools are at our disposal for predicting nucleation in these instances.In the following section, we leverage the observation that nucleation often occurs at the single most potent active site 18,24,46 , enabling the application of methods from extreme value statistics for describing the effect of random impurities and circumventing the need of fully characterizing every impurity.

C. Extreme value statistics of heterogeneous nucleation
Any model of heterogeneous nucleation must describe, in some manner, the inherent variability of nucleation active sites that contributes to the total system nucleation rate as described in Equation 11 47 .When specified quantities of characterized impurities are added to a solution, the nucleation parameters are known (i.e., can be characterized).Conversely, in the case of a system containing uncharacterized impurities, the total number of active sites, , as well as the nucleation parameters,  and , become random variables 18 .
A common experimental observation is that ice nucleates repeatably at individual sites, such as a scratch on a container's surface 19,41 .As discussed earlier, this behavior is characteristic of quenched disorder and often leads to non-self-averaging behavior.The result is a nucleation behavior varying randomly between seemingly identical samples.
The statistical theory of extreme values has been proposed to describe this nucleation phenomenon and is at the core of first statistical theory of heterogeneous nucleation 23,24,26 .The theory is described generally as follows: a series of independent and identically distributed random variables,  1 , … ,   , is described by the cumulative distribution function,   (), and maximum,   = max( 1 , … ,   ).Although the distribution of the maximum value of this set is exactly given by Pr(  ≤ ) = [()]  , the distribution   () is often not known such as in the case of heterogeneous nucleation with a distribution of active sites.In the asymptotic limit of  → ∞ however, the distribution of the maximum is described, regardless of the underlying distribution   (), by a finite family of distributions known as the generalized extreme value (GEV) distribution 22,48 : Here, () is the cumulative distribution function, () is the probability density function, and , , and  are the location, scale, and shape parameters, respectively.The specific distribution is determined by the shape parameter, which governs the tail behavior.While the Gumbel distribution is unbounded in both limits, the Weibull distribution has an upper bound and the Fréchet distribution a lower bound.The sub-classes of the GEV distribution are summarized in Table 1 and depicted in Figure 1c for location and scale parameters  = 0 and  = 1, respectively.Even though we cannot realistically know the complete distribution of impurities present in a solution, in the asymptotic limit, the distribution of most potent sites is predicted to follow one of these three distributions.
In order for extreme value theory to be applied to heterogeneous nucleation experiments, a few additional assumptions must first be introduced: 1) the variability in the number of active sites between systems is much smaller than the total number of active sites, and 2) the underlying distribution of active sites is the same between systems.These assumptions are reasonable for individual samples taken from the same source.

Determining the relevant extreme variables
The specific physical quantity that is chosen as the extreme variable confers additional constraint on the problem.For example, the classic singular model of nucleation prescribes a characteristic activation temperature to each active site, and thus the nucleation temperature is the extreme variable.In reality, nucleation is stochastic and does not occur at a single temperature but rather over a distribution of temperatures, even for a single active site.For extreme value statistics to be a good descriptor of the nucleation temperature, the variability of nucleation temperature for individual active sites should be small in comparison to the variability of characteristic nucleation temperature between systems.This is the case for the system studied here, as seen in Figure 2 and Figure 3, where the standard deviation of nucleation temperatures for a single sample is on average about 0.33°C while the standard deviation of nucleation temperatures between samples is approximately 1.5°C.
The classic singular model is typically applied to data from constant cooling rate experiments in which nucleation temperature is the measured quantity.Alternatively, in the case of isothermal experiments, the relevant extreme variable is the nucleation induction time, and the active site with the minimum characteristic induction time is responsible for nucleation 49 .For extreme value statistics to have reasonable predictive power here, the spread of induction times for individual active sites caused by inherent stochasticity should likewise be small compared to the spread of induction times between samples.These two applications of extreme value statistics are the most readily applied since the extreme variable is a measured quantity.Alternatively, we can consider the molecular mechanisms of nucleation and instead of attributing nucleation to the active site with the highest nucleation temperature or shortest nucleation induction time, we can describe the potency of the active site by the nucleation rate itself.In this way, nucleation is attributed to the active site with the largest nucleation rate.In the two-parameter nucleation rate relation described in Equation 7, the nucleation barrier, , is exponentiated, and as such, the nucleation rate is much more sensitive to  than it is to the pre-exponential factor, .A theoretical extreme value analysis of nucleation performed by Sear 16,18 held the pre-exponential factor constant and treated the nucleation barrier as the extreme variable.This approach is convenient since it reduces the problem to a single variable, however it does not agree well with the common experimental observation that the spread of nucleation temperatures is rather constant regardless of the temperature at which nucleation occurrs 50 .Both the expected value and standard deviation of the nucleation temperature for a constant cooling rate experiment, as computed from the first and second moments of the probability distributions, are dependent on pre-exponential and nucleation barrier parameters.Therefore, both nucleation parameters must vary for the standard deviation to remain the same as the nucleation temperature changes.
Since none of the singular extreme value models can suitably capture the time and temperature dependence of nucleation nor the variability of the nucleation rate, we propose a hybrid scheme that treats both the nucleation temperature and the nucleation barrier parameter, , as extreme quantities.This bivariate approach is then able to account for the variability of the nucleation parameter, , by leveraging the definition of the average nucleation temperature obtained as the first central moment of the constant cooling rate probability distribution function.
An additional limitation of the classic singular model, which prescribes a single activation temperature to nucleation sites, is that it does not capture the cooling rate dependence of the nucleation process.This limitation is overcome in this hybrid model since the expression relating the extreme variables, ⟨ f ⟩ and , to the nucleation rate pre-exponential factor, , in Equation 17incorporates the cooling rate.
The proposed hybrid scheme is a case of bivariate analysis, with each extreme variable described by separate univariate distributions.The nucleation temperature and nucleation barrier are not completely independent variables however, so the problems cannot be treated separately.Instead, we must define a joint probability distribution, considering the nucleation temperature and nucleation barrier as independent yet correlated variables.Formulation of bivariate extreme value distributions with arbitrary marginal distributions is in general difficult due to the construction of the dependence structure and the absence of a finite parametric family, which exists for univariate extreme value distributions.The two most common bivariate models are the mixed and logistic models.Here, we implement the logistic model [51][52][53] : where (, ) is the bivariate extreme value distribution describing both random extreme variables,  and , and  1 () and  2 () are the univariate marginal extreme value distributions.The quantity  is a measure of the dependence between the random variables and is defined as where  is the correlation coefficient.A value of  = 1 corresponds to independence of the two variables and leads to the joint distribution (, ) =  1 () 2 ().The joint probability density function follows similarly from the univariate definition and is given by

Determining the relevant extreme value distribution
Now that we have identified the extreme parameters and determined the form of the bivariate distribution, the next task is to identify the type of GEV distribution that describes each of the marginal distributions.Although the asymptotic theory predicts that the process will converge to one of the three types of GEV distributions shown in Table 1, we cannot immediately determine which one this is without information about the underlying process or distribution.If the underlying distribution of active sites is known, then the distribution of maximums is also known, and an extreme value analysis is moot.In the worst-case scenario, we cannot make any physical suppositions and are therefore left to fit data directly to the full three-parameter GEV distribution.By considering some properties of the GEV sub-classes however, we may be able to rule out some cases.For instance, we can consider the tail behavior of the distributions.
Both the nucleation temperature and the nucleation barrier are physically bounded.The nucleation temperature cannot be higher than the equilibrium melting point and cannot be lower than the homogeneous limit (roughly -40°C).In bulk systems that are not purified extensively, heterogeneous nucleation usually results in freezing occurring closer to the equilibrium melting point, making the Weibull distribution potentially a natural fit.Similarly, the nucleation barrier must be less than the homogeneous limit but must also be greater than zero.Since the nucleation barrier often varies many orders of magnitude in value, it is convenient to consider the logarithm of the value (i.e., ln ).The logarithm of the nucleation barrier does not inherently have a lower bound (since extreme value analysis deals with maximums, we must consider − ln  which conversely does not have an inherent upper bound).We can most likely rule out the Weibull distribution, leaving the Fréchet distribution as a plausible choice for the log-distributed nucleation barrier.In this case, the homogeneous nucleation barrier would provide a physically sensible lower limit.
Often the best option for assessing which probability distribution best describes the distribution is using the graphical method of the probability plot 22,53 .A good fit is indicated by the formation of a straight line when plotting the quantiles of the empirical distribution against the quantiles of a reference distribution.The probability plots in Figure 3 are linear in nature, indicating a decent goodness of fit has been obtained for the measured nucleation parameters.Since there is uncertainty in the underlying active site distribution and little data in the literature on extreme value distributions for either nucleation temperatures or nucleation rate parameters, it is still recommended that each extreme value distribution be explored when studying a new system.

A. Results from constant cooling rate experiments
The survival curves of the nucleation temperatures are shown in Figure 2 for a series of 23 constant cooling rate experiments performed in the 5ml system with PBS at 2°C/min.While the spread of nucleation temperatures for individual trials is typically within 1°C (average standard deviation of 0.33°C), nucleation temperatures across the set of experiments ranges nearly 7°C (standard deviation of average nucleation of 1.55°C).This is consistent with nucleation experiments in the literature that find narrow spreads for individual systems and large spreads for experiments conducted on multiple systems 54,55 .The stochastic behavior for individual nucleating catalysts is thus quite narrow, yet the characteristic temperature around which this purely stochastic behavior is centered can vary multiple degrees from system to system: a hallmark of non-self-averaging behavior, which indicates the potential suitability of extreme value theory.
The method of maximum likelihood estimation is used to determine the cumulative distribution function from Equation 7and obtain the respective nucleation parameters for each of the survival curves.As a check of whether the observed nucleation process is well described by the proposed empirical nucleation rate () =  exp(−/ 3 Δ 2 ), we can linearize the constant cooling rate CDF (see Deubener and Schmelzer 56 ).The linearized nucleation spectra are shown in Figure 1b, and their linear form suggests the observed nucleation process is generally well described by the proposed nucleation rate.

B. Distribution of extreme parameters
With the empirical survival curves and inferred nucleation parameters in hand, we can now evaluate the distributions of the extreme variables: average nucleation temperature, ⟨Δ f ⟩, and nucleation barrier, ln .Maximum likelihood estimation is used again for fitting of the extreme value distributions and the resulting distributions are shown in Figure 3 for the 5ml constant cooling rate experiments.
As discussed in Section II.C.2, the Weibull distribution is a physically plausible choice for describing the distribution of nucleation temperatures between samples because of the melting temperature upper bound.A slight modification is first made to the Weibull distribution as defined in Equation 16.By considering the degree of supercooling upon freezing, Δ f =  m −  f , instead of the absolute nucleation temeprature, the location of the upper bound is shifted from  m to zero.This effectively reduces the number of parameters in the Weibull distribution from three to two. Figure 3a depicts a histogram of the average nucleation temperatures along with the probability density function of the Weibull fit, and Figure 3b shows the Weibull probability plot of the average nucleation temperatures.Linearity in the probability plot indicates the ⟨Δ f ⟩ data are well described by the Weibull distribution.We find an average degree of supercooling upon freezing of ⟨Δ f ⟩ = 14.3°C and standard deviation of  ⟨Δ f ⟩ = 1.6°C.
In the case of the nucleation barrier, the full three parameter GEV distribution is fit to the nucleation barrier data and a best fit is found with the Fréchet distribution.This agrees with the intuition discussed in Section II.C.2 that the homogeneous nucleation barrier represents a natural limit.Wood and Walton 39 found a value of  hom = 1.66 × 10 12 K 5 (ln  hom = 28.14), and though this limit has not been strictly enforced when fitting the Fréchet distribution here, the probability density function evaluated at the homogeneous limit is on the order of 10 −60 and is small enough to exclude any nonphysical contributions.Figure 3c depicts a histogram of the nucleation barriers along with the probability density function of the fitted Fréchet distribution.Figure 3d shows the Frechet probability plot and the observed linearity suggests an appropriate fit.We find an average nucleation barrier of ⟨ln ⟩ = 25.4 and standard deviation of  ln  = 0.7.
Having determined the marginal distributions, we can now consider the joint distribution.To that end, we first compute the correlation parameter for the two extreme variables using Equation 19and find a value of  = 1.32.The joint probability density function, (⟨Δ f ⟩, ln ), is then calculated from Equations 18 and 20 and is shown in Figure 4a.Next, we can perform a variable substitution in order to obtain the PDF in terms of the nucleation rate parameters,  and , using the definition of the mean nucleation temperature form Equation 17.From the result, shown in Figure 4b, we see that the there is a narrow band of probable pairings, which indicates that nucleation parameters have a somewhat high degree of correlation and perhaps represents a fundamental property of heterogeneous nucleation.
In the context of classical nucleation theory (and shown in Equation 6), the heterogeneous nucleation barrier, Δ het * , is the product of the homogeneous nucleation barrier, Δ hom * , and a heterogeneous shape factor, , which accounts for a modified contact area between the growing solid ice cluster and the catalyst particle onto which it nucleates 57 : The shape factor,  het , is a function of the contact angle, , between the ice embryo and nucleating surface and can vary between (0°) = 0 (corresponding to complete wetting and no nucleation barrier) and (180°) = 1 (corresponding to complete hydrophobicity and homogeneous nucleation).Hydrophilic surfaces, which have small contact angles and also smaller nucleation barriers, are often polar in nature, and polar surfaces are known to orient liquid water molecules in their close vicinity.Therefore, a potential explanation for the scaling behavior seen in Figure 5c and correlation seen in Figure 4b could be that the reduction in mobility of liquid molecules caused by more potent hydrophilic nucleating surfaces results in a smaller preexponential diffusive coefficient.16.A narrow distribution indicates a strong correlation between the kinetic and thermodynamic nucleation parameters.Now that we have determined the distribution of the nucleation parameters from extreme value statistics, we can apply this model to useful ends.Since the 5ml systems are not a practical size for preserving anything beyond cell suspensions and small tissue constructs, we may be interested in seeing how the model can be used to predict nucleation behavior in larger, more relevant volumes.This will be covered in the following section.Additionally, the experiments from which the model is constructed, constant cooling rate experiments, do not represent the modality of supercooling employed when actually preserving a biological specimen.In reality, preservation is conducted at a constant temperature and at temperatures warmer than the supercooling limits probed in constant cooling rate experiments.We may therefore be interested in evaluating how the model predicts nucleation behavior in isothermal systems and as a function of time.This will be covered in Section III.D.

C. Scaling of the nucleation process with system size
A useful property of probability distributions is that the probability of independent events occurring together is the product of the independent probabilities (i.e., ( and ) = () ⋅ ()).If we consider a large system being comprised of a collection of smaller independent systems, we can generalize this rule to predict how probability distributions scale with system size.For example, if () is the cumulative distribution function for a system of size  1 , then the distribution for an identical system of size  2 is [ 1 ()]  2 / 1 .Following this logic, the size of the system can be readily incorporated into the GEV distribution yielding: where  0 is the volume of the system from which the model was developed, and  is the volume of an arbitrary system.If a process were to scale with surface area rather than volume, then the relevant surface area quantity could be used instead.In systems where heterogeneous catalysts are intentionally added in order to study their nucleation behavior or to initiate nucleation, the relative scaling parameter would be the total nucleant surface area which is then often converted to a concentration.
Incorporating the system size into the probability functions in this manner can help assess the validity of the statistical model developed thus far.For example, we can conduct a campaign of constant cooling rate experiments in a larger system and assess the level of agreement with the volume scaling predictions from the statistical model based on the 5ml data.
The 5ml and 20ml systems employed in this study have a volume ratio of  20 :  5 = 4 and a surface area ratio of  20 :  5 = 2.2.Although we can hypothesize that the petrolatum coating removes the possibility of ice nucleation on the container surface since it is a slippery hydrocarbonbased material similar to oils used to study homogeneous nucleation in emulsions, we cannot decisively conclude whether nucleation is initiated on the container surface during the constant cooling rate experiments and thus expect nucleation behavior to scale with surface area, or whether nucleation is initiated in the bulk of the fluid and thus expect nucleation to scale with volume.
An additional factor that must be considered is the temperature gradient that persists within the cylindrical volumes while being cooled at a constant rate.The 5ml chamber has an inner radius of 6.35mm and the 20ml chamber has an inner radius of 12.7mm.From a simple scaling analysis, we can approximate the temperature difference from the wall to the center axis of the cylindrical chamber, as well as the radial thermal penetration depth within which the temperature is approximately uniform.The radial thermal penetration depth for a constant wall temperature is given by  r = √4 and during constant cooling becomes: where  is the thermal diffusivity (~0.12 mm 2 /s for supercooled water), Δ is the temperature difference (°C), and  is the cooling rate (°C/s).For the 5ml system and a cooling rate of 2°C/min, we find that the center axis lags the wall temperature by ~2.6°C.For the 20ml system and a cooling rate of 1°C/min, the center lags the wall temperature by ~5.2°C.In the case of volume nucleation, if we assume that nucleation is occurring in the coldest part of the system, the entire volume likely would not contribute to nucleation but rather a portion of the volume within a certain distance from the inner wall.Ultimately, the uncertainty of where nucleation occurs represents a primary limitation for analysis of nucleation in these systems and future studies should investigate this more closely.Shown in Figure 5 are the three nucleation parameters ⟨ f ⟩, ln , and ln  computed as a function of volume, .The solid lines represent the median value of each parameter, and the shaded regions are bounded by the 25% and 75% quartiles.The nucleation temperatures are computed by incorporating volume into the Weibull distribution fit to the 5ml data.Similarly, scaling of the nucleation barrier, ln , is computed by incorporating volume into the Fréchet distribution fit to the 5ml data.The nucleation pre-exponential factor, ln , is obtained using the relation in Equation 17.
The average nucleation temperature (Figure 5a) scales relatively linearly with the logarithm of the volume, levelling off at larger volumes.The decreasing trend of the nucleation temperature with volume is caused by an increased likelihood for more potent nucleating catalysts in larger samples.Whereas in very small systems (such as micro-emulsions whose droplet diameters are often on the order of single microns) it may be possible to exclude all or most insoluble impurities (other than the encapsulating oil phase), larger systems inherently will contain more impurities.This result is consistent with the early findings of Bigg 58,59 , Langham and Mason 25 , and Mossop 60 and is in accordance with the singular model for the nucleation.The linearity is expected to be a universal property for nucleation of systems containing the same population of nucleating catalysts.The precise slope and position of this curve, however, is dependent on the particular source of water and the population of impurities that it contains.The scaling of the nucleation barrier, ln , behaves similarly to the scaling of nucleation temperature, i.e., the larger the system is, the higher the likelihood of more potent nucleating agents, and thus larger systems possess smaller characteristic nucleation barriers.
One result that would be difficult to predict without this analysis is how the nucleation preexponential factor, , scales with volume.This factor is often assumed to be the same for heterogeneous nucleation and homogeneous nucleation since in the context of CNT it is related to the diffusion of liquid molecules to the surface of the growing solid nucleus, and thus thought to be solely a property of the liquid and not of the nucleating catalyst.As discussed in Section II.C.2, for consistency of the bivariate statistical model, this parameter cannot remain constant and is actually constrained by the values of the other two parameters, ⟨Δ f ⟩ and .Interestingly, the value of  is predicted to decrease with increasing volume (Figure 5c), which is perhaps counterintuitive since a smaller  yields a lower nucleation rate for a constant value of .This behavior is in fact consistent with many studies of heterogeneous nucleation, and, as discussed in Section III.B, could be a result of reduced molecular mobility in the vicinity of potent active sites that are somewhat polar and hydrophilic.Regarding the width of the distribution, it is quite wide at small volumes and becomes quite narrow for larger volumes.It is difficult to ascribe a physical justification for this behavior since in the scope of this extreme value analysis, the parameter  is not an extreme variable, and as such, we can neither predict its distribution for the most potent active site nor can we comment on the value of  for the remainder of the active sites.Future studies should seek to validate this result and investigate potential physical origins.Also shown in Figure 5 are the median parameters from experiments performed in the 5ml and 20ml systems.The 20ml data is displayed in two instances, corresponding to scaling with volume and surface area.The same general trend (increasing vs. decreasing) is seen between the experiments and model prediction which supports the underlying premises of the model.The volume scaling produces the best agreement, suggesting that nucleation is not necessarily occurring on the container's inner surface.Future experimental studies should investigate this scaling behavior further.

D. Freezing probability versus time in systems at constant temperature
Having developed a statistical model that describes the random distribution of the kinetic and thermodynamic nucleation parameters and describes how they vary with system size, we can now address the practical objective of this research: determining the freezing probability of supercooled aqueous systems in order to identify conditions under which biological material may be held in a stable supercooled liquid state.The isothermal freezing probability, resulting from the homogeneous Poisson distribution of Equation 2 and the nucleation rate from Equation 6, is given by For systems with uncharacterized impurities, the nucleation parameters in this expression,  and , are random variables.The model developed in this study has treated the mean nucleation temperature and the nucleation barrier parameters as extreme variables, which enables the heterogeneous nucleation process to be described by the joint extreme value distribution in Equation 18.To incorporate the dependence on system volume, Equation 18 can be modified in accordance with Equation 22.By further transforming ⟨Δ f ⟩ to  using the definition of the mean nucleation temperature from Equation 17, we can obtain an expression for the probability density function in terms of  and : (⟨ f ⟩, ) → (, ).Finally, we can obtain a generalized expression for the isothermal freezing probability as a function of volume and time by integrating over all possible values of  and  weighted by their joint probability density.

25
This relation is used to compute the freezing probability, shown in Figure 6a, for physiological saline in a 5ml system with petrolatum-coated surfaces.We find that the logarithm of the probability scales approximately linearly with temperature for small probabilities ( < 0.5).The trend of freezing behavior with supercooled duration is also interesting.For a given freezing probability, varying the temperature by only a few degrees results in the corresponding induction time varying from one day to one year.This result may have the practical implication of extrapolating supercooled states observed to be stable on the order of days to being stable for much longer periods of time.The results of the 5ml isothermal experiments are also shown in Figure 6a.In each of these experiments, the effective quantity measured is the probability of the sample freezing within two hours of being supercooled.Average freezing probabilities of 23% (n=10), 48% (n=6), and 62% (n=18) are found for temperatures of -12°C, -13°C, and -14°C, respectively.As anticipated, the freezing probability increases with decreasing temperature, however the experimental values are consistently higher than those predicted by the model.Considerable variation is observed for the isothermal trials (repeated experiments at individual temperatures yielded standard deviations of roughly +/-30%).Though this, as well as the relatively small number of repeated trials, likely contribute to the disagreement with predictions, this general observation is in fact in accordance with predictions of the statistical model.Due to relatively large variation in active site potency between experiments, a high likelihood is predicted of either observing almost no freezing events within 2 hours or observing a freezing rate of nearly 100%.This highlights the difficulty of performing isothermal nucleation rate experiments as compared to constant cooling rate experiments as well as the difficulty of interpreting results from individual nucleation experiments.
Figure 5b depicts the freezing probability as a function of temperature and volume for a supercooling duration of 24 hours.This heat map can be of immediate use to the applied cryobiologist by informing the choice of preservation temperature based on the size of their system and appropriate freezing risk level.For example, accepting a 1% probability compared to a 0.1% chance of freezing would enable the storage temperature to be reduced by roughly 3-4°C.Accepting a 10% chance of freezing could enable the storage temperature to be reduced an additional 3-4°C.

IV. CONCLUSION
Supercooling, which describes the phenomenon of water remaining in a metastable liquid state below its equilibrium melting temperature, is an attractive premise for preserving biological material at low temperature yet in the absence of ice.Being a metastable process however, there is a nonzero probability at all times for freezing to occur, and this reality represents a significant challenge for the clinical translatability of any supercooling technique.Just as reactor engineers designing nuclear power plants need to know the probability of a meltdown occurring and civil engineers designing a suspension bridge need to know the likelihood of a high-magnitude earthquake, transplantation surgeons need certain assurance that their supercooled organ will not freeze during the storage period.Without methods to arrest ice growth before damage is imparted, quantifying freezing probability is expected to be central to enabling the widespread implementation of supercooling.
Ultimately, heterogeneous nucleation is a complex problem that is both incompletely understood from a molecular level and difficult to characterize from an experimental standpoint.Random active sites, existing as minute insoluble particles floating in solution or adhered to container walls as well as certain water-soluble macromolecules, can remain nearly undetectable in solution and produce seemingly unpredictable freezing behavior across repeated experiments.Predicting nucleation behavior in the presence of uncharacterized impurities remains an unsolved problem with broad implications.
In this study, we leverage the observation that nucleation often occurs repeatedly on the single most potent active site present in a system in order to approach the problem using the statistical theory of extreme values.This enables us to reduce the scope of the problem by circumventing the need to characterize every potential active site.We develop a joint singular and stochastic model based on data from constant cooling rate experiments in order to quantify the sample-to-sample variability and time-dependent intra-sample stochasticity.By capturing the variability of the kinetic and thermodynamic parameters governing heterogeneous nucleation, the model is able to predict the probability of nucleation as a function of temperature, volume, and time.Together, this approach constrains the multi-faceted probabilistic nature of heterogeneous ice nucleation and enables the rational design of supercooled biopreservation protocols.

Figure 2 :
Figure 2: Constant cooling rate nucleation temperature data.a) Empirical cumulative distribution functions for a series of 23 constant cooling rate experiments performed in the 5ml system with PBS at 2°C/min.b) Survival curves from panel a) linearized with respect to the

Figure 3 :
Figure 3: Extreme value distributions of mean nucleation temperature and nucleation barrier.a) Histogram and Weibull PDF of average nucleation temperatures from 5ml constant cooling rate experiments (n=23).b) Weibull probability plot of average nucleation temperatures.c) Histogram and GEV PDF for nucleation barrier from same experiments.d) Fréchet probability plot of nucleation barrier parameter.

Figure 4 :
Figure 4: Bivariate extreme value distributions.a) Joint probability density function, (⟨Δ f ⟩, ln ) computed from the marginal univariate extreme value distributions for the mean freezing temperature, ⟨Δ f ⟩, and nucleation barrier, .The distribution is centered at ⟨Δ f ⟩ = 14.3°C and ln  = 25.4.b) Probability density function from panel a) with the mean freezing temperature transformed to ln  using the definition of the mean nucleation temperature for a constant cooling rate experiment from Equation 16.A narrow distribution indicates a strong correlation between the kinetic and thermodynamic nucleation parameters.

Figure 5 :
Figure 5: Scaling of nucleation parameters with system volume.Scaling of a) nucleation temperature, ⟨Δ f ⟩, b) nucleation barrier, ln , and c) nucleation kinetic factor, ln , computed from the individual extreme value model based on 5ml data.Kinetic parameter, , is computed from the mean nucleation and nucleation barrier with the relation in Equation 16.Data points in each panel are from constant cooling rate experiments in 5ml (•) and 20ml (▲) systems.Location of the 20ml data corresponds to pure volume scaling (same color as 5ml data), and surface area scaling (gray).Solid curve denotes median value.Error bars and shaded regions denote 25%/75% quartiles.

Figure 6 :
Figure 6: Isothermal freezing probabilities.a) Freezing probability for a rigidly confined 5ml system with physiological saline and petrolatum-coated surfaces as a function of temperature and for various durations, computed from Equation 25.Data points correspond to average freezing probabilities from the campaign of isothermal nucleation experiments with a cut-off time of two hours.Error bars correspond to +/-one standard deviation.b) Probability of freezing within a time of 24 hours as a function of system volume and temperature, computed from Equation 25.