Recognition and binding of ice by proteins, crystals, and other surfaces is key for their control of the nucleation and growth of ice. Docking is the state-of-the-art computational method to identify ice-binding surfaces (IBS). However, docking methods require a priori knowledge of the ice plane to which the molecules bind and either neglect the competition of ice and water for the IBS or are computationally expensive. Here we present and validate a robust methodology for the identification of the IBS of molecules and crystals that is easy to implement and a hundred times computationally more efficient than the most advanced ice-docking approaches. The methodology is based on biased sampling with an order parameter that drives the formation of ice. We validate the method using all-atom and coarse-grained models of organic crystals and proteins. To our knowledge, this approach is the first to simultaneously identify the ice-binding surface as well as the plane of ice to which it binds, without the use of structure search algorithms. We show that biased simulations even identify surfaces that are too small or too weak to heterogeneously nucleate ice. The biasing simulations can be used to identify of IBS of antifreeze and ice nucleating proteins and to equilibrate ice seeds bound to an IBS for the calculation of heterogeneous ice nucleation rates using classical nucleation theory.
Molecular recognition of ice by ice-binding molecules (IBMs) is key for sustaining life in sub-freezing conditions,1–3 cold storage of cells and organs as well as for the frozen food industry.4–9 Molecules that bind ice create a metastable curved ice–liquid interface when ice tries to grow under supercooled conditions, halting the growth of ice.2,10,11 Ice-binding is not only central for molecular recognition12,13 and the control of ice growth and recrystallization11,14 but also for the nucleation of ice.15,16 Surfaces that bind ice can decrease the cost of creating an interface for the ice nucleus, lowering the free energy barrier for ice nucleation.17 A better understanding of how different molecules bind ice and what is the ice face to which they bind could support innovative strategies for applications ranging from cryopreservation of organs to climate modeling.
Ice-binding molecules are chemically and structurally diverse, encompassing inorganic surfaces, such as AgI, kaolinite, and carbon surfaces,18–22 organic molecules, such as alcohol monolayers,18 phloroglucinol dihydrate,23 and biological molecules, such as antifreeze and ice nucleating proteins.24–27 Most of these substances reconstruct the order of water at the ice interface upon binding, creating distinct interfacial transition layers (ITLs) that do not have the order of water in ice.28,29 The large diversity in the chemistry, structure, and size of the ice-binding molecules and surfaces, as well as the specific ice-planes to which they bind, calls for a robust and efficient method to identify their mode of binding to ice.
Different approaches are used to identify ice-binding surfaces (IBS). These include experimental site-directed mutagenesis,30–32 computational docking methods,33,34 and algorithms based on geometric and dynamic properties.35 In recent work, Kozuch et al. identified the IBS of a variety of antifreeze proteins (AFP) in simulations by searching for flat regions in the protein surface—which are optimal for binding flat ice surfaces—followed by their ranking according to their area and the lifetime of hydrogen-bonds in their hydration shell.35 Although this method has been successful at identifying binding surfaces of several molecules, it does not provide information on the mode of binding of the IBS to ice nor what is the ice face to which it binds. Molecular docking is the state-of-the-art computational technique used to identify the ice-binding sites of proteins and the ice faces to which they bind.34,36 Docking relies on finding the best complementary fit between ice and the IBS. The fit is evaluated geometrically—through a shape correlation function37—and chemically—through the scoring of binding energies or free energies.33,34,38 The assumption of a lock and key mechanism of binding between proteins and ice led to numerous docking studies of proteins to various ice–vacuum interfaces.38,39 These studies neglected the contribution of liquid water to the binding free energy of the proteins to ice and the reordering of interfacial water at the protein–ice interface. The reordering of the interface to produce a distinct interfacial transition layer is essential for the recognition of ice by several classes of antifreeze proteins,29 as well as in the nucleation of ice by proteins29 and several organic28,40 and inorganic crystals.41 A random structure search algorithm combined with molecular dynamics (MD) simulations has been recently proposed to circumvent the limitations of the absence of reconstruction at the IBS–ice interface.42 That algorithm efficiently samples the configurations of water at the crystal–ice interface. However, same as other docking procedures, the algorithm requires the user to input the candidate ice-binding planes. Then, the best binding plane among those tested is assigned after performing seeding growth simulations.42 To date, there is no computationally efficient methodology to simultaneously identify the ice-binding surface of a molecule, the plane of ice to which it binds, and the structure of the interfacial water between ice and the surface.
Strong binding of ice-nucleating molecules to ice is a requisite for their effectiveness as ice-nucleating agents.15 The stronger the binding of the molecule to ice, the more stabilized is the ice nuclei and higher the heterogeneous freezing temperature.15,18 Therefore, spontaneous ice nucleation can be used as a means to identify the ice-binding site of ice-nucleating molecules. However, ice nucleation is a stochastic process with characteristic times orders of magnitude longer than the times accessible through all-atom molecular simulations. Therefore, the use of efficient coarse-grained models at deep supercooling or rare event sampling is necessary to overcome the large free energy barriers for the nucleation of ice.43
Umbrella Sampling (US)44,45 has been used to compute the free energy of ice nucleation along order parameters that favor the crystallization of water.46,47 To our knowledge, US has been used only once to compute the free energy barrier for heterogeneous ice nucleation to address the question of whether clathrate hydrates can nucleate ice.47 That study used a harmonic restraint on the global bond-order parameter Q6 to form ice on a clathrate surface. As US calculations require sampling of multiple biasing simulations along the order parameter, they are typically expensive.
Here, we show that a single biasing simulation with an order parameter that drives the formation of ice suffices to simultaneously identify the ice-binding surface of a molecule or crystal, the plane of ice to which it binds, and the structure of interfacial water in the binding motif. The procedure is based on the assumption that ice will form where it is stabilized by interaction with an ice-binding surface. The size of the largest ice crystallite would be the ideal order parameter to drive the biasing simulations because it is the reaction coordinate for homogeneous48 and heterogeneous49–51 ice nucleation. Here, we drive the formation of ice with the global bond-order parameter Q6 as that order parameter is already implemented for biasing simulations with the open source massively parallelized molecular dynamics software lammps.52 We test and validate the biasing approach for the identification of the ice-binding surface and ice plane for extended and finite ice-binding surfaces, discuss the best selection of the parameters for the biasing simulations and the range of validity of the method, and compare its computational cost with alternative state-of-the-art methodologies. We validate our approach for all-atom and coarse-grained models of organic surfaces and proteins and demonstrate that biasing simulations can successfully identify the IBS and mode of binding of molecules that bind ice through a distinct interfacial transition layer with order different from that of ice, as well as the IBS of surfaces that are too small or too weak to favor heterogeneous nucleation over homogeneous nucleation of ice in cooling simulations. We demonstrate how to apply the biasing simulations to identify multiple coexisting ice-binding surfaces in a system. We present a thorough discussion of the selection of the simulation parameters and the computational cost of the method and conclude with a discussion of the advantages, limitations, and uses of the biasing approach for the study of molecular recognition of ice and ice nucleation.
A. All-atom models
We model phloroglucinol dihydrate with the OPLS force field53–55 and water with TIP4P/Ice.56 The charges on the sites of the phloroglucinol molecule are calculated from the Mulliken electronic population at the B3LYP/6-31G(d,p) level of theory using the Gaussian 09 package.57 The Lennard-Jones (LJ) parameters53–56 and charges for atoms in phloroglucinol are given in the supplementary material, Table S1. The cross interactions between different atoms are computed using the Lorentz–Berthelot combining rule.
We build the phloroglucinol dihydrate surface by cleaving the experimental crystal structure of phloroglucinol dihydrate58 along the (010) face with a slab thickness of ∼1.4 nm using the visualization program VESTA.59 Then, we place a water slab on the surface with a lateral box dimension of ∼5.4 × 5.7 nm2. We enlarge the simulation cell along the normal to the water slab to 10 nm to create a slab of vacuum that allows water to expand as it freezes. We keep the atoms of phloroglucinol (1,3,5-trihydroxybenzene) fixed during the simulations on the experimental lattice sites, except for the hydrogen atom of the hydroxyl groups, which are free to move according to the OPLS potential. The simulation cell contains a total of 13 704 atoms, of which 10 344 are from water molecules. To calculate the average value of the global bond-order parameter Q6 of bulk water, we equilibrate 4096 TIP4P/Ice water molecules in a periodic simulation cell at 260 K and 1 bar.
B. Coarse-grained models
Water is modeled with the monatomic mW model,60 which has been widely used to reproduce the structure and thermodynamic properties of water and ice, as well as their interfaces.3,48,60–76 The equilibrium melting temperature of hexagonal ice in the mW model is Tm = 273 ± 0.5 K.68 Phloroglucinol is modeled at the united atom (UA), i.e., without explicit hydrogen atoms. We model the non-bonded interaction between a –C– and –CH– fragment of the phloroglucinol molecule and mW water by a 12-6 Lennard-Jones (LJ) potential with ε = 0.17 kcal mol−1 and σ = 0.3536 nm, previously parameterized for the interaction between united atom alkanes and mW water.73,77 The LJ interactions are switched off with a cutoff distance of 1.4 nm. We consider that the interaction of the hydroxyl group of phloroglucinol with mW water is identical to that between mW water molecules. We do not compute the interactions between phloroglucinol molecules in the surface of the phloroglucinol dihydrate crystal because the positions of the phloroglucinol molecules are restrained at the experimental lattice positions.
To compute Q6 of bulk ice modeled with mW, we use a box of 512 molecules for hexagonal ice with dimensions of 2.3 × 2.6 × 2.8 nm3 and 516 molecules for cubic ice and cell dimensions 2.4 × 2.4 × 2.4 nm3. The structure of stacking disordered ice depends on the temperature at which it is grown.68 Here, we compute Q6 of large-scale simulation cells of stacking disordered ice grown in Ref. 68 with the mW water model at 270 K and 210 K to completely fill simulation cells with dimensions of 9.2 × 10.5 × 34.6 nm3 containing 112 600 water molecules. Q6 for the coarse-grained liquid water was computed using a simulation box with 9216 water molecules and dimensions of 5.4 × 5.7 × 8 nm3. The simulation cell used for the crystallization of ice starting from liquid contains a total of 41 784 atoms and has dimensions of 10.3 × 10.3 × 10.3 nm3.
To perform the biasing simulations of ice crystallization with a phloroglucinol dihydrate surface, we prepare a simulation cell that contains the (010) face of phloroglucinol dihydrate exposed to liquid water. The cell contains a total of 40 192 particles, of which 24 064 are water molecules, in a 10.8 × 11.3 × 10.0 nm3 simulation box. The z direction includes a slab of vacuum that allows water to expand as it freezes. To compare with the results of all-atom simulations, we also prepare a coarse-grained system with the same surface dimensions and thickness of the all-atom phloroglucinol surface, as well as the same number of water molecules in the simulation cell.
Cholesterol is modeled with the united atom model using the same interaction parameters as in phloroglucinol. We perform biasing simulations of simulation cells in which the (001) face of cholesterol monohydrate is exposed to water. The periodic simulation cell is triclinic with surface dimensions of ∼9.9 × 9.7 nm2; the z dimension includes a slab of vacuum to allow for the expansion of water as it freezes. The cell contains 20 736 water molecules.
The antifreeze protein of the beetle Tenebrio molitor, TmAFP, is modeled as a rigid body at the united atom level compatible with mW water, as in Ref. 29. Fully flexible all-atom simulations indicate that the protein binding site does not have conformational flexibility, that it is quite rigid, and that it binds to ice through the same anchored clathrate motif as the rigid coarse-grained model.29,78 Moreover, the rigid coarse-grained model of TmAFP nucleates ice15 with the same efficiency as TmAFP in experiments.16 The also rigid model ice nucleating proteins N-loop TmINP are made as in Ref. 15 by repeating N = 2–8 times the 12 residue loop sequence TCTNSQHCVKAN of TmAFP from the crystal structure 1EZG, at the same distances that these loops have, in average, in that crystal structure. Simulation cells of TmAFP and the N-loop TmINP contain a single rigid protein placed in a box of mW water. Simulation cells with TmAFP have a total of 5223 water molecules and dimensions of 4 × 4 × 4 nm3. Simulation cells used to test the ice formation on the IBS for different N-loop TmINP contain 43 985 water molecules with box dimensions of 13 × 13 × 7 nm3.
We model polyvinyl alcohol (PVA) with the united atom (UA) model compatible with the mW model, introduced in Ref. 3 and previously validated for the study of ice recognition,3 nucleation,79 and recrystallization inhibition.9,11 We prepare two systems containing a single PVA with degree of polymerization 20 immersed in liquid water: (a) a single flexible PVA molecule with a starting configuration from Ref. 3 and (b) a rigid extended PVA extracted from a configuration from Ref. 9 in which the molecule was bound to the ice–liquid interface. The flexible and rigid PVA are in periodic simulation cells of dimensions 7.4 × 5.4 × 5.7 nm3 that contain 4096 mW and 4252 mW molecules, respectively. Liquid water in the initial configuration forms a ∼5 nm width slab periodic in y and z with a slab of vacuum to allow for the expansion of the liquid in the x direction as it forms ice in NVT simulations.
To investigate whether the biasing approach can be used to identify competing ice-binding sites, we prepare a periodic simulation cell with dimensions 6.6 × 7.8 × 12.7 nm3. The cell contains a slab of 7725 mW water molecules (dimensions 6.6 × 7.8 × 7.7 nm3) in contact with a surface parallel to the xy plane that contains two ice-nucleating patches. Each ice-nucleating patch is a rigid alcohol monolayer with a radius of 1 nm (ice-binding area of 3.14 nm2). Each patch has a lattice mismatch to mW ice18 δa = 0.4% and δb = 0.3% and a three-carbon hydrocarbon tail. The distance between the centers of the patches is 4.2 nm; the ice-nucleating patches are surrounded by a coplanar non-ice nucleating disordered hydrocarbon surface. The hydrocarbon and the alcohols are modeled with the UA model of Ref. 18, tuning the strength εWOH of the water–OH interaction to be 90%, 100%, and 110% of the strength of the water–water attraction εWW in the mW model.60 We call patches with εWOH = 90%, 100%, and 110% of εWW IBS-1, IBS-2, and IBS-3, respectively. We prepare systems in which both patches are of the same type and cool these systems to determine the temperature at which ice nucleates for patches of a given εWOH. We prepare systems in which one patch is IBS-1 (εWOH 90% of εWW) and the other is either IBS-2 (100%) or IBS-3 (110%) to investigate the identification of different nucleation sites with the biasing simulations.
C. Simulation settings
Molecular dynamics simulations are performed using lammps.52 The equations of motion are integrated with the velocity Verlet algorithm with a time step of 2 fs in all-atom simulations and 5 fs in coarse-grained simulations. Except for the simulations to test energy conservation, which are performed in the NVE ensemble, the simulations are performed in the NVT ensemble with periodic boundary conditions. The temperature is controlled through a Nosé–Hoover thermostat with a relaxation time of 0.1 ps in the all-atom simulations and 2 ps for the coarse-grained simulations.
The non-bonded interactions in all-atom simulations are calculated with a long-range van der Waals tail correction.80 Long-range Coulomb interactions are computed using the Particle-Particle-Particle-Mesh (PPPM) TIP4P K-space solver of lammps52 with a precision of 10−4. A cutoff distance of 1 nm is used for both the LJ and Coulombic terms. The bonds and angles of the TIP4P/Ice water molecules are constrained to the equilibrium values using the SHAKE algorithm.81
In all the ice-biasing simulations studies, we first equilibrate the simulation cell at 300 K in the absence of a bias. We then use biasing simulations to perform isothermal crystallization of water on the phloroglucinol dihydrate surface at 260 K for all-atom and at 260 K and 265 K for coarse-grained simulations. The united atom simulations with cholesterol monohydrate are evolved at 265 K, those with proteins are carried out at three different temperatures, 230 K, 245 K, and 260 K, and those of PVA in water are carried out at 230 K. The biasing simulations with the two competing circular ice-nucleating alcohol monolayer surfaces are performed at 230 K, which is significantly above the temperature of heterogeneous nucleation of ice in these surfaces. The ice nucleation temperatures of these patches are obtained by cooling the systems with two identical patches from 220 K to 200 K at a rate of 1 K ns−1 and detecting the temperature of formation of ice with the decrease in the potential energy and the CHILL+ algorithm.66
D. Biasing the formation of ice with a harmonic restraint on the global bond order parameter Q6
Spherical harmonics are widely used in the literature to identify crystal phases and to compute free energy landscapes and nucleation barriers.29,82,83 In this study, we bias ice formation through a harmonic potential U on the global bond-order parameter Q6 as follows:
where KQ is the force constant of the harmonic restraint, Nw is the number of water molecules to be restrained (all water in the simulation cell), and Q6o is the target value of the order parameter in the restrained simulations. The global bond-order parameter Q6 is defined as83,84
where Nw is the number of water molecules in the system, Ym6(θij, φij) are the complex spherical harmonic functions of order 6, and (θij, φij) are spherical coordinates that indicate the orientation of the vector rij between the positions of the water i and its water neighbor j. The position of the water molecules in the TIP4P/Ice model is given by the location of the O atom. σ(rij) is a switch function that smoothly decays from 1 to 0 in the interval between 3.0 Å and 3.5 Å to ensure that only the first neighbor shell of water molecules is considered in the evaluation of the order parameter and that the bias can be applied smoothly in molecular dynamics simulations. Nb(i) is the number of neighbors of molecule i, defined by
The normalization by the sum of the number of neighbors of all water molecules makes Q6 an intensive, per particle, variable. We apply the harmonic restraint on Q6 to molecular dynamics simulations in lammps using a code developed by Clemens Moritz and Andreas Singraber. This implementation has been previously applied to form ice at clathrate hydrate interfaces47 and to control the amount of ice in multi-phase ice–liquid simulations.29,78,85 We apply the harmonic restraint to the simulations to promote the formation of ice, which we detect with the CHILL+ algorithm.66 When we report simulation of a system at various values Q6o, we perform the simulations sequentially from lower to higher values of Q6o, except when otherwise indicated, advancing Q60 in increments of 0.02 for the simulations with phloroglucinol dehydrate and in steps of 0.01 for the simulations with TmAFP, TmINP, and PVA.
In Sec. III, we first investigate the effect of KQ from 10 kcal mol−1 to 500 kcal mol−1 on the formation of ice in the presence of phloroglucinol dihydrate and then perform the final simulations with KQ = 50 kcal mol−1 or 100 kcal mol−1 for the range of Q6o indicated in Table I. The simulation time per target Q6o was 5 ns or 6 ns, except for the flexible PVA molecule, the same as for the simulations used for testing of KQ values.
|System .||KQ (kcal mol−1) .||Q6o range .|
|Rigid alcohol patches||50||0.05|
|System .||KQ (kcal mol−1) .||Q6o range .|
|Rigid alcohol patches||50||0.05|
III. RESULTS AND DISCUSSION
A. Selection of parameters of the biased simulations
In this study, we use biasing simulations with a harmonic restraint to the global bond order parameter Q6 to promote the formation of ice on ice-binding organic surfaces and biological molecules. Table II shows the equilibrium value of Q6 per molecule in simulations of water without a restraint: the values of the order parameter for hexagonal and cubic ice are the same and quite different from those of liquid water. This is a necessary condition for an order parameter that can drive the formation of ice.
|Phase .||Water model .||⟨Q6⟩ .|
|Liquid||TIP4P/Ice||0.01 ± 0.001|
|Liquid||mW||0.01 ± 0.001|
|Cubic ice||mW||0.53 ± 0.005|
|Hexagonal ice||mW||0.53 ± 0.004|
|Phase .||Water model .||⟨Q6⟩ .|
|Liquid||TIP4P/Ice||0.01 ± 0.001|
|Liquid||mW||0.01 ± 0.001|
|Cubic ice||mW||0.53 ± 0.005|
|Hexagonal ice||mW||0.53 ± 0.004|
The variables of the biasing simulations are the number of water molecules Nw in the simulation cell, the simulation temperature and time for each biased sampling simulation, the target value of the order parameter Q60, and the harmonic force constant per molecule KQ. We discuss these here and in the context of the different types of ice-binding surfaces throughout Secs. III B–III E.
1. Number of water molecules Nw
The rate of homogeneous nucleation is proportional to the volume of water, while the rate of heterogeneous nucleation is proportional to the area of the IBS. To minimize the probability of homogeneous nucleation, Nw should be the smallest possible that does not interfere with the solvation structure of the molecule/surface and inhibits the development of bound ice. We recommend starting with at least 1.5 nm width of water around the surfaces and no more than ∼5 nm.
2. Simulation temperature T
The temperature should be high enough that the critical nucleus N* for homogeneous ice nucleation in the water model contains at least ∼500 molecules but preferably 1000 or more. As the heterogeneous nucleus is always smaller than the homogeneous one at a given temperature, this ensures that the biasing simulations form a crystallite with at least 500 molecules that is still subcritical to prevent the formation of a large amount of stacking disordered ice that, as we show in Sec. III C and supplementary material Table S2, breaks the linear relationship between Nice and Q6. In this study, we perform all biasing simulations at 230 K or higher because the homogeneous critical ice nucleus contains ∼500 molecules at 230 K.48
3. Simulation time t of the biasing simulation
The lower bound to the simulation time is set by the relaxation time of liquid water at the temperature of the simulations. If the IBS is flexible, the simulation time t should be larger than the one required for sampling its configurations. We show in Sec. III E that these times become impossibly long for fully flexible polymers such as PVA, making the biasing simulations impractical for the identification of their binding surfaces. In the simulations with rigid surfaces, we selected t to be ∼5 ns, which is much longer than the relaxation time of the water and also of the global order parameter Q6 (supplementary material, Fig. S6). We find that 5 ns–6 ns sampling for each biasing simulation is sufficient for the rigid ice-binding surfaces of this study (Table I) as these times are at least 40 times longer than the time τσ it takes for a water molecule to diffuse a molecular diameter for the selected model and temperature. For example, for the simulations with mW water, we use a temperature no lower than 230 K, for which τσ = 12 ps, and for TIP4P/Ice water, we perform simulations at 260 K, for which τσ = 120 ps (τσ = 2 ns for TIP4P/Ice at 230 K). Moreover, we find that 5 ns biasing simulations are sufficiently long to exchange at least 50% of the molecules of a crystallite containing ∼500 molecules with the surrounding liquid for both the mW and TIP4P/Ice models at the temperatures considered in this study (230 K–265 K for mW and 260 K for TIP4P/Ice). We consider the exchange of water between the crystallite and the liquid important for equilibrating the contact angle of the crystallite and its binding motif to the surface. Equilibration of the contact angle of larger crystallites may require longer simulation times under the bias. There could be simulations, however, in which although Q6 reaches a value consistent with the presence of a crystallite, no ice is formed in the system. Such metastable liquid states would not exist if the order parameter (OP) was the reaction coordinate. Although, in principle, it would be possible to leave these liquid metastable states with high Q6 and form a crystallite by performing a long simulation, we find that it is computationally more efficient to perform a separate simulation with a larger Q6o.
4. The strength of the harmonic bias KQ
To identify the range of values of KQ that can be used in the biased simulations, we perform simulations of water in contact with the (010) face of phloroglucinol dihydrate at 260 K using coarse-grained models. We bias the simulations with Q6o spanning from 0.03 to 0.15 to sample states that encompass from liquid to a mixture of liquid and ice. Figure 1(a) shows the fraction of ice in the simulation cell for KQ ranging from 5 kcal mol−1 to 500 kcal mol−1 Fig. 1(b) shows that the lower values of KQ result in significant fluctuations of the order parameter, resulting in large deviation of the actual Q6 from the target value. The systematically lower values of Q6 compared to Q6o = 0.11 in Fig. 1(b) are not unexpected as the system is trying to surmount a steep free energy barrier to form ice at 260 K, resulting in a force toward liquid-like values of the order parameter. The broad distribution of Q6 is not a fundamental issue and can be beneficial for the use of the harmonic restraint to compute the free energy as a function of the order parameter. However, for the purpose of this work of promoting the formation of ice on an ice-binding surface, a higher value of KQ is preferred because it allows for shorter sampling time to form ice and identify the binding surface. The upper limit of KQ that can be used is given by the need of energy conservation in NVE simulations with the selected time step. We perform a series of 25 ns long NVE simulations of the large coarse-grained simulation cell of phloroglucinol dihydrate in contact with water at 260 K, with and without the harmonic restraint with Q6o = 0.15 (supplementary material, Fig. S1). Although the energy conservation is excellent, within 0.004%, for KQ = 500 kcal mol−1, for this high value of KQ, we detect a small drift in the energy with time (supplementary material, Fig. S1). For KQ = 100 kcal mol−1, the energy is conserved within 0.002%, and there is no observable drift; hence, we consider that to be a safe value for the use in the bias simulations. While a lower value of KQ may be more efficient to optimize the number of windows sampled for free energy calculations along Q6 using umbrella sampling, our biasing simulations for crystalline surfaces (Sec. III B), proteins and other finite systems (Sec. III C), one-dimensional ice-binders (Sec. III D), and competing ice-binding surfaces (Sec. III E) indicate that KQ between 50 kcal mol−1 and 100 kcal mol−1 per water molecule is suitable to bias the simulation toward the formation of ice and typically results in a Q6 that is about 80% of the value of the target Q6o.
5. Target value of the order parameter Q6o
A central assumption of our methodology is that rising Q6 increases the amount of ice in the simulations. This is verified in Fig. 2, which shows the fraction of ice in biased simulations of water in contact with phloroglucinol modeled with the coarse-grained [Fig. 2(a)] and all-atom [Fig. 2(b)] models at 260 K with KQ = 100 kcal mol−1 as a function of the average value of Q6 along each biased simulation. We find that both the atomistic and coarse-grained simulations show an almost linear relationship between the fraction of ice and Q6, although the slopes are not identical. The latter is due to the existence in each simulation of a different amount of interfacial ice that covers the ice grains and has local order intermediate between liquid and ice (see empty blue symbols in Fig. 2). The overall linearity in the fraction of ice (hexagonal plus cubic ice, filled blue symbols) with Q6 indicates that the biased simulation responds by forming ice and not following a metastable path in which the order parameter is satisfied by deforming the structure of liquid water.
The biasing simulation should aim at forming an ice nucleus that is at least a few hundred molecules large, to elucidate its binding to the surface, but smaller than the critical nucleus size for homogeneous nucleation at the temperature of the simulations to avoid driving so strongly the formation of ice that may result in homogeneous nucleation or the formation of so much ice that it is difficult to assess whether the nucleation started at the surface or in the bulk of the liquid. As a single crystallite accounts for most of the ice molecules in each biased simulation (supplementary material, Fig. S7) and there is an almost linear relation between the fraction of water crystallized and the value of Q6 (Fig. 2), we can estimate the number of water molecules in the ice nucleus as Nice = (Nw × Q6)/Q6ice, where Nwater is the number of water molecules in the simulation cell and Q6ice = 0.53 (Table II). For this estimate, we can further assume that Q6 ≈ 0.8 × Q6o, as supplementary material, Fig. S3 shows that Q6 typically lags target value Q6o by about 20% when KQ = 100 kcal mol−1.
B. Ice binding by extended crystalline surfaces
We first validate our method by comparing ice formation in biased simulations against ice formation in spontaneous cooling simulations for water in contact with the organic crystal phloroglucinol dihydrate represented with the coarse-grained model. Phloroglucinol dihydrate is a potent ice-nucleating agent that spontaneously nucleates ice at 248 K in coarse-grained simulations at a cooling at a rate of 1 K ns−1.28 This organic crystal nucleates hexagonal ice bound to the surface through its secondary prismatic face.28 The binding between ice and the surface is not direct, but it is mediated by an interfacial transition layer (ITL) that does not resemble the structure of ice.28 As the orientational order between water molecules in the ITL does not resemble that in ice,28 this surface provides a challenge for the use of Q6 to promote heterogeneous ice nucleation.
We perform biased simulations at 265 K, which is more than 15 K higher than the heterogeneous freezing temperature of phloroglucinol determined by cooling at 1 K ns−1 the coarse-grained model.28 The nucleation barrier for mW ice on phloroglucinol dihydrate is ∼90 kBT at 265 K (supplementary material, Sec. B). This barrier cannot be surmounted in brute force simulations in the absence of a bias. Figure 3 shows the snapshots of ice nucleation on the (010) face of phloroglucinol dihydrate from a bias simulation at 265 K. The biased simulations correctly identify that ice forms heterogeneously at the surface [Fig. 3(a)] and not homogeneously in the bulk of the liquid. Hexagonal ice forms on the surface, bound to the organic crystal by its secondary prism face through a distinct interfacial transition layer (ITL) in which the water molecules form pentagons and boat-like deformed six-member rings with the hydroxyl groups of phloroglucinol [Fig. 3(b)]. The ice polymorph, its mode of binding to the surface, and the structure of the ITL between the surface and ice are identical to those in the unbiased cooling simulations of Ref. 28. These results indicate that the bias in Q6 does not alter the mode of binding of ice to the surface, even when the binding is mediated by a structure of interfacial water that is not ice-like.
The simulation cells used for the biasing simulations above are relatively large, with 24 064 water molecules in contact with a 10.8 × 11.3 nm2 periodic surface of phloroglucinol dihydrate. To understand whether the biasing simulation can identify the ice-binding motif and binding plane for smaller systems, we perform simulations with a cell that has 1 4 of the surface and 3448 water molecules. We find that the small surface forms hexagonal ice, bound through the secondary prismatic face through the same ITL as the large system (supplementary material, Figs. S2a and S2b show the evolution of the formation of ice in the small cell). These results indicate that even relatively small simulation cells can be used to identify the ice-binding face and its mode of binding to a surface.
The high computational cost of all-atom models, which are about 100 times more expensive than the coarse-grained models of this study, together with the steep increase in the relaxation times of water on cooling for these models, makes the heterogeneous formation of ice in unbiased cooling simulations extremely challenging, even for small simulation cells. Here, we use biased simulations to promote the formation of ice in the all-atom model of the phloroglucinol dihydrate surface. We use a simulation cell for the all-atom model with the same size as the small one used in the coarse-grained simulations above. Figure 4 shows that the biasing simulations result in heterogeneous nucleation of ice. The all-atom model forms hexagonal ice bound to the surface of the organic crystal through its secondary prismatic face through an ITL with the same structure as in the coarse-grained model (compare Figs. 3 and 5). These results not only show that the coarse-grained and atomistic models have the same mode of binding to ice but also—most importantly for the present study—that the biasing simulations can be used to form ice on a surface using all-atom models and identify the face and mode of binding of ice to the surface.
Biasing simulations along Q6 have been used before to compute the free energy of nucleation of ice on a clathrate surface.47 This required the sampling of multiple Q6o windows that resulted in overlapping distributions of Q6 to reconstruct the free energy landscape. Here, we are not interested in determining the free energy as a function of the order parameter but only in identifying the ice-binding surface and its mode of binding to ice in a computationally efficient manner. Hence, our goal is to identify the mode of binding through a single simulation that bias to an arbitrary value of Q6 consistent with the formation of an ice crystallite, starting from an initial configuration that has only liquid water.
To identify the mode of binding, we aim at forming a crystallite with 500–1000 water molecules. Using the relation between Q6o and Nice discussed above, we expect that Q6o = 0.13 would produce a crystallite with Nice ≈ (Nw × 0.8 × Q6)/Q6ice = 680 molecules. We test the approach by performing a single biased simulation with Q6o = 0.13 starting with the phloroglucinol surface in contact with liquid water using both the coarse-grained and all-atom models. In both cases, we find an ice crystallite with a core of ∼700 molecules that forms heterogeneously at the phloroglucinol surface and that the ice is bound through the same face (secondary prismatic) and the same distinct interfacial transition layer as in the simulations in which ice is built by slowly increasing the target order parameter. Moreover, the amount of ice formed on the surface is the same when Q6o is increased progressively to 0.13 (blue symbols in Fig. 2) or when Q6o = 0.13 is applied directly to the surface in contact with liquid water (red symbols in Fig. 2). These results show that it is possible to identify the mode of binding of a surface to ice with a single ice-biasing simulation.
We find, however, that if simulations starting with all liquid water are biased directly into a high Q6o value (e.g., 0.21, which would convert about half the water into ice), then ice sometimes forms heterogeneously and sometimes homogeneously, challenging the identification of the ice-binding motif and plane. These results suggest that a single simulation that aims to transform ∼500–1000 water molecules into ice may be sufficient to identify the binding surface, ice plane, and the structure of the binding motif.
We test the single biasing simulation approach for the identification of the binding site with a coarse-grained model of the (001) face of cholesterol monohydrate—which is a much less efficient nucleant than phloroglucinol dihydrate: the (001) face of cholesterol surface forms ice with the mW model at Thet = 210 K on unbiased cooling simulations at 1 K ns−1;95 i.e., it needs 38 K more supercooling than the (010) face of phloroglucinol dihydrate considered above. We first perform a single biased coarse-grained simulation with Q6o = 0.13 at 265 K for the (001) face of cholesterol monohydrate and find that about 50% of the water transforms into ice (of which 25% is cubic or hexagonal, the other 25% is interfacial ice). As that amount of ice is too much to conclusively establish that its formation is heterogeneous, we perform a second bias simulation with Q6o = 0.07 and find a clear ice nucleus bound to the ice surface (supplementary material, Fig. S4). The ice nucleus has the structure of cubic ice and is bound to the surface by the (100) plane. The face of ice bound to the cholesterol surface and the structure of the interfacial transition layer that connects it are the same as in the previous cooling simulations with the same coarse-grained model95 and all-atom Forward Flux Sampling (FFS) simulations.40 We conclude that a biased simulation that promotes the transformation of a relatively small fraction of liquid into ice suffices to identify the face of ice and its mode of binding to the surface.
The ice crystallites produced by the biasing simulations are equilibrium structures subject to the restraint in the amount of ice given by the value of Q6. We find that not only ice is bound to the ice-binding surface by the face that produces the most stable binding and through the proper interfacial binding motif but also that the contact angle between the ice nucleus and the surface can be equilibrated in the biased simulations. For example, we find that the contact angle the ice nucleus with the cholesterol surface is ∼120° ± 5°along the biased simulation (supplementary material, Fig. S4), consistent with ∼125° ± 5° previously obtained for this system with the coarse-grained model in cooling simulations95 and for the all-atom model in FFS simulations.40 We find that the contact angle equilibrates in about 10–15 ns through reconstruction of the structure of the crystal nucleus bound to the surface. The computationally efficient (see Sec. III F for computational costs) creation of equilibrated ice seeds properly bound to a surface makes the formation of ice through biasing simulations optimal to start unbiased seeding simulations for the calculation of heterogeneous ice nucleation rates.42
C. Ice binding by proteins and other finite surfaces
Above, we used biasing simulations to identify the mode of binding and ice face that binds to extended, periodic surfaces. Our aim in this section is to assess the ability of biasing simulations to identify the ice plane and mode of binding of finite, small surfaces of ice-binding proteins. These small surfaces limit the size of the ice crystal they can bind and stabilize. Molecular simulations and Classical Nucleation Theory (CNT) predict that the size of the critical ice nucleus is larger for nucleation at warmer temperatures, thus requiring a larger IBS to stabilize it.15 Hence, the formation of ice on finite surfaces and its dependence with temperature provides a stringent test for our biasing approach. In what follows, we assess the sensitivity of biased simulations to identify IBS of finite protein surfaces as a function of IBS size and simulation temperature.
We first evaluate whether biased simulations can identify the site and mode of binding of ice to the antifreeze protein (AFP) from the beetle Tenebrio molitor, TmAFP. The binding site of TmAFP contains a flat array of TxT repeats that exposes both hydroxyl and methyl groups from the threonine residues to water.25,29 Hudait et al. demonstrated that TmAFP binds ice via an anchored clathrate motif that connects the protein and the ice surface.29 The mode of binding and the structure of the anchored clathrate interfacial transition layer are the same when water and the protein are simulated with coarse-grained and all-atom models.29 Experiments and simulations indicate that TmAFP and other insect hyperactive antifreeze proteins bind to both the basal and prismatic faces of ice.16,29 The binding to the prismatic face is weaker than to the basal one.29 Consistent with the predictions of CNT and the assumptions of this study, TmAFP uses the same surface to bind ice and nucleate ice.29
Recent experiments16 and simulations15 indicate that TmAFP is a very weak ice-nucleating agent. TmAFP promotes ice formation at just 2 ± 1 K above the homogeneous ice nucleation temperature in both the experiment and the simulations with the rigid united atom model we use here.15,16 The weak ice nucleation enhancement by this protein, the small area of its IBS, and the larger area of its non-ice-binding surface provide an optimum testing ground for the ability of ice-biasing simulations to recognize the ice-binding site, the ice plane it binds to, and the structure of its peculiar interfacial transition layer.
We start by performing biasing simulations using the coarse-grained model of TmAFP in mW water at 260 K, 245 K, and 230 K. All these temperatures are high compared to the 204 K heterogeneous nucleation temperature for the same system in cooling simulations at 1 K ns−1.15 We first bias the system progressing with Q6o from 0.03 to 0.1 every 0.01 and follow the formation of ice along the ensemble of simulation trajectories. In all cases, ice nucleates on the known TxT ice-binding surface of TmAFP bound to the protein by its basal face through the anchored clathrate binding motif (Fig. 6) previously found in unbiased simulations.29 We conclude that biasing simulations provide a robust method to identify the ice-binding site and mode of binding even for small surfaces, such as proteins.
The biasing simulations result in the correct binding structure between TmAFP and ice, irrespective of the temperature of the simulations. However, we note two differences with the simulations for the extended crystalline surfaces of Sec. III B. The first difference is that the sequential set of biasing simulations of ice formation by this small protein displays a region of hysteresis in which the amount of ice is not the same when the order parameter is increased stepwise from the liquid (black circles in Fig. 7) as when it is decreased from a system in which water coexists with ice bound to the protein (red circles in Fig. 7). The hysteresis originates on a free energy barrier along Q6 between ice and the “deformed” liquid with the Q6 that would be expected for liquid with a crystallite. The use of an order parameter that directly counts the number of water molecules in the largest ice crystallite (the actual reaction coordinate for ice nucleation48–51) could be used to eliminate the hysteresis. When using Q6 to drive the phase transition, it would, in principle, be possible to eliminate the hysteresis with longer sampling at each Q6o, but—depending on the temperature and surface—this strategy may require long simulations that would jeopardize the computational efficiency of the method (see Sec. III F). For example, we find that ice does not form in 5 ns simulations that start from the protein immersed in liquid water at 230 K with Q6o = 0.035, but it forms when the simulation runs for 15 ns. Nevertheless, ice does not form even in 100 ns simulations with Q6o = 0.05 when the protein is immersed in water at 260 K. It is important to emphasize, however, that the existence of hysteresis does not impact the identification of the binding site and the mode of binding.
The second difference between the simulations of ice formation in the presence of the protein and the extended phloroglucinol surface is that the amount of ice formed by the protein does not seem to grow linearly with the bias in Q6, particularly for ice formed at the lowest temperatures (Fig. 7). We find that the distinct scaling in the amount of ice with Q6 is related to the formation of crystallites of stacking disordered ice and that the value of the order parameter is lower when the stacking disorder occurs in multiple orientations in the simulation cell. We surmise that the lower value of Q6 of stacking disordered ice reflects a cancellation of contributions of local orders to the global order parameter [see Eq. (2)]. Indeed, we find that Q6 of stacking disordered ice is in all cases smaller than expected based on the fraction of ice in the simulation cell and the value of Q6 of cubic or hexagonal ice (supplementary material, Table S2). For example, 99% of water in a simulation cell in which ice was grown at 260 K is crystalline, but the Q6 value of water for that cell is 83% of that of cubic or hexagonal ice. There is not a clear correlation between Q6 and the fraction of cubic layers in stacking disordered ice (supplementary material, Table S2). Ice formed in the biased simulations that render Q6 = 0.08 at 230 K in the presence of the protein is stacking disordered with 27% of cubic ice and multiple crystallites that account for 70% of the water in the cell, while ice formed in the biased simulations at 260 K is almost all cubic (89%), and there is a single crystallite in the simulation cell that accounts for 9.6% of the water (supplementary material, Fig. S5). We find that multiple orientations of stacking disordered crystallites in the cell results in lower values of Q6 (supplementary material, Table S2 and Fig. S5), breaking the linear relationship between the fraction of water in the ice phase and the order parameter.
To understand why the amount of ice formed in biased simulations at 230 K is larger than at 260 K for the same value of Q6, we note that the reaction coordinate for ice formation is not Q6 but the number of water molecules Nice in the crystallite.48–51 When the crystallites are subcritical, they minimize the free energy by decreasing the amount of ice under the restraint in the order parameter; this is achieved by growing a single crystallite that is all cubic or all hexagonal ice, which have the highest contribution to Q6 (Table II and supplementary material, Table S2). That is the case for the simulation of ice formation on TmAFP at all values of Q6 in Fig. 7 (the crystallites melt if they are evolved at 260 K after the restraint is removed). If the crystallite is supercritical in size, the free energy is minimized by increasing the size of the crystallite subject to the restraint in Q6. This can be achieved by forming multiple crystallites, typically of stacking disordered ice, in different directions. This indicates that the use of an order parameter that is not the reaction coordinate may result in a different polymorph as in the unbiased simulations. Nevertheless, this seems to impact mostly the ice that forms far from the binding plane as our simulations with and without a constraint show the same mode of binding of the protein to ice in simulations with and without a bias.
Despite the non-linearity in the amount of ice with the global order parameter when multiple crystallites or stacking disordered ice can form, we find that single biased simulations can still be used to identify the mode of binding of a surface to ice. The only caveat is that the target Q6o may need to be smaller for cases in which pervasive stacking disorder may allow the system to increase the amount of ice at a given global restraint (e.g., at high driving forces, when the critical ice nucleus is small). We find that a simulation of TmAFP initially in liquid water to which Q6o is set to 0.1 at 260 K results in a single crystallite bound by the (111) face of cubic ice (which is the same as the basal face of hexagonal ice) to the IBS through the same anchored clathrate motif as in unbiased cooling simulations. This Q6o = 0.1, however, is too large for the same system at 245 K and 230 K, resulting in ice formation far from the binding surface and requiring lower values of Q6o (0.08 at 245 K and 0.05 at 230 K). We advise to run the simulations that bias the amount of ice with Q6 at a relatively high temperature to ensure that the amount of ice formed is less than the size of the critical nucleus at the corresponding temperature.
We now focus on identifying the smallest size of the ice-binding surface that can be identified with biasing simulations. To test this, we use a set of model ice-binding proteins built by stacking 2–8 ice-binding loops of the antifreeze protein TmAFP. Following Ref. 15, we call these proteins N-loop TmINP. In a recent molecular simulation study, Qiu et al., showed that the ice nucleation efficiency of these proteins, the gap between their heterogeneous nucleation temperature, and the homogeneous nucleation temperature of water at the same cooling rate decrease as with N decreases from 17.5 K for N = 8 for coarse-grained cooling simulations at 1 K ns−1 and that TmINP with N < 4 does not nucleate ice in cooling simulations.15
Here, we perform biasing simulations to promote the ice formation on TmINP with N ranging from 2 to 8 loops at a temperature of 230 K. We find that ice forms on the IBS of the proteins, connected through an anchored clathrate binding motif, the same as in the cooling simulations. Most significantly, the bias simulations nucleate ice heterogeneously at the protein IBS even for N = 3 (Fig. 8), for which the cooling simulations of the same model indicate that the nucleation is homogeneous. This demonstrates that biasing simulations can successfully detect the IBS with high sensitivity compared to cooling simulations, allowing for the identification of ice-binding sites that are too small to lead to heterogeneous nucleation in the absence of a bias.
We find, however, that biasing simulations for the identification of tiny ice-binding surfaces can only be performed at relatively low temperatures: biasing simulations of the protein with N = 3 loops at 245 K and 260 K result in homogeneous nucleation, even when the target value of Q6o is increased slowly. This is probably a limitation of using an order parameter that is not the reaction coordinate for ice nucleation. For example, the system with the protein at 260 K does not form ice until Q6 = 0.069 (Q6o = 0.08), at which point it creates a large stacking disordered crystallite that encompassed ∼1/3 of the molecules in the simulation cell (Nice = 7526 molecules). The effect of the stabilization of this large crystallite by the tiny IBS becomes negligible, and the ratio of volume of water to area of the binding site tilts the balance in the nucleation rates in favor of the homogeneous mechanism. As we discuss in Sec. III F, when the result of the biased simulation is the homogeneous formation of ice, it is advisable to either decrease the amount of water in the system or decrease the temperature (as we did here, Fig. 8) and repeat the biasing simulation to favor the heterogeneous formation of ice on a weak or small ice-binding surface.
D. One-dimensional and flexible ice binders
Both the organic crystals and antifreeze proteins discussed above have rather rigid and two-dimensional ice binding surfaces. There are, however, ice-binding molecules that adopt one-dimensional (1D) configurations in binding to ice and/or are flexible. For example, the fish antifreeze protein wfAFP and the synthetic ice-recrystallization inhibitor polyvinyl alcohol (PVA) bind ice along a single line of water molecules in the ice surface. The latter is also a fully flexible molecule, with a very short persistence length of 0.7 nm.86 Here, we first address the ability of the biasing simulations to promote ice formation bound to surfaces that bind ice along a single line of water molecules, and then, we investigate ice-binding by a 1D binder that is also highly flexible.
To investigate whether biasing simulations can be used to form ice on one-dimensional binders, we use PVA, which binds ice through its hydroxyl groups, assisted by the lattice (distance) matching between the OH groups in the extended polymer and the water molecules along the c-axis of the primary prismatic face of ice.3,87 We note that although PVA can promote the nucleation of ice at temperatures close to the homogeneous nucleation temperature, its weak promotion of ice nucleation is not due to its binding to ice but rather due to its increase in the activity of water that destabilizes the liquid phase.79
To separately assess the flexibility and one-dimensional binding of PVA, we first perform biased simulations with a rigid PVA with degree of polymerization 20 in an extended configuration we extract from an equilibrium simulation in which the molecule is fully bound to the interface between the primary prismatic face of ice and liquid water.9 The simulation cell has Nw = 4254 water molecules that surround the rigid PVA molecule and is evolved at 230 K for 5 ns, biasing Q6 with KQ = 50 kcal mol−1 and Q6o = 0.05. We find that ice forms along the PVA, bound by its primary prismatic face (Fig. 9), the same as in the unbiased simulations of ice binding of Refs. 3 and 9. We conclude that biasing simulations can be used to identify the binding mode and ice face for 1D-ice binders that do not have conformational flexibility.
We now address whether biasing simulations can identify the mode of binding of fully flexible PVA. We start from an equilibrium coil configuration of PVA in liquid water (NW = 4096) and evolve the cell for 5 ns at 230 K biasing ice formation with Q6o = 0.05 with KQ = 50 kcal mol−1. We find that ice does not form, either homogeneously of heterogeneously. Moreover, ice does not form even when we extend this simulation for further 20 ns. If we increase Q6o to 0.06, ice forms homogeneously away from the coiled polymer within 5 ns. The fact that the extended PVA can nucleate ice in biased simulations with Q6o = 0.05 suggests that sufficiently long simulations could result in the formation of ice. However, as the relaxation times of the polymer are very long, the length of these simulations may not be competitive against unbiased ∼200 ns simulations of PVA of binding of this polymer to the ice–liquid interface.3,9 Although the biasing simulations remove the need for pre-determination of the ice face that binds the molecule, we expect that biased simulations would not be competitive in computational cost against unbiased binding simulations for the elucidation of the mode of binding of fully flexible polymers to ice. As most protein binding surfaces are quite rigid, this may not be a large obstacle for the identification of ice-binding surface of proteins.
E. Systems with multiple ice-binding surfaces
All systems discussed above have a single IBS. Some proteins, however, are able to bind ice by more than one face. That is the case of the ice nucleating protein of Ps. syringae29 and the antifreeze protein of the sea-ice diatom Fragilariopsis cylindrus (fcIBP).88 Here, we address, first, whether a single biasing simulation forms ice on the strongest IBS (i.e., the one with the highest ice nucleation efficiency ΔTf = Thet − Thom), and, second, how to apply biasing simulations to identify all IBS in a protein or other complex surface.
It may be expected that a slow, reversible increase in the amount of ice in the system will result in ice formation in the most stable location. However, the method we propose relies on a single biasing simulation that attempts to form a crystallite with several hundred molecules from the liquid state. We here address whether such ice-biasing simulation in a system with multiple IBS forms ice in the one that binds strongest to ice. We use as a test system the one with two coplanar ∼3 nm2 IBS of identical size and very close ice nucleation efficiencies ΔTf, surrounded in the same plane by an inert non-IBS surface (supplementary material, Fig. S8). We start with patches that, individually, can nucleate ice with ΔTf that is indistinguishable within their error bars: 3 ± 2 K (IBS-1) and 5 ± 2 K (IBS-2). We find that when we perform 10 independent biased simulations of this system with Q6o = 0.05 and KQ = 50 kcal mol−1 230 K, and Nw = 7725 for 5 ns, ice forms randomly on any of the two patches: 60% of these form ice in IBS-1 and 40% in IBS-2. We note that in 40% of these simulations, ice crystallites initially appear in both IBS, but throughout the 5 ns biasing simulation, the system evolves toward having a single crystallite on one of the patches, decreasing the free energy under the ice-forming bias. As ΔTf of IBS-1 and IBS-2 are statistically indistinguishable, we perform other 10 independent simulations with IBS-1 but replacing IBS-2 with IBS-3 that nucleates ice at a marginally higher temperature, ΔTf = 7 ± 2 K. In this case, we find 40% nucleate ice in the weaker IBS-1 and 60% in the marginally stronger IBS-3.50% of these simulations start with ice in both IBS, before the ice consolidates into a single crystallite in one of them. It is possible that further separation in the ice nucleation efficiencies (i.e., the ice-binding strength) of the competing IBS would result in a more pronounced preference for ice formation in the strongest binder. However, we expect that ice may still form stochastically on both IBS, as forming ice on any of these surfaces is thermodynamically favorable compared to forming ice homogeneously. This suggests that a single biased simulation that attempts to form a crystallite with a few hundred molecules is not appropriate to rank the strength of the binding surfaces.
Although ice-biasing simulations that produce relatively large crystallites cannot rank the strength of multiple IBS, they are appropriate to identify these individual IBS. To determine whether a protein or other system has multiple IBS, we follow a strategy of “cap and repeat.” In this strategy, we first perform a biased simulation in the unmodified system and detect an IBS. Then, we cap that IBS to prevent ice formation on it and perform a second biasing simulation to find out whether ice forms in other site of the molecule or homogeneously. We chose to cap the first identified IBS by immobilizing or restraining water molecules within 0.5 nm of the IBS in their liquid water configuration (i.e., we apply the constraint or restraint to their positions before the application of the bias). We test this method first to the system with two ∼3 nm2 alcohol monolayer patches described above, with the interaction strengths of IBS-2 (ΔTf = 5 ± 2 K) and IBS-3 (ΔTf = 5 ± 2 K). We perform a biasing simulation with the same parameters as above (T = 230 K, Q6o = 0.05, KQ = 50 kcal mol−1, t = 5 ns, and Nw = 7725) and find that an ice crystallite forms on IBS-2. We then perform a second simulation with the same parameters in which the water molecules within 0.5 nm of the OH groups of IBS-2 are restrained in a configuration from the liquid before applying the ice-forming bias. This second simulation results in the formation of a crystallite in IBS-3. We then cap that IBS following the same protocol as for IBS-2 and find the nucleation to be homogeneous. We recommend that this cap-and-repeat procedure be applied to all molecules that have multiple faces. For example, for TmAFP, we cap the IBS identified in Sec. III C by restraining the water molecules within 0.5 nm of the OH groups of the threonine residues in their liquid configuration at 230 K and then proceed to perform a second biasing simulation with the same parameters (T = 230 K, Q6o = 0.05, KQ = 50 kcal mol−1, t = 5 ns, and Nw = 5223). This second simulation does not result in ice nucleation. Hence, we repeat this simulation with Q6o = 0.06 and find that ice forms homogeneously. This is the expected result as we know that there are no further IBS in TmAFP. When investigating a molecule for which this is not known, after ice forms homogeneously, we advise to follow the procedures of Sec. III F to disfavor homogeneous nucleation before concluding that there are no other IBS in the system. False positives are not expected, as surfaces that have a positive binding free energy to ice form a disordered premelted layer of water between ice and the surface.74 Indeed, we have not found a single case of ice formation on non-ice binding surfaces in the biasing simulations.
F. Decision tree for selection of the parameters
The desired outcome of the biasing simulation is a crystallite of ice bound to the surface. When running the biased simulations, however, it is possible that even in the presence of an ice-binding surface, the simulations will not produce the desired outcome of a well-defined surface-bound ice crystallite. Alternative outcomes are that (i) ice does not form at all, (ii) ice forms homogeneously, and (iii) ice grows over most of the simulation cell, making it impossible to identify the binding surface. The four possible outcomes of a biased simulation are sketched in Fig. 10, along with a decision tree that summarizes how to tune the target value of the OP Q6o, the strength of the harmonic bias KQ, the simulation temperature T, the running time t, and the amount of water in the simulation cell Nw to either obtain ice formation on the IBS or conclusively determine that that there is no ice-binding surface in the system.
G. Computational cost for the bias simulations
We now compare the computational efficiency of the biasing approach to the cost of the heterogeneous seeding method that uses a random structure search algorithm to find the more stable location and structure of the ice seed on a surface.42 We compare the central processing unit (CPU) time for simulations that run in Intel Xeon E5-2680 v4 @2.4 GHz using nodes with 28 processors each. The CPU time is computed as the clock run time multiplied by the total number of processors. Table III shows the CPU time to evolve 5 ns simulations with and without applying a bias in Q6 to the water molecules for coarse-grained (CG) and all-atom (AA) models of the phloroglucinol surface in contact with water. Here, the CG and AA models are evolved with 5 fs and 2 fs steps, respectively. The use of a bias slows down the AA and CG simulations (see the ratio of run times tbias/tno bias in Table III); the relative cost of applying the bias increases—but not dramatically—with increasing size of the system (Table III). tbias in Table III is the cost of a single biased simulation that suffices to form ice when using an appropriate value of Q6o (e.g., 0.1). We note that the ∼1000 CPU hours cost of a single biased simulation for a cell with ∼14 000 atoms in the all-atom model is two orders of magnitude lower than the 105 CPU hours cost of the heterogeneous seeding method with a random structure search algorithm for an all-atom model with a total of ∼8000 atoms,42 the state-of-the-art method for identifying the mode of binding of a surface to ice. The latter is already an order of magnitude more efficient than the 106 CPU hours required to heterogeneously grow a ∼500 molecule ice nucleus using Forward Flux Sampling for the same simulation cell and model using graphical processing unit (GPU) acceleration.40 We conclude that the use of a bias that promotes the formation of ice provides a computationally efficient route to identify the mode of binding and face of ice to a surface.
|Model .||Total .||Atoms from .||tbias .||tno bias .||.|
|(No. CPUs) .||atoms .||water .||[CPU (h)] .||[CPU (h)] .||tbias/tno bias .|
|AA (112 CPUs)||13 704||10 344||834||488||1.7|
|CG (28 CPUs)||5 464||3 448||62||15||4.2|
|CG (112 CPUs)||40 192||24 064||645||114||5.6|
|Model .||Total .||Atoms from .||tbias .||tno bias .||.|
|(No. CPUs) .||atoms .||water .||[CPU (h)] .||[CPU (h)] .||tbias/tno bias .|
|AA (112 CPUs)||13 704||10 344||834||488||1.7|
|CG (28 CPUs)||5 464||3 448||62||15||4.2|
|CG (112 CPUs)||40 192||24 064||645||114||5.6|
IV. CONCLUSIONS AND OUTLOOK
In this study, we present and validate a robust and computationally efficient methodology based on biased simulations that simultaneously identify the ice-binding surface of molecules or crystals, the ice plane they bind to, and the structure of the ice–surface interface. To our knowledge, this is the first study to use biased sampling to identify the ice-binding site of both extended and finite ice-nucleating surfaces. In what follows, we list the pros and cons of the methodology, discuss possible improvements, and compare it with state-of-the-art methods to identify ice-binding surfaces and their mode of binding to ice.
The central advantage of the ice-biasing simulation approach is that it simultaneously identifies the ice-binding surface, the face of ice it binds to, and the mode of binding to it. Moreover, the biasing simulation method accounts for the competition between ice and liquid for the ice-binding surface, does not make assumptions about the structure, topography, or composition of the IBS, and correctly identifies the mode of binding of ice to a surface without the need for a structure search algorithm. We conclude that biasing simulations are advantageous over state-of-the-art docking methods, all of which assume the identity of the binding plane (or must sample many candidate faces) and either neglect the liquid phase,38,39 do not allow the reconstruction of water at the interface with ice,38,39 or use expensive methods to first reconstruct that interface and then identify the most stable mode of binding of the surface to ice.42
The ice-biasing simulation approach is computationally efficient. We find that in most cases, a single biased simulation suffices to identify the ice-binding surface, mode of binding, and ice plane. The computational cost of a biased simulation is hundred times lower than the cost of methods that identify the binding through a combination of docking with structure search algorithms and seeding molecular dynamics simulations of growth,42 and a thousand times lower than the cost of growing a small crystallite of ice with forward flux sampling simulations.40
The ice-biasing simulation approach identifies ice-binding surfaces even when they are too small (e.g., antifreeze proteins), one dimensional ice-binders (e.g., stretched PVA), or too weak in their binding to ice (e.g., clathrate hydrates47) to be able to compete against homogeneous nucleation on cooling. This makes this method particularly attractive for the identification of the ice-binding site of antifreeze and ice nucleating proteins. We note, however, that the biasing simulations identify only the face of ice to which the IBS binds the strongest (e.g.,. the basal face for TmAFP78) and not all the ice faces to which the surface can bind (e.g., the basal and prismatic ice face in the case of TmAFP78). The identification of all ice faces that bind to a particular molecule may be important in applications such as ice shaping.
We find that ice-biasing simulations are not computationally competitive against docking to predetermined ice–liquid interfaces for fully flexible polymers, such as PVA, which have long relaxation times.3,9 It is an open question whether biased simulations can identify the IBS and mode of binding of antifreeze glycoproteins,89 which are less flexible than PVA but more flexible than the hyperactive antifreeze and ice-nucleating proteins considered in this study and –different from the one-dimensional binding of PVA3-bind ice along multiple directions of the prismatic plane.89
We note that ice-biasing simulations can be implemented using an order parameter that promotes ice formation, such as Q6, even if that order parameter is not the reaction coordinate for ice nucleation. Q6 is proportional to the amount of ice when a crystallite is all hexagonal ice or all cubic ice, but not when ice is stacking disordered or there are multiple crystallites with different orientations in the simulations. Our analysis indicates that ice-biasing with Q6 robustly identifies the binding surface and its mode of binding to ice at low target values of the order parameter and relatively low supercooling that guarantees that a single crystallite forms and that its size is subcritical. Use of the number of water molecules in the largest ice crystallite Nice—the actual reaction coordinate for homogeneous48 and heterogeneous48 ice nucleation—as biasing order parameter would allow for a robust identification of the ice-binding surface and mode of binding at any temperature. Moreover, implementation of Nice as the biasing order parameter, e.g., using hybrid Monte Carlo (HMC),90 would allow for the calculation of the free energy barrier of nucleation. We note, however that knowledge of the barrier is not necessary to identify the IBS and its mode of binding to ice.
In this study, we first use ice-biasing simulations to identify the IBS of proteins that have a single ice-binding surface and then demonstrate how to use biasing simulations with a sequential “cap and repeat” approach to identify multiple IBS within model systems. We find that although this approach finds all ice-binding sites, it is not appropriate to rank them in terms of relative strength of binding. We expect that the use of this sequential procedure to reveal multiple binding sites will be important to identify IBS of proteins. This will help understand and manipulate the function of these biomolecules as well as to train and validate data-driven models that predict the thermal hysteresis of antifreeze proteins from properties of their IBS.35
Seeding methods have been extensively used for the calculation of the size of critical nuclei and rates of nucleation within the framework of classical nucleation theory.91–94 The use of seeding methods for heterogeneous nucleation has been hampered by the difficulty of producing properly equilibrated ice seeds bound to a surface.42 Biasing simulations remove that hindrance by providing equilibrated ice nuclei bound to a surface through the most stable ice plane, with the proper interfacial structure of water and contact angle between the nucleus and the surface. This makes the biased simulations optimal as a starting point for seeding simulations to compute the size of the critical nucleus and—using classical nucleation theory—the rate of heterogeneous nucleation of ice by a surface.
See the supplementary material for supporting tables (Sec. A) with the partial charges in the all-atom phloroglucinol model (supplementary material, Table S1) and the value of Q6 for several stacking disordered ices (supplementary material, Table S2), and supplementary material figures (Sec. A) showing the energy conservation with different strengths of the order parameter (supplementary material, Fig. S1), snapshots along ice-biasing simulations of ice formation on cholesterol monohydrate (supplementary material, Fig. S2), an example of a plot of Q6 vs Q6o (supplementary material, Fig. S3), the contact angle of ice on cholesterol monohydrate (supplementary material, Fig. S4), ice formation in biased simulations with the antifreeze protein TmAFP (supplementary material, Fig. S5), the time evolution of Q6 for two strengths of the harmonic bias KQ (supplementary material, Fig. S6), the total fraction of ice and fraction of ice in the largest ice cluster as a function of Q6 (supplementary material, Fig. S7), and formation of an ice crystallite in a biasing simulation of a system with two ice-binding surfaces (supplementary material, Fig. S8) and also for how we estimate the nucleation barrier for mW ice formation on the coarse-grained model of phloroglucinol at 265 K (Sec. B).
P.M.N. and A.K.M. contributed equally to this work.
The authors gratefully acknowledge Matias Factorovich and Yuqing Qiu for assistance and insightful discussions. We acknowledge the donors of the American Chemical Society Petroleum Research Fund for partial support of this research through New Directions Award No. 59303-ND6 and the Air Force Office of Scientific Research for partial support through MURI Award No. FA9550- 20-1-0351. The authors thank the Center of High Performance Computing at The University of Utah for technical support and a grant of computing time.
The data that support the findings of this study are available from the corresponding author upon reasonable request.