We study a three-species analogue of the Potts lattice gas model of nucleation from solution in a regime where partially disordered solute is a viable thermodynamic phase. Using a multicanonical sampling protocol, we compute phase diagrams for the system, from which we determine a parameter regime where the partially disordered phase is metastable almost everywhere in the temperature–fugacity plane. The resulting model shows non-trivial nucleation and growth behaviour, which we examine via multidimensional free energy calculations. We consider the applicability of the model in capturing the multi-stage nucleation mechanisms of polymorphic biominerals (e.g., CaCO3). We then quantitatively explore the kinetics of nucleation in our model using the increasingly popular “seeding” method. We compare the resulting free energy barrier heights to those obtained via explicit free energy calculations over a wide range of temperatures and fugacities, carefully considering the propagation of statistical error. We find that the ability of the “seeding” method to reproduce accurate free energy barriers is dependent on the degree of supersaturation, and severely limited by the use of a nucleation driving force Δμ computed for bulk phases. We discuss possible reasons for this in terms of underlying kinetic assumptions, and those of classical nucleation theory.
I. INTRODUCTION
Predicting the rate at which crystalline material precipitates from a supersaturated solution remains a significant challenge to atomistic and molecular simulation due to a large disparity between the physical time scales of the process and the time scales which are accessible to computational models.1–3 To date, very few studies have computed quantitative absolute nucleation rates from such simulations.4
A particularly challenging example where nucleation kinetics remain well beyond the reach of atomistic simulation is biomineralisation. Here nucleation and growth, mediated by organic components, appears to progress via a multistage process with a high degree polymorph control.5 This serves as a natural example of self-assembly — a phenomenon of great technological interest.6 The prototypical system for study of biomineralisation and polymorph selection is calcium carbonate (CaCO3), where some degree of controlled assembly has been demonstrated in a laboratory.7–9 Recent in situ transition electron microscopy (TEM) experiments10 have provided a striking visualisation of crystallisation of calcite — the most stable polymorph of CaCO3 at ambient conditions — via vaterite — a metastable polymorph whose molecular structure is characterised by a degree of structural disorder.11
Some important insights into parts of the CaCO3 nucleation pathway have been gained through molecular dynamics (MD) simulations.12–14 However, the expense of detailed atomistic models and the low solubility of CaCO3 prohibit modelling of the entire multi-stage process. More quantitative studies of multi-stage processes15–17 are possible using simple lattice models. In common with biomineral nucleation, these show existence of amorphous precursors to the assembly of anisotropic particles. To our knowledge, however, pathways proceeding via partially disordered phases or featuring the dissolution–regrowth mechanism10 have not been previously captured.
Lattice models remain a useful tool in nucleation theory, yielding insights into nonclassical phenomena,15–17 multistep pathways,18,19 heterogeneous nucleation,20,21 and limitations of calculation methodology.22,23
In the following contribution, we present a lattice model of nucleation from solution, where the solute may form disordered, ordered, or semi-ordered solids. We map the phase diagram of the system and show that, in the limit of slow growth, the transition from solvent rich to ordered crystalline state proceeds via the two metastable solute phases. The temperature dependence of the barrier to nucleation of these three phases indicates the existence of a parameter regime where heights of these barriers become comparable. However, we also find that the barriers to solid state transformation between the three solute phases, once nucleated, are too low to allow our model to capture the dissolution and regrowth pathway to crystallisation.
We then study the kinetics of nucleation in our model using two distinct choices of microscopic dynamics over a broad range of parameters. We use the increasingly popular “seeding” approach,4,24–26 which uses information extracted from simulations to parametrise a classical nucleation theory (CNT) expression for the nucleation rate. We consider the propagation of statistical error and critically examine the approximations made in this method. We show that the reconstructed nucleation free energy barrier is limited by the use of a CNT driving force parameter Δμ calculated for bulk phases rather than finite-size nuclei. As a result, barriers can be inconsistent with those obtained via explicit free energy calculation.
The definition of the lattice model is given in Sec. II. In Sec. III we specify the details of the free energy calculations presented in Sec. IV, where we examine the phase behaviour and the possible nucleation pathways in the model. In Sec. V we move on to the discussion of kinetics of nucleation under the different choices of microscopic dynamics (Sec. V A), developing an error estimation approach for the “seeding” method in Sec. V B and discussing generation of appropriately structured seeds in Sec. V C, before presenting the obtained by the method results in Secs. V D and V E. Finally, in Sec. VI we summarise and discuss the results presented in Sections IV and V.
II. MODEL
Our model is an extension of that introduced by Duff and Peters.16,27 We study a three component system of anisotropic particles on a cubic lattice, where each particle i is characterised by species si ∈ {1, 2, 3} and orientation qi ∈ {1, …, 24}. We label species 1 as solvent and species 2 and 3 as solute. The nearest neighbouring particles (i, j) interact isotropically with strength K(si, sj), while diagonally neighbouring particles (k, l) interact anisotropically with strength A(sk, sl), giving us the following lattice energy E:
where the first and second summations are performed, respectively, over the unique nearest and diagonal neighbour pairs, δ(., .) is the Kronecker delta function, and the lattice structure is primitive cubic with 6 nearest and 12 diagonal neighbours. We sample lattice configurations from the semigrand (μVT) ensemble, i.e., , where Ns is the number of particles of species s and N = L3 is the number of lattice sites, using the Metropolis algorithm with the moves: (1) Transmutation . (2) Reorientation . The two Monte Carlo (MC) moves are attempted with equal probability and accepted with probability,
where fss′ = exp[β(μs′ − μs)] is the ratio of fugacities of species s′ and s, β = (kBT)−1 is the inverse temperature of the system, and ΔE is the change in energy of the lattice configuration due to the proposed move. In Secs. V and VI of the article, we will refer to the MC move set defined here as transmutation-reorientation kinetics. For convenience, we define f to be the solute to solvent fugacity ratio f12.
We study the phase behaviour of the system as a function of temperature kBT and fugacity ratio f in the parameter regime where μ2 = μ3 with the following interaction strengths:
For c = 0.5, i.e., where anisotropic interactions of species 2 are stronger than those of species 3, we observe, for kBT ≤ 1.5, three distinct energetically stable solute rich states (Fig. 1) and a solvent rich state for low values of f.
Visualisations of bulk forms of the three solute phases on a cubic lattice of length L = 6, with solute species s = 2 and s = 3 drawn as cubes and tetrahedrons, respectively. Due to the nature of isotropic interaction, solute particles assemble into checkered structures. The shapes are colour coded according to the corresponding orientation label q.
Visualisations of bulk forms of the three solute phases on a cubic lattice of length L = 6, with solute species s = 2 and s = 3 drawn as cubes and tetrahedrons, respectively. Due to the nature of isotropic interaction, solute particles assemble into checkered structures. The shapes are colour coded according to the corresponding orientation label q.
III. FREE ENERGY METHODS
We employ a multicanonical “flat histogram” method, refined using the Wang-Landau recursion28 algorithm in conjunction with a histogram reweighting procedure.29,30 Our implementation of the method relies on constructing functions ηN1(N1) and ηE(E), which, when used as bias energies in our Metropolis acceptance criteria, allow our MC scheme to sample uniformly in N1 and E, where N1 is the total number of solvent particles in the system. It can be shown that uniform sampling is achieved when ηN1(N1) = lnℙ(N1) and ηE(E) = lnℙ(E), where ℙ(N1) and ℙ(E) are the equilibrium probability distributions of quantities N1 and E.
We proceed by segmenting the ranges of the two quantities into overlapping intervals of equal width, ranging between 40 and 80 bins. We use bin widths of 1 and 6 for N1 and E, respectively. Within each interval, the sampling scheme performs a biased Metropolis random walk while simultaneously refining the bias function using increments Δη ∈ {2−10, 2−11, …, 2−26} and generating a histogram h of the quantity of interest. Once the histogram achieves the flatness criteria and , where is the mean value of the histogram across all bins, the increment Δη is reduced and the histogram is discarded. After obtaining the estimates of ηN1(N1) and ηE(E), we sample histograms of N1 and E with the, now fixed, bias. The maximum likelihood estimates of lnℙ(N1) and lnℙ(E) are then recovered via weighted histogram analysis (WHAM).31–33 The obtained distributions are reweighed with respect to parameters f and β to produce approximate coordinates of the points where two or more states, e.g., solvent rich (N1 ≈ N) and solute rich (N1 ≈ 0), are equally probable. We rerun the scheme 20 times for each coexistence point to obtain error bars.
To explore the nucleation pathways in our model, we implement an equilibrium path sampling (EPS) algorithm,16,34 which produces an estimate of the equilibrium probability distribution of a desired quantity without biasing the MC. We first define an order parameter (n, υ, χ) which characterises the size and degree of orientational order within the largest cluster of solute particles on the lattice, where n is the cluster size, and υ and χ are cluster orientational order parameters defined as follows:
where Pi and pi are, respectively, the total number and the number of aligned diagonally neighbouring pairs of particles of species i, present in the cluster. To avoid singularities we only consider configurations where Pi ≥ 1. We argue that, for a cluster of n > 20, the (υ, χ) coordinate is sufficiently representative of the orientational ordering of the cluster to classify it as amorphous (I), semi-ordered (II), or crystalline (III).
We employ the standard “geometric” definition of clustering, where a solute particle is considered to be part of a cluster if it is a nearest neighbour of at least one other solute particle. The quantity n is taken as the size of the largest such cluster on the lattice. This definition is known to be problematic at high temperatures above the surface roughening transition;35 however, in this work we operate in the low temperature regime where the roughening effects are negligible.
We estimate the equilibrium joint probability distribution ℙ(n, υ, χ) for n ∈ [22, 322] by, once again, segmenting the range of the order parameter into overlapping windows of 4 bins wide along each axis. We use a bin width of 1 along n, and 1/16 along υ and χ. In each window, the EPS algorithm performs path MC,36 where the state of the Markov chain is some path , with σi ∈ {(s, q)}N being a lattice configuration. In our implementation, a path is generated from a seed lattice state by propagating it a random number m ∈ {1, …, τ − 1} of sweeps forward and τ − m − 1 sweeps backward, where a sweep is equivalent to N Metropolis MC moves and, for our purposes, the MC is time symmetric. We fix the path length τ = 7. The generated path is accepted with probability 1 if at least one of the comprising states falls within the range of (n, υ, χ), as specified by the window, and rejected otherwise. To generate a new path from an already accepted path , EPS selects a random snapshot σk of the old path and uses it as a seed for the new path. Upon seeding the path MC, we allow a relaxation stage which terminates after accepting 5 × 104 new configurations (up to 7 configurations per path). This is followed by a sampling stage, which aims to accept at least 105 new configurations.
All EPS calculations presented in this work were carried out in cubic systems with linear dimension of L = 32 lattice sites. For all conditions studied, this is sufficient to capture nuclei at the final stage of the nucleation pathway without self-interaction between periodic images. For computation of phase diagrams via multicanonical sampling with Wang-Landau recursion, the structural details of the mixed-phase region are unimportant allowing use of smaller system sizes L ∈ {4, 8} for rapid convergence of bias energies.
IV. PHASE BEHAVIOUR AND NUCLEATION PATHWAYS
With the help of the multicanonical sampling method, as described in Sec. III, we verify that the solute-solvent, I–II and II–III phase transitions in our model are first-order, as is consistent with the current understanding of the Q ≥ 3 3D Potts models37–39 as well as previous studies of the Potts Lattice Gas (PLG) model.16 We obtain phase diagrams (Fig. 2) of our system for c ∈ {0.5, 1}, showing that the stability of the partially disordered solute phase is determined by the relative strength of anisotropic interactions of solute species. By considering the solute rich state of our model as an interweave of two decoupled Q = 24 state Potts lattices with 12 nearest neighbours (due to diagonal interactions), we show (Fig. 2(c)) that the coordinates β† of solute phase coexistence points are in close agreement with the existing mean field result,40
where R is the strength of anisotropic interaction between aligned particles and Nnbr is the maximum number of neighbours with which a solute particle can interact anisotropically. In our case, Nnbr = 12 due to diagonal interactions. While we observe finite widths of the stability region of the partially disordered phase for c = 1 on small lattices, we find that the proximity of the two solute phase coexistence lines increases with system size. Hence, in accordance with the mean field prediction, we expect the partially ordered phase to be metastable everywhere but the disorder–order coexistence line in the thermodynamic limit.
Phase diagrams for different anisotropic interaction strengths, where f is the parameter controlling the solute saturation. Coexistence lines shown: (i) II and III. (ii) I and II. (iii) Solute and solvent. In (a) and (b), the regions of stability of the solvent rich state are marked by an asterisk (∗). Shown in (c) is the comparison of mean field predictions (A) and (B) as given by (6), and computational results for the coordinates of the solute phase coexistence points. The equilibrium distributions of nuclei were sampled at parameter values indicated by black crosses in (b). Lines (i) and (ii) were estimated via the multicanonical approach described in Sec. III applied to the model on a cubic lattice with a linear dimension L = 8. Lines (iii) were estimated using the same approach but in a cubic system with L = 4. In (b), line (iii) is in good agreement with the mean field approximation (7) shown by (C).
Phase diagrams for different anisotropic interaction strengths, where f is the parameter controlling the solute saturation. Coexistence lines shown: (i) II and III. (ii) I and II. (iii) Solute and solvent. In (a) and (b), the regions of stability of the solvent rich state are marked by an asterisk (∗). Shown in (c) is the comparison of mean field predictions (A) and (B) as given by (6), and computational results for the coordinates of the solute phase coexistence points. The equilibrium distributions of nuclei were sampled at parameter values indicated by black crosses in (b). Lines (i) and (ii) were estimated via the multicanonical approach described in Sec. III applied to the model on a cubic lattice with a linear dimension L = 8. Lines (iii) were estimated using the same approach but in a cubic system with L = 4. In (b), line (iii) is in good agreement with the mean field approximation (7) shown by (C).
We derive an approximation to the solute–solvent coexistence line for c = 1 by considering the lattice adaptation41 of the Widom expressions42,43 for solute and solvent chemical potentials in the semigrand ensemble. Assuming that, at conditions of solute–solvent coexistence, the free energy change due to insertion of one solvent particle into a system with N solute particles is equal to that due to insertion of one solute particle into a system with N solvent particles, we arrive at the ideal gas like approximation for the coordinates f∗(β) of solute–solvent coexistence points,
where determines the Q dependent value β∗(Q) — the inverse temperature of the order–disorder transition. The estimates of solute–solvent coexistence points, obtained via the multicanonical sampling of lnℙ(N1) in a cubic L = 4 system, appear in good agreement with Eq. (7) (Fig. 2(b)). Additionally, we find that β∗(Q) is in reasonable agreement with Eq. (6) for Q ≥ 24, converging to β†(Q) as a power law for large Q.
We study the energetics of transition of our model between solvent rich and solute rich states at conditions where the system is most stable in the crystalline phase (Fig. 2(b)). Using EPS along with nearest neighbour interpolation based on Delaunay triangulation, we obtain free energy surfaces ℱ(n, υ, χ) = − kBTlnℙ(n, υ, χ) for kBT ∈ {0.6, 0.65, 0.7} (respectively, 33%, 28%, and 22% undercooling with respect to the order–disorder transition), at a constant value of the fugacity ratio f = 2.25 (167%, 160%, and 155% supersaturation for the respective temperatures). By examining the local minima , we note that the preferred orientational ordering of the solute nuclei varies with nucleus size. Thus, we argue that thermodynamic transition pathways, i.e., those limited to slow nuclei growth, starting in the solvent rich phase and leading to the solute crystal, proceed via amorphous and partially disordered precursors. Defining the critical nucleus size n‡ as the maximum of free energy , we further show that the preferred orientational ordering (υ, χ)‡(n‡) of the critical nuclei is temperature dependent (Fig. 3).
Free energy surfaces for critical nuclei n‡ at three different temperatures and constant fugacity ratio f = 2.25. Contour lines are drawn at intervals of 1.0kBT and the coordinates (υ, χ)‡(n‡) of the local minima are indicated by the crosses. The plots demonstrate the temperature dependence of preferred ordering (υ, χ)‡(n‡) of critical nuclei n‡. The three regimes shown are as follows: (a) critical nuclei most stable when ordered; (b) critical nuclei most stable when partially ordered; and (c) critical nuclei most stable when disordered.
Free energy surfaces for critical nuclei n‡ at three different temperatures and constant fugacity ratio f = 2.25. Contour lines are drawn at intervals of 1.0kBT and the coordinates (υ, χ)‡(n‡) of the local minima are indicated by the crosses. The plots demonstrate the temperature dependence of preferred ordering (υ, χ)‡(n‡) of critical nuclei n‡. The three regimes shown are as follows: (a) critical nuclei most stable when ordered; (b) critical nuclei most stable when partially ordered; and (c) critical nuclei most stable when disordered.
We compute the conditional probability distributions ℙ(n|{υ}, {χ}) = ∫{υ}∫{χ}dυdχℙ(n, υ, χ) to obtain free energies ℱX(n) = − kBTlnℙ(n|{υ}X, {χ}X) of nuclei of the three solute phases, where X is the phase label [disordered (I), partially ordered (II), or ordered (III)]. To do so, we define three basins in the (υ, χ) plane: (1) {υ}I = [0, 0.5], {χ}I = [0, 1]. (2) {υ}II = (0.5, 1], {χ}II = [0, 0.5]. (3) {υ}III = (0.5, 1], {χ}III = (0.5, 1]. Plots of the three free energy curves at different temperatures are shown in Fig. 4. Here we argue that at lower temperatures (kBT ≤ 0.6), the free energy barrier to nucleation of the ordered phase is lowest. The picture changes, however, at temperatures kBT ≥ 0.7, where the ordered phase has the highest free energy barrier. Hence, we anticipate a crossover regime at intermediate temperatures 0.6 < kBT < 0.7, where the heights of all three free energy barriers are comparable. This hints at the existence of a parameter regime where nuclei of all three solute phases can nucleate on similar time scales, with nuclei of the most stable phase growing at the expense of the others via dissolution and reprecipitation. However, as illustrated by the sparsity of contours between minima in Fig. 3, we find that barriers to solid state transformation between the solute phases are typically comparable to thermal energy at the critical nucleus size, or absent entirely, in either case vanishing completely for large n. This implies that the most probable route to reaching the thermodynamically stable phase is via transformations within the solute rich phase.
Numerically obtained free energy curves (markers) for nuclei of the three phases at f = 2.25, showing the corresponding CNT least squares fits (lines). For reference, shown by the dashed lines are estimates of the free energies obtained via a one-dimensional analogue of the EPS procedure described in Sec. III. We find both variants of the EPS approach in excellent agreement with respect to the estimates of .
Numerically obtained free energy curves (markers) for nuclei of the three phases at f = 2.25, showing the corresponding CNT least squares fits (lines). For reference, shown by the dashed lines are estimates of the free energies obtained via a one-dimensional analogue of the EPS procedure described in Sec. III. We find both variants of the EPS approach in excellent agreement with respect to the estimates of .
The vanishing of the barriers to solid state transformation within the larger nuclei is not surprising, since we study the model in a temperature regime far below the thermodynamic and kinetic (for L ≥ 32 cubic systems) metastability limits of bulk disordered and partially disordered solute states. In addition, we are able to qualitatively reproduce the temperature dependence of the preferred orientational ordering within the small nuclei by studying the system in the solute rich state on small (L < 10) cubic lattices with reflecting boundary conditions, demonstrating that the relative thermodynamic stability of the three solute phases is strongly size dependent. Thus, the stability of disordered and partially disordered structures in small nuclei can be interpreted as analogous to a finite size effect.
The shapes of the three barriers in Fig. 4, with the exception of faceting effects, are well fitted by the classical nucleation theory (CNT) expression of the form F(n) = F0 + An2/3 − Bn, where A and B are related to, respectively, the surface and volume free energy densities of the nucleus and F0 relaxes the fit by allowing the free energy of the metastable solvent-rich phase to deviate from zero.16 The obtained least squares fits yield parameter values which are consistent with the energetics of our model—the cost of forming solute-solvent interface and the gain associated with the growth of solute domain both increase with the degree of solute ordering.
We observe a dramatic reduction in the quality of the CNT fits if setting F0 = 0, i.e., fitting the commonly used expression ΔF(n) = F(n) − F(0), consistent with other reports of poor quantitative performance of the functional form ΔF(n) = An2/3 − Bn in representing explicitly computed free energy barriers.22,44,45 Fitting ΔF(n) to the overall nucleation barriers in the range n ∈ [n‡ − 50, n‡ + 50], we find the barrier heights adequately captured by the CNT fits, suggesting that the standard CNT framework alone is sufficient to formulate an effective description of the energetics of nucleation in the present model, despite the evident multi-stage character of the process. This poses an interesting test case for the increasingly popular “seeding” method for nucleation barrier and rate estimation, which aims to parametrise a CNT based model of nucleation kinetics by extracting the necessary quantities from trajectories of the nucleus size coordinate. Below, we will assess the capability of the “seeding” method to formulate an effective description of the multi-stage nucleation process under different kinetic regimes, thus evaluating the sensitivity of the method to variations in kinetic growth pathways and violation of the Markov assumption on the kinetics of the nucleus size coordinate.
V. SEEDING METHOD
In the framework of CNT, the nucleation rate J is given by1,46
where ρ is the solute monomer density, is the rate of monomer attachment to the critical cluster, F(n) = − kBTlnℙ(n) is the free energy of nucleus of size n, ΔF(n) = F(n) − F(0), and is the Zeldovich factor proportional to the square root of the second n derivative F″ of F(n) evaluated at n† — the size of the critical nucleus, i.e., n† = argmaxn{F(n)}. We distinguish between the empirical value n‡ of the critical nucleus, as used in Sec. IV, and the value n† maximising the standard CNT formula for F(n) since the two may not agree in general.
A closely related expression47 to (8) can be derived by assuming the continuum approximation to time evolution of nucleus size n(t) to obey the Langevin equation in the overdamped limit: , where ξn(t) is a random process satisfying 〈ξn(t)〉 = 0 and 〈ξn(t) ξn(t′)〉 = 2Dnδ(t − t′), with δ(t) being the Dirac delta function, and Dn—the diffusivity of the nucleus size coordinate at the top of the barrier ΔF(n) — is taken to be equal to the monomer attachment rate .
The “seeding” approach to nucleation rate calculations builds on the above equations and has been reported to yield accurate rate estimates from atomistic simulations when applied to nucleation from the melt.48 Seeding, in this context, amounts to preparing the initial state of the system to contain a nucleus of size n0 and observing the time evolution n(t) with the initial condition n(0) = n0. Taking the ensemble average on both sides of the overdamped Langevin equation yields
which, in principle, allows one to reconstruct the free energy profile F(n) from the initial drift velocities . In practice, however, accurate estimation of drift velocities for the whole range of the relevant n0 is often computationally expensive and the free energy barrier ΔF(n†) is instead estimated by fitting the n derivative of the CNT expression,
where Δμ can be defined as the difference per particle between the free energies of solute and solvent rich states and γ is the product of the nucleus shape factor and the free energy per unit area of the nucleus surface, to a handful of estimates of in the vicinity of n†, where Dn is assumed to be approximately independent of n. In both cases, the mean initial drift velocities must be expressed in units of βDn, leading to propagation of error in estimates of Dn to estimates of the barrier height, which is exponentially amplified in the rate calculation. We, therefore, pay particular attention to error analysis in this work.
A. Microscopic kinetics
In principle, the outcome of the nucleation barrier reconstruction method should be independent of the microscopic kinetics, provided that the relevant statistical ensemble of the system’s configurations is sampled appropriately and the solute chemical potentials are correctly maintained. The transmutation–reorientation (TR) MC move set, as defined in Sec. II, provides kinetics which correctly maintain the chemical potentials of all particle species at the expense of incorrectly representing mass transport due to the unphysical transmutation move. A more realistic representation of mass transport can be provided by replacing the transmutation move, in the TR move set, with nearest neighbour particle exchange move, i.e., analogous to Kawasaki dynamics in the kinetic Ising model,49,50 while keeping the reorientation move and the move attempt probabilities unchanged. We will refer to this alternative move set as diffusion–reorientation (DR) kinetics. Due to the conservation of particle counts under the DR kinetics, solute depletion effects can occur during the nucleation process; however, as detailed in Sec. V C, part of the “seeding” approach is to ensure that these depletion effects are negligible.
Apart from being a more realistic model of particle transport, the DR kinetics differ from TR in two ways: (1) The Markov assumption on n(t), as made by the “seeding” method, is violated more strongly due to the stronger memory effects in local particle density fluctuations.23,51 (2) The rates of nuclei growth are much lower, due to the diffusion limited character of the particle attachment process, allowing substantially greater time scales for structural relaxation of the growing nuclei between successive solute attachment events. By applying the “seeding” approach under both sets of kinetics, we are, therefore, able to assess the sensitivity of the barrier reconstruction method to deviations of n(t) from the assumed Langevin dynamics description as well as differences in the observed nuclei growth pathways.
B. Fitting and error estimation procedures
In order to estimate the quantities and Dn, one typically samples M independent trajectories nj(t) : nj(0) = n0 at some number W of uniformly spaced time points ti = iΔt, i ∈ {1, …, W} for a range of initial cluster sizes n0. The estimator for the mean drift velocity is given simply by the gradient of the least squares fit to the time series of the mean displacement 〈Δn(n0, t)〉 = 〈n(t) − n0〉. In accordance with CNT, one expects that trajectories n(t) : n(0) = n†, starting at the critical cluster size, will yield 〈Δn(n†, t)〉 = 0, allowing the diffusivity Dn to be estimated via the halved gradient of the linear least squares fit to the time series of the mean squared displacement. In practice, however, deviations from the expected zero drift behaviour are common, and average squared deviation from the mean 〈SDn(n0, t)〉 = 〈[n(t) − 〈n(t)〉]2〉 is used instead. Both 〈Δn(n0, t)〉 and 〈SDn(n0, t)〉 can be expected to satisfy 〈Δn(n0, 0)〉 = 0 and 〈SDn(n0, 0)〉 = 0, allowing the use of a linear model g(t) ∝ t with zero vertical offset for extraction of gradients. The mean drift velocity and diffusivity estimators are then given by, respectively, vn(n0) and ζn(n†), which, if writing out the averaging and least squares fitting operations explicitly, can be written as
where Δnj(n0, t) and are obtained from independent trajectories nj(t) with nj(0) = x0.
Assuming the CNT form of the nucleus free energy, as given by Eq. (10), we can extract the CNT critical nucleus size n† and the height of the free energy barrier ΔF(n†),
by estimating γ via a fit derived from Eq. (9),
For known Δμ, the fitting procedure can be simplified by rearranging Eq. (14) for γ. If Brownian motion is a good approximation to n(t) and Eq. (10) holds, one should find that γ†(n0) = γ, where
hence the least squares estimator γLS of γ is given by an average over the range n0 ∈ [a, b],
where kγ = 3/(2(b − a + 1)) and . For normally distributed estimates vn(n0) of with known parameters, the parameters of the normal distribution of the sum can be deduced via the scaling and addition properties of normal distributions. If estimates of Dn also follow a normal distribution, then the quantity follows the distribution of the quotient of noncentral normal variates,52,53 for which the cumulative distribution function (CDF) is known and can be computed accurately by appropriately integrating the bivariate normal probability density function.54 Thus, knowing the parameters of the normal distributions of ζn(n†) and vn(n0), we can compute the confidence interval on γLS as well as on ΔF(n†).
The above error estimation approach makes a number of assumptions, in particular, that (1) estimators vn(n0) and are unbiased and normally distributed and (2) estimators vn(n0) and are independent. While (2) is straightforward to guarantee by using independent sets of trajectories for computation of vn(n0) and , we find that (1) is not guaranteed even if the dynamics of n(t) exactly follow the overdamped limit of the Langevin equation. In the Appendix, we show that may not follow the normal distribution for small M and can be a biased estimator of Dn if n(t) is a trajectory of a 1D Brownian particle in a bistable potential. We, therefore, recommend use of M ≥ 100 for estimation of Dn; however, explicit verification of normality of for large M may be infeasible in many applications of the “seeding” method.
C. Seed generation
We employ the “seeding” method in our lattice model to estimate the nucleation barrier heights in the range of kBT ∈ {0.6, 0.65, 0.7} and f ∈ {2, 3, …, 7}. At each parameter point, we sample M = 102 independent trajectories of the largest cluster size nj(t) at time values ti = iΔt, i ∈ {1, …, W} with nj(0) = n0 ∈ [3, 498], Δt = 10 MCS and W = 102, where a unit of time represents a single MC sweep (MCS) of the whole lattice.
The initial configurations for each independent trajectory were generated by equilibrating a randomly grown compact cluster of solute particles in pure solvent. The initial compact clusters were generated iteratively by, starting with a single solute particle on the lattice, placing a solute particle into a randomly chosen neighbouring site of a randomly chosen member particle of the cluster until the desired cluster size is reached. The term “equilibration” here is used in reference to a statistical ensemble of configurations where the number of solute particles belonging to the constructed cluster is conserved. To achieve correct sampling from the said ensemble, we restricted the MC to the set of sites S, which included solute sites belonging to the grown solute cluster and solvent sites in the cluster’s nearest neighbourhood. The equilibration procedure employed transmutation and reorientation moves on solute sites in S as well as non-local particle swap moves on pairs of solute and solvent sites in S, with all move types having equal attempt probabilities. Acceptance of a non-local particle swap move may lead to a change in S, which we took into account in the calculation of the move acceptance probability to assure detailed balance. Moves leading to a change in the size of the grown cluster were rejected with probability 1.
We find that after a burn-in period of 103n iterations of the procedure, the nuclei attain compact shapes and the corresponding distributions of internal orientational order parameter are consistent with those obtained via the three-dimensional EPS approach applied to the μVT ensemble. The agreement between the two methods of sampling distributions of cluster orientational order parameter is not surprising, since the difference between the effects of pure solvent and saturated solution on the structure of the nuclei is negligible due to the short range nature of particle interactions and the low counts of solute particles in the supersaturated solution. Upon equilibration of the nucleus, we fill the lattice uniformly with Nρ(kBT, f) solute particles, where ρ(kBT, f) is the fugacity and temperature dependent solute concentration, which we compute numerically as the mean fraction of solute particles in the system in the metastable state.
Following seeding, the trajectories nj(t) were generated under composition conserving unconstrained DR dynamics,49,50 i.e., equally probable reorientation and nearest neighbour particle exchange moves, as well as the composition nonconserving TR dynamics described in Sec. II. Since DR dynamics samples system configurations from the NV T ensemble, conserving the solute particle count, the resultant evolution of nucleus size can be affected by depletion or augmentation of solute—an unphysical effect in open systems.4 To avoid this effect in our sample of trajectories nj(t), we employ the larger cubic lattice sizes of L = 64. In addition, we verify that the maximum change in nucleus size, along all obtained trajectories for DR dynamics, is negligible in comparison to the total number of the available solute particles in the system.
D. Drift velocities and diffusivities
Typical realisations of 〈Δn(n0, t)〉 = 〈n(t) − n0〉 are shown in Fig. 5, where we see that, consistent with the intuition of Sec. V A, nuclei growth rates for TR kinetics are approximately two orders of magnitude faster than those for DR. We observe roughly linear behaviour of 〈Δn(n0, t)〉 on short time scales t ≤ 100MCS; however, based on 102 observations, we expect considerable relative error margins on initial drift velocities measured under DR dynamics. Estimates vn(n0) of initial drift velocities along the cluster size coordinate were calculated by fitting straight lines with zero vertical offset (Eq. (11)) to time series 〈Δn(n0, t)〉 over the range t ≤ 100 MCS. Estimates ζn(n0) of cluster size dependent diffusivity Dn along n were extracted from time series 〈SDn(n0, t)〉 = 〈[n(t) − 〈n(t)〉]2〉, n(0) = n0 (Fig. 6(a)) in a similar fashion [Eq. (12)].
Plots of mean displacements 〈Δn(n0, t)〉 against time t for DR [(a) n∗ = 82] and TR [(b) n∗ = 78] dynamics at kBT = 0.7, f = 3. Error bars represent the 95% confidence intervals based on 102 observations assuming normal distribution. The three sets of points correspond to trajectories initialised with n0 given by (i) n∗ − 60, (ii) n∗, and (iii) n∗ + 102. Solid line segments represent the linear fits with gradients vn(n0) used as estimates of initial drift velocity along the cluster size coordinate.
Plots of mean displacements 〈Δn(n0, t)〉 against time t for DR [(a) n∗ = 82] and TR [(b) n∗ = 78] dynamics at kBT = 0.7, f = 3. Error bars represent the 95% confidence intervals based on 102 observations assuming normal distribution. The three sets of points correspond to trajectories initialised with n0 given by (i) n∗ − 60, (ii) n∗, and (iii) n∗ + 102. Solid line segments represent the linear fits with gradients vn(n0) used as estimates of initial drift velocity along the cluster size coordinate.
Plots of 〈SDn(n0, t)〉 for n0 = n∗ (a) and diffusion coefficient estimates ζn for n0 in the range n0 ∈ [n∗ − 20, n∗ + 20] (b) obtained under TR (main panels) and DR (insets) kinetics at kBT = 0.7, f = 3. In (a), the solid line segments represent linear fits used in estimation of cluster size diffusivity ζn(n∗). In (b), solid lines represent linear fits used for detrending the data for estimation of statistical uncertainty in ζn(n∗).
Plots of 〈SDn(n0, t)〉 for n0 = n∗ (a) and diffusion coefficient estimates ζn for n0 in the range n0 ∈ [n∗ − 20, n∗ + 20] (b) obtained under TR (main panels) and DR (insets) kinetics at kBT = 0.7, f = 3. In (a), the solid line segments represent linear fits used in estimation of cluster size diffusivity ζn(n∗). In (b), solid lines represent linear fits used for detrending the data for estimation of statistical uncertainty in ζn(n∗).
In accordance with CNT, trajectories starting at the critical cluster size n† are expected to yield vn(n†) ≈ 0. Computing the value n†, however, requires knowledge of the value γLS, which, in itself, requires the estimate and, hence, the coordinate of the barrier top. Thus, we estimate the coordinate of the barrier top as , which exploits the proportionality of dF(n)/dn to the mean initial drift velocity in Eq. (9). Again, it is important to stress that n∗ is an empirical value of the critical cluster size, which may differ from the fitted CNT value n†. We observe that within the range n0 ∈ [n∗ − 20, n∗ + 20] of initial cluster sizes n0, the gradient estimates ζn(n0) of the mean squared displacements vary slowly with n0, as shown in Fig. 6(b), which is consistent with the intuition that diffusivity along the n coordinate is approximately constant close to the top of the free energy barrier.1 We therefore use the data for ζn(n0) in the range n0 ∈ [n∗ − 20, n∗ + 20] to estimate the statistical uncertainty in our measurement ζn(n∗) of the diffusivity Dn along the cluster size coordinate.
E. Choice of Δμ
In our model the degree of saturation of the solution is dependent only on the difference between solute and solvent chemical potentials. Thus we can set μ1 = 0 without loss of generality, allowing the chemical potential μ2 = μ3 of solute to fully determine the chemical composition of the system. Therefore, the standard choice of the value of the CNT parameter Δμ, in our case, would be
with f∗(β) given by Eq. (7), corresponding to the difference between the bulk solute chemical potentials in states of metastable supersaturation and solute–solvent coexistence.1,4,55 It is, however, understood that usage of the bulk value Δμ = Δμcoex as the driving force to nucleation may be unsuitable for the description of the microscopic particle attachment process.56 An alternative value Δμfit(β, f) can be obtained via a two parameter fit of Eq. (14), allowing the appropriate value of Δμ to be determined by the microscopic dynamics of the system.
Due to the comparatively low computational cost of free energy calculations in our model, we are able to compute the nucleation barriers over the specified range of parameter values explicitly with high accuracy by utilizing the one-dimensional analogue of the EPS procedure, described in Sec. III, applied to cubic systems of length L = 32 sites. To assure absence of finite size effects in the obtained barrier estimates, we verify that the EPS data are reproducible in smaller systems with L = 16. We obtain data which are adequately described by Eq. (10) (Fig. 7), yielding the set of values ΔμEPS(β, f) via a two parameter fit of Eq. (10) to the EPS estimates of ΔF(n) over a range of 100 values of n near the top of the barrier (Fig. 7(d)).
Fits of Eq. (10) (dashed lines) to EPS data (solid lines) for (a) kBT = 0.6, (b) kBT = 0.65, and (c) kBT = 0.7. The six curves in (a), (b), and (c) are sampled at the six values of f ∈ {2, 3, …, 7} with higher barriers corresponding to lower values of f. The thicker portions of the solid lines in (a), (b), and (c) indicate the ranges of data used in the corresponding two parameter fits of Eq. (10). The corresponding fitted values ΔμEPS are plotted in (d) as against given by Eq. (17).
Fits of Eq. (10) (dashed lines) to EPS data (solid lines) for (a) kBT = 0.6, (b) kBT = 0.65, and (c) kBT = 0.7. The six curves in (a), (b), and (c) are sampled at the six values of f ∈ {2, 3, …, 7} with higher barriers corresponding to lower values of f. The thicker portions of the solid lines in (a), (b), and (c) indicate the ranges of data used in the corresponding two parameter fits of Eq. (10). The corresponding fitted values ΔμEPS are plotted in (d) as against given by Eq. (17).
We now consider the effect of the choice of Δμ on the capacity of the “seeding” method to reconstruct the explicitly computed free energy barriers by carrying out the fitting procedure [Eq. (16)] for each of the three possible choices: Δμcoex, Δμfit, and ΔμEPS. We first obtain estimates of Δμfit via two parameter fits of Eq. (14) to the scaled average initial drift velocities for DR and TR dynamics over a range of n0 ∈ [n∗ − 20, n∗ + 20]. Close to the top of the barrier, i.e., n ∈ [n∗ − 20, n∗ + 20], our data show reasonable agreement with γ†(n) = const. [Eq. (15)] for Δμ≠Δμcoex (Figs. 8(b) and 8(c)), while the usage of the bulk value Δμcoex often yields a clear n dependence of γ†(n) (Fig. 8(a)). Despite the comparable quality of the three fits of Eq. (14) (Fig. 8(d)), with occasionally exceptionally poor fits using Δμcoex (Fig. 8(e)), we find that barrier reconstruction via the “seeding” approach is highly sensitive to the choice of Δμ (Figs. 8(f)–8(h)), only yielding a consistent agreement with EPS for the two sets of dynamics if using Δμ = ΔμEPS.
Fits of CNT to scaled mean drift velocity data via Eq. (16) for kBT = 0.6, f = 4. The shown data points were obtained under DR (circles) and TR (squares) dynamics. Error bars represent the 95% confidence intervals obtained via the CDF of noncentral normal quotients. In (a) (Δμ = Δμcoex), (b) (Δμ = Δμfit), and (c) (Δμ = ΔμEPS), we plot the data used for fitting of γLS via Eq. (16), with dashed and solid lines corresponding to the fits to TR and DR data, respectively, over the range of n0 ∈ [n∗ − 20, n∗ + 20]. In (d) and (e), we show the corresponding fits to the estimates of [Eq. (14)] for a wider range of n0, with data points outside the fitting range shown as small markers with . The curves (A), (B), and (C), in (d) and (e), correspond, respectively, to the three choices of Δμ values: Δμcoex, Δμfit, and ΔμEPS. Finally in (f) (Δμ = Δμcoex), (g) (Δμ = Δμfit), and (h) (Δμ = ΔμEPS), we show the resultant shapes of the CNT free energy barriers in comparison to the ΔF(n) obtained via EPS (solid line).
Fits of CNT to scaled mean drift velocity data via Eq. (16) for kBT = 0.6, f = 4. The shown data points were obtained under DR (circles) and TR (squares) dynamics. Error bars represent the 95% confidence intervals obtained via the CDF of noncentral normal quotients. In (a) (Δμ = Δμcoex), (b) (Δμ = Δμfit), and (c) (Δμ = ΔμEPS), we plot the data used for fitting of γLS via Eq. (16), with dashed and solid lines corresponding to the fits to TR and DR data, respectively, over the range of n0 ∈ [n∗ − 20, n∗ + 20]. In (d) and (e), we show the corresponding fits to the estimates of [Eq. (14)] for a wider range of n0, with data points outside the fitting range shown as small markers with . The curves (A), (B), and (C), in (d) and (e), correspond, respectively, to the three choices of Δμ values: Δμcoex, Δμfit, and ΔμEPS. Finally in (f) (Δμ = Δμcoex), (g) (Δμ = Δμfit), and (h) (Δμ = ΔμEPS), we show the resultant shapes of the CNT free energy barriers in comparison to the ΔF(n) obtained via EPS (solid line).
To illustrate the sensitivity of the “seeding” method further, we plot all barrier height estimates in Fig. 9. For f ≥ 3, the EPS barrier height estimates fall on a straight line (Fig. 9(a)) as is consistent with CNT.1 Closer to coexistence (f = 2) for kBT ∈ {0.65, 0.7}, however, the linear trend does not hold, which cannot be accounted for by statistical errors or finite size effects in our free energy calculations, and, therefore, may be due to the errors in the mean field approximation to f∗(β) in Eq. (7). We find that usage of the bulk values of Δμ yields a systematic error of at least a factor of 2 in the barrier height estimates due to the “seeding” method (Fig. 9(b)). Although the approach employing a two parameter fit of Eq. (14) can yield reasonable agreement between EPS and the “seeding” method data, usage of Δμ = Δμfit leads to barrier estimates which vary greatly with temperature, saturation, and the choice of the system’s kinetics (Fig. 9(c)). Setting Δμ = ΔμEPS, on the other hand, recovers barrier height estimates which are in excellent agreement with the explicit free energy calculations for f ≥ 3 independent of the system’s kinetics, yet deviate by, roughly, 43% for kBT ∈ {0.6, 0.65}, f = 2 (Fig. 9(c)). Thus, we argue that, even for a well chosen Δμ, the “seeding” method cannot guarantee an accurate estimate of the nucleation barrier height for low supersaturations in our model. In our case, particularly, the 43% error in barrier height can lead to underestimation of the nucleation rate by up to 10 orders of magnitude when using Eq. (8).
Comparison of nucleation barrier height estimates obtained via EPS and the “seeding” method. In (a) we plot the EPS estimates of barrier heights against , yielding good agreement with the linear trend for f ≥ 3 at the three values of kBT. In (b)–(d), we show the barrier height estimates obtained via the “seeding” method under DR (circles) and TR (squares) dynamics, taking the Δμ values as, respectively, Δμcoex, Δμfit, and ΔμEPS, in relation to the linear fits for f ≥ 3 to the EPS data. Error bars represent the 95% confidence intervals obtained via the CDF of noncentral normal quotients.
Comparison of nucleation barrier height estimates obtained via EPS and the “seeding” method. In (a) we plot the EPS estimates of barrier heights against , yielding good agreement with the linear trend for f ≥ 3 at the three values of kBT. In (b)–(d), we show the barrier height estimates obtained via the “seeding” method under DR (circles) and TR (squares) dynamics, taking the Δμ values as, respectively, Δμcoex, Δμfit, and ΔμEPS, in relation to the linear fits for f ≥ 3 to the EPS data. Error bars represent the 95% confidence intervals obtained via the CDF of noncentral normal quotients.
The kinetic prefactor ρDnZ of Eq. (8) can readily be obtained from our calculations, with estimates ζn(n∗) of Dn being, as to be expected, the only significant kinetics dependent contribution to the rate J for Δμ = ΔμEPS. For the explicitly computed nucleation barriers, the Zeldovich factor can be estimated via a parabolic fit near n†, which allows us to compare rate estimates due to EPS and the “seeding” method. Taking into account the statistical errors in our estimates, we arrive at the same observations analysing J as a function of Δμcoex as outlined above, obtaining reasonable agreement with the CNT prediction .
VI. CONCLUSIONS
We have introduced a novel multicomponent lattice model, in which the slow growth limited solute crystallisation pathway proceeds via the metastable disordered and partially disordered solute phases. We have shown that the heights of the barriers to nucleation of the metastable phases in relation to that of the stable crystal vary with temperature, leading to a parameter regime where the free energy barriers to nucleation of all three phases are equal. Due to the low barriers to solid state transformation between the three solute phases, we argue that the present model cannot expect to favour the dissolution–regrowth pathway relevant to homogeneous nucleation of calcite from solution.10 Design of minimal models to capture this process needs to incorporate large energy barriers to direct transformation between solute-rich phases. We hope to report on such models in a future communication.
Turning our attention to nucleation kinetics and the “seeding” method, given that the nucleation barrier height is exponentiated in the CNT rate expression, the largest contribution to error in rate estimates via the “seeding” method lies in the barrier reconstruction. We have shown how to estimate statistical uncertainties in this procedure, and hence demonstrated statistically significant deviation of nucleation barrier height estimates from those obtained via explicit free energy calculations, subject to the definition of the CNT parameter Δμ. We found that the discrepancies between the two barrier estimation methods vanish over a broad range of parameter space, with the exception of conditions of lower supersaturation, if using values of Δμ informed by the shape of the explicitly computed free energy barriers. At lower supersaturations, however, the discrepancies between the two methods remain significant, which cannot be explained by the errors in our calculations. A possible source for these discrepancies lies in our choice of reaction coordinate. It is known that the choice of the reaction coordinate plays a major role in quantitative treatment of nucleation.57 While we cannot completely rule out the possibility of having chosen a suboptimal definition of n, results of the committor test58 suggest the quality of our definition is at worst comparable to those used in off-lattice simulations.
Additional explanations for the failure of the “seeding” method to correctly reconstruct F(n) at weak supersaturation lie in the underlying assumptions of CNT. The functional form of ΔF(n) remains a subject of debate.22,59 In this work we have not made any assumptions regarding the shape of nuclei except that surface area scales as n2/3. The single cluster assumption underlying this statement is potentially problematic close to n = 0 where the entropic benefit of multiple clusters may offset the increased interfacial free energy. This trade-off is system size dependent and not corrected for in our free energy calculations.
The “seeding” method also relies on the assumption that the dynamics of n(t) can be approximated by the over-damped Langevin equation. This assumption enters into the calculation of the barrier height as well as the kinetic prefactor. The validity of the Markov assumption for n(t) is expected to depend on the choice of microscopic kinetics. It is known that TR dynamics yield only approximately Markovian n(t), while the behaviour of n(t) under the more realistic mass transport of DR dynamics completely violates the Markov assumption.23 Despite the apparent differences between the two sets of kinetics, we observed a remarkable agreement between the two corresponding free energy barrier reconstructions, particularly at higher supersaturations, suggesting that the Markov assumption does not play a great role here. The agreement also suggests that the “seeding” method can recover consistent barrier estimates in different regimes of nuclei growth, provided that the structures of the initial seeds are correctly prepared. In future communications, we will report on direct calculations of nucleation rates via path sampling methods, with the hope to examine more closely the role of the Markov assumption in computing accurate nucleation rates.
Acknowledgments
We would like to thank Stefan Grosskinsky (Centre for Complexity Science, University of Warwick), Carina Karner (Computational Physics, University of Vienna), and the authors of Ref. 16 for useful discussions and advice. We are also grateful for support from EPSRC, Grant Nos. EP/E501311/1 and EP/H00341X/1. Additionally, we gratefully acknowledge the use of facilities provided by the University of Warwick Centre for Scientific Computing and Queen Mary’s MidPlus computational facilities, supported by QMUL Research-IT, and funded by EPSRC Grant No. EP/K000128/1. We thank the anonymous referee for providing detailed and constructive feedback for this article. Data associated with this work is available via the Warwick Research Archive Portal at http://wrap.warwick.ac.uk/81418/.
APPENDIX: SEEDING METHOD FOR BROWNIAN MOTION
We now illustrate the “seeding” approach by computing barrier crossing rates jx of a one-dimensional Brownian particle in the bistable potential U(x) = x4 − 2x2. The equivalent to the CNT rate expression [Eq. (8)] for this system is the Kramers’60 rate,
where Dx = kBT is the diffusivity of the Brownian particle, ΔU = U(0) − U(−1) is the potential barrier height, and U″(x) = 12x2 − 4. Employing the BAOAB-limit method,61 we estimate the initial drifts and Dx taking x0 ∈ { − 0.9, − 0.8, …, 0.9} at kBT ∈ {0.15, 0.175, …, 0.25}. We employ the estimators vx(x0) and ζx(x0), computed in the same fashion as shown by Eqs. (11) and (12), taking Δt = 10−4 and W ∈ {5, 10, 100}.
Assuming that on short time scales the coordinate x(t) evolves according to , it is straightforward to obtain the exact distributions of Δx(x0, t) and SDx(x0, t) as, respectively, normal — , and gamma — Γ(0.5, 4Dxt). While the estimator vx is clearly normally distributed, the exact form of the distribution of ζx is much more complex,62 however we find that for M ≥ 100 the normal distribution gives a reasonable approximation (Fig. 10(a)). This is to be expected, since the probability distribution of the sum is Γ(M/2, 4MDxt), which is well approximated by the normal distribution for large M.
Typical distributions of the estimator ζx at kBT = 0.15, W = 10 and varying M (markers), and the corresponding normal distribution fits (solid lines). The deviation from normality is clear for low M; however, it is negligible at M = 100. The M = 1 estimator corresponds to fitting linear trends to individual trajectories and the resultant distribution is highly asymmetric.
Typical distributions of the estimator ζx at kBT = 0.15, W = 10 and varying M (markers), and the corresponding normal distribution fits (solid lines). The deviation from normality is clear for low M; however, it is negligible at M = 100. The M = 1 estimator corresponds to fitting linear trends to individual trajectories and the resultant distribution is highly asymmetric.
In light of the computational expense of obtaining large numbers of trajectory sets, it might be tempting to estimate the error in ζx by fitting least squares lines to individual realisations of . While this approach will recover the correct mean value, the distribution of estimates will drastically deviate from normal (Fig. 10(b)) yielding a wide and highly asymmetric confidence bound. In fact, for low M the estimator ζx(x†), where x† = 0 is the coordinate of the barrier top, does not correspond to the maximum likelihood estimator of Dx.
We find that both 〈Δx(x0, t)〉 and 〈SDx(x0, t)〉 are well fitted by the linear model over the chosen range of times and temperatures (Fig. 11). While the estimates of drift velocities match the gradient of the potential well (Fig. 12(a)), we observe a parabolic deviation of the diffusion coefficient estimates ζx from the known value Dx = kBT (Fig. 12(b)). In particular, at the top of the barrier, we expect values of ζx between 1% and 4% higher than kBT. Factoring in the error of barrier height and Dx estimates due to increasingly high drift contributions to 〈SDx(x0, t)〉 away from the barrier top, we observe good agreement of Eq. (A1) with brute force estimates of inverse mean first passage times63 (Fig. 12(c)). It is important to highlight that the derivation of Eq. (A1) relies on harmonic approximations of the potential U(x) near the top and bottom of the barrier. These approximations, in our case, lead to small errors (Fig. 12(c)) which are cancelled by the systematic errors in estimates of the diffusion coefficient. A more accurate value of jx can be obtained by numerically integrating the Kramers’60 expression for the stationary current: , out of the quasi-stationary state near x = − 1, of the appropriate Fokker–Planck equation.
Mean displacements (a) and mean squared displacements (b) for x(t) against time at kBT = 0.15, M = W = 102. Error bars represent the 95% confidence intervals, the width of which is typically smaller than that of the markers. The solid lines in (a) are drawn with the gradient given by ∇xU(x0). In (b), the solid lines represent the linear least squares fits to the data.
Mean displacements (a) and mean squared displacements (b) for x(t) against time at kBT = 0.15, M = W = 102. Error bars represent the 95% confidence intervals, the width of which is typically smaller than that of the markers. The solid lines in (a) are drawn with the gradient given by ∇xU(x0). In (b), the solid lines represent the linear least squares fits to the data.
Mean drift velocity (a), diffusion coefficient (b), and rate (c) estimates for the Brownian dynamics in U(x) = x4 − 2x2. In (a), the mean drifts are in excellent agreement with the gradient of the potential shown as a solid line. In (b), the estimates of the diffusion coefficient vary parabolically in the vicinity of the barrier top—the solid line shows a parabolic fit to the data to guide the eye. In (c), the points represent brute force computed inverse mean first passage time estimates, dashed lines show the range of rate estimates obtained via the procedure analogous to the “seeding” method, while the solid grey line gives the more accurate values of jx computed via numerical integration of the Kramers’ stationary current equation. The dashed–dotted line in (c) is the plot of Eq. (A1) taking βDx = 1.
Mean drift velocity (a), diffusion coefficient (b), and rate (c) estimates for the Brownian dynamics in U(x) = x4 − 2x2. In (a), the mean drifts are in excellent agreement with the gradient of the potential shown as a solid line. In (b), the estimates of the diffusion coefficient vary parabolically in the vicinity of the barrier top—the solid line shows a parabolic fit to the data to guide the eye. In (c), the points represent brute force computed inverse mean first passage time estimates, dashed lines show the range of rate estimates obtained via the procedure analogous to the “seeding” method, while the solid grey line gives the more accurate values of jx computed via numerical integration of the Kramers’ stationary current equation. The dashed–dotted line in (c) is the plot of Eq. (A1) taking βDx = 1.