We present a strategy to construct guiding distribution functions (GDFs) based on variance minimization. Auxiliary dynamics via GDFs mitigates the exponential growth of variance as a function of bias in Monte Carlo estimators of large deviation functions. The variance minimization technique exploits the exact properties of eigenstates of the tilted operator that defines the biased dynamics in the nonequilibrium system. We demonstrate our techniques in two classes of problems. In the continuum, we show that GDFs can be optimized to study the interacting driven diffusive systems where the efficiency is systematically improved by incorporating higher correlations into the GDF. On the lattice, we use a correlator product state ansatz to study the 1D weakly asymmetric simple exclusion process. We show that with modest resources, we can capture the features of the susceptibility in large systems that mark the phase transition from uniform transport to a traveling wave state. Our work extends the repertoire of tools available to study nonequilibrium properties in realistic systems.
I. INTRODUCTION
Large deviation theory (LDT) is a framework to extend the formalism of equilibrium statistical mechanics to nonequilibrium systems.1 Much of LDT is concerned with summarizing the dynamics of the system, as expressed via the fluctuations of typical trajectories. Ensembles of rare trajectories display a fascinating behavior reminiscent of phase transitions and criticality. Recent work illustrates such a dynamical behavior both in lattice systems, such as simple exclusion processes,2–9 constrained kinetic models,10,11 models of self-assembly,12–14 and dissipative hydrodynamics,15,16 and in continuum systems in the form of driven or active Brownian particles,14,17–21 as well as open quantum systems.22–24 Recently, LDT has also been shown to offer a route to calculating nonlinear transport coefficients in model and realistic systems.25,26
Accessing properties of interest, in all but the simplest systems, requires numerical tools. In the context of LDT, this takes the form of sampling techniques such as the cloning algorithm or diffusion Monte Carlo (DMC)27–32 and transition path sampling (TPS) or path integral Monte Carlo,33,34 or representing the non-equilibrium distribution with an explicit ansatz, for example, matrix product states.9,11,35 The primary challenge in sampling methods is the problem of exponential variance in the estimator for the large deviation function as a function of the bias. Several approaches have been suggested to ameliorate this variance problem that can be interpreted as different forms of importance sampling. Nemoto et al. demonstrated the use of control forces that can be parameterized and optimized via a feedback loop.10,32 Klymko et al. formulated a probability-conserving reference model to umbrella sample trajectories for certain growth processes.13 A different approach has been formulated by Jacobson and Whitelam to directly compute the rate function via an optimizable variational ansatz.36 In previous work, we showed that guiding distribution functions (GDFs), as introduced for quantum DMC calculations,37 define an auxiliary dynamics that can be used to importance sample computations of large deviation functions under bias20 and highlighted the connection to the generalized Doob’s transform.38–40 In the current work, we describe a practical numerical technique to generate good guiding distribution functions in both lattice and continuum simulations of large deviation functions using the idea of variance minimization. This again draws from the quantum field and, in particular, the methods of variational Monte Carlo (VMC).41 There has been other recent work on optimizing auxiliary dynamics by using a variational estimator to determine a GDF for a continuum system;42 our work provides a different perspective based on different techniques that are less computationally expensive than performing a DMC based optimization.
II. THEORY
In the current work, the quantity of interest in large deviation theory is the cumulant generating function (CGF) ψ(λ) which is analogous to the free energy of equilibrium statistical mechanics. It is computed via an ensemble average over trajectories given by
where λ is a field conjugate to the observable , with o being an arbitrary function of configurations () at adjacent times, t+ and t−, and tN is the final trajectory time. is the likelihood of a given trajectory . The trajectories are generated by the master equation , where is the probability of a configuration of the system, , at time t, and is a linear operator. It can be shown that ψ(λ) is the largest eigenvalue of a “tilted” operator , i.e., , where ⟨Ξ| is the corresponding dominant left eigenvector.43 The effect of the tilt is to reweight the transition probabilities of . In the discrete case, , where is the exit rate.
is not a Markovian operator (i.e., the sum of transition probabilities is not normalized). Consequently, when calculating ψ(λ) via Monte Carlo techniques, it is necessary to track this additional normalization constant or weight, whose variance grows exponentially with |λ|. It is desirable, thus, to instead consider an auxiliary dynamics generated by a modified operator (related to the generalized Doob’s transformation18,38) from which the CGF can be obtained as
where the diagonal matrix is constructed from the dominant left eigenvector of [i.e., ], is the uniform left vector, and |p0⟩ is the initial distribution of configurations. The normalization of is completely independent of the configuration, and the variance due to the bias is removed; thus, the transformation by carries out a form of importance sampling. Although propagation with requires the knowledge of the exact eigenvectors, in the GDF approach, we simply approximate these eigenvectors with guiding functions of our own construction and carry out dynamics with using the diffusion Monte Carlo algorithm.20 The quality of the GDF importance sampling then depends on the degree of overlap between the approximate and the exact left eigenvector of . The problem is therefore reduced to finding a high quality GDF in order to compute the CGF and its associated cumulants with good statistical efficiency.
As mentioned earlier, we can determine GDFs using ideas that originate from quantum diffusion Monte Carlo calculations, where the analogous problem is to determine a guiding function that best approximates the ground-state of a quantum Hamiltonian.41 This is termed a variational Monte Carlo (VMC) calculation. While energy minimization is commonly used for this purpose,44,45 is not Hermitian, and thus, its spectrum is not necessarily bounded. However, we can use an associated property of eigenstates, viz., the variance of the quantity,
must vanish for eigenstates. This quantity, which we call the local CGF, is the analog of the local energy for which variance minimization has previously been explored in quantum Monte Carlo.41 The nonequilibrium variance minimization problem thus corresponds to minimizing
where is the average of the local CGF, {p} are the parameters used to characterize the GDF, , and is a normalized sampling distribution. The calculation involves minimizing Eq. (4) with respect to {p} over a fixed set of configurations . Note that (4) can be sampled from any , but for , we obtain
which is an estimator for ψ(λ) in the sense that the approximate sign is replaced by an equality for a GDF, that is, the exact eigenstate. Despite the absence of a bounded variational principle, this estimator of ψ(λ) in practical terms is often useful, and we refer to this as the VMC estimator for ψ(λ), in complete correspondence to its quantum counterpart. Alternate forms of could be used to control the statistical error of the VMC estimator σ2({p}, λ), but we have not explored this dependence.
Whereas a strictly zero variance is guaranteed for eigenstates, we emphasize that a smaller (non-zero) variance of the local CGF does not strictly imply a better GDF. A more rigorous metric is the actual reduction of the standard deviation in the estimate of the CGF [ε(ψ)] from DMC (or indeed TPS, which can easily be adapted to use a GDF). Decreasing ε(ψ), thus, defines a better GDF, and this is reflected in statistical independence of samples (the trajectories which are being generated). For the DMC algorithm, the indicator of statistical independence is the fraction of independent walkers fI, which is the ratio of the number of unique walkers at a given simulation time (t) to the total number of walkers.20,32 These walkers are ancestors for all walkers at simulation time t′ > t. Generally, walkers at later times are descended from a smaller set of (unique) walkers at previous times, leading to a growth in correlated walkers as the simulation proceeds. Empirically, fI is seen to be less susceptible to statistical noise than ε(ψ) but (for a given λ) still bears a monotonic relationship with it, and we will, therefore, primarily use fI as the measure of quality of a GDF. In the case of perfect importance sampling, using the exact auxiliary dynamics, fI is equal to 1 at all times. In the other limit, if all walkers are correlated, fI = 0. Because fI(t) measures the correlation among walkers as a function of time, it must be smallest at t = 0,14,20 which is what we will report. It is important to note that although improvements in fI yield improvements in ε(ψ), the relationship between the two is not linear.
III. RESULTS
To demonstrate our procedure, we carry out simulations on a continuum system and a lattice model. For the continuum system, we consider the prototypical driven Brownian walker, where the observable of interest is the entropy production. This system consists of N = 10 particles (at location R = {ri}) moving in a periodic potential [V(r) = 0 cos(2πr)] on a ring of size L = 1.0 under the influence of an external driving force (f) and a fluctuating field represented by Gaussian white noise. These particles also interact via a pairwise repulsive force [ and ]. The transformed tilted propagator that includes auxiliary dynamics due to the GDF is given by20
where . The adjoint operator represents the norm-breaking term that is handled via branching. Trajectories for are generated from Langevin dynamics , where the random force, ηi, satisfies ⟨ηi(t)⟩ = 0 and ⟨ηi(t)ηi(t′)⟩ = 2δ(t − t′). The entropy production s(t) is reflected in the biasing term exp[λtNs(t)] with which is absorbed into the dynamics of the tilted Fokker–Planck operator.14,20,46
For this system, we parameterize the GDF as , where the single-particle function, ϕ(ri), is obtained from the non-interacting eigenstate of [] generated with M = 101 plane wave modes20 and . The parameters are estimated by minimizing (4).
Shown in Fig. 1(a) is the large deviation function and fI computed using the no guiding function, the non-interacting guiding function (J = 1), and the variational form above (a similar form has been explored independently42). We parameterized J(k1, k2) with 241 parameters (21 plane waves per particle), and we minimized the variance such that σ2({p}, λ) < 4.0 × 10−3 for all λs we considered. The minimization procedure is started at small |λ| using an initial guess for parameters that produces a uniform state since we know that for λ = 0, the exact ⟨Ξ| is uniform. The optimized parameters are used as an initial guess for the next (nearby) λ. A trace of the minimization is shown in Fig. 2 at λ = −0.5 obtained using a simple simplex algorithm. For this system, the minimization required 10 000 configurations [sampled from the guessed GDF as the distribution ] for −0.3 ≤ λ ≤ 0.3 (λ ≠ 0) in order to avoid getting stuck in local minima (which produced a poor GDF). For |λ| > 0.3, 2000 configurations were sufficient.
(a) Large deviation function for the entropy production of N = 10 driven Brownian particles in a periodic potential with v0 = 2 and f = 1.0. The Gaussian interaction force is given by rc = 0.10 and α = 10.0. The main figure shows the CGF as a function of λ computed using DMC using the guiding function and directly from the GDF via the VMC estimator [Eq. (5)]. The DMC calculations are done with different GDFs [no GDF, non-interacting (NI) GDF, and variational form in main text], but all converge to the same estimate although the efficiencies are different. The inset shows these efficiencies measured via the fraction of independent walkers (fI). (b) Improvement in the fraction of independent walkers (fI) relative to calculations done without any auxiliary dynamics () for the same 1D Brownian problem. The inset shows the corresponding improvement in the standard deviation of the CGF [ϵ(ψ)]. Error bars are smaller than symbol sizes. No error estimate is available for the standard deviation (see text).
(a) Large deviation function for the entropy production of N = 10 driven Brownian particles in a periodic potential with v0 = 2 and f = 1.0. The Gaussian interaction force is given by rc = 0.10 and α = 10.0. The main figure shows the CGF as a function of λ computed using DMC using the guiding function and directly from the GDF via the VMC estimator [Eq. (5)]. The DMC calculations are done with different GDFs [no GDF, non-interacting (NI) GDF, and variational form in main text], but all converge to the same estimate although the efficiencies are different. The inset shows these efficiencies measured via the fraction of independent walkers (fI). (b) Improvement in the fraction of independent walkers (fI) relative to calculations done without any auxiliary dynamics () for the same 1D Brownian problem. The inset shows the corresponding improvement in the standard deviation of the CGF [ϵ(ψ)]. Error bars are smaller than symbol sizes. No error estimate is available for the standard deviation (see text).
VMC minimization trace of the variance for the driven Brownian walker system for λ = 0.5. The inset shows how the local CGF changes as the minimization proceeds (see text).
VMC minimization trace of the variance for the driven Brownian walker system for λ = 0.5. The inset shows how the local CGF changes as the minimization proceeds (see text).
From the reduction in fI [inset of Fig. 1(a)], it is evident that continuum calculations can be made much more efficient with appropriate GDFs. As noted earlier, the improvement in fI estimated at t = 0 can imply greater efficiency gains when the full trajectory space is considered. In Fig. 1(b) (inset), we show the improvement in the corresponding standard deviation in the subsequent DMC calculation, which can be reduced by a large factor, although statistical noise in this measure means that it is difficult to give a precise estimate of the factor. Additionally, although we used the same observation time (tN) to compute the CGF and cumulants for all types of sampling, we find that the results converge much faster with tN for simulations done with auxiliary dynamics.
We now consider an interacting non-equilibrium problem on a lattice, namely, the current fluctuations of a periodic weakly asymmetric simple exclusion process (WASEP).47 The WASEP models the transport of N particles on a lattice with L sites. Here, N is chosen as 0.3L (rounded to the nearest integer). The configuration of the particles is defined by a set of occupation numbers with hard-core constraints, ni = {0, 1}, e.g., . The tilted propagator for a current bias λ,
yields particles hopping to the right with rate and to the left with rate , where the 1/L factor in hopping gives the weakly asymmetric limit, whose large scale behavior (universality class) is described by the Edwards–Wilkinson equation.4,48 Here, () creates (destroys) particles on site i, and ni () counts the number of particles (holes). For subsequent calculations, we set E = 10 and use periodic boundary conditions, i = L + 1 → 1 and i = 0 → L.
Unlike the continuum system, where the soft-core interaction means the non-interacting solution is a sensible starting point to construct the GDF, the hard-core interaction requires a different treatment. Here, we consider a GDF that is a product purely of n-particle correlation factors, e.g., for n = 2, , where rp, rq denote the positions of the particles in the configuration and for n = 3, we use J3(rp, rq, rs), and the variational parameters are the values [i.e., J2(rp, rq)]. This form is sometimes referred to as a correlator product state (CPS) in quantum systems.50,51 To enforce periodic boundary conditions (PBC), we use J2(rpq), J3(rpq, rqr, rqs) where the inter-particle distances are defined with a minimum image convention, i.e., rpq = min(|rp − rq|, L − |rp − rq|), and J2, J3 are symmetric under cyclic permutations of their arguments (cyclic permutation symmetry, rather than full symmetry, was used to reflect the handedness of hopping around in the model). To minimize the number of variational parameters in the large calculations below, for rpq > Rcut, we set J3 = 1, where Rcut is a cutoff distance. For comparison, we have also considered a GDF of the cluster mean-field (CMF) form described in the work by Ray et al.20
Figure 3(a) shows a comparison of fI as a function of λ for a L = 16 model with different GDFs. In the WASEP, the short-range CMF is unable to capture the long-range correlations present in the system, and therefore, we do not get much improvement in efficiency using this GDF. However, with the correlator product state, we can obtain large improvements [e.g., we see from Fig. 3(b) that using J3, ε(ψ) is improved by two orders of magnitude].
(a) Fraction of independent walkers (fI) for different types of GDF [no GDF, non-interacting GDF, and optimized variational form (see main text)] for an L = 16 WASEP model. The most flexible GDF is provided by the J3 ansatz. We find that the efficiency gains from using the GDF systematically improve from the CMF ansatz to the J2 and J3 ansatz. (b) Improvement in the fraction of independent walkers (fI) relative to calculations done without any auxiliary dynamics (). The inset shows the corresponding improvement in the standard deviation of the CGF [ϵ(ψ)]. Error bars are smaller than symbol sizes. No error estimate is available for the standard deviation (see text).
(a) Fraction of independent walkers (fI) for different types of GDF [no GDF, non-interacting GDF, and optimized variational form (see main text)] for an L = 16 WASEP model. The most flexible GDF is provided by the J3 ansatz. We find that the efficiency gains from using the GDF systematically improve from the CMF ansatz to the J2 and J3 ansatz. (b) Improvement in the fraction of independent walkers (fI) relative to calculations done without any auxiliary dynamics (). The inset shows the corresponding improvement in the standard deviation of the CGF [ϵ(ψ)]. Error bars are smaller than symbol sizes. No error estimate is available for the standard deviation (see text).
In order to illustrate the flexibility of this ansatz, we have further performed calculations for different system sizes L = 24–96. Due to the reduction in standard deviation, despite the large system size, we needed only a modest number of walkers in the DMC procedure Nw = 10 000–120 000, which was sufficient to estimate the susceptibility (computed as a correlation function). For these calculations, in order to reduce the number of parameters, we used a cutoff distance of Rcut = 16 for J3. The variance minimization was carried out using 4000–8000 fixed configurations sampled from the GDF to optimize 85–155 parameters depending on the system size. The error of the VMC estimator for ψ(λ), as a measure of the GDF quality, is shown in Fig. 4(a), where we see that the relative error grows with |λ| as more particles become correlated. Note that for these calculations, we did not spend a lot of effort to optimize the more extreme λ values as we were interested in the range over which the susceptibility peaks, where the VMC error is <5%. In this model, the trajectories undergo a continuous phase transition from a uniform state to a traveling wave at some λc, provided that , where ρ = N/L.3,49 This can be detected from the growing susceptibility of the system shown in Fig. 4(b). Macroscopic Fluctuation Theory (MFT) suggests that (for ρ = 0.3). However, quantitative accuracy of MFT is not guaranteed, and although a Bethe ansatz formulation exists, the precise value of λc for L → ∞ is not explicitly known.4 Our DMC calculations suggest a critical point near λc ∼ −3.9 for the largest system sizes (L = 64 and 96). We note that our results for ψ(λ/L) are in excellent agreement with the largest system size (L = 64) considered in the work of Espigares et al.,49 also plotted in the inset of Fig. 4(a).
(a) Relative error of the CGF obtained from VMC as compared to DMC. Note that as the system size is increased and more particles are involved, the GDF is unable to capture the full extent of the correlations. Nonetheless, the DMC calculations still benefit from the GDF in terms of statistical efficiency. The inset shows a comparison of DMC and VMC results for the CGF [ψ(λ/L)]. Also included are data from Espigares et al.49 for L = 64 indicated as DMC (H). (b) Susceptibility [χ(λ/L)] of the 1D WASEP for different system sizes. The dashed line at λ = 3.9 indicates the possible location of the continuous phase transition in the limit of L → ∞.
(a) Relative error of the CGF obtained from VMC as compared to DMC. Note that as the system size is increased and more particles are involved, the GDF is unable to capture the full extent of the correlations. Nonetheless, the DMC calculations still benefit from the GDF in terms of statistical efficiency. The inset shows a comparison of DMC and VMC results for the CGF [ψ(λ/L)]. Also included are data from Espigares et al.49 for L = 64 indicated as DMC (H). (b) Susceptibility [χ(λ/L)] of the 1D WASEP for different system sizes. The dashed line at λ = 3.9 indicates the possible location of the continuous phase transition in the limit of L → ∞.
IV. CONCLUSIONS
In this article, we showed how to compute guiding distribution functions using the technique of variance minimization originating in variational Monte Carlo calculations of quantum systems. This provides a systematic route to statistically efficient Monte Carlo computation of large deviation functions using the diffusion Monte Carlo or cloning algorithm, which we demonstrated in the continuum for the problem of the Brownian walker, as well as on the lattice for the WASEP model. The very general nature of variance minimization means that the possibilities for different guiding distribution functions are limited only by one’s imagination and they can be adapted to very complex systems. The form of the GDF should typically be motivated by properties of the system. For instance, we have demonstrated that for the continuum system, the repulsive nature of the interactions can be accommodated by a Jastrow type of GDF. Similarly, the long-range order present in the WASEP case can be accounted for by the CPS form. Other types of forms may be motivated by strategies employed in quantum ground state calculations, as the Backflow ansatz,52 and, additionally, different forms can be combined. Finally, we note that obtaining a good form for the guiding distribution function, much like obtaining a compact wavefunction in a quantum problem, is likely to provide important analytic insights into the behavior of the non-equilibrium system of interest.
ACKNOWLEDGMENTS
The authors would like to thank Rob Jack, Vivien Lecomte, Juan P. Garrahan, and David Limmer for fruitful and engaging discussions. U.R. was supported by the Simons Collaboration on the Many-Electron Problem and the California Institute of Technology. G.K.-L.C. is a Simons Investigator in Theoretical Physics and was supported by the California Institute of Technology and the U.S. Department of Energy, Office of Science, via Grant No. DE-SC0018140. These calculations were performed with CANSS, available at https://github.com/ushnishray/CANSS.
APPENDIX A: FOKKER–PLANCK OPERATOR FOR THE CONTINUUM
In the main text, we described how the computation of the cumulant generating large deviation function (CGF) can be greatly enhanced with the appropriate form of auxiliary dynamics. The GDF structure that we have introduced in the 1D case is of the form . To carry out simulations with this GDF, we require the adjoint of the transformed Fokker–Planck operator. For the Brownian walker in the main text, we find
where the non-interacting single-particle operator is given by
where γi(ri) = f − ∂iV(ri) is the effective single-particle force acting on a particle. This expression provides the norm-breaking term for a subsequent DMC calculation and can be generalized straightforwardly to higher dimensions.
APPENDIX B: STATISTICAL INDEPENDENCE AND EFFICIENCY IMPROVEMENT WITH AUXILIARY DYNAMICS
In the main text, we have mentioned that an improvement of the GDF is indicated by the improvement in the fraction of independent walkers (fI) estimated from DMC (this would be reflected in a corresponding decrease in the autocorrelation time for TPS), and we reported fI evaluated for t = 0. Here, we show the full fI(t) in Fig. 5 for calculations using different types of dynamics at λ = −5.0 for the 1D WASEP system. Note that fI(1) = 1, but as one goes back in simulation time, the walkers are descended from a smaller and smaller set of ancestors. The GDFs J2 and use the same J2 ansatz in the main text but illustrate the effect of optimizing for more steps in the variance minimization of J2, leading to an improved fI(t). This is also reflected in Table I, where the VMC estimator of the CGF appears to converge with an increase in the flexibility of the GDF, while the standard deviation is systematically improved.
Fraction of independent walkers (fI) as a function of normalized observation time for different types of auxiliary dynamics for λ = −5.0 (see text).
Fraction of independent walkers (fI) as a function of normalized observation time for different types of auxiliary dynamics for λ = −5.0 (see text).
CGF estimate and standard deviation for different auxiliary dynamics at λ = −0.5 (see text).
GDF . | ψ(λ) . | ε(λ) . |
---|---|---|
No IS | −0.501 2 | 1.3 × 10−4 |
−0.501 13 | 9.2 × 10−5 | |
J2 | −0.501 06 | 2.9 × 10−5 |
J3 | −0.501 049 | 4.9 × 10−6 |
GDF . | ψ(λ) . | ε(λ) . |
---|---|---|
No IS | −0.501 2 | 1.3 × 10−4 |
−0.501 13 | 9.2 × 10−5 | |
J2 | −0.501 06 | 2.9 × 10−5 |
J3 | −0.501 049 | 4.9 × 10−6 |