We present a strategy to construct guiding distribution functions (GDFs) based on variance minimization. Auxiliary dynamics via GDFs mitigates the exponential growth of variance as a function of bias in Monte Carlo estimators of large deviation functions. The variance minimization technique exploits the exact properties of eigenstates of the tilted operator that defines the biased dynamics in the nonequilibrium system. We demonstrate our techniques in two classes of problems. In the continuum, we show that GDFs can be optimized to study the interacting driven diffusive systems where the efficiency is systematically improved by incorporating higher correlations into the GDF. On the lattice, we use a correlator product state ansatz to study the 1D weakly asymmetric simple exclusion process. We show that with modest resources, we can capture the features of the susceptibility in large systems that mark the phase transition from uniform transport to a traveling wave state. Our work extends the repertoire of tools available to study nonequilibrium properties in realistic systems.

## I. INTRODUCTION

Large deviation theory (LDT) is a framework to extend the formalism of equilibrium statistical mechanics to nonequilibrium systems.^{1} Much of LDT is concerned with summarizing the dynamics of the system, as expressed via the fluctuations of typical trajectories. Ensembles of rare trajectories display a fascinating behavior reminiscent of phase transitions and criticality. Recent work illustrates such a dynamical behavior both in lattice systems, such as simple exclusion processes,^{2–9} constrained kinetic models,^{10,11} models of self-assembly,^{12–14} and dissipative hydrodynamics,^{15,16} and in continuum systems in the form of driven or active Brownian particles,^{14,17–21} as well as open quantum systems.^{22–24} Recently, LDT has also been shown to offer a route to calculating nonlinear transport coefficients in model and realistic systems.^{25,26}

Accessing properties of interest, in all but the simplest systems, requires numerical tools. In the context of LDT, this takes the form of sampling techniques such as the cloning algorithm or diffusion Monte Carlo (DMC)^{27–32} and transition path sampling (TPS) or path integral Monte Carlo,^{33,34} or representing the non-equilibrium distribution with an explicit ansatz, for example, matrix product states.^{9,11,35} The primary challenge in sampling methods is the problem of exponential variance in the estimator for the large deviation function as a function of the bias. Several approaches have been suggested to ameliorate this variance problem that can be interpreted as different forms of importance sampling. Nemoto *et al.* demonstrated the use of control forces that can be parameterized and optimized via a feedback loop.^{10,32} Klymko *et al.* formulated a probability-conserving reference model to umbrella sample trajectories for certain growth processes.^{13} A different approach has been formulated by Jacobson and Whitelam to directly compute the rate function via an optimizable variational ansatz.^{36} In previous work, we showed that guiding distribution functions (GDFs), as introduced for quantum DMC calculations,^{37} define an auxiliary dynamics that can be used to importance sample computations of large deviation functions under bias^{20} and highlighted the connection to the generalized Doob’s transform.^{38–40} In the current work, we describe a practical numerical technique to generate good guiding distribution functions in both lattice and continuum simulations of large deviation functions using the idea of variance minimization. This again draws from the quantum field and, in particular, the methods of variational Monte Carlo (VMC).^{41} There has been other recent work on optimizing auxiliary dynamics by using a variational estimator to determine a GDF for a continuum system;^{42} our work provides a different perspective based on different techniques that are less computationally expensive than performing a DMC based optimization.

## II. THEORY

In the current work, the quantity of interest in large deviation theory is the cumulant generating function (CGF) *ψ*(*λ*) which is analogous to the free energy of equilibrium statistical mechanics. It is computed via an ensemble average over trajectories given by

where *λ* is a field conjugate to the observable $O=\u2211t=1tNo(Ct+,Ct\u2212)$, with *o* being an arbitrary function of configurations ($Ct$) at adjacent times, *t*+ and *t*−, and *t*_{N} is the final trajectory time. $P[C(tN)]$ is the likelihood of a given trajectory $C(tN)={C0,C1,\u2026,CtN}$. The trajectories are generated by the master equation $\u2202tpt(C)=Wpt(C)$, where $pt(C)$ is the probability of a configuration of the system, $C$, at time *t*, and $W$ is a linear operator. It can be shown that *ψ*(*λ*) is the largest eigenvalue of a “tilted” operator $W\lambda $, i.e., $\u2009\Xi |W\lambda =\u2009\Xi |\psi (\lambda )$, where ⟨Ξ| is the corresponding dominant left eigenvector.^{43} The effect of the tilt is to reweight the transition probabilities of $W$. In the discrete case, $W\lambda (C,C\u2032)=W(C,C\u2032)e\u2212\lambda o(C,C\u2032)(1\u2212\delta C,C\u2032)\u2212R(C)\delta C,C\u2032$, where $R(C)=\u2211C\u2260C\u2032W(C,C\u2032)$ is the exit rate.

$W\lambda $ is not a Markovian operator (i.e., the sum of transition probabilities is not normalized). Consequently, when calculating *ψ*(*λ*) via Monte Carlo techniques, it is necessary to track this additional normalization constant or weight, whose variance grows exponentially with |*λ*|. It is desirable, thus, to instead consider an auxiliary dynamics generated by a modified operator $W\u0303\lambda =\Xi ^W\lambda \Xi ^\u22121$ (related to the generalized Doob’s transformation^{18,38}) from which the CGF can be obtained as

where the diagonal matrix $\Xi ^=\u2211C\Xi \u0303(C)|C\u2009\u2009C|$ is constructed from the dominant left eigenvector of $W\lambda $ [i.e., $\Xi \u0303(C)=\u27e8\Xi |C\u27e9$], $\u20091|$ is the uniform left vector, and |*p*_{0}⟩ is the initial distribution of configurations. The normalization of $W\u0303\lambda $ is completely independent of the configuration, and the variance due to the bias is removed; thus, the transformation by $\Xi ^$ carries out a form of importance sampling. Although propagation with $W\u0303\lambda $ requires the knowledge of the exact eigenvectors, in the GDF approach, we simply approximate these eigenvectors with guiding functions of our own construction and carry out dynamics with $W\u0303\lambda $ using the diffusion Monte Carlo algorithm.^{20} The quality of the GDF importance sampling then depends on the degree of overlap between the approximate and the exact left eigenvector of $W\lambda $. The problem is therefore reduced to finding a high quality GDF in order to compute the CGF and its associated cumulants with good statistical efficiency.

As mentioned earlier, we can determine GDFs using ideas that originate from quantum diffusion Monte Carlo calculations, where the analogous problem is to determine a guiding function that best approximates the ground-state of a quantum Hamiltonian.^{41} This is termed a variational Monte Carlo (VMC) calculation. While energy minimization is commonly used for this purpose,^{44,45} $W\lambda $ is not Hermitian, and thus, its spectrum is not necessarily bounded. However, we can use an associated property of eigenstates, viz., the variance of the quantity,

must vanish for eigenstates. This quantity, which we call the *local CGF*, is the analog of the *local energy* for which variance minimization has previously been explored in quantum Monte Carlo.^{41} The nonequilibrium variance minimization problem thus corresponds to minimizing

where $\u27e8\Lambda ({p})\u27e9\u2261\u2211Cw(C)\Lambda ({p},C)$ is the average of the local CGF, {*p*} are the parameters used to characterize the GDF, $\Xi \u0303({p},C)$, and $w(C)$ is a normalized sampling distribution. The calculation involves minimizing Eq. (4) with respect to {*p*} over a fixed set of configurations ${C}$. Note that (4) can be sampled from any $w(C)$, but for $w(C)=\Xi \u0303({p},C)\Vert \Xi ^({p})\Vert $, we obtain

which is an estimator for *ψ*(*λ*) in the sense that the approximate sign is replaced by an equality for a GDF, that is, the exact eigenstate. Despite the absence of a bounded variational principle, this estimator of *ψ*(*λ*) in practical terms is often useful, and we refer to this as the VMC estimator for *ψ*(*λ*), in complete correspondence to its quantum counterpart. Alternate forms of $w(C)$ could be used to control the statistical error of the VMC estimator *σ*^{2}({*p*}, *λ*), but we have not explored this dependence.

Whereas a strictly zero variance is guaranteed for eigenstates, we emphasize that a smaller (non-zero) variance of the local CGF does not strictly imply a better GDF. A more rigorous metric is the actual reduction of the standard deviation in the estimate of the CGF [*ε*(*ψ*)] from DMC (or indeed TPS, which can easily be adapted to use a GDF). Decreasing *ε*(*ψ*), thus, defines a better GDF, and this is reflected in statistical independence of samples (the trajectories which are being generated). For the DMC algorithm, the indicator of statistical independence is the fraction of independent walkers *f*_{I}, which is the ratio of the number of unique walkers at a given simulation time (*t*) to the total number of walkers.^{20,32} These walkers are ancestors for all walkers at simulation time *t*′ > *t*. Generally, walkers at later times are descended from a smaller set of (unique) walkers at previous times, leading to a growth in correlated walkers as the simulation proceeds. Empirically, *f*_{I} is seen to be less susceptible to statistical noise than *ε*(*ψ*) but (for a given *λ*) still bears a monotonic relationship with it, and we will, therefore, primarily use *f*_{I} as the measure of quality of a GDF. In the case of perfect importance sampling, using the exact auxiliary dynamics, *f*_{I} is equal to 1 at all times. In the other limit, if all walkers are correlated, *f*_{I} = 0. Because *f*_{I}(*t*) measures the correlation among walkers as a function of time, it must be smallest at *t* = 0,^{14,20} which is what we will report. It is important to note that although improvements in *f*_{I} yield improvements in *ε*(*ψ*), the relationship between the two is not linear.

## III. RESULTS

To demonstrate our procedure, we carry out simulations on a continuum system and a lattice model. For the continuum system, we consider the prototypical driven Brownian walker, where the observable of interest is the entropy production. This system consists of *N* = 10 particles (at location **R** = {*r*_{i}}) moving in a periodic potential [*V*(*r*) = $v$_{0} cos(2*πr*)] on a ring of size *L* = 1.0 under the influence of an external driving force (*f*) and a fluctuating field represented by Gaussian white noise. These particles also interact via a pairwise repulsive force [$f^(ri,rj)=\u2212f^(rj,ri)$ and $|f^|=\alpha \u2061exp[\u2212(|ri\u2212rj|/rc)2]$]. The transformed tilted propagator that includes auxiliary dynamics due to the GDF $\Xi \u0303(R)$ is given by^{20}

where $Fi(R)=fx^\u2212\u2202iV(ri)x^+\u2211j\u2260if^(ri,rj)$. The adjoint operator $W\lambda \u2020=\u2211i\u2202i2+(Fi(R)\u22c5x^+2f\lambda )\u2202i+f\lambda (f\lambda +Fi(R)\u22c5x^)$ represents the norm-breaking term that is handled via branching. Trajectories for $W\u0303\lambda $ are generated from Langevin dynamics $\u2202tri=Fi(R)\u22c5x^+2f\lambda +2\u2202i\u2061ln\Xi \u0303(R)+\eta i$, where the random force, *η*_{i}, satisfies ⟨*η*_{i}(*t*)⟩ = 0 and ⟨*η*_{i}(*t*)*η*_{i}(*t*′)⟩ = 2*δ*(*t* − *t*′). The entropy production *s*(*t*) is reflected in the biasing term exp[*λt*_{N}*s*(*t*)] with $s(tN)tN=\u2211i\u222b0tNfri\u0307(\tau )d\tau $ which is absorbed into the dynamics of the tilted Fokker–Planck operator.^{14,20,46}

For this system, we parameterize the GDF as $\Xi \u0303(R)=\u220fi\varphi (ri)\u220fj<i[J(ri,rj)]$, where the single-particle function, *ϕ*(*r*_{i}), is obtained from the non-interacting eigenstate of $W\u0303\lambda $ [$f^(ri,rj)=0$] generated with *M* = 101 plane wave modes^{20} and $J(ri,rj)=\u2211k1,k2J\u0303(k1,k2)eik1r1+ik2r2$. The parameters $J\u0303(k1,k2)$ are estimated by minimizing (4).

Shown in Fig. 1(a) is the large deviation function and *f*_{I} computed using the no guiding function, the non-interacting guiding function (*J* = 1), and the variational form above (a similar form has been explored independently^{42}). We parameterized *J*(*k*_{1}, *k*_{2}) with 241 parameters (21 plane waves per particle), and we minimized the variance such that *σ*^{2}({*p*}, *λ*) < 4.0 × 10^{−3} for all *λ*s we considered. The minimization procedure is started at small |*λ*| using an initial guess for parameters that produces a uniform state since we know that for *λ* = 0, the exact ⟨Ξ| is uniform. The optimized parameters are used as an initial guess for the next (nearby) *λ*. A trace of the minimization is shown in Fig. 2 at *λ* = −0.5 obtained using a simple simplex algorithm. For this system, the minimization required 10 000 configurations [sampled from the guessed GDF as the distribution $w(C)$] for −0.3 ≤ *λ* ≤ 0.3 (*λ* ≠ 0) in order to avoid getting stuck in local minima (which produced a poor GDF). For |*λ*| > 0.3, 2000 configurations were sufficient.

From the reduction in *f*_{I} [inset of Fig. 1(a)], it is evident that continuum calculations can be made much more efficient with appropriate GDFs. As noted earlier, the improvement in *f*_{I} estimated at *t* = 0 can imply greater efficiency gains when the full trajectory space is considered. In Fig. 1(b) (inset), we show the improvement in the corresponding standard deviation in the subsequent DMC calculation, which can be reduced by a large factor, although statistical noise in this measure means that it is difficult to give a precise estimate of the factor. Additionally, although we used the same observation time (*t*_{N}) to compute the CGF and cumulants for all types of sampling, we find that the results converge much faster with *t*_{N} for simulations done with auxiliary dynamics.

We now consider an interacting non-equilibrium problem on a lattice, namely, the current fluctuations of a periodic weakly asymmetric simple exclusion process (WASEP).^{47} The WASEP models the transport of *N* particles on a lattice with *L* sites. Here, *N* is chosen as 0.3*L* (rounded to the nearest integer). The configuration of the particles is defined by a set of occupation numbers with hard-core constraints, *n*_{i} = {0, 1}, e.g., $C={0,1,\u2026,1,1}$. The tilted propagator for a current bias *λ*,

yields particles hopping to the right with rate $p=12eE/L$ and to the left with rate $q=12e\u2212E/L$, where the 1/*L* factor in hopping gives the weakly asymmetric limit, whose large scale behavior (universality class) is described by the Edwards–Wilkinson equation.^{4,48} Here, $b^i\u2020$ ($b^i$) creates (destroys) particles on site *i*, and *n*_{i} ($1\u2212ni$) counts the number of particles (holes). For subsequent calculations, we set *E* = 10 and use periodic boundary conditions, *i* = *L* + 1 → 1 and *i* = 0 → *L*.

Unlike the continuum system, where the soft-core interaction means the non-interacting solution is a sensible starting point to construct the GDF, the hard-core interaction requires a different treatment. Here, we consider a GDF that is a product purely of *n*-particle correlation factors, e.g., for *n* = 2, $\Xi \u0303(C)=\u220fp<qJ2(rp,rq)$, where *r*_{p}, *r*_{q} denote the positions of the particles in the configuration $C$ and for *n* = 3, we use *J*_{3}(*r*_{p}, *r*_{q}, *r*_{s}), and the variational parameters are the values [i.e., *J*_{2}(*r*_{p}, *r*_{q})]. This form is sometimes referred to as a correlator product state (CPS) in quantum systems.^{50,51} To enforce periodic boundary conditions (PBC), we use *J*_{2}(*r*_{pq}), *J*_{3}(*r*_{pq}, *r*_{qr}, *r*_{qs}) where the inter-particle distances are defined with a minimum image convention, i.e., *r*_{pq} = min(|*r*_{p} − *r*_{q}|, *L* − |*r*_{p} − *r*_{q}|), and *J*_{2}, *J*_{3} are symmetric under cyclic permutations of their arguments (cyclic permutation symmetry, rather than full symmetry, was used to reflect the handedness of hopping around in the model). To minimize the number of variational parameters in the large calculations below, for *r*_{pq} > *R*_{cut}, we set *J*_{3} = 1, where *R*_{cut} is a cutoff distance. For comparison, we have also considered a GDF of the cluster mean-field (CMF) form described in the work by Ray *et al.*^{20}

Figure 3(a) shows a comparison of *f*_{I} as a function of *λ* for a *L* = 16 model with different GDFs. In the WASEP, the short-range CMF is unable to capture the long-range correlations present in the system, and therefore, we do not get much improvement in efficiency using this GDF. However, with the correlator product state, we can obtain large improvements [e.g., we see from Fig. 3(b) that using *J*_{3}, *ε*(*ψ*) is improved by two orders of magnitude].

In order to illustrate the flexibility of this ansatz, we have further performed calculations for different system sizes *L* = 24–96. Due to the reduction in standard deviation, despite the large system size, we needed only a modest number of walkers in the DMC procedure *N*_{w} = 10 000–120 000, which was sufficient to estimate the susceptibility $\chi (\lambda /L)=d2\psi (\lambda /L)d(\lambda /L)2$ (computed as a correlation function). For these calculations, in order to reduce the number of parameters, we used a cutoff distance of *R*_{cut} = 16 for *J*_{3}. The variance minimization was carried out using 4000–8000 fixed configurations sampled from the GDF to optimize 85–155 parameters depending on the system size. The error of the VMC estimator for *ψ*(*λ*), as a measure of the GDF quality, is shown in Fig. 4(a), where we see that the relative error grows with |*λ*| as more particles become correlated. Note that for these calculations, we did not spend a lot of effort to optimize the more extreme *λ* values as we were interested in the range over which the susceptibility peaks, where the VMC error is <5%. In this model, the trajectories undergo a continuous phase transition from a uniform state to a traveling wave at some *λ*_{c}, provided that $E>Ec=\pi /\rho (1\u2212\rho )$, where *ρ* = *N*/*L*.^{3,49} This can be detected from the growing susceptibility of the system shown in Fig. 4(b). Macroscopic Fluctuation Theory (MFT) suggests that $\lambda c=\u2212E+E2\u2212Ec2\u2009=\u22122.7198$ (for *ρ* = 0.3). However, quantitative accuracy of MFT is not guaranteed, and although a Bethe ansatz formulation exists, the precise value of *λ*_{c} for *L* → *∞* is not explicitly known.^{4} Our DMC calculations suggest a critical point near *λ*_{c} ∼ −3.9 for the largest system sizes (*L* = 64 and 96). We note that our results for *ψ*(*λ*/*L*) are in excellent agreement with the largest system size (L = 64) considered in the work of Espigares *et al.*,^{49} also plotted in the inset of Fig. 4(a).

## IV. CONCLUSIONS

In this article, we showed how to compute guiding distribution functions using the technique of variance minimization originating in variational Monte Carlo calculations of quantum systems. This provides a systematic route to statistically efficient Monte Carlo computation of large deviation functions using the diffusion Monte Carlo or cloning algorithm, which we demonstrated in the continuum for the problem of the Brownian walker, as well as on the lattice for the WASEP model. The very general nature of variance minimization means that the possibilities for different guiding distribution functions are limited only by one’s imagination and they can be adapted to very complex systems. The form of the GDF should typically be motivated by properties of the system. For instance, we have demonstrated that for the continuum system, the repulsive nature of the interactions can be accommodated by a Jastrow type of GDF. Similarly, the long-range order present in the WASEP case can be accounted for by the CPS form. Other types of forms may be motivated by strategies employed in quantum ground state calculations, as the Backflow ansatz,^{52} and, additionally, different forms can be combined. Finally, we note that obtaining a good form for the guiding distribution function, much like obtaining a compact wavefunction in a quantum problem, is likely to provide important analytic insights into the behavior of the non-equilibrium system of interest.

## ACKNOWLEDGMENTS

The authors would like to thank Rob Jack, Vivien Lecomte, Juan P. Garrahan, and David Limmer for fruitful and engaging discussions. U.R. was supported by the Simons Collaboration on the Many-Electron Problem and the California Institute of Technology. G.K.-L.C. is a Simons Investigator in Theoretical Physics and was supported by the California Institute of Technology and the U.S. Department of Energy, Office of Science, via Grant No. DE-SC0018140. These calculations were performed with CANSS, available at https://github.com/ushnishray/CANSS.

### APPENDIX A: FOKKER–PLANCK OPERATOR FOR THE CONTINUUM

In the main text, we described how the computation of the cumulant generating large deviation function (CGF) can be greatly enhanced with the appropriate form of auxiliary dynamics. The GDF structure that we have introduced in the 1D case is of the form $\Xi \u0303(R)=\u220fi\varphi (ri)\u220fj<i[J(ri,rj)]$. To carry out simulations with this GDF, we require the adjoint of the transformed Fokker–Planck operator. For the Brownian walker in the main text, we find

where the non-interacting single-particle operator is given by

where *γ*_{i}(*r*_{i}) = *f* − *∂*_{i}*V*(*r*_{i}) is the effective single-particle force acting on a particle. This expression provides the norm-breaking term for a subsequent DMC calculation and can be generalized straightforwardly to higher dimensions.

### APPENDIX B: STATISTICAL INDEPENDENCE AND EFFICIENCY IMPROVEMENT WITH AUXILIARY DYNAMICS

In the main text, we have mentioned that an improvement of the GDF is indicated by the improvement in the fraction of independent walkers (*f*_{I}) estimated from DMC (this would be reflected in a corresponding decrease in the autocorrelation time for TPS), and we reported *f*_{I} evaluated for *t* = 0. Here, we show the full *f*_{I}(*t*) in Fig. 5 for calculations using different types of dynamics at *λ* = −5.0 for the 1D WASEP system. Note that *f*_{I}(1) = 1, but as one goes back in simulation time, the walkers are descended from a smaller and smaller set of ancestors. The GDFs *J*_{2} and $J\u03032$ use the same *J*_{2} ansatz in the main text but illustrate the effect of optimizing for more steps in the variance minimization of *J*_{2}, leading to an improved *f*_{I}(*t*). This is also reflected in Table I, where the VMC estimator of the CGF appears to converge with an increase in the flexibility of the GDF, while the standard deviation is systematically improved.