The intriguing behavior of a wide variety of physical systems, ranging from amorphous solids or glasses to proteins, is a direct manifestation of underlying free energy landscapes riddled with local minima separated by large barriers. Exploring such landscapes has arguably become one of statistical physics’s great challenges. A new method is proposed here for uniform sampling of rugged free energy surfaces. The method, which relies on special Green’s functions to approximate the Dirac delta function, improves significantly on existing simulation techniques by providing a boundary-agnostic approach that is capable of mapping complex features in multidimensional free energy surfaces. The usefulness of the proposed approach is established in the context of a simple model glass former and model proteins, demonstrating improved convergence and accuracy over existing methods.

Structural and morphological transitions, controlled by free energy, are increasingly explored in molecular simulations to provide quantitative predictions of thermodynamic observables from first-principles. A wide variety of methods exist, including (but not limited to) thermodynamic integration,^{1,2} parallel tempering coupled to histogram reweighting,^{3–5} density-of-states sampling,^{6–10} non-equilibrium steered dynamics that exploit Jarzynski’s identity,^{11,12} adaptive bias potentials,^{13–16} expanded ensemble,^{17–19} and flux-based meth-ods.^{20–22} Flat-histogram methods, including Wang–Landau sampling (WLS),^{6,7,23–25} metadynamics (MetaD),^{18,19,21,26–32} and ExEDOS,^{8,33–37} are useful in this context for their ability to provide a free energy landscape in terms of a set of reduced collective variables (CVs). Specifically, these methods can be cast in the framework of an *expanded ensemble*, where the standard ensemble (e.g., the canonical *NVT* ensemble) is augmented by a set of state functions (collective variables) that partition similar structural properties. These collective variables effectively project the *αN* classical degrees of freedom of a particle-based system (where *α* depends on the dimensionality and orientability of particles) into an *M*-dimensional space defined by the collective variables.

The appropriate choice of the *M* degrees of freedom [$ \xi \u2192 = ( \xi 1 , \u2026 , \xi M ) $] should differentiate in some way among relevant states, be they different states of matter,^{38} cluster morphologies,^{39} hybridized and denatured DNA,^{40} or folded protein morphologies.^{41} The free energy may be represented by an *M* + 1 dimensional graph $ ( \xi \u2192 , F ( \xi \u2192 ) ) $, whose height *F* corresponds to the free energy surface (FES). Flat histogram methods seek to obtain a biasing force $V ( \xi \u2192 ) $ such that the combined free energy surface $ ( \xi \u2192 , F ( \xi \u2192 ) + V ( \xi \u2192 ) ) = ( \xi \u2192 , 0 ) $ renders all collective states equally probable, resulting in uniform sampling. As such a function is not known *à priori*, it must be obtained through adaptive algorithms that define a sequence $ V n ( \xi \u2192 ) $ of bias functions converging asymptotically to $V ( \xi \u2192 ) $. Arguably, the most widely used of these methods originate with WLS^{6} and MetaD.^{18} As originally proposed, however, such methods suffer from slow convergence and problems at boundaries. MetaD is of particular interest here, as complex systems with rough free energy surfaces, such as proteins, are almost exclusively studied using molecular dynamics (MD). Importantly, MD methods benefit greatly from parallel computation techniques, and many commercial and extensible open-source codes are available to perform large-scale MD simulations.

Without loss of generality, we can examine the fundamental equations for bias accumulation in one dimension. Standard Wang–Landau and related Monte Carlo techniques rely on the accumulation of histograms and discrete bias potentials which are sums of Kronecker deltas. If the space of interest is subdivided into bins, the resulting bias in bin *i* at time *t* is

Here, *t*^{∗} denotes the time steps where biases have been deposited, and *j*(*t*^{∗}) is the location of the system at time *t*^{∗}. Each deposition is proportional to a (potentially time-dependent^{26}) weight *W*(*t*^{∗}). The discrete nature of such bias potentials necessarily leads to problems in MD simulations, which require that forces—derivatives of these potentials—be computed. Continuous Wang–Landau sampling (CWLS) methods^{23} circumvent this problem by replacing the *δ _{ij}* terms with a distribution peaked at the current position and distributed over multiple bins,

where *ρ*(*i* − *j*(*t*^{∗})) is a normalized distribution with mean value *j*(*t*^{∗}). Shrinking the bins to infinitesimal width, one is left with a distribution suitable for sampling continuous collective variables. These distributions can be differentiated easily, yielding forces suitable for MD implementation.

MetaD^{18} applies exactly this idea. Utilizing Gaussian biases, one may construct an on-the-fly bias directly analogous to Eq. (2),

In this equation, *σ* defines the width of the Gaussian and determines the length scale over which features may be sampled. Note that this can be made time dependent, *σ*(*t*^{∗}), as in the adaptive scheme of Ref. 27. One can show^{42} that in the limit *σ* → 0, Eq. (3) becomes identical to Eq. (1), with the Kronecker *δ*_{i,j(t∗)} terms replaced by Dirac functions *δ*(*ξ* − *s*(*t*^{∗})). While an infinite sum over Dirac functions will eventually yield a flat histogram, it is intractable in finite computational time. It is also incapable of defining useful forces for MD simulations. The finite-width Gaussian distributions of MetaD solve this problem, and thus, MetaD may be viewed as a way to faithfully represent the reduced partition function *Z*(*ξ*) as a sum over discretely visited states *s*(*t*^{∗}).

Despite these advantages, MetaD does have drawbacks. First, Gaussian biases are incommensurate with intrinsic boundaries on the collective variables. Without recently proposed analytic corrections,^{43} for example, the pileup of Gaussian peaks near a boundary creates deep wells that lead to dramatic over-sampling in simulations. Second, some variants of MetaD exhibit convergence problems that can be difficult to overcome.^{21} In previous work,^{42} we have shown that it is possible to sample continuously without Gaussian functions (or related functions^{23}) by defining an adaptive bias constructed through projection onto orthogonal basis functions. Here, we adopt that perspective and build on Eq. (3) to create a powerful new variant of MetaD/CWLS using continuous, differentiable Green’s function biases. Importantly, the method can be adapted on-the-fly to higher order expansions (analogous to adaptive-bias MetaD^{27}), leaving only two important parameters—*W*, the energy scale of biases (analogous to MetaD), and *τ*, the time interval between application of biases. As it may be defined readily on an expansion of terms suitable for the finite or infinite domain of interest, the method is boundary-agnostic. Further, in contrast to metadynamics, which constantly accrues energy in the bias and is analogous to filling a valley by depositing sand hills, this method sculpts the bias to fit the FES. In what follows, we will demonstrate the wide applicability of such a method (denoted Green’s function sampling [GFS]) and suggest it as an ideal candidate to sample sharp, craggy free energy surfaces such as those encountered in complex fluids, glassy materials, and protein folding.

The crux of the idea is that there is more than one way to represent a Dirac function *δ*(*x*). Indeed, any symmetric function *η*_{ϵ}(*x*) that is equal to zero outside the interval [ − *ϵ*, *ϵ*] and whose integral is equal to unity as *ϵ* → 0 suffices. Normalized Gaussians meet these criteria. One may also construct delta functions as points in a Hilbert space from orthonormal sets of basis functions. Given such a set of (potentially complex-valued) functions, *ϕ _{n}*(

*x*), one effectively has infinite-dimensional normal vectors, which can thus resolve the identity matrix,

Computing the inner product of *I* with a function *f* on domain Ω,

shows its properties to be equal to those of the *δ* function. We may thus construct an approximant to *δ*(*x*) in another way—rather than being a limit of a single function; here, it is the *N* → ∞ limit of the sequence

This equation leads to a simple relation for the accrual of biases, in analogy to Eq. (3),

which becomes

Importantly, this means that the evolution of bias is controlled completely by updating the coefficients of a basis function expansion. Thus, the number of terms that must be evaluated does not change with time. The converged approximation to the FES, −*V*(*ξ*, *t*^{∗}), is accurate up to the (*N* + 1)th orthogonal function in the series. The resulting expression resembles that of the radial basis interpolations built via the single-sweep method,^{44} though here coefficients of orthogonal functions are determined by convergence to a flat histogram rather than by least-squares fitting to the mean force obtained at a discrete set of points.

Choosing a basis set appropriate for the domain Ω, one may obtain an arbitrarily accurate expansion for *δ*(*x*) formed by continuous, differentiable functions suitable for MD. Further, by noting that each orthonormal expansion contains a constant function *ϕ*_{0}(*x*) = *C*, we may separate the level bias and driving bias^{29} and avoid adding a net energy into the system^{43} by summing only terms with *n* ∈ {1, …, *N*}. This is a consequence of the identity

with *V*_{0} the constant bias contribution. The bias potential resulting from this method is analogous to that in Ref. 45, but crucial differences arise in the update of the coefficients.

Figure 1 illustrates the contrast between metadynamics and the Green’s function method. MetaD simulations involve the sequential deposition of Gaussians, for which we adopt the compact notation *G*_{σ}(*ξ*) = exp(−*ξ*^{2}/2*σ*^{2}). GFS, by contrast, utilizes sequential biases of the form *δ _{N}*(

*x*) −

*C*

^{2}, where

*C*is the constant term previously defined. In Figure 1, the orthonormal functions are defined by the scaled Legendre polynomials $ L \u0303 n ( x ) = ( 2 n + 1 ) / 2 L n ( x ) $, where the

*L*(

_{n}*x*) are defined in the standard way, with

*L*(1) = 1. One can immediately identify two differences. First, the approximate Green’s functions

_{n}*δ*(

_{N}*x*) oscillate around the position of the bias, whereas the Gaussians are positive everywhere. The instantaneous biases in this method are thus inherently long-ranged; this is the result of the projection of the free energy perturbation onto a basis set. Second, whereas Gaussian biases are static with respect to the defined boundaries, the basis expansion changes in height and shape as one nears the bounding wall. This is similar to the behavior of heuristic,

^{28}diffusive,

^{27}and the more appropriate analytical

^{43}corrections recently proposed for MetaD in the presence of coordinate boundaries. In Fig. 1,

*δ*(

_{N}*x*) has been scaled by

*N*to give comparable weight to the Gaussians—as

*δ*(0) increases approximately linearly with the number of terms, this practice enables energies and forces to remain tractable.

_{N}In addition to boundary corrections, a challenge for MetaD simulations is to resolve complex free energy surfaces. Deposition of biases always promotes the escape of a system from deep wells having sharp features. When resolving the underlying FES, the choice of Gaussian widths is a strong practical limitation. When sharp features exist on length scales smaller than *σ*, standard MetaD has difficulty resolving them. It is thus difficult to obtain accurate free energies in the context of protein folding and aggregation, for example, where free energy surfaces are inherently rough and behavior *a priori* unknown. Recent developments in diffusion-based adaptation^{27} have done much to improve this, but still require prior knowledge of relevant simulation time scales. GFS offers an alternative, where a converged basis expansion may be simply refined, adding higher order terms in the expansion to achieve an *exact* *N*th order description of the physical system. This is in contrast to approximation of a free energy surface with Gaussian biases, where optimal representation is difficult to obtain.^{42} We may identify these regions by monitoring a histogram *H*(*t*) computed in a window around late-stage time *t* and augment the expansion to improve accuracy.

To illustrate the usefulness of the proposed GFS method, we implement it in a simple one dimensional Monte Carlo scheme meant to imitate the rough density of states that is typically encountered in glassy systems.^{46} The analytical free energy surface is a sum of 50 G, with heights *h* ∈ [ − 40, 10], widths $\sigma \u2208 [ 0 , 2 10 ] $, and positions *ξ* ∈ [ − 1, 2]. Figure 2 serves as a proof-of-concept for convergence of the method, using an *N* = 200 expansion of Legendre polynomials. Approximate convergence proceeds relatively quickly, with −*V*(*ξ*) ≈ *F*(*ξ*) after only ten iterations. After this point, the bias potential fluctuates around the FES, in direct analogy to the behavior of untempered MetaD at late times;^{21} we thus construct an accurate estimate for the FES by averaging out these fluctuations. The resulting free energy is almost indistinguishable from the analytical function, differing only near points of sharp variation in *F*(*ξ*). Comparing the analytical and approximate profiles shows these regions as strong oscillations, seen in Fig. 3, which correlate with the positions of particularly narrow wells (cf. Table S1 in the supplemental material).^{46} In cases where the analytical free energy surface is not known, a histogram computed at a late stage of simulation is seen to mirror these oscillations, suggesting an on-the-fly test for bias refinement. For comparison, we have also examined the behavior of MetaD through addition of Gaussian biases which match the height and width of the central peak in *δ*_{200}(0). The resulting FES generated using MetaD converges (cf. Fig. 2) more slowly and is compromised by significant oversampling at the boundaries.

Commonly, in molecular dynamics simulations, one will not simulate on a finite interval, but rather an “infinite” one, where the driven sampling is unconstrained. The example above utilized a finite interval with known boundaries, which can be easily scaled to fit the domain of the (e.g.) Legendre functions. If unconstrained sampling is required, one has two options. The first is to utilize a set of orthogonal functions, such as Hermite polynomials, which are defined on the unbounded interval [ − ∞, ∞]. The second is to utilize soft walls (e.g., *U*(*ξ*) = *k*(*ξ* − *ξ _{b}*)

^{α}, where

*ξ*is the boundary point and

_{b}*α*a tunable exponent) to constrain sampling within the interval. If one can place reasonable bounds on the interval, the latter solution is preferable as it mitigates undersampling problems that can hamper convergence. A full FES may then be constructed using multiple intervals and histogram reweighting.

We test this latter method in simulations of a 15-residue poly-alanine molecule dissolved in water, which has been previously studied in the context of flat histogram methods.^{47} Figure 4(a) shows a comparison of constrained simulations using MetaD and GFS. Our simulations utilize as a collective variable the end-to-end distance (*R*_{ee}) in nanometers. For this example, the FES is approximately flat apart from extended conformations. Sets of five simulations for each method are performed on the interval [1.6, 3.2], with soft walls placed at *ξ _{b}* = 1.7 and

*ξ*= 3.1. An

_{b}*N*= 50 Legendre expansion is used and compared to a converged umbrella-sampling calculation and well-tempered metadynamics simulations. The resulting free energy surfaces after 20 ns of simulation show that GFS better matches the total FES than does MetaD, with deviations that reduce upon further sampling [Fig. 4(b)]. It should be noted that all deviations of GFS from the true FES in this case are within 1

*k*

_{B}

*T*of the umbrella sampling calculation and, if needed, can easily be refined by diffusive correction, transition matrix methods,

^{48,49}or flux-tempered metadynamics (FTM).

^{20,21}Further, one may divide up the region of interest to explore underlying narrow minima, as demonstrated for a neutral 12-residue miniprotein in the supplementary material.

^{46}

The GFS method is fully generalizable to multiple dimensions, as *M*-dimensional orthogonal functions may be created as products of one-dimensional orthogonal functions. Such a property is well-known, being essential to the solution of partial differential equations by separation of variables. To demonstrate multidimensional GFS, we perform a set of simulations on the alanine dipeptide in vacuum using a stochastic integrator. The torsional angles *ϕ* and *ψ* between the alanine residue and the N- and C-termini commonly serve as a benchmark for new free energy calculations. Here, to highlight the ability of GFS to capture boundary effects appropriately, we study the cosines of these angles. Simulations were performed in GROMACS v4.6.5,^{50} using the CHARMM27 forcefield, following the calculations of Ref. 43. A grid-based version of GFS was implemented into the PLUMED2 suite.^{51} Bias was constructed from an *N* = 10 Legendre polynomial expansion along each CV (resulting in a total of 120 non-constant terms). For comparison, a corrected FES using the bias obtained as a result of MetaD simulations^{43} is plotted in Fig. 5(a). Approximate convergence proceeds rapidly (cf. Fig. 6), and averaging over all recorded biases in a 300 ns simulation results in the free energy surface of Fig. 5(c), which deviates from the true FES only in the infrequently sampled states near the boundary. We also plot in Fig. 5(b) the result of a 300 ns simulation using MetaD with adaptive Gaussian biases.^{27} The Gaussian area in this simulation is set by *σ* = 0.1, which approximately matches the peak-width of *δ*_{10}(0). We observe that the basis function approximation converges quite closely to the exact surface in Fig. 5(a), with small deviations confined to rarely sampled states near the CV boundary. In contrast, MetaD simulations retain substantial roughness in the FES at late times. This result shows that multivariate basis expansions can accurately obtain free energy surfaces in bounded domains, further highlighting their potential application to protein simulations, where *α*-helical and *β*-strand content, for example, are often sampled as fundamental morphological signatures.^{41}

To quantify the inherent error of our method and determine the speed of convergence, in Fig. 6 we plot the free energy surfaces obtained by MetaD and GFS as a function of time at timepoints *t*^{∗} ∈ {20, 64, 200} ns, averaged over five independent runs. After simulating for 20 ns, we observe that the GFS method achieves approximate convergence quickly, with good matching of the FES away from the highest free energy states. While MetaD also achieves a decent approximation of the FES quickly, substantially larger errors including unsampled high free-energy states persist at late times. As the simulation progresses, GFS experiences slight refinements, serving primarily to better approximate the higher free energies on the surface, eventually achieving error saturation due to truncation of the expansion. MetaD at this state retains substantial roughness, particularly near the boundaries of the interval, which persist to late times. This result highlights the effectiveness of GFS relative to MetaD in obtaining quick, accurate estimates of the FES.

The preceding results show the robustness of GFS to undulations in the free energy surface and to boundaries on collective variables. GFS is, however, not designed to overcome choices in collective variable where the slow degrees of freedom do not coincide with the sampled reaction coordinates. Bias-exchange and tempering methods^{52,53} offer a promising solution for coupling to GFS methods. We note that the set of tempering schemes utilized by metadynamics routines should also be fully applicable to the new GFS method. Further, polynomial expansions provide a simple method for extracting thermodynamic properties and elastic constants as well,^{42} and this approach is easily modified to that purpose. Along with the successes outlined above, we expect GFS to be adopted in a wide variety of free energy calculations for atomistic and coarse-grained simulations.

This work is supported by the Department of Energy, Basic Energy Sciences, Materials Research Division. J.K.W. acknowledges support from startup funds at the University of Notre Dame. A.M.F. acknowledges support from the National Science Foundation through Grant No. DGE-0718123. We gratefully acknowledge the computing resources provided on “Fusion,” a 320-node computing cluster operated by the Laboratory Computing Resource Center at Argonne National Laboratory. We acknowledge the University of Chicago Research Computing Center for use of the Midway cluster and support of this work. An award of computer time was provided by the Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program of the Argonne Leadership Computing Facility at Argonne National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-06CH11357.