A fast and accurate sampling method is in high demand, in order to bridge the large gaps between molecular dynamic simulations and experimental observations. Recently, an integrated tempering enhanced sampling (ITS) method has been proposed and successfully applied to various biophysical examples, significantly accelerating conformational sampling. The mathematical validation for its effectiveness has not been elucidated yet. Here we show that the integrated tempering enhanced sampling method can be viewed as a reformulation of the infinite switching limit of the simulated tempering method over a mixed potential. Moreover, we demonstrate that the efficiency of simulated tempering molecular dynamics improves as the frequency of switching between the temperatures is increased, based on the large deviation principle of empirical distributions. Our theory provides the theoretical justification of the advantage of ITS. Finally, we illustrate the utility of the infinite switching simulated tempering method through several numerical examples.

Molecular dynamics simulation is a powerful method for investigating microscopic biochemical systems. However, when the system contains barriers due to energy or entropy, direct molecular dynamics simulation will lead to a very high computational cost. Therefore, people turn to use the enhanced sampling method, sacrificing the real dynamic information in order to capture the desired Gibbs distribution. Until now, sampling methods have already been successfully applied in multiple disciplines including statistical physics, chemical physics, Bayesian statistics, machine learning, and related areas.

The efficiency of the sampling approach highly depends on the convergence rate to the thermodynamic equilibrium. Tempering schemes, such as simulated tempering molecular dynamics (STMD)1,2 and replica exchange molecular dynamics (REMD),3–5 were developed and are among the most popular methods to overcome the metastability and to enhance the convergence to equilibrium, thanks to their effectiveness and simplicity to use. The basic idea of both is to use one or more artificial high temperatures to accelerate exploration of the conformational space, while using the physical temperature to sample the desired physical observable. The interaction between different temperatures is designed so as to guarantee the unbiasedness of the estimation. These two methods are quite comparable to each other. STMD gives a higher rate of delivering the system between different temperature states as well a higher rate of traversing the energy space,6 but requires the estimation of the partition function first.

Another tempering algorithm named “integrated tempering enhanced sampling” (ITS), which was recently introduced by Gao,7 uses a temperature-biased effective potential energy to run the molecular dynamics (MD) simulation and has been successfully applied to various biochemical examples.8,9 Noting that ITS also uses auxiliary temperatures to accelerate the convergence, it is desirable to compare it with previous methods, and find out why it performs better in many cases, which has not been studied theoretically to the best of our knowledge.

As one of the main results in this paper, we discover that the integrated tempering enhanced sampling method is in fact a reformulated version of the infinite switching limit of STMD, that is, the limit of the simulated tempering when the attempt switching frequency goes to infinity. Moreover, we show that as the switching rate in STMD increases, the empirical measure converges faster toward stationary distribution, using the large deviation principle; and thus the sampling efficiency of STMD increases as the switching frequency increases. Combining the two theoretical findings, we justify the efficiency of ITS over conventional STMD. Finally, we compare the infinite switching STMD against the normal STMD with two numerical examples: an artificial high dimensional system and a more realistic Lennard-Jones example10 with 16 atoms. The numerical results validate our theoretical findings.

Our study of the switching rate of the STMD is closely related to the recent progress in understanding the swapping frequency of tempering schemes, started with replica exchange methods. For REMD, it has been discovered through numerical examples that the sampling efficiency increases when the frequency of swapping of temperatures is pushed up to infinity;11 but directly increasing the swapping rate to reach this limit is computationally infeasible, as many swaps are needed per MD step. A breakthrough was made in Refs. 12 and 13 which proposed an explicit way to reach this limit and also proved that indeed the sampling efficiency increases in such a limit. A natural reformulation of infinite swapping REMD was later proposed in Ref. 14 which leads to an easy implementation as a simple patch to conventional molecular dynamics. For multiple temperatures, an efficient implementation based on ideas of a multiscale integrator was also proposed for infinite swapping REMD,15,16 which leads to practical applications in sampling configurational space of large biomolecules. We note that increasing the switching rate in the simulated tempering has been mentioned in Refs. 16 and 17 under a general framework of viewing tempering schemes as MD process augmented by jumping processes, without much details, in particular, the connection with ITS.

In the remaining article, we will first recall the simulated tempering molecular dynamics as a stochastic process in Sec. II. We will justify using a large switching frequency via large deviation principle for the empirical distribution in Sec. III with detailed derivation given in the  Appendix. Infinite switching limit of STMD and its reformulation are derived and generalized in Secs. IV and V. The identification of ITS and infinite switching STMD is given and discussed in Sec. VI, followed by numerical examples in Sec. VII. Some conclusive remarks are given in Sec. VIII.

We begin by recalling the simulated tempering molecular dynamics. For the sake of simplicity, we will first discuss the algorithm for a system governed by overdamped Langevin equations (i.e., when inertia can be neglected). More general dynamics will be discussed in Sec. V. Consider

ẋ=f(x)+2β1η,
(1)

where xR3n denotes the configuration of the system with n particles, f(x) = −∇V(x) is the force associated with the potential V(x), β = 1/kBT is the inverse temperature, and η is a 3n-dimensional white-noise with independent components. We choose a unit system so that the friction coefficient becomes one for notational simplicity. Equation (1) is consistent with the Boltzmann equilibrium probability density

ρβ(x)=Zβ1eβV(x),
(2)

where the normalization constant Zβ=R3neβV(x)dx is the partition function.

Assuming we only take two temperatures in simulated tempering (the generalization to more temperatures will be considered in Sec. V), we replace (1) by

ẋ=f(x)+2β1(t)η,
(3)

where the inverse temperature β(t) now attempts switches between the physical and the artificial temperatures with frequency ν, and the attempted switch from β1 to β2 are accepted with the probability

gβ1β2(x)=minn2eβ2V(x)n1eβ1V(x),1,
(4)

where n1 and n2 are some weighting parameters that will be further discussed below.

Note that, in the simulated tempering overdamped dynamics, both the configuration x and the inverse temperature β are dynamical variables: x follows (3), β follows a jump process, and the two dynamics are coupled. To understand the dynamics, it is useful to consider its infinitesimal generator, given by (we use β′ to denote the temperature other than β, namely, β′ = β2 if β = β1 and vice versa)

(Lνu)(x,β)=V(x)xu(x,β)+β1Δxu(x,β)νgβ,β(x)u(x,β)+νgβ,β(x)u(x,β)
(5)

for a smooth function u:R3n×{β1,β2}R. This means that the evolution of a physical observable u under the dynamics is given by

E(u(x(t+δt),β(t+δt))x(t),β(t))=u(x(t),β(t))+δt(Lνu)(x(t),β(t))+o(δt),
(6)

or equivalently, the density ρ of (x, β) evolves under the adjoint operator

tρ(t,x,β)=(Lν)ρ(t,x,β)=ΔV(x)ρ(t,x,β)+V(x)xρ(t,x,β)+β1Δxρ(t,x,β)νgβ,βρ(t,x,β)+νgβ,βρ(t,x,β).
(7)

For the infinitesimal generator, the first two terms on the right-hand side of (5) correspond to the overdamped dynamics (3), while the last two terms correspond to the jump process of β, − νgβ,β(x) being the rate of switching from temperature β to β′, and νgβ′,β(x) is the rate of switching to β from the other temperature β′ (attempt switching frequency adjusted by the acceptance rate).

It is straightforward to check the equilibrium distribution of the coupled dynamics of (x, β), as the stationary solution of (7) is given by

ϱ(x,β)=n1eβ1V(x)δβ,β1+n2eβ2V(x)δβ,β2n1Zβ1+n2Zβ2,
(8)

where δβ,β1 denotes the Kronecker delta: δβ,β1=1 if and only if β = β1. This is nothing but a weighted average of Boltzmann densities at temperatures β1 and β2, with weight n1Zβ1 and n2Zβ2, respectively. On each fixed temperature, the distribution is generated by the ordinary molecular dynamics and thus follows Boltzmann distribution; meanwhile, at each fixed configuration, the proportion of the two temperatures is given by

ωj(x)=njeβjV(x)n1eβ1V(x)+n2eβ2V(x),j=1,2
(9)

as determined by the acceptance probability (4). Therefore, (3) together with the jumping process of β samples the ϱ(x, β) as the equilibrium distribution.

Summing ϱ(x, β) over β then gives the marginal equilibrium density for the configuration position alone,

ϱ(x)=ϱ(x,β1)+ϱ(x,β2)=n1eβ1V(x)+n2eβ2V(x)n1Zβ1+n2Zβ2,
(10)

which is a weighted average of the Boltzmann densities at the two temperatures β1 and β2. As a result the ensemble average at the physical temperature (β1) of any observable A(x) can be estimated from

Aβ1R3nA(x)ρβ1(x)dx=n1Zβ1+n2Zβ2n1Zβ1R3nA(x)ω1(x)ϱ(x)dx=(R3nω1(x)ϱ(x)dx)1R3nA(x)ω1(x)ϱ(x)dxlimT1T0TA(x(t))ω1(x(t))dtlimT1T0Tω1(x)dt,
(11)

where we have used that the ensemble average equals to the time average thanks to ergodicity. Note that in principle, we shall use two independent realizations to estimate the numerator and denominator on the right-hand side of (11) to get an unbiased estimator, while in practice the bias is well-controlled and the variance of the estimator usually dominates the error.

Note that the above holds for arbitrary positive weighting factors n1 and n2, while the sampling efficiency of the method depends on the choice. The conventional wisdom in simulated tempering method is to choose n1 and n2 according to the partition function,

n1=Zβ11  and  n2=Zβ21.
(12)

With this choice, we have

ϱ(x,β)=12ρβ1(x)δβ,β1+12ρβ2(x)δβ,β2
(13)

and thus the sampler spends equal amount time at the two temperatures. It is also possible to choose different ni to emphasize one of the temperatures. Of course, the partition functions Zβi are not known a priori, and hence the usual approach is to start with some initial guess of the weighting factors and then adaptively adjust them on-the-fly using an iterative method; see, e.g., Refs. 7 and 17–19. In what follows, we will mainly focus on the choice of the switching frequency ν, and thus for the simplicity of discussion, we will assume that ni are fixed during the sampling process and only make some remarks about adjusting ni afterwards.

As we have discussed above, for any choice of the switching frequency ν, the trajectory (xν, βν) (we put superscript ν to emphasize the ν dependence) of the simulated tempering overdamped dynamics can be used to sample the equilibrium distribution (8). In particular, the empirical distribution of the trajectory, defined blow,

λTν1TBB+Tδ(xν(t),βν(t))dt,
(14)

will converge to the equilibrium distribution ϱ as T. Note here δ(x,β) denotes the Dirac delta function, that is, the point mass function at (x, β), and B is any fixed burn-in time (which for simplicity will be taken as B = 0 in the numerical experiments).

To discuss the choice of ν, it is important then to quantify the speed of convergence of the empirical distribution (14) to the equilibrium: A better ν will correspond to faster convergence and thus less simulation length of the trajectory. To quantify this convergence for STMD, the theoretical tool we will use is the large deviation principle for the empirical measure of stochastic processes. We will discuss the idea and conclusion of the large deviation principle below, while defer the rigorous definition and derivation to the  Appendix.

When T becomes large, the empirical distribution is expected to be very close to the equilibrium. The large deviation principle quantifies the probability that the empirical distribution is still far away from the equilibrium: more specifically, let the probability of the empirical distribution being equal to μ is on the order of O(eIν(μ)T), with a rate functional, specific form given below, Iν(μ) ≥ 0, and only vanish when μ is the equilibrium distribution ϱ. This, in particular, tells us that as T, the likelihood that empirical distribution is deviated away from the equilibrium is exponentially small, for which the functional Iν quantifies the rate of the exponential decay. Hence a larger rate function indicates faster rate of convergence.

To specify the rate functional, let μ be a probability measure on R3n×{β1,β2} with a smooth density and define θ(x, β) ≔ [/](x, β), the ratio of the probability density of μ and the equilibrium distribution. For the simulated tempering process with switching frequency ν, the large deviation rate function for the empirical measure converging to the stationary one is given by

Iν(μ)=J0(μ)+νJ1(μ),
(15)

where

J0(μ)=β14θ(x,β)2β1|xθ(x,β)|2μ(dx,β),
(16)
J1(μ)=12βgββ(x)[1θ(x,β)θ(x,β)]2μ(dx,β).
(17)

Note that J1 is non-negative and hence the large deviation rate functional is a pointwise monotonic function in ν. Thus, we conclude from this that a larger swapping rate ν corresponds to a faster convergence of the empirical distribution to equilibrium.

As shown in Sec. III, the higher the switching rate ν is, the faster the convergence of the empirical distribution and thus the sampling of the simulated tempering overdamped dynamics is. On the other hand however, a large ν value requires one to make many swapping attempts, which slows down the actual simulation. To resolve this problem, the key is that the limit when ν can be taken explicitly. This observation is first made for replica exchange dynamics in Ref. 13. As for simulated tempering, when the switching frequency between the temperatures ν, the dynamics (3) converges to the following:

ẋ=f(x)+2(β11ω1(x)+β21ω2(x))η,
(18)

where the effective temperature T(x)=β11ω1(x)+β21ω2(x) is a weighted average of T1=β11 and T2=β21. The weight as a function of x is given by (9). Intuitively, this can be understood as with fixed configuration x, the proportion of time the trajectory spends at temperature βj is given by ωj(x), and thus the effective temperature is given by a weighted average of the two temperatures.

The Fokker-Planck equation for the probability density of x(t) corresponding to (18) can be written as

tρ(t)=div(B(ρ(t)gradU+kBT1gradρ(t))),
(19)

where we have defined the effective potential

U(x)=β11lnϱ(x)
(20)

and the mobility

B(x)=ω1(x)+β11β2ω2(x).
(21)

It is easy to check that the stationary solution of (19) is exp(−β1U(x)), which is just ϱ(x) by definition of U.

Note that this point of view also leads to a further simplification of (18), in the spirit of Lu and Vanden-Eijnden.14 We may replace B in the Fokker-Planck equation (19) by a constant function, which, for convenience, we will simply take to be the identity. This substitution does not affect the stationary distribution, and hence preserves the sampling property, but it changes the overdamped Langevin dynamics associated with the Fokker-Planck equation. It is easy to see that this new dynamics is given by

ẋ=(ω1(x)+β2β11ω2(x))f(x)+2β11η.
(22)

Compare with (18), the noise term is additive in (22), and it still samples the desired stationary distribution ϱ. Note that (22) is very similar to the original overdamped Langevin dynamics (1). The only change is a scaling factor in front of the forcing term, which involves the auxiliary (inverse) temperature β2 and also the weights ωj, j = 1, 2. Note that in practice, (22) can be implemented as an easy patch of existing codes for molecular dynamics.

We refer to this infinite switching simulated tempering method as the ITS method.

First, the idea in Sec. IV can be extended naturally to more than two temperatures. For which, the dynamics in the infinite switching limit is given by

ẋ=β11k(βkωk(x))f(x)+2β11η,
(23)

where we define the weight here as

ωk(x)=nkeβkV(x)jnjeβjV(x),k=1,,N,
(24)

if N is the number of temperatures used. Similarly, the dynamics (23) can be viewed as the overdamped dynamics with the effective potential given by (20) where the corresponding marginal equilibrium density becomes

ϱ(x)=jnjeβjV(x)jnjZβj,
(25)

as we have

xU(x)=β11xknjeβjV(x)knjeβjV(x)=β11kβknjeβjV(x)V(x)knjeβjV(x)=β11kβkωk(x)f(x).
(26)

Thus the gradient of the effective potential U(x) is exactly the forcing term in (23).

The infinite switching simulated tempering can be also generalized to the (underdamped) Langevin equation, rather than the overdamped Langevin dynamics (1); other thermostats can be also used. Recall the Langevin equation

ẋ=m1p,ṗ=f(x)γp+2γmβ1η,
(27)

where m denotes the mass and γ the friction coefficient. The generalization of (22) reads

ẋ=m1p,ṗ=β11k(βkωk(x))f(x)    γp+2γmβ1η.
(28)

The structure of the equation is rather similar to the overdamped case (22); the only modification compared to (27) is the scaling factor in front of the forcing term that amounts to a weighted average of the inverse temperatures.

The infinite switching limit of the simulated tempering sampling scheme is very closely related to the integrated tempering enhanced sampling (ITS) algorithm originally proposed in Ref. 7. The ITS algorithm introduces a temperature biased effective potential energy as

Veff(x)=β11lnknβkeβkV(x)
(29)

and runs molecular dynamics simulation on the surface. Here nβk is chosen to be some weighting factor for the temperature βk. Note that this is identically the same as we have in (20) with the marginal equilibrium ϱ(x) given by (25), despite a constant difference which does not matter in sense of potential energy.

As a conclusion, the ITS algorithm can be viewed as the infinite switching limit of the simulated tempering algorithm. As we discussed in Sec. III, the sampling efficiency of the simulated tempering method increases as ν. Thus as a corollary, the ITS algorithm is more efficient in sampling compared with the simulated tempering algorithm at a finite switching rate. This will be further demonstrated in our numerical examples in Sec. VII.

We first consider a system in D dimension moving on the following potential with x = (x0, …, xD−1):

V(x)=(1x02)214x0+j=1D112λjxj2,
(30)

where λ1, λ2, …, λD−1 are parameters controlling the stiffness of the harmonic potential in the x0, x1, …, xD−1 directions. We take the low (physical) temperature β0 = 25 and five artificial temperatures βk = 25 × 2k, k = 1, 2, 3, 4, 5. For each βk, its weighting parameter nk are set to be inverse of the corresponding partition function, i.e., Zβk1, which could vary when the number of dimensions changes. When simulating with STMD, we use

ẋ=β11β(t)f(x)+2β11η,
(31)

so that its infinite swapping limit coincides with (23) [rather than a multi-temperature version of (18) for the original overdamped dynamics (3)]. At a finite switching rate, we just consider the switching attempts of β(t) from some βk to its adjacent βk−1 (if k > 0), or βk+1 (if k < 5). And in each attempt to switch, we shall first decide whether to switch up or down with equal probability, if 0 < k < 5. The total simulation time is Ttot = 2.5 × 106 with time step dt = 0.025, which means the total number of steps is Ntot = 108. As mentioned previously, we will take no burn-in period when estimating the average of physical observable.

To compare the algorithms, we calculate the asymptotic variance of the observable V(x) using a batch estimation,

AV=Var t=(j1)*WS+1j*WSV(x(t)),j=1,,NtotWS,
(32)

where AV stands for the asymptotic variance and WS stands for the window size of the batch. The results are shown in Figs. 1 and 2 for D = 1 and D = 10 respectively. We observe that for the finite switching rate, STMD at frequency ν = 1 has a lower asymptotic variance compared to ν = 0.1, and moreover the infinite swapping limit has an even lower asymptotic variance. We also observe that for this example, the simulated tempering with ν = 1 is already quite close to the infinite switching limit.

FIG. 1.

Asymptotic variance of V(x) as in (30) (D = 1, thus a double well potential) for simulated tempering algorithms at different switching frequencies and the infinite switching limit.

FIG. 1.

Asymptotic variance of V(x) as in (30) (D = 1, thus a double well potential) for simulated tempering algorithms at different switching frequencies and the infinite switching limit.

Close modal
FIG. 2.

Asymptotic variance of V(x) as in (30) (D = 10) for simulated tempering algorithms at different switching frequencies and the infinite switching limit.

FIG. 2.

Asymptotic variance of V(x) as in (30) (D = 10) for simulated tempering algorithms at different switching frequencies and the infinite switching limit.

Close modal

Now, to test the performance of our algorithm on a more realistic example, we apply (28) for a dimer in a solvent model as considered in Ref. 10. This system consists of N two-dimensional particles in a periodic box with side length l. All particles have the same mass m, and they interact with each other with the Weeks-Chandler-Anderson potential defined as

VWCA(r)=4ϵ(σ/r)12(σ/r)6+ϵ,
(33)

if rrWCA = 21/6σ and VWCA(r) = 0 otherwise, except for a pair of particles which interact via a double well potential

VdW(r)=h1(rrWCAω)2ω22.
(34)

We take N = 16, l = 4.4, σ = 1, h = 1, ω = 0.5, and ϵ = 1 in the simulation. The physical temperature is T0 = 0.2, and one artificial temperature T1 = 1.0. The total simulation time is set to be Ttot = 1.0 × 105 with time step dt = 0.001, which means that the total number of steps is Ntot = 108. The quantity of interest is the free energy associated with the distance of the pair of particles interacting via the double well potential.

In this numerical example, we would still choose the weighting parameter to be inverse partition function, which is however not explicitly known or easily obtained a priori in this case. Therefore, we use the following method to get approximation of these partition functions, similar to that of Ref. 7, and then use the estimates in the weighting factors nk. First, a set of initial guess Zk(0) is chosen and used to run the dynamics simulation. Then from a relatively short trajectory, we can add up the corresponding weight to get the “proportion” of each temperature by normalizing their summation to one. Since this proportion would go to 1/2 if the guess is accurate and the trajectory is infinitely long, we can adjust our guess accordingly. More specifically, if the proportions are wk, k = 0, 1, the weighting factor will be updated as

Zknew=Zkold×2wk,  if 2wkI,Zkold×2wk,  otherwise,
(35)

where I is a small neighborhood around 1 to take into account of the fluctuation due to the short simulation trajectory when estimating wk. We iteratively update the partition function estimate {Zk(l)},l=1,2, until the proportions wk are satisfactorily close to 1/2. Here we take I = [0.35, 1.5], initial guess (Z0(0),Z1(0))=(1,108), and the number of iterations lmax = 10 with the trajectory length 107 steps each.

The performance of STMD with various switching rates is shown Fig. 3. It can be seen that the performance of STMD is better when the switching rate increases from 0.25 to 25, and the asymptotic variance converges to that of the IST as ν. This provides strong numerical validation of our theoretical results. We also remark that as this example is more complicated than the previous one, the switching rate needed by STMD to get close to IST becomes much higher, compared to the results in Fig. 2. It is expected that for a larger system and when more temperatures are used, it will become more difficult for STMD to reach its infinite switching limit by increasing the switching rate. Thus it is more advantageous to take the limit explicitly and operate at the infinite switching rate by using, e.g., the IST algorithm.

FIG. 3.

Asymptotic variance of the potential energy V(x) for the WCA example for simulated tempering algorithms at different switching frequencies and the infinite switching limit.

FIG. 3.

Asymptotic variance of the potential energy V(x) for the WCA example for simulated tempering algorithms at different switching frequencies and the infinite switching limit.

Close modal

We justify that the sampling efficiency of the simulated tempering method increases with the switching rate of the temperature, using both the theoretical analysis based on large deviation of empirical distribution and also numerical tests on two examples. This motivates taking the infinite switching limit of the simulated tempering dynamics, which recovers the integrated tempering enhanced sampling method under a natural reformulation. Now the limiting dynamics can be implemented as a patch to standard molecular dynamics by adding a scaling factor to the force term based on a weighted average of all temperatures involved. This leads to a practical scheme with higher sampling efficiency than standard simulated tempering. Throughout this work, we have assumed that the weight factors nk are given and only consider the efficiency of the algorithm as the parameter ν varies. For practical use of simulate tempering algorithms, it is also essential for us to choose an appropriate set of weights for each temperature or to design an adaptive algorithm to learn a set of optimal weights. A thorough understanding of these issues is beyond the scope of the current paper and is an important direction for future investigation.

H.G. is supported in part by the National Science Foundation of China via Grant No. 11622101. J.L. is supported in part by the National Science Foundation via Grant No. DMS-1454939. J.L. and H.G. would also like to thank Yiqin Gao, Zhiqiang Tan, Eric Vanden-Eijnden, and Zhennan Zhou for helpful discussions.

To introduce formally the large deviation principle, let us introduce some definitions. Let S be a Polish space and P(S) the space of probability measures on S, equipped with the topology of weak convergence. Under this weak topology, P(S) itself is a Polish space. Note that the empirical measure λTν is a P(S)-valued random variable.

The large deviation principle for empirical distribution can be stated as follows:20–22 A sequence of random probability measures {γT} is said to satisfy a large deviation principle with a rate function I:P(S)[0,], if for all open sets OP(S)

lim infT1TlogP(γTO)infμOI(μ),
(A1)

for all closed sets CP(S)

lim infT1TlogP(γTC)infμCI(μ),
(A2)

and for all M < , {μ: I(μ) ≤ M} is compact in P(S). In particular, once we obtain the rate functional, (A1) and (A2) quantifies how unlikely the empirical distribution is far from the equilibrium when T is large, which was used in this work to justify the infinite switching limit.

To find the rate functional Iν for the simulate tempering overdamped dynamics with switching rate ν, we divide the infinitesimal generator into two parts,

(Lνu)(x,β)=(Ldiffνu+νLjumpνu)(x,β),
(A3)

where

Ldiffν=xV(x)x+β1xx,Ljumpν=gβ,β(x)+gβ,β(x).
(A4)

The rate functional thus has an additive structure (15) with the rate functional J0 and J1 corresponding to Ldiffν and Ljumpν, respectively.

To get an explicit formula of the rate functionals, we first consider J0. Let f(x,β)=θ(x,β), then following Donsker and Varadhan,20 we have

J0(μ)=βf(Ldiffνf)ϱ(dx,β)=β[(xV(x)xf)f+(β1xxf)f]ϱ(dx,β)=β12Zβ1(xV(x)xf)feβV(x)dx+β1(xxf)feβV(x)dx=β12Zβ1(xV(x)xf)feβV(x)dxβ1xfx(feβV(x))dx=β12Zβ1β1|xf|2eβV(x)dx=β12β1|xθ(x,β)|2ρβ(dx)=β18θ(x,β)β1|xθ(x,β)|2ρβ(dx).
(A5)

Next we consider the rate functional J1 corresponding to the jump process of the temperature with generator Ljumpν. From the definition of θ(x, β), we have

θ(x,β)=2μ(x,β)ρβ(x),
(A6)

and therefore

μ(x,β)μ(x,β)=θ(x,β)θ(x,β)ρβ(x)ρβ(x)=θ(x,β)θ(x,β)gβ,β(x)gβ,β(x).
(A7)

Combined with the Cauchy-Schwartz inequality, we obtain that for any choice of non-negative u(x, β),

gβ,β(x)u(x,β)u(x,β)μ(dx,β)+gβ,β(x)u(x,β)u(x,β)μ(dx,β)  2gβ,β(x)gβ,β(x)μ(x,β)μ(x,β)μ(dx,β)  =2gβ,β(x)θ(x,β)θ(x,β)μ(dx,β),
(A8)

where the equality is attained if and only if for each xR3n and each β it holds

u(x,β)u(x,β)gβ,β(x)μ(x,β)gβ,β(x)μ(x,β)=ρβ(x)μ(x,β)ρβ(x)μ(x,β).
(A9)

In particular, this gives the choice of u(x, β) to make equality holds in (A8).

Using the variational characterization of the large deviation rate functional as in Ref. 20, we have

J1(μ)=infu0βLjumpνuu(x,β)μ(dx,β)=infu0β[gβ,β(x)+gβ,β(x)u(x,β)u(x,β)]μ(dx,β)=12βgβ,β(x)μ(dx,β)+gβ,β(x)μ(dx,β)12infu0βgβ,β(x)u(x,β)u(x,β)μ(dx,β)+gβ,β(x)u(x,β)u(x,β)μ(dx,β)=(A8)12βgβ,β(x)1+θ(x,β)θ(x,β)μ(dx,β)12β2gβ,β(x)θ(x,β)θ(x,β)μ(dx,β)=12βgββ(x)1θ(x,β)θ(x,β)2μ(dx,β).
(A10)

Therefore, we arrive at the explicit formula for the large deviation rate functionals. We remark that a rigorous proof of the rate functional derived from the above calculation can be found in Ref. 23.

1.
E.
Marinari
and
G.
Parisi
,
Europhys. Lett.
19
,
451
(
1992
).
2.
W.
Kerler
and
P.
Rehberg
,
Phys. Rev. E
50
,
4220
(
1994
).
3.
U. H.
Hansmann
,
Chem. Phys. Lett.
281
,
140
(
1997
).
4.
Y.
Sugita
and
Y.
Okamoto
,
Chem. Phys. Lett.
314
,
141
(
1999
).
5.
D. J.
Earl
and
M. W.
Deem
,
Phys. Chem. Chem. Phys.
7
,
3910
(
2005
).
6.
C.
Zhang
and
J.
Ma
,
J. Chem. Phys.
129
,
134112
(
2008
).
7.
Y. Q.
Gao
,
J. Chem. Phys.
128
,
064105
(
2008
).
8.
L.
Yang
,
Q.
Shao
, and
Y. Q.
Gao
,
J. Chem. Phys.
130
,
124111
(
2009
).
9.
L.
Yang
,
C. W.
Liu
,
Q.
Shao
,
J.
Zhang
, and
Y. Q.
Gao
,
Acc. Chem. Res.
48
,
947
(
2015
).
10.
C.
Dellago
,
P. G.
Bolhuis
, and
D.
Chandler
,
J. Chem. Phys.
110
,
6617
(
1999
).
11.
D.
Sindhikara
,
Y.
Meng
, and
A. E.
Roitberg
,
J. Chem. Phys.
128
,
024103
(
2008
).
12.
N.
Plattner
,
J. D.
Doll
,
P.
Dupuis
,
H.
Wang
,
Y.
Liu
, and
J. E.
Gubernatis
,
J. Chem. Phys.
135
,
134111
(
2011
).
13.
P.
Dupuis
,
Y.
Liu
,
N.
Plattner
, and
J. D.
Doll
,
Multiscale Model. Simul.
10
,
986
(
2012
).
14.
J.
Lu
and
E.
Vanden-Eijnden
,
J. Chem. Phys.
138
,
084105
(
2013
).
15.
T.-Q.
Yu
,
J.
Lu
,
C. F.
Abrams
, and
E.
Vanden-Eijnden
,
Proc. Natl. Acad. Sci. U. S. A.
113
,
11744
(
2016
).
16.
J.
Lu
and
E.
Vanden-Eijnden
, “
Methodological and computational aspects of parallel tempering methods in the infinite swapping limit
,” preprint arXiv:1712.06947 (
2017
).
17.
A.
Martinsson
,
J.
Lu
,
B.
Leimkuhler
, and
E.
Vanden-Eijnden
, Simulated tempering method in the infinite switch limit with adaptive weight learning (unpublished).
18.
S.
Park
and
V. S.
Pande
,
Phys. Rev. B
76
,
016703
(
2007
).
19.
Z.
Tan
,
J. Comput. Graph. Stat.
26
,
54
(
2017
).
20.
M. D.
Donsker
and
S.
Varadhan
,
Commun. Pure Appl. Math.
28
,
1
47
(
1975
).
21.
J. D.
Deuschel
and
D. W.
Stroock
,
Large Deviation
, Pure and Applied Mathematics (
Academic Press
,
New York
,
1989
), Vol. 137.
22.
A.
Dembo
and
O.
Zeitouni
,
Large Deviation Techniques and Applications
(
Springer
,
2010
).
23.
P.
Dupuis
and
Y.
Liu
,
Ann. Probab.
43
,
1121
(
2015
).